Running Agents

Agents and team agents are both executed using the .run() method. This triggers the agent's reasoning loop, optionally invokes tools, and returns a structured or natural language response. Input can be simple text, structured data, conversation history, variables, or file references.

The response contains the generated output, session metadata, intermediate reasoning steps, execution statistics, and credit usage.

The main differences between agents and team agents are:

Iteration control: Agents manage max_iterations internally, while team agents manage it externally across multiple agents.
Response object: Agents return an AgentResponse, while team agents return a ModelResponse.

This guide explains how to run agents and team agents, customize input, and work with the returned data.

Running agents and team agents

Run a single agent:

response = agent.run(
    query="Identify and qualify EdTech leads for AI-based personalized learning."
) 

Run a team agent:

team_response = team.run(
    query="Identify and qualify EdTech leads for AI-based personalized learning."
)

Parameters

query (str): Required user input text.
data (dict or str, optional): Structured input data or additional context.
session_id (str, optional): Reuse a conversation memory.
history (list, optional): Override conversation memory manually.
parameters (dict, optional): Dynamic model configuration (works like kwargs in Python).
output_format (OutputFormat, optional): TEXT or MARKDOWN (default: TEXT).
timeout (float, optional): Maximum allowed runtime (default: 300 seconds).
wait_time (float, optional): Polling frequency to check status (default: 0.5 seconds).
max_tokens (int, optional): Maximum tokens generated by an LLM per iteration.
max_iterations (int, optional): Number of reasoning or planning loops allowed.

Managing agent reasoning loops

Single agent:

max_iterations controls how many internal LLM-tool loops are allowed for that agent.

Team agent:

Each agent inside the team has its own internal max_iterations.
Every call to an agent counts as one step against the team's max_iterations, even if the agent performs several internal iterations.
The team agent itself has a global external max_iterations controlling how many agent steps the team can take overall.
If an agent inside the team fails after reaching its internal max_iterations, the team agent can still continue executing other steps if possible.

Best practices:

Start with a higher value (e.g., 10–30) during development.
Monitor how many iterations your agent or team usually needs.
Tune max_iterations based on actual usage to balance cost and performance.
For team agents: remember external max_iterations controls team steps, not individual agent retries.

Handling execution timing

timeout: Maximum time (in seconds) to complete execution. Default is 300 seconds.
wait_time: Interval (in seconds) between checking for completion. Default is 0.5 seconds.

Tip: Reduce wait_time carefully to avoid overwhelming the server with frequent polls.

Accessing the response

For a single agent:

response.data["output"]                # Final result
response.data["intermediate_steps"]     # Execution trace
response.data["execution_stats"]         # Performance and session details

For a team agent:

team_response.data["output"]            # Final result
team_response.data["intermediate_steps"]# Execution trace
team_response.data["executionStats"]    # Performance and session details

See tracing and monitoring guide →

Providing input to agents and teams

The run() method accepts a variety of inputs depending on the task. You can combine a plain query with context, memory, or variables using the content, session_id, and history fields.

Simple input

Pass a static string as the query:

response = agent.run(query="What is the capital of France?")

Contextual input

Use content to pass additional information the agent can use to answer:

response = agent.run(
    query="Summarize this article.",
    content=["First page...", "Second page..."]
)

You can also pass reasoning hints:

response = agent.run(
    query="Solve this math problem",
    content=["Show your work", "Break it down step by step"]
)

Files and paths

To process audio, images, or other files, pass paths or public URLs in content:

response = agent.run(
    query="Translate this audio file to English:",
    content=["DiscoveraiXplain.mp3", "https://aixplain.com/studio/"]
)

note

File type support depends on the tools and models your agent is using. Test with your specific configuration.

info

OnPrem does not support local file upload. Only text based input and public urls for files, provided the agent has access to the internet

Structured variables

To provide named values (e.g., for prompt templates), pass a dictionary in content:

response = agent.run(
    query="Generate a personalized greeting for {{name}} at {{time_of_day}}.",
    content={"name": "Alice", "time_of_day": "morning"}
)

You can also combine variables with unstructured context:

response = agent.run(
    query="What should {{user}} focus on?",
    content=[
        {"user": "Alice"},
        "Previous performance review: Excellent coding skills",
        "Areas for growth: Project management"
    ]
)

Unified input

To pass everything in a single dictionary, use the data argument:

response = agent.run(
    data={
        "query": "Translate this text.",
        "session_id": session,
        "content": ["Hola, ¿cómo estás?"]
    }
)

Asynchronous Run

For long-running tasks or workflows where blocking isn't ideal, use the asynchronous execution method. The run_async() call returns immediately, and you can use poll() to track progress.

import time

response = agent.run_async("What are AI Agents?")

while True:
    result = agent.poll(response.url)
    if result.get("completed"):
        print(result)
        break
    else:
        time.sleep(5)

Asynchronous runs provide a polling URL in response.url, which you can use with poll() to check for completion or errors.

Multi-turn conversations

Use a session_id to maintain memory between calls. The session lasts 15 days unless reused.

session_id = response.data.session_id

response = agent.run(query="What is the capital of France?", session_id=session_id)

Alternatively, use history to provide prior messages directly:

history = [
  { "role": "user", "content": "My name is Alex." },
  { "role": "assistant", "content": "Hi Alex! How can I help you?" }
]

response = agent.run(query="What’s my name?", history=history)

If both session_id and history are provided, history takes precedence and overrides the session context.

Output Format

Agents can return their final output in different formats.

from aixplain.modules.agent import OutputFormat

response = agent.run(
    query="Summarize the latest AI breakthroughs.",
    output_format=OutputFormat.MARKDOWN
)
print(response.data.output)

Available formats:

TEXT: Plain language output (default).
MARKDOWN: Markdown-formatted output.
JSON: JSON-formatted output.

Expected Output

The expected_output parameter allows you to customize the format of the agent's response based on a user-defined structure.

from pydantic import BaseModel
from typing import Optional, List

class Person(BaseModel):
    name: str
    age: int
    city: Optional[str] = None

class Response(BaseModel):
    result: List[Person]

Use these custom defined outputs using these expected_ouput parameter.

response = agent.run("Who have more than 30 years old?", output_format=OutputFormat.JSON, 
expected_output=Response)
print(response.data.output)

Best practices

Use clear and consistent variable names (e.g., user_name, not user name)
Avoid unnecessary nesting in content
Limit content size for performance-sensitive tasks
Track session_id to preserve context in chat-like interactions

Choose the right input method

Scenario	Recommended input method
One-off queries	`query="..."`
Multi-turn conversation	Use `session_id`
Supplying documents or files	Use `content=[...]`
Dynamic variables	Use `content={...}`
Mixed context and variables	Use `content=[{}, "..."]`
Full control	Use `data={...}`

Optimize `content`

Use lists to provide multi-part background or reference material
Use dictionaries for structured variables (e.g., {"user": "John"})
Mix both in a single list to combine structure and unstructured context

Running agents and team agents​

Managing agent reasoning loops​

Handling execution timing​

Accessing the response​

Providing input to agents and teams​

Simple input​

Contextual input​

Files and paths​

Structured variables​

Unified input​

Asynchronous Run​

Multi-turn conversations​

Output Format​

Expected Output​

Best practices​

Choose the right input method​

Optimize content​