Running Agents
Agents and team agents are both executed using the .run() method. This triggers the agent's reasoning loop, optionally invokes tools, and returns a structured or natural language response. Input can be simple text, structured data, conversation history, variables, or file references.
The response contains the generated output, session metadata, intermediate reasoning steps, execution statistics, and credit usage.
The main differences between agents and team agents are:
- Iteration control: Agents manage
max_iterationsinternally, while team agents manage it externally across multiple agents. - Response object: Agents return an
AgentResponse, while team agents return aModelResponse.
This guide explains how to run agents and team agents, customize input, and work with the returned data.
Running agents and team agents
Run a single agent:
response = agent.run(
query="Identify and qualify EdTech leads for AI-based personalized learning."
)
Run a team agent:
team_response = team.run(
query="Identify and qualify EdTech leads for AI-based personalized learning."
)
Parameters
query(str): Required user input text.data(dict or str, optional): Structured input data or additional context.session_id(str, optional): Reuse a conversation memory.history(list, optional): Override conversation memory manually.parameters(dict, optional): Dynamic model configuration (works like kwargs in Python).output_format(OutputFormat, optional): TEXT or MARKDOWN (default: TEXT).timeout(float, optional): Maximum allowed runtime (default: 300 seconds).wait_time(float, optional): Polling frequency to check status (default: 0.5 seconds).max_tokens(int, optional): Maximum tokens generated by an LLM per iteration.max_iterations(int, optional): Number of reasoning or planning loops allowed.
Managing agent reasoning loops
Single agent:
max_iterationscontrols how many internal LLM-tool loops are allowed for that agent.
Team agent:
- Each agent inside the team has its own internal
max_iterations. - Every call to an agent counts as one step against the team's max_iterations, even if the agent performs several internal iterations.
- The team agent itself has a global external
max_iterationscontrolling how many agent steps the team can take overall. - If an agent inside the team fails after reaching its internal
max_iterations, the team agent can still continue executing other steps if possible.
Best practices:
- Start with a higher value (e.g., 10–30) during development.
- Monitor how many iterations your agent or team usually needs.
- Tune
max_iterationsbased on actual usage to balance cost and performance. - For team agents: remember external
max_iterationscontrols team steps, not individual agent retries.
Handling execution timing
timeout: Maximum time (in seconds) to complete execution. Default is 300 seconds.wait_time: Interval (in seconds) between checking for completion. Default is 0.5 seconds.
Tip: Reduce wait_time carefully to avoid overwhelming the server with frequent polls.
Accessing the response
For a single agent:
response.data["output"] # Final result
response.data["intermediate_steps"] # Execution trace
response.data["execution_stats"] # Performance and session details
For a team agent:
team_response.data["output"] # Final result
team_response.data["intermediate_steps"]# Execution trace
team_response.data["executionStats"] # Performance and session details
Providing input to agents and teams
The run() method accepts a variety of inputs depending on the task. You can combine a plain query with context, memory, or variables using the content, session_id, and history fields.
Simple input
Pass a static string as the query:
response = agent.run(query="What is the capital of France?")
Contextual input
Use content to pass additional information the agent can use to answer:
response = agent.run(
query="Summarize this article.",
content=["First page...", "Second page..."]
)
You can also pass reasoning hints:
response = agent.run(
query="Solve this math problem",
content=["Show your work", "Break it down step by step"]
)
Files and paths
To process audio, images, or other files, pass paths or public URLs in content:
response = agent.run(
query="Translate this audio file to English:",
content=["DiscoveraiXplain.mp3", "https://aixplain.com/studio/"]
)
File type support depends on the tools and models your agent is using. Test with your specific configuration.
OnPrem does not support local file upload. Only text based input and public urls for files, provided the agent has access to the internet
Structured variables
To provide named values (e.g., for prompt templates), pass a dictionary in content:
response = agent.run(
query="Generate a personalized greeting for {{name}} at {{time_of_day}}.",
content={"name": "Alice", "time_of_day": "morning"}
)
You can also combine variables with unstructured context:
response = agent.run(
query="What should {{user}} focus on?",
content=[
{"user": "Alice"},
"Previous performance review: Excellent coding skills",
"Areas for growth: Project management"
]
)
Unified input
To pass everything in a single dictionary, use the data argument:
response = agent.run(
data={
"query": "Translate this text.",
"session_id": session,
"content": ["Hola, ¿cómo estás?"]
}
)
Asynchronous Run
For long-running tasks or workflows where blocking isn't ideal, use the asynchronous execution method. The run_async() call returns immediately, and you can use poll() to track progress.
import time
response = agent.run_async("What are AI Agents?")
while True:
result = agent.poll(response.url)
if result.get("completed"):
print(result)
break
else:
time.sleep(5)
Asynchronous runs provide a polling URL in response.url, which you can use with poll() to check for completion or errors.
Multi-turn conversations
Use a session_id to maintain memory between calls. The session lasts 15 days unless reused.
session_id = response.data.session_id
response = agent.run(query="What is the capital of France?", session_id=session_id)
Alternatively, use history to provide prior messages directly:
history = [
{ "role": "user", "content": "My name is Alex." },
{ "role": "assistant", "content": "Hi Alex! How can I help you?" }
]
response = agent.run(query="What’s my name?", history=history)
If both session_id and history are provided, history takes precedence and overrides the session context.
Output Format
Agents can return their final output in different formats.
from aixplain.modules.agent import OutputFormat
response = agent.run(
query="Summarize the latest AI breakthroughs.",
output_format=OutputFormat.MARKDOWN
)
print(response.data.output)
Available formats:
TEXT: Plain language output (default).MARKDOWN: Markdown-formatted output.JSON: JSON-formatted output. JSON outputs must be a PydanticBaseModelclass or a dictioanry representing the JSON schema.
Expected Output
The expected_output parameter allows you to customize the format of the agent's response based on a user-defined structure.
from pydantic import BaseModel
from typing import Optional, List
class Person(BaseModel):
name: str
age: int
city: Optional[str] = None
class Response(BaseModel):
result: List[Person]
Use these custom defined outputs using these expected_ouput parameter.
from aixplain.factories import AgentFactory
from aixplain.modules.agent import AgentResponse
from aixplain.modules.agent.output_format import OutputFormat
INSTRUCTIONS = """Answer questions based on the following context:
+-----------------+-------+----------------+
| Name | Age | City |
+=================+=======+================+
| João Silva | 34 | São Paulo |
+-----------------+-------+----------------+
| Maria Santos | 28 | Rio de Janeiro |
+-----------------+-------+----------------+
| Pedro Oliveira | 45 | |
+-----------------+-------+----------------+
| Ana Costa | 19 | Recife |
+-----------------+-------+----------------+
| Carlos Pereira | 52 | Belo Horizonte |
+-----------------+-------+----------------+
| Beatriz Lima | 31 | |
+-----------------+-------+----------------+
| Lucas Ferreira | 25 | Curitiba |
+-----------------+-------+----------------+
| Julia Rodrigues | 41 | Salvador |
+-----------------+-------+----------------+
| Miguel Almeida | 37 | |
+-----------------+-------+----------------+
| Sofia Carvalho | 29 | Brasília |
+-----------------+-------+----------------+"""
agent = AgentFactory.create(
name="Test Agent",
description="Test description",
instructions=INSTRUCTIONS,
llm_id="6646261c6eb563165658bbb1",
)
response = agent.run("Who have more than 30 years old?", output_format=OutputFormat.JSON,
expected_output=Response)
print(response.data.output)
Best practices
- Use clear and consistent variable names (e.g.,
user_name, notuser name) - Avoid unnecessary nesting in
content - Limit content size for performance-sensitive tasks
- Track
session_idto preserve context in chat-like interactions
Choose the right input method
| Scenario | Recommended input method |
|---|---|
| One-off queries | query="..." |
| Multi-turn conversation | Use session_id |
| Supplying documents or files | Use content=[...] |
| Dynamic variables | Use content={...} |
| Mixed context and variables | Use content=[{}, "..."] |
| Full control | Use data={...} |
Optimize content
- Use lists to provide multi-part background or reference material
- Use dictionaries for structured variables (e.g.,
{"user": "John"}) - Mix both in a single list to combine structure and unstructured context