Tracing
Tracing, monitoring, and debugging help you understand how an agent or team agent handled a request, which tools were used, what reasoning was applied, and where errors may have occurred.
Tracing agent executions
When you run an agent, the system returns an AgentResponse
containing:
input
: The original input, chat history, and parameters.output
: Final generated result.session_id
: Used to track conversation sessions.intermediate_steps
: A full trace of the agent’s tool usage, responses, and reasoning.execution_stats
: Timing, API calls made, credits consumed, runtime breakdown by tools.
AgentResponse(
status="SUCCESS",
completed=True,
data=AgentResponseData(
input={...},
output="Summarized news content...",
session_id="abc123",
intermediate_steps=[...],
execution_stats={...}
),
used_credits=0.0003,
run_time=4.1
)
Example usage:
query = "Identify and qualify EdTech leads for AI-based personalized learning."
response = lead_finder.run(query)
from pprint import pprint
pprint(response)
intermediate_steps
include:
agent
name (e.g., Lead Finder, Summarizer, Translator)input
provided to the agentoutput
generated by the agenttool_steps
used (if a tool was called)runTime
for that agent’s stepusedCredits
for that stepapiCalls
made (if any)thought
records the agent’s internal reasoning about its action.task
a field used only inside team agents to define the assigned work for user-defined agents.
This makes it easy to review what happened inside the agent and debug if needed.
If thought
is None or empty, it simply means: The agent did not generate any internal reasoning (it moved straight from input to output).
task
is not used when the user sends a message directly to an agent.
Example usage:
response.data["intermediate_steps"]
Tracing team agent executions
Team agents extend this behavior by orchestrating multiple user-defined agents and micro agents (mentalists, inspectors, orchestrators, feedback combiners).
When you run a team agent, the system returns a ModelResponse
containing:
input
: The original input and parameters.output
: Final structured answer.session_id
: Session ID for multi-step traces.intermediate_steps
:- Micro agent actions (e.g., mentalist planning, inspector reviews, feedback combinations)
- Each step run by different agents (Lead Finder, Lead Analyzer, etc.)
- Tool usage and reasoning
execution_stats
: Overall run time, credits consumed, API calls made, breakdowns per agent and micro agent.plan
: Describes the intended execution flow: a sequence of tasks, assigned agents (workers), and their expected outcomes.
ModelResponse(
status="SUCCESS",
data={
"input": {...},
"output": "final output...",
"session_id": "abc-123",
"intermediate_steps": [...],
"executionStats": {...},
"plan": [
{"step": "Task: find leads", "worker": "Lead Finder"},
{"step": "Task: analyze leads", "worker": "Lead Analyzer"}
]
},
completed=True
)
Example usage:
query = "Identify and qualify EdTech leads for AI-based personalized learning."
team_response = team.run(query)
from pprint import pprint
pprint(team_response)
intermediate_steps
The intermediate_steps
for a team agent contain a detailed log of every micro agent actions. Each step includes:
agent
is agent name (e.g., orchestrator, mentalist, Lead Finder, inspector, response generator)input
provided to that agentoutput
generated by that agenttool_steps
used (if the agent called a tool)runTime
for that agent’s stepusedCredits
for that stepapiCalls
made by the agent (if any)thought
- Mentalist: See how the tasks were planned.
- Inspector: Review validation feedback or issues detected during or after agent execution.
- Feedback combiner: Summarizes multiple inspector comments into one.
task
assignment information (when available). It helps orchestrate sub-tasks between user-defined agents and ensures each agent knows its role.
Example usage:
team_response.data["intermediate_steps"]
Understnding team agent behavior
Mentalist
- The mentalist started with no input and created a plan to fulfill the user's request: Identify and qualify EdTech leads for AI-based personalized learning.
- It broke the goal into two tasks:
- Find leads: Generate a list of EdTech companies with contact information.
- Analyze leads: Prioritize those companies based on their fit with the AI platform.
- It assigned:
- "Lead Finder" agent to handle task 1
- "Lead Analyzer" agent to handle task 2
Orchestrator
- The orchestrator took the user's request and the mentalist’s plan, and assigned the first task to the agent: Lead Finder.
- It injected the user request and instructions into the Lead Finder's input.
- After the Lead Finder completed the task, the orchestrator collected the results and passed them to the inspector.
- When the inspector reported that the results were incomplete (missing contact information, not focused enough on AI personalization), the orchestrator created a new assignment to have the Lead Finder redo the task, this time with stricter instructions based on the feedback.
- After two incomplete attempts by the Lead Finder, the orchestrator reassigned the work to the Lead Analyzer. When the Lead Analyzer could not provide a fully qualified output and the inspector confirmed no further progress could be made, the orchestrator issued a FINISH signal to end the task.
Inspector
- The inspector received the Lead Finder agent results after the first run.
- It analyzed the output, not just for existence but against the user's goals:
- Were the leads qualified?
- Was complete contact information provided?
- Was the AI personalization focus clear?
- The inspector found that:
- While some leads were relevant, the contact info was often missing.
- Some companies were not clearly focused on AI personalization.
- It gave structured feedback: Re-run the step to include comprehensive contact information and ensure each company focuses on AI-based personalized learning.
Monitoring execution
For both agent and team agent, execution stats help you monitor:
- Session ID: Allows resuming or analyzing multi-turn interactions.
- API call breakdown: Number of tool/API calls per agent.
- Credit consumption: Per tool, per agent, and total.
- Run time breakdown: How much time each step or agent consumed.
- Tool failures
- Unexpected behaviors
Monitor agents
response.data["execution_stats"]
Monitor team agents
team_response.data["executionStats"]
Key fields:
status
: Overall result of the execution (e.g.,SUCCESS
).apiCalls
: Total number of API calls made during execution.credits
: Total compute credits consumed by the execution.runtime
: Total runtime (in seconds) for the execution.apiCallBreakdown
: Number of API calls made by each asset.runtimeBreakdown
: Time (in seconds) spent by each asset.creditBreakdown
: Compute credits used by each asset.sessionId
: Unique identifier for the execution session.environment
: Execution environment (e.g.,prod
,dev
).assetsUsed
: List of agents, tools, or models utilized.timeStamp
: Timestamp when execution finished (UTC).params
: Additional parameters likeid
andsessionId
.
Debugging failures
- Internal failures: If an agent or micro agent fails (tool timeout, wrong output format), the step will show an error or failure status inside
intermediate_steps
. - External failures: If the final output cannot be produced, the run status will be "FAILED" and the error will appear in the
execution_stats
.
Important:
- A single agent failure inside a team agent does not necessarily fail the team.
- However, if a critical step (e.g., response generation) fails, the entire team agent run can fail.
- Inspecting the
intermediate_steps
and thoughts from inspectors will show exactly where and why a failure happened.
Best practices for tracing, monitoring, and debugging
- Always check
intermediate_steps
to understand agent decision-making. - Review
executionStats
to catch expensive tool calls or slow steps. - Use inspectors for early error catching.
- For complex team agents, trace the plan and each micro-agent individually.
- Investigate any failures in steps or missing outputs early.