Version: v2.0

Agents

Agents are instruction-driven, model-powered reasoning components that follow a plan → act → observe → repeat loop. They use an LLM to decide the next action, call tools when needed, and return structured output in TEXT, MARKDOWN, or JSON format.

Setup

pip install aixplain

from aixplain import Aixplain

aix = Aixplain(api_key="YOUR_API_KEY")

Quick start

Create and run a minimal agent to validate your setup:

agent = aix.Agent(
    name="Hello Agent",
    description="Answers general questions clearly and concisely.",
    instructions="You are a helpful assistant.",
)
agent.save()

response = agent.run(query="What is machine learning?")
print(response.data.output)
print(agent.path)

Show output

agent.save() transitions the agent from DRAFT to ONBOARDED state and makes it callable.

How it works

Each run executes a reasoning loop:

1. INIT          → Load config, validate input, process variables
2. REASONING LOOP ⟲
   ├─> LLM plans next action
   ├─> Execute tools (if needed)
   ├─> Evaluate results
   └─> Repeat until complete or the iteration limit is reached
3. RETURN        → AgentResponse with output + metadata

The LLM autonomously decides which tools to call and when. Runs continue until the task is complete or the iteration limit (budget.max_iterations) is hit.

Agent states

State	Description
`DRAFT`	Created but not persisted. Call `agent.save()` to promote.
`ONBOARDED`	Persisted and production-ready.

Tools

Tools extend an agent beyond text generation. The agent decides autonomously when to invoke them.

# Marketplace tool
web_search_tool = aix.Tool.get("tavily/tavily-search-api")

# Model used as a tool
translation_model = aix.Model.get("google/translate-multi-lingual")

Pass tools at agent creation:

note

# replaces: LangChain AgentExecutor + ReAct prompt engineering
# one Agent() call, runtime loop handled by AgenticOS

INSTRUCTIONS = """
You are a technical documentation assistant.
Think step-by-step when solving problems. Explain non-obvious choices.
Use tools only when internal knowledge is insufficient.
Prefer official sources when citing.
"""

agent = aix.Agent(
    name="Research Agent",
    description="Researches topics using web search.",
    instructions=INSTRUCTIONS,
    tools=[web_search_tool],
)
agent.save()

response = agent.run(query="What are the latest developments in AI safety?")
print(response.data.output)

Show output

You can test a tool in isolation before attaching it:

print(web_search_tool.list_actions())
response = web_search_tool.run(data="What is aixplain?")
print(response.data)

Show output

tip

If the agent ignores a tool, check response.data.steps for what it attempted, then tighten the tool's name and description. If total parameters across all tools exceed 100, optional ones may be silently dropped — mark only the ones you need as required.

Overriding tool parameters

You can pin a tool's input parameters so the agent uses fixed values instead of letting the LLM choose them. Configured parameters are hidden from the LLM-facing schema and injected as arguments on the tool call.

Inspect a tool's declared inputs to find a parameter name to override:

tool = aix.Tool.get("aixplain/aixplain-web-search/aixplain")

for action in tool.list_inputs(" "):
    inputs = [(i.code, i.name) for i in (action.inputs or [])]
    print(f"action {action.name!r}: inputs={inputs}")

Show output

Create-time override (persisted). Set the parameter on the tool before creating the agent. The value is saved on the agent and reused on every run:

tool.actions.search.inputs.num_results = 2   # pin to 2 results

agent = aix.Agent(
    name="Research Agent",
    description="Researches topics using web search.",
    tools=[tool],
)
agent.save()

# Re-fetch to confirm the parameter persisted on the agent's tool
fetched = aix.Agent.get(agent.id)
print(fetched.tools[0].parameters)

Show output

Run-time override (wins per run). Set the parameter on the agent's tool before a run() call. The run-time value is merged by tool id over the persisted default, so it wins for that run only:

agent.tools[0].actions.search.inputs.num_results = 3   # this run uses 3 results

response = agent.run("What are the latest developments in fusion energy in 2026?")
print(response.data.output)

Show output

The persisted value (2) stays on the agent; the run-time value (3) applies only to that call. Check response.data.steps to confirm the tool ran with the run-time value.

Learn more about tools →

Skills

Skills package reusable expertise (instructions, helper scripts, and reference files) into a single asset an agent can load on demand. A skill is authored as a Claude-style folder (SKILL.md plus optional scripts/ and resources/ subfolders); the whole tree is uploaded and managed as one asset.

SKILL.md frontmatter defines the skill's metadata. Only description is exposed to the agent in base context as the routing signal; the rest loads when the skill is used.

---
name: pdf-filler
description: Fills PDF forms from structured data.
requires:
  - tavily/tavily-search
---
# PDF Filler
Use scripts/fill.py to populate the template.

Load a folder into a Skill object. The constructor parses the frontmatter into name, description, and required_tools, and the markdown body into instructions.

skill = aix.Skill(
    file_path="pdf-filler/",              # folder with SKILL.md at its root
    tags=["forms"],
    privacy=aix.Privacy.PRIVATE,
)
print("name           ->", skill.name)
print("description    ->", skill.description)
print("required_tools ->", skill.required_tools)

skill.save()                              # uploads the bundle and registers the asset
print("created        ->", skill.id, "| status:", skill.status)

Show output

save() creates the asset and uploads the folder's files and subfolders internally. There is no node-level API for individual files.

Parameter	Type	Required	Default	Description
`file_path`	`str`	Yes	(none)	Path to the skill folder containing `SKILL.md` at its root.
`tags`	`list[str]`	No	`[]`	Labels for organising and filtering skills.
`privacy`	`Privacy`	No	`PRIVATE`	`aix.Privacy.PRIVATE` or `aix.Privacy.PUBLIC`.

Fetch and search existing skills

Retrieve a skill by path or id with aix.Skill.get, or discover skills with the standard paginated aix.Skill.search.

skill = aix.Skill.get("6a22a671887747f23b6c5613")
print(skill.name, "|", skill.privacy, "|", skill.status)

page = aix.Skill.search("pdf", page_size=5)
print("total:", page.total, "| page:", page.page_number, "/", page.page_total)
for s in page.results:
    print("  -", s.id, s.name)

Show output

Attach a skill to an agent

Pass skills at agent creation via skills=[...]. Each entry is a Skill object or an id string. The agent loads the skill when its description matches the task.

agent = aix.Agent(
    name="Forms Assistant",
    description="Completes PDF forms from user-provided data.",
    instructions="You have an attached skill. Use it when asked to fill a form.",
    skills=[skill],                       # Skill object or id string
)
agent.save()

response = agent.run(query="What skill do you have attached? Be brief.")
print(response.data.output)

Show output

Manage the skill bundle

download retrieves the uploaded bundle as a zip; clone creates an independent copy; delete removes the asset.

path = skill.download(file_path="pdf-filler.zip")
print("downloaded ->", path)

clone = skill.clone(name="pdf-filler-copy").save()
print("clone ->", clone.id, clone.name)

print(clone.delete().status)
print(skill.delete().status)

Show output

LLM configuration

The default model is GPT-5.4. Override it at agent creation:

# Option A: Model ID
SONNET_MODEL_ID = "67be216bd8f6a65d6f74d5e9"
agent = aix.Agent(
    name="Sonnet Agent",
    description="...",
    llm=aix.Model.get(SONNET_MODEL_ID),
)

# Option B: Fine-grained parameters via the inputs proxy
llm = aix.Model.get("openai/gpt-5.4")
llm.inputs.temperature = 1
llm.inputs.max_tokens = 100_000

agent = aix.Agent(
    name="Custom LLM Agent",
    description="...",
    llm=llm,
)

# Option C: Reasoning effort (GPT-5.4 and other reasoning models)
llm = aix.Model.get("openai/gpt-5.4")
llm.inputs.reasoning_effort = "high"  # "low" | "medium" | "high"

agent = aix.Agent(
    name="Deep Reasoning Agent",
    description="Handles complex, multi-step analysis.",
    llm=llm,
)

Choose an LLM based on: context window size, reasoning depth, latency requirements, cost per 1M tokens, tool-calling reliability, and multilingual quality.

Output format

Available formats: text (default) | markdown | json.

When using json, pass an expected_output schema and set run_response_generation=True at run time — any non-text output format requires response generation, otherwise the backend rejects the run (AX-VAL-1000). Three expected_output formats are accepted:

from pydantic import BaseModel
from typing import List, Dict
from aixplain.v2.agent import OutputFormat

# Option 1: Text description of the shape
expected_output = """{"name": "string", "calories": "string"}"""

# Option 2: Dict
expected_output = {"name": "string", "calories": "string"}

# Option 3: Pydantic model (recommended — adds type validation)
class RecipeOutput(BaseModel):
    name: str
    description: str
    ingredients: List[str]
    instructions: str
    nutrition: Dict[str, str]

expected_output = RecipeOutput

recipe_agent = aix.Agent(
    name="Recipe Structurer",
    description="Culinary assistant that returns structured recipes.",
    instructions="Extract and organise recipe data into the required JSON shape. Use web search to fill gaps.",
    tools=[web_search_tool],
    output_format=OutputFormat.JSON,
    expected_output=expected_output,
)
recipe_agent.save()

response = recipe_agent.run("Chocolate cake recipe", run_response_generation=True)
print(response.data.output)

Show output

Multimodal attachments

Attach images and audio to a run with the attachments parameter. Use a vision- and audio-capable model, and pass via_session=True so attachments are stored on the session message and read back by the worker.

Each entry in attachments can take two forms:

Form	Example	Notes
URL string	`"https://.../cat.jpg"`	`type` is auto-detected from the extension.
Local path string	`"/path/to/cat.jpg"`	Uploaded to aixplain storage automatically.

Create an agent backed by a multimodal model:

# Gemini 3.5 Flash handles both vision and audio
agent = aix.Agent(
    name="Multimodal Assistant",
    instructions="You are a concise multimodal assistant. Answer from the media provided.",
    llm=aix.Model.get("6a2846019a0a2598b61e12b8"),
)
agent.save()

IMAGE_URL = "https://upload.wikimedia.org/wikipedia/commons/3/3a/Cat03.jpg"
AUDIO_URL = "https://aixplain-platform-assets.s3.us-east-1.amazonaws.com/temp/sample.wav"

Images

Pass an image as a URL string or a local path. Both attach the same way:

session = agent.create_session(name="image-demo")

response = agent.run(
    session_id=session.id,
    via_session=True,
    attachments=[IMAGE_URL],   # or a local path like "/path/to/cat.jpg"
    query="Describe what is in this image in one sentence.",
)
print(response.data.output)

Show output

A local path is uploaded to aixplain storage before the run; a URL is forwarded to the model as-is.

Audio

Audio attachments are downloaded and inlined into the request. Attach one and ask the model to work with it:

session = agent.create_session(name="audio-demo")

response = agent.run(
    session_id=session.id,
    via_session=True,
    attachments=[AUDIO_URL],   # or a local path like "/path/to/sample.wav"
    query="Transcribe the speech in this audio.",
)
print(response.data.output)

Show output

Audio as the prompt

When the audio itself is the user turn, omit a meaningful query and let the model respond to the recording directly:

session = agent.create_session(name="audio-as-prompt")

response = agent.run(
    session_id=session.id,
    via_session=True,
    attachments=[AUDIO_URL],
    query=" ",   # blank query: the audio becomes the user turn
)
print(response.data.output)

Show output

Multiple attachments

Pass several media in one call. The model reasons over all of them together:

session = agent.create_session(name="multi-media")

response = agent.run(
    session_id=session.id,
    via_session=True,
    attachments=[IMAGE_URL, AUDIO_URL],
    query="Describe what I attached here.",
)
print(response.data.output)

Show output

Runtime parameters

agent.run() accepts the following parameters:

Parameter	Type	Default	Description
`query`	`str \| dict`	—	Main task or question. A `dict` is merged with `variables` and sent as structured input.
`variables`	`dict`	`None`	Values substituted into `{{placeholders}}` in `instructions` / `description`.
`session_id`	`str`	`None`	Resume a stateful session (14-day retention).
`attachments`	`list`	`None`	Images/audio to attach to the run. Each entry is a URL string or a local path string. See Multimodal attachments.
`via_session`	`bool`	`False`	Store attachments on the session message so the worker reads them back. Set `True` when passing `attachments`.
`history`	`list`	`None`	Inject prior turns without a session.
`run_response_generation`	`bool`	`False`	Generate a final synthesised response after tool steps. Set `True` when you need JSON output.
`progress_format`	`str \| None`	`None`	`"status"` (single line) or `"logs"` (timeline). `None` disables output.
`progress_verbosity`	`int`	`1`	Detail level: `1` minimal, `2` includes thoughts, `3` full I/O.
`progress_truncate`	`bool`	`True`	Truncate long text in progress output.
`timeout`	`int`	`300`	Seconds to poll before the SDK stops waiting. The agent may continue server-side.
`wait_time`	`float`	`0.5`	Seconds between polling checks.

note

max_tokens is an agent setting, not a run parameter: set it on the agent and save() (see below). Passing it as a keyword argument to run() has no effect. Iteration limits are set on the agent's budget (budget.max_iterations).

Understanding `max_tokens`

max_tokens caps output tokens only (not input/context). There are two independent levers, both set before the run rather than per call:

# 1. The agent's own output cap (default 2048) — set on the agent, then save
agent.max_tokens = 4000
agent.save()

# 2. The LLM's persistent output cap — set on the model
llm = aix.Model.get("openai/gpt-5.4")
llm.inputs.max_tokens = 100_000
agent.llm = llm
agent.save()

Keep caps conservative. Raise the agent's max_tokens first; if truncation persists, raise the LLM's max_tokens.

Variable substitution

Use {{variable}} placeholders in instructions or description, then supply values at runtime via variables:

agent = aix.Agent(
    name="Multilingual Researcher",
    description="Research assistant for {{topic}}.",
    instructions="""
You are a research assistant specialising in {{topic}}.
Always respond in {{language}}.
Focus on peer-reviewed sources when available.
""",
    tools=[web_search_tool],
)
agent.save()

response = agent.run(
    query="What are the key challenges?",
    variables={"topic": "quantum computing", "language": "Spanish"},
)
print(response.data.output)

Show output

note

variables substitution applies to instructions and description only — not to query.

Progress streaming

# Disabled (default)
response = agent.run(query="What is machine learning?")

# Compact single-line status
response = agent.run(
    query="What is machine learning?",
    progress_format="status",
    progress_verbosity=1,
)

# Full timestamped log with agent reasoning
response = agent.run(
    query="What is machine learning?",
    progress_format="logs",
    progress_verbosity=2,
    progress_truncate=True,
)

Show output

Level	What's shown
`1`	Step names and tool invocations
`2`	Steps + agent reasoning / thoughts
`3`	Full inputs and outputs at every step

Session management

Pass session_id to persist multi-turn context (stored 14 days, not used for training). Pass history to inject prior turns from an external source.

# Start a session
session_id = agent.generate_session_id()

response = agent.run(query="What is machine learning?", session_id=session_id)
print(response.data.output)

# Follow-up — agent retains context from the first turn
followup = agent.run(query="Give me a practical example.", session_id=session_id)
print(followup.data.output)

Show output

# Inject history manually (no server-side memory)
history = [
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
]
response = agent.run("Tell me a fun fact about it.", history=history)
print(response.data.output)

# Seed a new session with existing history
session_id = agent.generate_session_id(history=history)
response = agent.run("Tell me more about that.", session_id=session_id)
print(response.data.output)

Show output

Budget and governance

agent.budget is the single source of truth for run governance: it caps cost, duration, and iterations for a run. Set the fields directly on the budget, then save() to persist them as the agent's default.

agent = aix.Agent(
    name="Governed Agent",
    description="Answers questions under a fixed run budget.",
    instructions="You are a helpful assistant. Respond briefly.",
)
agent.save()

# max_duration_seconds=0 forces an immediate block so the governance result is visible
agent.budget.max_cost = 1
agent.budget.max_duration_seconds = 0
agent.budget.max_iterations = 5

response = agent.run("In one sentence, what is aixplain?")
print("status:", response.status)
print(response.data.governance)

Show output

When any limit is exceeded, the run is blocked and response.data.governance reports BLOCKED_BY_BUDGET with the reason. A run within budget reports ALLOWED. The budget and every field inside it are optional; omit the budget entirely and no budget enforcement applies (backward compatible).

Field	Type	Description
`max_cost`	`float`	Spending cap for the run, in credits.
`max_duration_seconds`	`float`	Wall-clock seconds before the run is blocked.
`max_iterations`	`int`	Node-execution cap enforced as a budget guardrail.

Persisted vs. run-time budget

A budget set before save() persists as the agent's default. A budget set after save() but before run() applies to that run only and merges field by field over the persisted default, with the run-time value winning per field.

# Persisted default: set before save()
agent.budget.max_cost = 1.0
agent.budget.max_iterations = 10
agent.save()

# Run-time override: set after save(), before run(). Only this field changes for the
# run; persisted max_cost and max_iterations still apply.
agent.budget.max_duration_seconds = 0
response = agent.run("In one sentence, what is aixplain?")
print(response.data.governance["status"], "|", response.data.governance["reason"])

Show output

The run-time value wins on max_duration_seconds, while the persisted max_cost and max_iterations continue to apply.

Round-trip

A persisted budget survives a re-fetch, so the limits you saved come back when you reload the agent.

agent.budget.max_cost = 1.0
agent.budget.max_iterations = 5
agent.save()

fetched = aix.Agent.get(agent.id)
print("cost:", fetched.budget.max_cost)
print("iterations:", fetched.budget.max_iterations)

Show output

Async calling

Use run_async() to start a run without blocking. The method returns immediately with a polling URL; call agent.poll(url) until result.completed is True.

import time

response = agent.run_async(query="Summarise the history of computing.")

while True:
    if not response.url:        # completed immediately (no polling needed)
        print(response.data.output)
        break

    result = agent.poll(response.url)

    if result.completed:
        print(result.data.output)
        break

    time.sleep(5)

Show output

Batch async

Start multiple runs in parallel, then collect results as they finish:

import time

queries = [
    "What are the benefits of cloud computing?",
    "Explain blockchain in plain English.",
    "What is reinforcement learning?",
]

# Kick off all runs
pending = []
for query in queries:
    r = agent.run_async(query=query)
    if r.url:
        pending.append((query, r.url))
    else:
        print(f"[immediate] {r.data.output}\n")

# Poll until all finish
results = []
while pending:
    for query, url in pending[:]:
        result = agent.poll(url)
        if result.completed:
            results.append((query, result.data.output))
            pending.remove((query, url))
    time.sleep(3)

for query, output in results:
    print(f"Q: {query}\nA: {output}\n")

Show output

Tracing and monitoring

Every run returns structured traces. Use them for debugging and cost tracking:

note

# replaces: LangSmith tracing + Helicone logging + custom middleware
# step-level traces are on by default; no instrumentation needed

response = agent.run(
    query="What are the top programming languages in 2025?",
    progress_format="logs",
    progress_verbosity=1,
)

# Run outcome
print("Status:     ", response.status)        # SUCCESS / FAILED / IN_PROGRESS
print("Output:     ", response.data.output)
print("Completed:  ", response.completed)
print("Error:      ", response.error_message)

# Top-level run metrics
print("Steps:      ", len(response.data.steps or []))
print("Session ID: ", response.data.session_id)
print("Credits:    ", response.used_credits)
print("Run time:   ", response.run_time, "s")

Show output

Inspecting steps

response.data.steps contains every reasoning step. Each step has an agent (which sub-agent ran) and a unit (which model or tool it invoked).

import json

for i, step in enumerate(response.data.steps or []):
    agent_info = step.get("agent", {})
    unit_info  = step.get("unit", {})

    agent_name = agent_info.get("name", "Unknown") if isinstance(agent_info, dict) else str(agent_info)
    unit_name  = unit_info.get("name", "Unknown")  if isinstance(unit_info, dict) else str(unit_info)
    unit_type  = unit_info.get("type", "")          if isinstance(unit_info, dict) else ""

    is_tool = unit_type.lower() == "tool"

    print(f"\n--- Step {i+1}: {agent_name} → {unit_name} ({'Tool' if is_tool else 'LLM'}) ---")
    print(f"  API calls: {step.get('api_calls', 0)}")
    print(f"  Credits:   {step.get('used_credits', 0):.6f}")

    if step.get("thought"):
        print(f"  Thought:   {str(step['thought'])[:200]}")
    if step.get("task"):
        print(f"  Task:      {step['task']}")
    if step.get("input"):
        inp = step["input"]
        print(f"  Input:     {json.dumps(inp)[:300] if isinstance(inp, dict) else str(inp)[:300]}")
    if step.get("output"):
        out = step["output"]
        print(f"  Output:    {json.dumps(out)[:300] if isinstance(out, dict) else str(out)[:300]}")
    if step.get("error") or step.get("error_message"):
        print(f"  Error:     {step.get('error') or step.get('error_message')}")

Show output

Step fields reference:

Field	Description
`agent`	Sub-agent that executed this step (dict with `name` and `id`).
`unit`	Model or tool invoked (dict with `name`, `id`, and `type`). `type` is `"tool"` for tool calls, otherwise an LLM step.
`api_calls`	Number of API calls made in this step.
`used_credits`	Credits consumed by this step.
`thought`	Agent's internal reasoning before acting (visible at `progress_verbosity=2`+).
`task`	Task name assigned to this step when tasks are configured; `None` otherwise.
`input`	Input passed to the unit.
`output`	Output returned by the unit.
`error` / `error_message`	Error if this step failed.

Execution metrics

stats = response.data.execution_stats or {}
print("API calls:", stats.get("api_calls"))
print("Credits:", stats.get("credits"))
print("Runtime:", stats.get("runtime"), "s")
print("Assets used:", stats.get("assets_used"))
print("Session ID:", stats.get("session_id"))
print("Run ID:", stats.get("params", {}).get("id"))
print("Request ID:", stats.get("request_id"))

Show output

Save and update

The typical lifecycle is create → save → run → update → save:

agent.output_format = "markdown"
agent.budget.max_iterations = 15
agent.save()

Call agent.save() after any change to name, description, instructions, tools, llm, or output_format.

Troubleshooting

Agent ignores tools - Inspect response.data.steps to see what the agent attempted. Check that the tool's name and description are unambiguous. If total tool parameters exceed 100, optional ones may be silently dropped.

agent reached the maximum number of iterations - The agent hit its iteration limit (default 5). Raise it through the budget for complex, multi-step tasks:

agent.budget.max_iterations = 20
agent.save()

model response was cut off because the maximum token limit was reached - Increase the LLM's persistent token cap:

llm = aix.Model.get("openai/gpt-5.4")
llm.inputs.max_tokens = 100_000
agent.llm = llm
agent.save()

Agent response is cropped - The agent's own max_tokens (default 2048) caps the final output, independent of the LLM cap. Raise it:

agent.max_tokens = 20_000
agent.save()

Setup​

Quick start​

How it works​

Agent states​

Tools​

Overriding tool parameters​

Skills​

Fetch and search existing skills​

Attach a skill to an agent​

Manage the skill bundle​

LLM configuration​

Output format​

Multimodal attachments​

Images​

Audio​

Audio as the prompt​

Multiple attachments​

Runtime parameters​

Understanding max_tokens​

Variable substitution​

Progress streaming​

Session management​

Budget and governance​

Persisted vs. run-time budget​

Round-trip​

Async calling​

Batch async​

Tracing and monitoring​

Inspecting steps​

Execution metrics​

Save and update​

Troubleshooting​

Setup

Quick start

How it works

Agent states

Tools

Overriding tool parameters

Skills

Fetch and search existing skills

Attach a skill to an agent

Manage the skill bundle

LLM configuration

Output format

Multimodal attachments

Images

Audio

Audio as the prompt

Multiple attachments

Runtime parameters

Understanding `max_tokens`

Variable substitution

Progress streaming

Session management

Budget and governance

Persisted vs. run-time budget

Round-trip

Async calling

Batch async

Tracing and monitoring

Inspecting steps

Execution metrics

Save and update

Troubleshooting