Version: v2.0

Production

This guide covers how to operate aiXplain Agents in production after initial build and validation.

Choose your deployment mode

Use Serverless when you want managed infrastructure and fast scaling.
Use Private when you need strict network isolation, air-gapped operation, or full infrastructure control.

Production readiness checklist

Before exposing an agent to live traffic:

Confirm the agent behavior in aiXplain Studio validation traces.
Validate outputs on representative and adversarial inputs.
Configure Inspectors for safety, quality, and compliance checks.
Set API access and quotas via API keys.
Verify workspace permissions in Workspaces.
Confirm cost expectations in Credits and billing.
Decide memory posture per agent or session (enabled or disabled) based on privacy and retention needs.

Reliability patterns

For stable runtime behavior under load and failure scenarios:

Keep retries and fallback strategy enabled for model and tool failures.
Configure a clear primary/secondary fallback chain for critical model or tool dependencies.
Use deterministic task structure where strict execution order is required.
Set clear termination criteria to avoid runaway loops.
Guard external dependencies with timeout-aware tools and graceful fallback behavior.
Test degraded scenarios (tool unavailable, model timeout, malformed tool response).

Configuration snippets

Use these SDK snippets to configure your agent and per-run runtime params.

Configure your agent (Python SDK)

from aixplain import Aixplain

aix = Aixplain(api_key="YOUR_API_KEY")

# Pull any tool assets you want the agent to call at runtime
search_tool = aix.Tool.get("tavily/tavily-web-search/tavily")

# Configure the agent (including model config)
agent = aix.Agent(
    name="Production support agent",
    description="Resolves operational issues with safe, auditable actions.",
    instructions=(
        "Investigate incidents, call tools when needed, and return a concise "
        "triage summary with recommended actions."
    ),
    llm="YOUR_MODEL_ID",  # model is configured directly on the agent
    tools=[search_tool],
    output_format="json",
    expected_output={
        "summary": "string",
        "risk": "low|medium|high",
        "recommendedActions": ["string"],
    },
    max_iterations=8,
    max_tokens=1600,
    inspector_targets=["input", "steps", "output"],
    max_inspectors=2,
)
agent.save()

Configure runtime params per run (Python SDK)

from aixplain import Aixplain

aix = Aixplain(api_key="YOUR_API_KEY")
agent = aix.Agent.get("YOUR_AGENT_ID")

response = agent.run(
    query="Triage this issue and propose next actions.",
    variables={
        "tenant": "acme-corp",
        "priority": "high",
        "region": "us-east-1",
    },
    history=[
        {"role": "user", "content": "Checkout error rate spiked after 09:10 UTC."}
    ],
    executionParams={
        "outputFormat": "json",
        "expectedOutput": {
            "summary": "string",
            "rootCauseHypothesis": "string",
            "recommendedActions": ["string"],
        },
        "maxTokens": 1200,
        "maxIterations": 6,
        "maxTime": 180,
    },
    runResponseGeneration=True,
    progress_format="logs",   # "status" or "logs"
    progress_verbosity=2,     # 1=minimal, 2=thoughts, 3=full I/O
    progress_truncate=True,
)

print(response)

Configure runtime params per run (REST)

curl -X POST "https://platform-api.aixplain.com/v2/agents/YOUR_AGENT_ID/run" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Triage this issue and propose next actions.",
    "variables": {
      "tenant": "acme-corp",
      "priority": "high",
      "region": "us-east-1"
    },
    "executionParams": {
      "outputFormat": "json",
      "maxTokens": 1200,
      "maxIterations": 6,
      "maxTime": 180
    }
  }'

Governance and access control

Treat governance as runtime behavior, not a one-time setup:

Enforce policies with Inspectors at input, step, and output stages where needed.
Scope API keys to the minimum required permissions.
Use model-scoped rate limits and workspace roles to prevent uncontrolled usage.
Agent-scoped rate limiting is coming soon.
Separate build/test/prod workspaces when teams need stronger operational isolation.

Observability and diagnostics

Use both platform views and application-side telemetry:

In Studio, use validation traces and analytics for step-level behavior, latency, and cost.
In your app, persist requestId from every run for correlation and incident analysis.
Poll run results and store status/metadata in your logging stack for long-term reporting.
Build service-level dashboards from your own logs for SLO tracking (latency, error rate, cost per run).

Note on analytics APIs:

Current API request docs describe run and poll endpoints for request-level results.
A dedicated aggregated analytics endpoint is not documented in API requests.

Choose your deployment mode​

Production readiness checklist​

Reliability patterns​

Configuration snippets​

Configure your agent (Python SDK)​

Configure runtime params per run (Python SDK)​

Configure runtime params per run (REST)​

Governance and access control​

Observability and diagnostics​