Skip to main content
Version: 2.0

Production

This guide covers how to operate aiXplain Agents in production after initial build and validation.

Choose your deployment mode

  • Use Serverless when you want managed infrastructure and fast scaling.
  • Use Private when you need strict network isolation, air-gapped operation, or full infrastructure control.

Production readiness checklist

Before exposing an agent to live traffic:

  • Confirm the agent behavior in aiXplain Studio validation traces.
  • Validate outputs on representative and adversarial inputs.
  • Configure Inspectors for safety, quality, and compliance checks.
  • Set API access and quotas via API keys.
  • Verify workspace permissions in Workspaces.
  • Confirm cost expectations in Credits and billing.
  • Decide memory posture per agent or session (enabled or disabled) based on privacy and retention needs.

Reliability patterns

For stable runtime behavior under load and failure scenarios:

  • Keep retries and fallback strategy enabled for model and tool failures.
  • Configure a clear primary/secondary fallback chain for critical model or tool dependencies.
  • Use deterministic task structure where strict execution order is required.
  • Set clear termination criteria to avoid runaway loops.
  • Guard external dependencies with timeout-aware tools and graceful fallback behavior.
  • Test degraded scenarios (tool unavailable, model timeout, malformed tool response).

Configuration snippets

Use these SDK snippets to configure your agent and per-run runtime params.

Configure your agent (Python SDK)

from aixplain import Aixplain

aix = Aixplain(api_key="YOUR_API_KEY")

# Pull any tool assets you want the agent to call at runtime
search_tool = aix.Tool.get("tavily/tavily-web-search/tavily")

# Configure the agent (including model config)
agent = aix.Agent(
name="Production support agent",
description="Resolves operational issues with safe, auditable actions.",
instructions=(
"Investigate incidents, call tools when needed, and return a concise "
"triage summary with recommended actions."
),
llm="YOUR_MODEL_ID", # model is configured directly on the agent
tools=[search_tool],
output_format="json",
expected_output={
"summary": "string",
"risk": "low|medium|high",
"recommendedActions": ["string"],
},
max_iterations=8,
max_tokens=1600,
inspector_targets=["input", "steps", "output"],
max_inspectors=2,
)
agent.save()

Configure runtime params per run (Python SDK)

from aixplain import Aixplain

aix = Aixplain(api_key="YOUR_API_KEY")
agent = aix.Agent.get("YOUR_AGENT_ID")

response = agent.run(
query="Triage this issue and propose next actions.",
variables={
"tenant": "acme-corp",
"priority": "high",
"region": "us-east-1",
},
history=[
{"role": "user", "content": "Checkout error rate spiked after 09:10 UTC."}
],
executionParams={
"outputFormat": "json",
"expectedOutput": {
"summary": "string",
"rootCauseHypothesis": "string",
"recommendedActions": ["string"],
},
"maxTokens": 1200,
"maxIterations": 6,
"maxTime": 180,
},
runResponseGeneration=True,
progress_format="logs", # "status" or "logs"
progress_verbosity=2, # 1=minimal, 2=thoughts, 3=full I/O
progress_truncate=True,
)

print(response)

Configure runtime params per run (REST)

curl -X POST "https://platform-api.aixplain.com/v2/agents/YOUR_AGENT_ID/run" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "Triage this issue and propose next actions.",
"variables": {
"tenant": "acme-corp",
"priority": "high",
"region": "us-east-1"
},
"executionParams": {
"outputFormat": "json",
"maxTokens": 1200,
"maxIterations": 6,
"maxTime": 180
}
}'

Governance and access control

Treat governance as runtime behavior, not a one-time setup:

  • Enforce policies with Inspectors at input, step, and output stages where needed.
  • Scope API keys to the minimum required permissions.
  • Use model-scoped rate limits and workspace roles to prevent uncontrolled usage.
  • Agent-scoped rate limiting is coming soon.
  • Separate build/test/prod workspaces when teams need stronger operational isolation.

Observability and diagnostics

Use both platform views and application-side telemetry:

  • In Studio, use validation traces and analytics for step-level behavior, latency, and cost.
  • In your app, persist requestId from every run for correlation and incident analysis.
  • Poll run results and store status/metadata in your logging stack for long-term reporting.
  • Build service-level dashboards from your own logs for SLO tracking (latency, error rate, cost per run).

Note on analytics APIs:

  • Current API request docs describe run and poll endpoints for request-level results.
  • A dedicated aggregated analytics endpoint is not documented in API requests.

Next steps

  1. Configure mode-specific details in Serverless or Private.
  2. Integrate run/poll flows from API requests.
  3. Establish dashboard and alerting baselines in your app telemetry and Studio analytics.