Production
This guide covers how to operate aiXplain Agents in production after initial build and validation.
Choose your deployment mode
- Use Serverless when you want managed infrastructure and fast scaling.
- Use Private when you need strict network isolation, air-gapped operation, or full infrastructure control.
Production readiness checklist
Before exposing an agent to live traffic:
- Confirm the agent behavior in aiXplain Studio validation traces.
- Validate outputs on representative and adversarial inputs.
- Configure Inspectors for safety, quality, and compliance checks.
- Set API access and quotas via API keys.
- Verify workspace permissions in Workspaces.
- Confirm cost expectations in Credits and billing.
- Decide memory posture per agent or session (enabled or disabled) based on privacy and retention needs.
Reliability patterns
For stable runtime behavior under load and failure scenarios:
- Keep retries and fallback strategy enabled for model and tool failures.
- Configure a clear primary/secondary fallback chain for critical model or tool dependencies.
- Use deterministic task structure where strict execution order is required.
- Set clear termination criteria to avoid runaway loops.
- Guard external dependencies with timeout-aware tools and graceful fallback behavior.
- Test degraded scenarios (tool unavailable, model timeout, malformed tool response).
Configuration snippets
Use these SDK snippets to configure your agent and per-run runtime params.
Configure your agent (Python SDK)
from aixplain import Aixplain
aix = Aixplain(api_key="YOUR_API_KEY")
# Pull any tool assets you want the agent to call at runtime
search_tool = aix.Tool.get("tavily/tavily-web-search/tavily")
# Configure the agent (including model config)
agent = aix.Agent(
name="Production support agent",
description="Resolves operational issues with safe, auditable actions.",
instructions=(
"Investigate incidents, call tools when needed, and return a concise "
"triage summary with recommended actions."
),
llm="YOUR_MODEL_ID", # model is configured directly on the agent
tools=[search_tool],
output_format="json",
expected_output={
"summary": "string",
"risk": "low|medium|high",
"recommendedActions": ["string"],
},
max_iterations=8,
max_tokens=1600,
inspector_targets=["input", "steps", "output"],
max_inspectors=2,
)
agent.save()
Configure runtime params per run (Python SDK)
from aixplain import Aixplain
aix = Aixplain(api_key="YOUR_API_KEY")
agent = aix.Agent.get("YOUR_AGENT_ID")
response = agent.run(
query="Triage this issue and propose next actions.",
variables={
"tenant": "acme-corp",
"priority": "high",
"region": "us-east-1",
},
history=[
{"role": "user", "content": "Checkout error rate spiked after 09:10 UTC."}
],
executionParams={
"outputFormat": "json",
"expectedOutput": {
"summary": "string",
"rootCauseHypothesis": "string",
"recommendedActions": ["string"],
},
"maxTokens": 1200,
"maxIterations": 6,
"maxTime": 180,
},
runResponseGeneration=True,
progress_format="logs", # "status" or "logs"
progress_verbosity=2, # 1=minimal, 2=thoughts, 3=full I/O
progress_truncate=True,
)
print(response)
Configure runtime params per run (REST)
curl -X POST "https://platform-api.aixplain.com/v2/agents/YOUR_AGENT_ID/run" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "Triage this issue and propose next actions.",
"variables": {
"tenant": "acme-corp",
"priority": "high",
"region": "us-east-1"
},
"executionParams": {
"outputFormat": "json",
"maxTokens": 1200,
"maxIterations": 6,
"maxTime": 180
}
}'
Governance and access control
Treat governance as runtime behavior, not a one-time setup:
- Enforce policies with Inspectors at input, step, and output stages where needed.
- Scope API keys to the minimum required permissions.
- Use model-scoped rate limits and workspace roles to prevent uncontrolled usage.
- Agent-scoped rate limiting is coming soon.
- Separate build/test/prod workspaces when teams need stronger operational isolation.
Observability and diagnostics
Use both platform views and application-side telemetry:
- In Studio, use validation traces and analytics for step-level behavior, latency, and cost.
- In your app, persist
requestIdfrom every run for correlation and incident analysis. - Poll run results and store status/metadata in your logging stack for long-term reporting.
- Build service-level dashboards from your own logs for SLO tracking (latency, error rate, cost per run).
Note on analytics APIs:
- Current API request docs describe run and poll endpoints for request-level results.
- A dedicated aggregated analytics endpoint is not documented in API requests.
Next steps
- Configure mode-specific details in Serverless or Private.
- Integrate run/poll flows from API requests.
- Establish dashboard and alerting baselines in your app telemetry and Studio analytics.