Skip to main content
Version: v2.0

API Requests

This guide shows how to call aiXplain's production REST API directly—without the SDK—for Models and Agents. Use it when you're integrating from a language we don't have an SDK for, or when you want full control over the raw HTTP calls.

tip

Prefer a standardized interface? aiXplain assets are also reachable over the Model Context Protocol—see MCP Servers for another way to access models and agents from MCP-compatible clients.

Authentication

Every request requires your API key in the x-api-key header. Requests with a body also need Content-Type: application/json.

x-api-key: YOUR_API_KEY
Content-Type: application/json
note

Your API key is the workspace (team) API key from studio.aixplain.com. Keep it server-side—never expose it in client-side code.

Two different auth header schemes

Most endpoints (model/agent execution, polling, discovery) authenticate with x-api-key: YOUR_API_KEY. The file-upload endpoints (/sdk/file/upload/temp-url, /sdk/file/upload-url) instead expect Authorization: token YOUR_API_KEY. It's the same key, but the header name and format differ—an easy thing to trip on. See Upload a file via REST.

Requests are subject to per-workspace request, token, and credit limits. See Rate Limiting for how limits are enforced and configured.


How execution works

aiXplain endpoints respond in one of two ways:

  • Synchronous — fast models (most LLMs, cloud TTS/ASR) return the result directly in the POST response with "completed": true.
  • Asynchronous — longer-running models (video generation, large speech jobs) and all agents return a polling URL in data instead of the result. You then GET that URL until the job finishes.

You don't choose the mode—the model decides. Always inspect the response: if completed is true, the result is already in data; if data is a URL, poll it.

An asynchronous POST returns a status envelope like this:

{
"status": "IN_PROGRESS",
"completed": false,
"data": "https://models.aixplain.com/api/v1/data/<REQUEST_ID>"
}
FieldMeaning
statusIN_PROGRESS, SUCCESS, or FAILED.
completedfalse while running, true once finished.
dataWhile running: the URL to poll. Once finished: the result payload.
tip

Always poll the exact URL returned in the data field rather than constructing the path yourself. The polling host and version (e.g. /api/v1/data/ vs /api/v2/data/) can differ by service, so trusting the returned URL keeps your client correct.

A typical async client loops: POST → read data URL → GET it every few seconds → stop when completed is true (or status is SUCCESS/FAILED).


Models API

Base URL: https://models.aixplain.com

All models—LLMs, speech, vision, video—are executed through the same endpoint:

POST https://models.aixplain.com/api/v2/execute/{model_id}

What you put in the body, and what you get back, depends on the model's modality. The sections below cover each.

Run a model (LLM)

Request:

POST https://models.aixplain.com/api/v2/execute/669a63646eb56306647e1091
x-api-key: YOUR_API_KEY
Content-Type: application/json

{
"text": "What is 2 + 2?"
}

Response (synchronous):

{
"status": "SUCCESS",
"completed": true,
"data": "4",
"details": [
{"index": 0, "message": {"role": "assistant", "content": "4"}, "finish_reason": "stop"}
],
"runTime": 0.329,
"usedCredits": 3.75e-06,
"usage": {"prompt_tokens": 21, "completion_tokens": 1, "total_tokens": 22},
"asset": {"assetId": "669a63646eb56306647e1091", "id": "openai/gpt-4o-mini/openai"}
}

Output fields

FieldDescription
dataThe model's answer (string), or—for async jobs—the polling URL.
status / completedJob status; see How execution works.
detailsProvider-native payload. For chat LLMs, the raw message object(s) with role, content, finish_reason.
usageToken counts (prompt_tokens, completion_tokens, total_tokens).
usedCreditsCredits charged for this call.
runTimeServer-side execution time in seconds.
assetThe resolved model (assetId and human-readable id).
Get the provider's raw output with includeRawData

includeRawData is a model-type-agnostic options flag—it works the same way for LLMs, speech, vision, and any other model. Add "options": {"includeRawData": true} to the request body to receive the backing provider's full, unmodified response in a rawData field, alongside the normalized data/details. The shape of rawData mirrors whatever the supplier returns, so it varies by model and provider. See the ASR example for a concrete payload (segments, tokens, log-probs).

Generation parameters

Pass model parameters as top-level fields alongside text. The common LLM parameters:

POST https://models.aixplain.com/api/v2/execute/{model_id}
x-api-key: YOUR_API_KEY
Content-Type: application/json

{
"text": "What are the colors of the rainbow?",
"max_tokens": 50,
"temperature": 0.8
}

The exact parameters a model accepts vary by model. To discover them, fetch the model's definition:

GET https://platform-api.aixplain.com/sdk/models/{model_id}
x-api-key: YOUR_API_KEY

The response includes a params array. Each entry tells you the parameter name, whether it's required, its dataType (text, label, number, audio, …), any availableOptions, and defaultValues. This is the source of truth for what a given model accepts—use it before assuming a field exists.

Streaming (LLMs)

Add "stream": true to receive tokens as Server-Sent Events instead of one final response. Each event is a data: line carrying an OpenAI-style chat.completion.chunk; the stream ends with data: [DONE].

Request:

POST https://models.aixplain.com/api/v2/execute/{model_id}
x-api-key: YOUR_API_KEY
Content-Type: application/json

{
"text": "Count to 3.",
"stream": true
}

Response (text/event-stream):

data: {"choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"content":"1"},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"content":", 2"},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"content":", 3."},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: {"choices":[],"usage":{"prompt_tokens":12,"completion_tokens":8,"total_tokens":20}}

data: [DONE]

Read incrementally from choices[0].delta.content. The final non-[DONE] event carries the usage totals.

Chat history

For conversational LLMs, pass an array of role/content messages as text instead of a plain string.

Request:

POST https://models.aixplain.com/api/v2/execute/{model_id}
x-api-key: YOUR_API_KEY
Content-Type: application/json

{
"text": [
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi there! How can I help?"},
{"role": "user", "content": "Tell me a fun fact."}
]
}

Multimodal input (images)

Vision-capable LLMs (e.g. GPT-4o) accept image content using the same text message array. Each message's content becomes an array of typed parts—text parts and image_url parts. The image can be a public URL or an inline base64 data URI.

Request (image URL):

POST https://models.aixplain.com/api/v2/execute/6646261c6eb563165658bbb1
x-api-key: YOUR_API_KEY
Content-Type: application/json

{
"text": [
{
"role": "user",
"content": [
{"type": "text", "text": "What animal is in this image? One word."},
{"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
]
}
],
"max_tokens": 10
}

Request (inline base64 — works even when the image isn't hosted anywhere):

{
"text": [
{
"role": "user",
"content": [
{"type": "text", "text": "What color is this image? One word."},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgoAAAANS..."}}
]
}
],
"max_tokens": 10
}

Response:

{
"status": "SUCCESS",
"completed": true,
"data": "Red.",
"usage": {"prompt_tokens": 271, "completion_tokens": 2, "total_tokens": 273},
"asset": {"assetId": "6646261c6eb563165658bbb1", "id": "openai/gpt-4o/openai"}
}
warning

With an image URL, the upstream provider fetches it server-side—so the URL must be publicly reachable. Hosts that block automated fetchers return a FAILED status with code: "invalid_image_url". When in doubt, send the image as a base64 data URI, which never depends on an external fetch.

Text-to-speech (TTS)

Send the text to synthesize. Cloud voices (AWS, Google, Azure) typically run synchronously and return a downloadable audio URL in data.

Request:

POST https://models.aixplain.com/api/v2/execute/618ba6e4e2e1a9153ca2a3a2
x-api-key: YOUR_API_KEY
Content-Type: application/json

{
"text": "The quick brown fox jumps over the lazy dog."
}

Response:

{
"status": "SUCCESS",
"completed": true,
"data": "https://aixplain-modelserving-data.s3.amazonaws.com/<id>.mp3?...signed...",
"runTime": 0.243,
"usedCredits": 0.00026,
"asset": {"assetId": "618ba6e4e2e1a9153ca2a3a2", "id": "aws/speech-synthesis-english-amy/AWS"}
}

data is a signed, time-limited URL to the generated audio—download it before it expires.

note

Some providers require extra parameters. ElevenLabs voices, for example, need a voice_id; omitting it returns a FAILED status. Check the model's params (see Generation parameters) for required fields like voice_id or language.

Speech-to-text (ASR)

Pass the audio as a URL in source_audio and the spoken language. Cloud ASR models return the transcript synchronously.

Request:

POST https://models.aixplain.com/api/v2/execute/615dd18b6eb56373643b09d1
x-api-key: YOUR_API_KEY
Content-Type: application/json

{
"language": "en",
"source_audio": "https://example.com/audio.mp3"
}

Response:

{
"status": "SUCCESS",
"completed": true,
"data": "the quick brown fox jumps over the lazy dog",
"confidence": 0.955996,
"details": {
"segments": [
{"segment_id": 0, "start_time": 0.1, "end_time": 3.4, "text": "the quick brown fox jumps over the lazy dog", "confidence": 0.955996, "speaker": ""}
]
}
}

The transcript is in data; details.segments carries per-segment timestamps and confidence. Required parameters (language, source_audio) and optional ones (dialect, script, …) vary by model—check its params.

Some ASR models also auto-detect the spoken language regardless of the language you pass.

Adding "options": {"includeRawData": true} (the model-agnostic flag described above) returns the provider's full Whisper payload in rawData—the auto-detected language, audio duration, and per-segment token IDs, avg_logprob, compression_ratio, and no_speech_prob for confidence scoring:

{
"data": "Hello, how are you?",
"rawData": {
"task": "transcribe",
"language": "English",
"duration": 4.56,
"segments": [
{"id": 0, "start": 0.0, "end": 4.56, "text": " Hello, how are you?",
"tokens": [50365, 2425, 11, 577, 366, 291, 30, 50593],
"temperature": 0, "avg_logprob": -0.191, "compression_ratio": 1.11, "no_speech_prob": 0.004}
]
}
}

Video generation

Long-running generative models (e.g. ByteDance Seedance) run asynchronously: the POST returns a polling URL, and the finished data is a URL to the generated video.

Request:

POST https://models.aixplain.com/api/v2/execute/695ea397253de54a56dc5aa1
x-api-key: YOUR_API_KEY
Content-Type: application/json

{
"text": "A red panda surfing a wave at sunset, cinematic.",
"resolution": "1080p",
"ratio": "16:9",
"duration": 5
}

For this model text is the prompt; resolution (480p/720p/1080p), ratio (16:9, 9:16, 1:1, 4:3, 3:4, 21:9), and duration (seconds) are optional. As always, the model's params endpoint is the authoritative list. Poll the returned URL until completed is true, then read the video URL from data.

Passing files and URLs

aiXplain models reference media by URL, not file upload, on the execute endpoint:

  • Pass a URL as-is. Put the URL string directly in the relevant field (source_audio for ASR, image_url.url for vision, or text for a document URL). The platform/provider fetches it server-side, so it must be publicly reachable (or a signed URL). Unreachable or blocked hosts fail with err.invalid_input_data_or_input_url (HTTP 492) or a supplier error such as 502 Bad Gateway.
  • Inline content with a data URI. To send bytes you don't host anywhere, base64-encode them into a data: URI (e.g. data:image/png;base64,...). This is the most reliable option because it needs no external fetch—verified above for image input.
  • Local files. There's no multipart upload on the execute endpoint. Upload the file to reachable storage first and pass the resulting URL. The Python SDK does this for you (FileUploader), but you can do it with plain REST—see below.

Upload a file via REST

The execute endpoint takes URLs, not file bytes. To send a local file, first push it to aiXplain's temporary storage with a presigned S3 upload, then pass the returned downloadUrl to the model.

Step 1 — request a presigned upload URL. Note the auth header here is Authorization: token …, not x-api-key.

POST https://platform-api.aixplain.com/sdk/file/upload/temp-url
Authorization: token YOUR_API_KEY
Content-Type: application/x-www-form-urlencoded

contentType=audio/mpeg&originalName=audio.mp3
{
"key": "1/sdk/1780273955103-audio.mp3",
"uploadUrl": "https://s3.amazonaws.com/aixplain-platform-backend-temp/...&Signature=...",
"downloadUrl": "https://s3.amazonaws.com/aixplain-platform-backend-temp/...&Signature=..."
}

Step 2 — PUT the bytes to uploadUrl. The Content-Type must match the contentType you declared in step 1.

curl -X PUT 'PASTE_uploadUrl_HERE' \
-H 'Content-Type: audio/mpeg' \
--data-binary @audio.mp3
# → HTTP 200

Step 3 — pass downloadUrl to the model in the relevant field (source_audio, image_url.url, text, …). It's a signed, publicly reachable URL.

For a permanent (non-expiring) asset instead of temp storage, use POST https://platform-api.aixplain.com/sdk/file/upload-url with contentType, originalName, tags, and license in the body; the upload and reference steps are the same.

Upload size limits (enforced per file type):

File typeLimit
Audio50 MB
Image25 MB
Documents (application/*)25 MB
Video300 MB
Database (.db, .sqlite, .sqlite3)300 MB
Other50 MB

These are aiXplain's upload limits; an individual model may impose tighter format or duration limits—check its page in Studio.

Poll for the result (async models)

When a POST returns data as a URL, poll it until the job finishes.

Poll request:

GET https://models.aixplain.com/api/v1/data/{request_id}
x-api-key: YOUR_API_KEY

While the job is still running, model polls return a minimal payload:

{"completed": false}
note

Async model polling is intentionally sparse—expect little or no progress metadata while completed is false. Keep polling the URL from the start response until completed becomes true, at which point the result is returned in data.


Agents API

Base URL: https://platform-api.aixplain.com

Agent runs are always asynchronous: the POST returns a polling URL, and you GET the result URL until the run finishes.

warning

Send query as a top-level field. Wrapping it in an envelope such as {"data": {"query": "..."}} is rejected with 400 query should not be empty.

Run an agent

Request:

POST https://platform-api.aixplain.com/v2/agents/{agent_id}/run
x-api-key: YOUR_API_KEY
Content-Type: application/json

{
"query": "What is 5 + 5?"
}

Start response:

{
"requestId": "9c018efa-32bc-41df-86c8-787664f9572e",
"data": "https://platform-api.aixplain.com/sdk/agents/9c018efa-32bc-41df-86c8-787664f9572e/result"
}

Poll the URL returned in data for the result.

Run-time parameters

Pass these as top-level fields on the run request to control execution:

ParameterPurpose
maxTokensCaps tokens generated in the agent's response.
maxIterationsCaps the agent's reasoning/tool-use loop. Raise it for tool-heavy agents that hit "max iterations reached".
outputFormatDesired output format, e.g. text, markdown, or json.
POST https://platform-api.aixplain.com/v2/agents/{agent_id}/run
x-api-key: YOUR_API_KEY
Content-Type: application/json

{
"query": "Summarize the latest AI news.",
"maxTokens": 1024,
"maxIterations": 15,
"outputFormat": "markdown"
}

Poll for the result

Request:

GET https://platform-api.aixplain.com/sdk/agents/{request_id}/result
x-api-key: YOUR_API_KEY

Completed response:

{
"completed": true,
"status": "SUCCESS",
"data": {
"input": "What is 5 + 5?",
"output": "10",
"session_id": "6a0fc1f51ea81106dbb8e35d_20260531235330",
"intermediate_steps": [],
"plan": [],
"executionStats": {},
"runTime": 1.42,
"usedCredits": 0.0001
}
}

The agent's answer is in data.output. A reusable conversation handle is in data.session_id (see below). The payload also includes intermediate_steps, plan, and executionStats describing the agent's reasoning and tool calls. Keep polling while status is IN_PROGRESS; stop on SUCCESS or FAILED.

Multi-turn conversations

There are two independent ways to give an agent prior context. Pick one per request.

Omit sessionId on the first run, read data.session_id from the result, then pass it back on later runs. The platform stores the conversation for you.

First run (no sessionId):

POST https://platform-api.aixplain.com/v2/agents/{agent_id}/run
x-api-key: YOUR_API_KEY
Content-Type: application/json

{
"query": "My name is Sam."
}

Follow-up run, reusing the session_id returned by the first run:

POST https://platform-api.aixplain.com/v2/agents/{agent_id}/run
x-api-key: YOUR_API_KEY
Content-Type: application/json

{
"query": "What is my name?",
"sessionId": "SESSION_ID_FROM_PREVIOUS_RESPONSE"
}
note

The request field is sessionId (camelCase); the value comes from data.session_id (snake_case) in the previous run's result.

Option B — client-managed history

Send the prior turns explicitly as a history array. Use this when you keep the conversation state on your side rather than relying on a session.

Request:

POST https://platform-api.aixplain.com/v2/agents/{agent_id}/run
x-api-key: YOUR_API_KEY
Content-Type: application/json

{
"query": "Continue from where we left off",
"history": [
{"role": "user", "content": "Help me plan a project"},
{"role": "assistant", "content": "Sure! What kind of project?"}
]
}

Errors

Failed requests return a non-2xx HTTP status with a message describing the problem.

StatusTypical causeExample body
400Malformed body or missing required field (e.g. an empty or wrongly-nested query).{"message":["query should not be empty"],"error":"Bad Request","statusCode":400}
401Missing or invalid x-api-key.{"error":"Invalid api key. Please generate a new api key and try again."}
492Input URL could not be fetched (unreachable or blocked host).{"error":"err.invalid_input_data_or_input_url"}

A model can also return HTTP 201 with "status": "FAILED" when the upstream provider rejects the input—for example an unfetchable image URL (code: "invalid_image_url") or a missing required parameter. For asynchronous runs, a job that starts successfully can still finish FAILED. Always check the final status, not just the HTTP code of the initial POST.