API Requests
This guide shows how to call aiXplain's production REST API directly—without the SDK—for Models and Agents. Use it when you're integrating from a language we don't have an SDK for, or when you want full control over the raw HTTP calls.
Prefer a standardized interface? aiXplain assets are also reachable over the Model Context Protocol—see MCP Servers for another way to access models and agents from MCP-compatible clients.
Authentication
Every request requires your API key in the x-api-key header. Requests with a body also need Content-Type: application/json.
x-api-key: YOUR_API_KEY
Content-Type: application/json
Your API key is the workspace (team) API key from studio.aixplain.com. Keep it server-side—never expose it in client-side code.
Most endpoints (model/agent execution, polling, discovery) authenticate with x-api-key: YOUR_API_KEY. The file-upload endpoints (/sdk/file/upload/temp-url, /sdk/file/upload-url) instead expect Authorization: token YOUR_API_KEY. It's the same key, but the header name and format differ—an easy thing to trip on. See Upload a file via REST.
Requests are subject to per-workspace request, token, and credit limits. See Rate Limiting for how limits are enforced and configured.
How execution works
aiXplain endpoints respond in one of two ways:
- Synchronous — fast models (most LLMs, cloud TTS/ASR) return the result directly in the
POSTresponse with"completed": true. - Asynchronous — longer-running models (video generation, large speech jobs) and all agents return a polling URL in
datainstead of the result. You thenGETthat URL until the job finishes.
You don't choose the mode—the model decides. Always inspect the response: if completed is true, the result is already in data; if data is a URL, poll it.
An asynchronous POST returns a status envelope like this:
{
"status": "IN_PROGRESS",
"completed": false,
"data": "https://models.aixplain.com/api/v1/data/<REQUEST_ID>"
}
| Field | Meaning |
|---|---|
status | IN_PROGRESS, SUCCESS, or FAILED. |
completed | false while running, true once finished. |
data | While running: the URL to poll. Once finished: the result payload. |
Always poll the exact URL returned in the data field rather than constructing the path yourself. The polling host and version (e.g. /api/v1/data/ vs /api/v2/data/) can differ by service, so trusting the returned URL keeps your client correct.
A typical async client loops: POST → read data URL → GET it every few seconds → stop when completed is true (or status is SUCCESS/FAILED).
Models API
Base URL: https://models.aixplain.com
All models—LLMs, speech, vision, video—are executed through the same endpoint:
POST https://models.aixplain.com/api/v2/execute/{model_id}
What you put in the body, and what you get back, depends on the model's modality. The sections below cover each.
Run a model (LLM)
Request:
POST https://models.aixplain.com/api/v2/execute/669a63646eb56306647e1091
x-api-key: YOUR_API_KEY
Content-Type: application/json
{
"text": "What is 2 + 2?"
}
Response (synchronous):
{
"status": "SUCCESS",
"completed": true,
"data": "4",
"details": [
{"index": 0, "message": {"role": "assistant", "content": "4"}, "finish_reason": "stop"}
],
"runTime": 0.329,
"usedCredits": 3.75e-06,
"usage": {"prompt_tokens": 21, "completion_tokens": 1, "total_tokens": 22},
"asset": {"assetId": "669a63646eb56306647e1091", "id": "openai/gpt-4o-mini/openai"}
}
Output fields
| Field | Description |
|---|---|
data | The model's answer (string), or—for async jobs—the polling URL. |
status / completed | Job status; see How execution works. |
details | Provider-native payload. For chat LLMs, the raw message object(s) with role, content, finish_reason. |
usage | Token counts (prompt_tokens, completion_tokens, total_tokens). |
usedCredits | Credits charged for this call. |
runTime | Server-side execution time in seconds. |
asset | The resolved model (assetId and human-readable id). |
includeRawDataincludeRawData is a model-type-agnostic options flag—it works the same way for LLMs, speech, vision, and any other model. Add "options": {"includeRawData": true} to the request body to receive the backing provider's full, unmodified response in a rawData field, alongside the normalized data/details. The shape of rawData mirrors whatever the supplier returns, so it varies by model and provider. See the ASR example for a concrete payload (segments, tokens, log-probs).
Generation parameters
Pass model parameters as top-level fields alongside text. The common LLM parameters:
POST https://models.aixplain.com/api/v2/execute/{model_id}
x-api-key: YOUR_API_KEY
Content-Type: application/json
{
"text": "What are the colors of the rainbow?",
"max_tokens": 50,
"temperature": 0.8
}
The exact parameters a model accepts vary by model. To discover them, fetch the model's definition:
GET https://platform-api.aixplain.com/sdk/models/{model_id}
x-api-key: YOUR_API_KEY
The response includes a params array. Each entry tells you the parameter name, whether it's required, its dataType (text, label, number, audio, …), any availableOptions, and defaultValues. This is the source of truth for what a given model accepts—use it before assuming a field exists.
Streaming (LLMs)
Add "stream": true to receive tokens as Server-Sent Events instead of one final response. Each event is a data: line carrying an OpenAI-style chat.completion.chunk; the stream ends with data: [DONE].
Request:
POST https://models.aixplain.com/api/v2/execute/{model_id}
x-api-key: YOUR_API_KEY
Content-Type: application/json
{
"text": "Count to 3.",
"stream": true
}
Response (text/event-stream):
data: {"choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"content":"1"},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"content":", 2"},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"content":", 3."},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: {"choices":[],"usage":{"prompt_tokens":12,"completion_tokens":8,"total_tokens":20}}
data: [DONE]
Read incrementally from choices[0].delta.content. The final non-[DONE] event carries the usage totals.
Chat history
For conversational LLMs, pass an array of role/content messages as text instead of a plain string.
Request:
POST https://models.aixplain.com/api/v2/execute/{model_id}
x-api-key: YOUR_API_KEY
Content-Type: application/json
{
"text": [
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi there! How can I help?"},
{"role": "user", "content": "Tell me a fun fact."}
]
}
Multimodal input (images)
Vision-capable LLMs (e.g. GPT-4o) accept image content using the same text message array. Each message's content becomes an array of typed parts—text parts and image_url parts. The image can be a public URL or an inline base64 data URI.
Request (image URL):
POST https://models.aixplain.com/api/v2/execute/6646261c6eb563165658bbb1
x-api-key: YOUR_API_KEY
Content-Type: application/json
{
"text": [
{
"role": "user",
"content": [
{"type": "text", "text": "What animal is in this image? One word."},
{"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
]
}
],
"max_tokens": 10
}
Request (inline base64 — works even when the image isn't hosted anywhere):
{
"text": [
{
"role": "user",
"content": [
{"type": "text", "text": "What color is this image? One word."},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgoAAAANS..."}}
]
}
],
"max_tokens": 10
}
Response:
{
"status": "SUCCESS",
"completed": true,
"data": "Red.",
"usage": {"prompt_tokens": 271, "completion_tokens": 2, "total_tokens": 273},
"asset": {"assetId": "6646261c6eb563165658bbb1", "id": "openai/gpt-4o/openai"}
}
With an image URL, the upstream provider fetches it server-side—so the URL must be publicly reachable. Hosts that block automated fetchers return a FAILED status with code: "invalid_image_url". When in doubt, send the image as a base64 data URI, which never depends on an external fetch.
Text-to-speech (TTS)
Send the text to synthesize. Cloud voices (AWS, Google, Azure) typically run synchronously and return a downloadable audio URL in data.
Request:
POST https://models.aixplain.com/api/v2/execute/618ba6e4e2e1a9153ca2a3a2
x-api-key: YOUR_API_KEY
Content-Type: application/json
{
"text": "The quick brown fox jumps over the lazy dog."
}
Response:
{
"status": "SUCCESS",
"completed": true,
"data": "https://aixplain-modelserving-data.s3.amazonaws.com/<id>.mp3?...signed...",
"runTime": 0.243,
"usedCredits": 0.00026,
"asset": {"assetId": "618ba6e4e2e1a9153ca2a3a2", "id": "aws/speech-synthesis-english-amy/AWS"}
}
data is a signed, time-limited URL to the generated audio—download it before it expires.
Some providers require extra parameters. ElevenLabs voices, for example, need a voice_id; omitting it returns a FAILED status. Check the model's params (see Generation parameters) for required fields like voice_id or language.
Speech-to-text (ASR)
Pass the audio as a URL in source_audio and the spoken language. Cloud ASR models return the transcript synchronously.
Request:
POST https://models.aixplain.com/api/v2/execute/615dd18b6eb56373643b09d1
x-api-key: YOUR_API_KEY
Content-Type: application/json
{
"language": "en",
"source_audio": "https://example.com/audio.mp3"
}
Response:
{
"status": "SUCCESS",
"completed": true,
"data": "the quick brown fox jumps over the lazy dog",
"confidence": 0.955996,
"details": {
"segments": [
{"segment_id": 0, "start_time": 0.1, "end_time": 3.4, "text": "the quick brown fox jumps over the lazy dog", "confidence": 0.955996, "speaker": ""}
]
}
}
The transcript is in data; details.segments carries per-segment timestamps and confidence. Required parameters (language, source_audio) and optional ones (dialect, script, …) vary by model—check its params.
Some ASR models also auto-detect the spoken language regardless of the language you pass.
Adding "options": {"includeRawData": true} (the model-agnostic flag described above) returns the provider's full Whisper payload in rawData—the auto-detected language, audio duration, and per-segment token IDs, avg_logprob, compression_ratio, and no_speech_prob for confidence scoring:
{
"data": "Hello, how are you?",
"rawData": {
"task": "transcribe",
"language": "English",
"duration": 4.56,
"segments": [
{"id": 0, "start": 0.0, "end": 4.56, "text": " Hello, how are you?",
"tokens": [50365, 2425, 11, 577, 366, 291, 30, 50593],
"temperature": 0, "avg_logprob": -0.191, "compression_ratio": 1.11, "no_speech_prob": 0.004}
]
}
}
Video generation
Long-running generative models (e.g. ByteDance Seedance) run asynchronously: the POST returns a polling URL, and the finished data is a URL to the generated video.
Request:
POST https://models.aixplain.com/api/v2/execute/695ea397253de54a56dc5aa1
x-api-key: YOUR_API_KEY
Content-Type: application/json
{
"text": "A red panda surfing a wave at sunset, cinematic.",
"resolution": "1080p",
"ratio": "16:9",
"duration": 5
}
For this model text is the prompt; resolution (480p/720p/1080p), ratio (16:9, 9:16, 1:1, 4:3, 3:4, 21:9), and duration (seconds) are optional. As always, the model's params endpoint is the authoritative list. Poll the returned URL until completed is true, then read the video URL from data.
Passing files and URLs
aiXplain models reference media by URL, not file upload, on the execute endpoint:
- Pass a URL as-is. Put the URL string directly in the relevant field (
source_audiofor ASR,image_url.urlfor vision, ortextfor a document URL). The platform/provider fetches it server-side, so it must be publicly reachable (or a signed URL). Unreachable or blocked hosts fail witherr.invalid_input_data_or_input_url(HTTP492) or a supplier error such as502 Bad Gateway. - Inline content with a data URI. To send bytes you don't host anywhere, base64-encode them into a
data:URI (e.g.data:image/png;base64,...). This is the most reliable option because it needs no external fetch—verified above for image input. - Local files. There's no multipart upload on the execute endpoint. Upload the file to reachable storage first and pass the resulting URL. The Python SDK does this for you (
FileUploader), but you can do it with plain REST—see below.
Upload a file via REST
The execute endpoint takes URLs, not file bytes. To send a local file, first push it to aiXplain's temporary storage with a presigned S3 upload, then pass the returned downloadUrl to the model.
Step 1 — request a presigned upload URL. Note the auth header here is Authorization: token …, not x-api-key.
POST https://platform-api.aixplain.com/sdk/file/upload/temp-url
Authorization: token YOUR_API_KEY
Content-Type: application/x-www-form-urlencoded
contentType=audio/mpeg&originalName=audio.mp3
{
"key": "1/sdk/1780273955103-audio.mp3",
"uploadUrl": "https://s3.amazonaws.com/aixplain-platform-backend-temp/...&Signature=...",
"downloadUrl": "https://s3.amazonaws.com/aixplain-platform-backend-temp/...&Signature=..."
}
Step 2 — PUT the bytes to uploadUrl. The Content-Type must match the contentType you declared in step 1.
curl -X PUT 'PASTE_uploadUrl_HERE' \
-H 'Content-Type: audio/mpeg' \
--data-binary @audio.mp3
# → HTTP 200
Step 3 — pass downloadUrl to the model in the relevant field (source_audio, image_url.url, text, …). It's a signed, publicly reachable URL.
For a permanent (non-expiring) asset instead of temp storage, use POST https://platform-api.aixplain.com/sdk/file/upload-url with contentType, originalName, tags, and license in the body; the upload and reference steps are the same.
Upload size limits (enforced per file type):
| File type | Limit |
|---|---|
| Audio | 50 MB |
| Image | 25 MB |
Documents (application/*) | 25 MB |
| Video | 300 MB |
Database (.db, .sqlite, .sqlite3) | 300 MB |
| Other | 50 MB |
These are aiXplain's upload limits; an individual model may impose tighter format or duration limits—check its page in Studio.
Poll for the result (async models)
When a POST returns data as a URL, poll it until the job finishes.
Poll request:
GET https://models.aixplain.com/api/v1/data/{request_id}
x-api-key: YOUR_API_KEY
While the job is still running, model polls return a minimal payload:
{"completed": false}
Async model polling is intentionally sparse—expect little or no progress metadata while completed is false. Keep polling the URL from the start response until completed becomes true, at which point the result is returned in data.