Models
Models are the foundation of intelligence in the aiXplain ecosystem. They encapsulate capabilities such as language understanding, translation, summarization, speech processing, and more. Models can be directly used or integrated as tools into agents.
Key Concepts:
-
Models are callable assets that accept text, audio, or other modalities.
-
They support multiple functions: generation, translation, classification, etc.
-
Models can be run synchronously, asynchronously, or in streaming mode.
-
Streaming responses yield results incrementally while
status=IN_PROGRESS.
Listing Models
To explore all accessible models for your team:
from aixplain.factories import ModelFactory
from aixplain.enums import Function
models = ModelFactory.list(
function=Function.TRANSLATION,
)
for model in model_list:
print(model.id, model.name, model.supplier)
Alternatively, browse models and other assets in the marketplace.
Parameters
| Parameter | Type | Description |
|---|---|---|
function | Function | Filter by model function (e.g., TEXT_GENERATION, TRANSLATION, SUMMARIZATION). |
query | str | Optional keyword to search by name or description. |
suppliers | Supplier or List[Supplier] | Filter by model supplier (e.g., OPENAI, HUGGINGFACE). |
source_languages | Language or List[Language] | Accepted source/input language(s). |
target_languages | Language or List[Language] | Output language(s), applicable to translation models. |
is_finetunable | bool | Whether the model supports fine-tuning. |
ownership | Tuple[OwnershipType, List[OwnershipType]] | Ownership filter: OWNER, SUBSCRIBED, PUBLIC. |
sort_by | SortBy | Attribute to sort by (e.g., NAME, CREATED_AT). |
sort_order | SortOrder | ASCENDING or DESCENDING. |
page_number | int | Page number for paginated results. |
page_size | int | Maximum number of results to return (default: 20). |
model_ids | List[str] | Specific model IDs to retrieve. Cannot be used with other filters. |
api_key | str | Optional API key override for authorization. |
Running a Model
You can run a model by calling run() with the desired input:
model = ModelFactory.get("65c51c556eb563350f6e1bb1")
response = model.run("Latest news about AI agents")
print(response.data.output)
Asynchronous Run
For long-running tasks or workflows where blocking isn't ideal, use the asynchronous execution method. The run_async() call returns immediately, and you can use poll() to track progress.
import time
from aixplain.factories import ModelFactory
model = ModelFactory.get("<MODEL_ID>")
start_response = model.run_async({"text": "<TEXT_TEXT_DATA>"})
# Polling loop: Wait for the completion of the asynchronous request
while True:
result = model.poll(start_response.url)
if result.get("completed"):
print(result)
break
else:
time.sleep(5) # Wait for 5 seconds before checking the result again
Streaming Output for LLMs
Streaming allows you to receive tokens incrementally as the LLM generates them, which is ideal for real-time applications. To enable streaming, use the stream=True flag when calling model.run(). You can also pass other parameters, such as max_tokens.
response = model.run("Explain LLMs", stream=True)
for chunk in response:
print(chunk)
Each chunk is a ModelResponse object with:
-
status: The status of the chunk, either
IN_PROGRESSorCOMPLETED. -
data: The partial output (token or a phrase) generated by the model.
The data field may occasionally be empty. These "empty" chunks act as heartbeats, indicating that the connection is still active and more output is expected.
Custom Model Parameters
You can customise LLM parameters like temperature and max_tokens
llm = ModelFactory.get("669a63646eb56306647e1091") #GPT-4o Mini
original_temperature = llm.model_params.temperature
original_max_tokens = llm.model_params.max_tokens
print(f"Original LLM temperature: {original_temperature}")
print(f"Original LLM max_tokens: {original_max_tokens}")
Set custom parameters:
llm.model_params.temperature = 0.2
llm.model_params.max_tokens = 1024
print(f"Custom LLM temperature: {llm.temperature}")
print(f"Custom LLM max_tokens: {llm.model_params.max_tokens}")
API Requests
For scenarios where you are not using the aiXplain Python SDK, you can interact with models directly via REST API calls. This method is particularly useful for integrating with other programming languages or systems. The API uses a two-step process for asynchronous execution, which is recommended for long-running tasks.
1. Execute a Model (POST Request)
This endpoint initiates a model run, returning a unique request ID that you can use to track the job's status and retrieve its result later.
curl -X POST 'https://models.aixplain.com/api/v1/execute/<model_id>' \
-H 'x-api-key: AIXPLAIN_API_KEY' \
-H 'Content-Type: application/json' \
-d '{"data": "Your input data"}'
-
URL: The URL path includes the specific<model_id>for the model you want to run. -
-H 'x-api-key: AIXPLAIN_API_KEY': This header is required for authentication. ReplaceAIXPLAIN_API_KEYwith your actual API key. -
-H 'Content-Type: application/json': This header specifies that the data you are sending in the body of the request is in JSON format. -
-d '{"data": "Your input data"}': The-d(or--data) option sends the input payload to the model. For models that require a single input, use the"data"key. For models that accept multiple parameters, you should provide them as a JSON object, as detailed below.
Example Response:
{
"completed": false,
"data": "https://models.aixplain.com/api/v1/data/<requestId>",
"requestId": "<requestId>"
}
-
"completed": false: This indicates that the request has been submitted but the model is still processing the task asynchronously. -
"data": This is the URL endpoint you should use for the subsequentGETrequest to poll for the final result. -
"requestId": This is a unique identifier for your execution request.
2. Retrieve the Result (GET Request)
After initiating a model run, you can use this endpoint to poll the status of your request and retrieve the final output once the job is completed.
curl -X GET 'https://models.aixplain.com/api/v1/data/<requestId>' \
-H 'x-api-key: AIXPLAIN_API_KEY' \
-H 'Content-Type: application/json'
URL: This URL uses the<requestId>obtained from thePOSTrequest to check the status of a specific job.
Example Response:
{
"completed": true,
"data": "The output data from the model",
"usedCredits": 0.00006,
"runTime": 1.456,
"details": {
"modelSpecificField1": "Value1",
"modelSpecificField2": "Value2"
// Additional fields as required by the specific model
}
-
"completed": true: This signifies that the model has finished processing and the result is available in the"data"field. -
"data": The final output of the model, which can be text, a URL to a file, or another data type depending on the model's function. -
"usedCredits"and"runTime": These provide metrics on the cost and duration of the execution. -
"details": This optional field contains additional information specific to the model or the execution run.
Common Input Parameters
The following is a list of common parameters that many models, particularly LLMs, accept in the POSTrequest's JSON payload. You should consult the specific model's documentation to see which parameters it supports.
-
"text": "<TEXT_TEXT_DATA>": The primary text input for tasks like summarization, translation, or content generation. -
"prompt": "<PROMPT_TEXT_DATA>": A specific prompt or instruction to guide the model's output. -
"context": "<CONTEXT_TEXT_DATA>": Additional context or a system message to set the tone or persona for the model's response. -
"temperature": "<TEMPERATURE_TEXT_DATA>": A float value (e.g.,0.7) that controls the randomness of the output. Higher values lead to more creative and varied responses. Default is128. -
"max_tokens": "<MAX_TOKENS_TEXT_DATA>": An integer value specifying the maximum number of tokens to generate in the output. -
"top_p": "<TOP_P_TEXT_DATA>": A float value (e.g.,0.9) that controls nucleus sampling, where the model only considers tokens from a probability mass that sum up totop_p. Default is1.0. -
"history": "<HISTORY_TEXT_DATA>": A list of past messages in a conversation, formatted to maintain context for a chat model.