Version: v2.0

Rate Limiting

Rate limiting lets workspace Admins and Owners cap LLM usage — requests, tokens, and credit spend — before a runaway agent or misconfigured app drains your budget. An admin key sets limits on member keys; member keys cannot modify limits or inspect other keys.

How it works

Rate limits are enforced by the Access layer inside AgenticOS on every request that hits a limited model, regardless of which application or agent originated the call.

Request → Access layer → check the key's limits → allow or reject

If a request exceeds a configured limit, it is rejected immediately (not queued) and the API returns a rate-limit error (see Detect rate-limit errors).

Who can configure rate limits

Role	Can create admin keys	Can set rate limits
Owner	Yes	Yes
Admin	Yes	Yes
Member	No	No

What you can limit

Limits are attached to a key and can be set globally (across all models on the key) or per model:

Limit	What it controls
`request_per_minute` / `request_per_day`	API calls allowed per 60-second window / per 24-hour period
`token_per_minute` / `token_per_day`	Tokens (input + output, or output-only via `token_type`) per minute / per day
`budget`	Total credits available to the key

All limits can apply simultaneously — whichever threshold is reached first triggers enforcement.

A limit set on a model applies across every invocation path: the REST API, the SDK, LLMs used as an agent's backbone (including team agents), and LLMs inside multi-agent pipelines. Capping a model covers all agents and applications in the workspace that use it, not just one app.

Setup

pip install aixplain

from aixplain import Aixplain

# Member key — for inference and monitoring your own usage
aix = Aixplain(api_key="YOUR_MEMBER_API_KEY")

# Admin key — for creating and managing limits on other keys
aix_admin = Aixplain(api_key="YOUR_ADMIN_API_KEY")

Create an admin key in Console → Settings → API Keys. Admin keys cannot be used for inference.

Search existing keys

Member keys cannot list other keys — calling search() with a member key raises Forbidden. Admin keys can list all keys in the workspace.

# Admin can list all keys
result = aix_admin.APIKey.search()
for key in result["results"]:
    print(key.name, key.id)

Show output

Use get_by_access_key() when you have the key string itself (not the ID) and need to inspect or update it.

target = aix_admin.APIKey.get_by_access_key("TARGET_MEMBER_API_KEY")

print("id:", target.id)
print("name:", target.name)
print("is_admin:", target.is_admin)
print("global_limits:", target.global_limits.to_dict() if target.global_limits else None)
for limit in target.asset_limits:
    print("asset limit:", limit.to_dict())

Show output

Create a key with rate limits

Create a member key and attach per-model and global limits in one call.

from datetime import datetime
from aixplain.v2.api_key import APIKeyLimits

timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")

new_key = aix_admin.APIKey(
    name=f"member-key-{timestamp}",
    asset_limits=[
        APIKeyLimits(
            model="openai/gpt-5/openai",
            token_per_minute=10,
            token_per_day=30,
            request_per_minute=2,
            request_per_day=2,
        )
    ],
    global_limits=APIKeyLimits(
        token_per_minute=100,
        token_per_day=1000,
        request_per_minute=100,
        request_per_day=1000,
    ),
    budget=1000,
    expires_at=datetime(2030, 1, 1),
).save()

print(new_key.to_dict())

Show output

Parameter	Type	Required	Default	Description
`name`	`str`	✅	—	Label for the key.
`asset_limits`	`list[APIKeyLimits]`	—	`[]`	Per-model limits.
`global_limits`	`APIKeyLimits`	—	`None`	Limits across all models on this key.
`budget`	`int`	—	`None`	Total credits available. Increase the value to top up.
`expires_at`	`datetime`	—	`None`	Expiry date. Omit for a non-expiring key.

note

Global limits and asset-specific limits are both enforced. Global limits do not override per-model limits — whichever is stricter applies first.

Update rate limits

Modify limits on an existing key and call save(). Updated limits take effect at the start of the next timeframe (next minute for per-minute limits, next day for per-day limits).

If you have the member key string (not the ID), use get_by_access_key() to fetch it first:

from aixplain.v2.api_key import APIKeyLimits

key = aix_admin.APIKey.get_by_access_key("TARGET_MEMBER_API_KEY")
key.asset_limits = [
    APIKeyLimits(
        model="669a63646eb56306647e1091",
        request_per_minute=2,
        request_per_day=5,
        token_per_minute=500,
        token_per_day=5000,
    )
]
key.save()

Show output

Or fetch by listing all keys if you have the ID:

from aixplain.v2.api_key import APIKeyLimits, TokenType

# Fetch the key to update
result = aix_admin.APIKey.search()
key = result["results"][0]

# Update budget
key.budget = 1200

# Update global limits
key.global_limits.token_per_day = 50
key.global_limits.token_per_minute = 500

# Replace asset limits
key.asset_limits = [
    APIKeyLimits(
        model="openai/gpt-5/openai",
        token_per_minute=20 * 15000,
        token_per_day=8 * 60 * (20 * 15000),
        request_per_minute=60,
        request_per_day=8 * 60 * 60,
        token_type=TokenType.OUTPUT,
    ),
    APIKeyLimits(
        model="openai/gpt-5.1/openai",
        token_per_minute=60 * 8000,
        token_per_day=3 * 60 * (60 * 8000),
        request_per_minute=200,
        request_per_day=8 * 60 * 200,
        token_type=TokenType.OUTPUT,
    ),
]

key.save()
print(key.to_dict())

Show output

token_type=TokenType.OUTPUT counts only output tokens against the limit. Omit it to count all tokens (input + output).

Monitor usage

get_usage_limits() returns daily consumption and configured limits. Members call it on aix.APIKey to check their own key. Admins call it on any key object returned from search() or get_by_access_key().

# All usage rows for your own key (member)
rows = aix.APIKey.get_usage_limits()
for row in rows:
    print(row)

Show output

A row with model=None is the global scope — None counts and limits mean no global cap is configured on this key. Ignore it unless you explicitly set global limits.

Pass model to filter to one model:

MODEL_ID = "669a63646eb56306647e1091"  # use IDs, not paths

rows = aix.APIKey.get_usage_limits(model=MODEL_ID)
for row in rows:
    if row.model is None:
        print("No model-specific limits configured")
    else:
        print(
            f"Model {row.model}: "
            f"{row.daily_request_count}/{row.daily_request_limit} daily requests, "
            f"{row.daily_token_count}/{row.daily_token_limit} daily tokens"
        )

Show output

Admins can call get_usage_limits() on any key object returned from search() or get_by_access_key().

Alert on threshold

Poll usage and fire an alert when consumption crosses a threshold.

import time

def usage_alerts(rows, threshold=0.8):
    alerts = []
    for row in rows:
        req_count = getattr(row, "daily_request_count", 0) or 0
        req_limit = getattr(row, "daily_request_limit", 0) or 0
        tok_count = getattr(row, "daily_token_count", 0) or 0
        tok_limit = getattr(row, "daily_token_limit", 0) or 0
        label = getattr(row, "model", None) or "global"
        if req_limit and req_count / req_limit >= threshold:
            alerts.append(f"{label}: requests {req_count}/{req_limit}")
        if tok_limit and tok_count / tok_limit >= threshold:
            alerts.append(f"{label}: tokens {tok_count}/{tok_limit}")
    return alerts

for _ in range(3):
    rows = aix.APIKey.get_usage_limits(model=MODEL_ID)
    alerts = usage_alerts(rows, threshold=0.8)
    if alerts:
        for alert in alerts:
            print("ALERT:", alert)
    else:
        print("Usage within threshold")
    time.sleep(60)

Show output

Detect rate-limit errors

Rate-limit enforcement surfaces as HTTP 497 (aiXplain per-minute limit) or 429 (standard). Check for both when calling models on a rate-limited key.

from aixplain import Aixplain

aix = Aixplain(api_key="YOUR_MEMBER_API_KEY")
model = aix.Model.get("669a63646eb56306647e1091")

try:
    result = model.run(text="Hello")
    print(result.data)
except Exception as exc:
    code = getattr(exc, "status_code", None)
    if code in (429, 497):
        print("Rate limit hit — back off and retry")
    else:
        raise

Show output

Delete a key

key.delete()

Deletion is immediate and irreversible. Any application using the key will receive authentication errors.

Rate limits vs. workspace credits

Rate limits and credits are two separate controls that work together:

	Rate limits	Credits
Set by	Admin / Owner via an admin key	Purchased by the Owner
Enforced by	Access layer (per key, per model)	Platform-wide billing
What it blocks	Requests over the limit	Any request when credits reach 0
Resets	Per minute / per day	When credits are topped up

Use rate limits for predictable spend control, and credits for a hard workspace-wide cap.

Common use cases

Cost control. Prevent runaway costs from misconfigured agents or traffic spikes — set a daily token cap on expensive models, a per-week budget on production keys, or strict RPM limits on test environments.

Fair resource sharing. Cap requests per minute on shared models so a single application or team member can't consume disproportionate capacity.

Compliance and governance. Limit access to high-cost or sensitive models via budgets, and pair rate limits with Inspector policies for full runtime governance. Separate admin keys per environment (dev, staging, prod) create audit-friendly boundaries.

Gradual rollout. Deploy a new agent behind low RPM limits, monitor usage, and raise the limits incrementally as it proves stable.

API Keys — Creating and managing API keys
Credits and Billing — Workspace credit management
Team Management — Roles and permissions

How it works​

Who can configure rate limits​

What you can limit​

Setup​

Search existing keys​

Create a key with rate limits​

Update rate limits​

Monitor usage​

Alert on threshold​

Detect rate-limit errors​

Delete a key​

Rate limits vs. workspace credits​

Common use cases​

Related​

How it works

Who can configure rate limits

What you can limit

Setup

Search existing keys

Create a key with rate limits

Update rate limits

Monitor usage

Alert on threshold

Detect rate-limit errors

Delete a key

Rate limits vs. workspace credits

Common use cases

Related