Skip to main content
Version: v2.0

Rate Limiting

Rate limiting lets workspace Admins and Owners cap LLM usage — requests, tokens, and credit spend — before a runaway agent or misconfigured app drains your budget. An admin key sets limits on member keys; member keys cannot modify limits or inspect other keys.

How it works

Rate limits are enforced by the Access layer inside AgenticOS on every request that hits a limited model, regardless of which application or agent originated the call.

Request → Access layer → check the key's limits → allow or reject

If a request exceeds a configured limit, it is rejected immediately (not queued) and the API returns a rate-limit error (see Detect rate-limit errors).

Who can configure rate limits

RoleCan create admin keysCan set rate limits
OwnerYesYes
AdminYesYes
MemberNoNo

What you can limit

Limits are attached to a key and can be set globally (across all models on the key) or per model:

LimitWhat it controls
request_per_minute / request_per_dayAPI calls allowed per 60-second window / per 24-hour period
token_per_minute / token_per_dayTokens (input + output, or output-only via token_type) per minute / per day
budgetTotal credits available to the key

All limits can apply simultaneously — whichever threshold is reached first triggers enforcement.

A limit set on a model applies across every invocation path: the REST API, the SDK, LLMs used as an agent's backbone (including team agents), and LLMs inside multi-agent pipelines. Capping a model covers all agents and applications in the workspace that use it, not just one app.

Setup

pip install aixplain
from aixplain import Aixplain

# Member key — for inference and monitoring your own usage
aix = Aixplain(api_key="YOUR_MEMBER_API_KEY")

# Admin key — for creating and managing limits on other keys
aix_admin = Aixplain(api_key="YOUR_ADMIN_API_KEY")

Create an admin key in Console → Settings → API Keys. Admin keys cannot be used for inference.

Search existing keys

Member keys cannot list other keys — calling search() with a member key raises Forbidden. Admin keys can list all keys in the workspace.

# Admin can list all keys
result = aix_admin.APIKey.search()
for key in result["results"]:
print(key.name, key.id)
Show output

Use get_by_access_key() when you have the key string itself (not the ID) and need to inspect or update it.

target = aix_admin.APIKey.get_by_access_key("TARGET_MEMBER_API_KEY")

print("id:", target.id)
print("name:", target.name)
print("is_admin:", target.is_admin)
print("global_limits:", target.global_limits.to_dict() if target.global_limits else None)
for limit in target.asset_limits:
print("asset limit:", limit.to_dict())
Show output

Create a key with rate limits

Create a member key and attach per-model and global limits in one call.

from datetime import datetime
from aixplain.v2.api_key import APIKeyLimits

timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")

new_key = aix_admin.APIKey(
name=f"member-key-{timestamp}",
asset_limits=[
APIKeyLimits(
model="openai/gpt-5/openai",
token_per_minute=10,
token_per_day=30,
request_per_minute=2,
request_per_day=2,
)
],
global_limits=APIKeyLimits(
token_per_minute=100,
token_per_day=1000,
request_per_minute=100,
request_per_day=1000,
),
budget=1000,
expires_at=datetime(2030, 1, 1),
).save()

print(new_key.to_dict())
Show output
ParameterTypeRequiredDefaultDescription
namestrLabel for the key.
asset_limitslist[APIKeyLimits][]Per-model limits.
global_limitsAPIKeyLimitsNoneLimits across all models on this key.
budgetintNoneTotal credits available. Increase the value to top up.
expires_atdatetimeNoneExpiry date. Omit for a non-expiring key.
note

Global limits and asset-specific limits are both enforced. Global limits do not override per-model limits — whichever is stricter applies first.

Update rate limits

Modify limits on an existing key and call save(). Updated limits take effect at the start of the next timeframe (next minute for per-minute limits, next day for per-day limits).

If you have the member key string (not the ID), use get_by_access_key() to fetch it first:

from aixplain.v2.api_key import APIKeyLimits

key = aix_admin.APIKey.get_by_access_key("TARGET_MEMBER_API_KEY")
key.asset_limits = [
APIKeyLimits(
model="669a63646eb56306647e1091",
request_per_minute=2,
request_per_day=5,
token_per_minute=500,
token_per_day=5000,
)
]
key.save()
Show output

Or fetch by listing all keys if you have the ID:

from aixplain.v2.api_key import APIKeyLimits, TokenType

# Fetch the key to update
result = aix_admin.APIKey.search()
key = result["results"][0]

# Update budget
key.budget = 1200

# Update global limits
key.global_limits.token_per_day = 50
key.global_limits.token_per_minute = 500

# Replace asset limits
key.asset_limits = [
APIKeyLimits(
model="openai/gpt-5/openai",
token_per_minute=20 * 15000,
token_per_day=8 * 60 * (20 * 15000),
request_per_minute=60,
request_per_day=8 * 60 * 60,
token_type=TokenType.OUTPUT,
),
APIKeyLimits(
model="openai/gpt-5.1/openai",
token_per_minute=60 * 8000,
token_per_day=3 * 60 * (60 * 8000),
request_per_minute=200,
request_per_day=8 * 60 * 200,
token_type=TokenType.OUTPUT,
),
]

key.save()
print(key.to_dict())
Show output

token_type=TokenType.OUTPUT counts only output tokens against the limit. Omit it to count all tokens (input + output).

Monitor usage

get_usage_limits() returns daily consumption and configured limits. Members call it on aix.APIKey to check their own key. Admins call it on any key object returned from search() or get_by_access_key().

# All usage rows for your own key (member)
rows = aix.APIKey.get_usage_limits()
for row in rows:
print(row)
Show output

A row with model=None is the global scope — None counts and limits mean no global cap is configured on this key. Ignore it unless you explicitly set global limits.

Pass model to filter to one model:

MODEL_ID = "669a63646eb56306647e1091"  # use IDs, not paths

rows = aix.APIKey.get_usage_limits(model=MODEL_ID)
for row in rows:
if row.model is None:
print("No model-specific limits configured")
else:
print(
f"Model {row.model}: "
f"{row.daily_request_count}/{row.daily_request_limit} daily requests, "
f"{row.daily_token_count}/{row.daily_token_limit} daily tokens"
)
Show output

Admins can call get_usage_limits() on any key object returned from search() or get_by_access_key().

Alert on threshold

Poll usage and fire an alert when consumption crosses a threshold.

import time

def usage_alerts(rows, threshold=0.8):
alerts = []
for row in rows:
req_count = getattr(row, "daily_request_count", 0) or 0
req_limit = getattr(row, "daily_request_limit", 0) or 0
tok_count = getattr(row, "daily_token_count", 0) or 0
tok_limit = getattr(row, "daily_token_limit", 0) or 0
label = getattr(row, "model", None) or "global"
if req_limit and req_count / req_limit >= threshold:
alerts.append(f"{label}: requests {req_count}/{req_limit}")
if tok_limit and tok_count / tok_limit >= threshold:
alerts.append(f"{label}: tokens {tok_count}/{tok_limit}")
return alerts

for _ in range(3):
rows = aix.APIKey.get_usage_limits(model=MODEL_ID)
alerts = usage_alerts(rows, threshold=0.8)
if alerts:
for alert in alerts:
print("ALERT:", alert)
else:
print("Usage within threshold")
time.sleep(60)
Show output

Detect rate-limit errors

Rate-limit enforcement surfaces as HTTP 497 (aiXplain per-minute limit) or 429 (standard). Check for both when calling models on a rate-limited key.

from aixplain import Aixplain

aix = Aixplain(api_key="YOUR_MEMBER_API_KEY")
model = aix.Model.get("669a63646eb56306647e1091")

try:
result = model.run(text="Hello")
print(result.data)
except Exception as exc:
code = getattr(exc, "status_code", None)
if code in (429, 497):
print("Rate limit hit — back off and retry")
else:
raise
Show output

Delete a key

key.delete()

Deletion is immediate and irreversible. Any application using the key will receive authentication errors.

Rate limits vs. workspace credits

Rate limits and credits are two separate controls that work together:

Rate limitsCredits
Set byAdmin / Owner via an admin keyPurchased by the Owner
Enforced byAccess layer (per key, per model)Platform-wide billing
What it blocksRequests over the limitAny request when credits reach 0
ResetsPer minute / per dayWhen credits are topped up

Use rate limits for predictable spend control, and credits for a hard workspace-wide cap.

Common use cases

Cost control. Prevent runaway costs from misconfigured agents or traffic spikes — set a daily token cap on expensive models, a per-week budget on production keys, or strict RPM limits on test environments.

Fair resource sharing. Cap requests per minute on shared models so a single application or team member can't consume disproportionate capacity.

Compliance and governance. Limit access to high-cost or sensitive models via budgets, and pair rate limits with Inspector policies for full runtime governance. Separate admin keys per environment (dev, staging, prod) create audit-friendly boundaries.

Gradual rollout. Deploy a new agent behind low RPM limits, monitor usage, and raise the limits incrementally as it proves stable.