Rate Limiting
Rate limiting lets workspace Admins and Owners cap LLM usage — requests, tokens, and credit spend — before a runaway agent or misconfigured app drains your budget. An admin key sets limits on member keys; member keys cannot modify limits or inspect other keys.
How it works
Rate limits are enforced by the Access layer inside AgenticOS on every request that hits a limited model, regardless of which application or agent originated the call.
Request → Access layer → check the key's limits → allow or reject
If a request exceeds a configured limit, it is rejected immediately (not queued) and the API returns a rate-limit error (see Detect rate-limit errors).
Who can configure rate limits
| Role | Can create admin keys | Can set rate limits |
|---|---|---|
| Owner | Yes | Yes |
| Admin | Yes | Yes |
| Member | No | No |
What you can limit
Limits are attached to a key and can be set globally (across all models on the key) or per model:
| Limit | What it controls |
|---|---|
request_per_minute / request_per_day | API calls allowed per 60-second window / per 24-hour period |
token_per_minute / token_per_day | Tokens (input + output, or output-only via token_type) per minute / per day |
budget | Total credits available to the key |
All limits can apply simultaneously — whichever threshold is reached first triggers enforcement.
A limit set on a model applies across every invocation path: the REST API, the SDK, LLMs used as an agent's backbone (including team agents), and LLMs inside multi-agent pipelines. Capping a model covers all agents and applications in the workspace that use it, not just one app.
Setup
pip install aixplain
from aixplain import Aixplain
# Member key — for inference and monitoring your own usage
aix = Aixplain(api_key="YOUR_MEMBER_API_KEY")
# Admin key — for creating and managing limits on other keys
aix_admin = Aixplain(api_key="YOUR_ADMIN_API_KEY")
Create an admin key in Console → Settings → API Keys. Admin keys cannot be used for inference.
Search existing keys
Member keys cannot list other keys — calling search() with a member key raises Forbidden. Admin keys can list all keys in the workspace.
# Admin can list all keys
result = aix_admin.APIKey.search()
for key in result["results"]:
print(key.name, key.id)
Use get_by_access_key() when you have the key string itself (not the ID) and need to inspect or update it.
target = aix_admin.APIKey.get_by_access_key("TARGET_MEMBER_API_KEY")
print("id:", target.id)
print("name:", target.name)
print("is_admin:", target.is_admin)
print("global_limits:", target.global_limits.to_dict() if target.global_limits else None)
for limit in target.asset_limits:
print("asset limit:", limit.to_dict())
Create a key with rate limits
Create a member key and attach per-model and global limits in one call.
from datetime import datetime
from aixplain.v2.api_key import APIKeyLimits
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
new_key = aix_admin.APIKey(
name=f"member-key-{timestamp}",
asset_limits=[
APIKeyLimits(
model="openai/gpt-5/openai",
token_per_minute=10,
token_per_day=30,
request_per_minute=2,
request_per_day=2,
)
],
global_limits=APIKeyLimits(
token_per_minute=100,
token_per_day=1000,
request_per_minute=100,
request_per_day=1000,
),
budget=1000,
expires_at=datetime(2030, 1, 1),
).save()
print(new_key.to_dict())
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
name | str | ✅ | — | Label for the key. |
asset_limits | list[APIKeyLimits] | — | [] | Per-model limits. |
global_limits | APIKeyLimits | — | None | Limits across all models on this key. |
budget | int | — | None | Total credits available. Increase the value to top up. |
expires_at | datetime | — | None | Expiry date. Omit for a non-expiring key. |
Global limits and asset-specific limits are both enforced. Global limits do not override per-model limits — whichever is stricter applies first.
Update rate limits
Modify limits on an existing key and call save(). Updated limits take effect at the start of the next timeframe (next minute for per-minute limits, next day for per-day limits).
If you have the member key string (not the ID), use get_by_access_key() to fetch it first:
from aixplain.v2.api_key import APIKeyLimits
key = aix_admin.APIKey.get_by_access_key("TARGET_MEMBER_API_KEY")
key.asset_limits = [
APIKeyLimits(
model="669a63646eb56306647e1091",
request_per_minute=2,
request_per_day=5,
token_per_minute=500,
token_per_day=5000,
)
]
key.save()
Or fetch by listing all keys if you have the ID:
from aixplain.v2.api_key import APIKeyLimits, TokenType
# Fetch the key to update
result = aix_admin.APIKey.search()
key = result["results"][0]
# Update budget
key.budget = 1200
# Update global limits
key.global_limits.token_per_day = 50
key.global_limits.token_per_minute = 500
# Replace asset limits
key.asset_limits = [
APIKeyLimits(
model="openai/gpt-5/openai",
token_per_minute=20 * 15000,
token_per_day=8 * 60 * (20 * 15000),
request_per_minute=60,
request_per_day=8 * 60 * 60,
token_type=TokenType.OUTPUT,
),
APIKeyLimits(
model="openai/gpt-5.1/openai",
token_per_minute=60 * 8000,
token_per_day=3 * 60 * (60 * 8000),
request_per_minute=200,
request_per_day=8 * 60 * 200,
token_type=TokenType.OUTPUT,
),
]
key.save()
print(key.to_dict())
token_type=TokenType.OUTPUT counts only output tokens against the limit. Omit it to count all tokens (input + output).
Monitor usage
get_usage_limits() returns daily consumption and configured limits. Members call it on aix.APIKey to check their own key. Admins call it on any key object returned from search() or get_by_access_key().
# All usage rows for your own key (member)
rows = aix.APIKey.get_usage_limits()
for row in rows:
print(row)
A row with model=None is the global scope — None counts and limits mean no global cap is configured on this key. Ignore it unless you explicitly set global limits.
Pass model to filter to one model:
MODEL_ID = "669a63646eb56306647e1091" # use IDs, not paths
rows = aix.APIKey.get_usage_limits(model=MODEL_ID)
for row in rows:
if row.model is None:
print("No model-specific limits configured")
else:
print(
f"Model {row.model}: "
f"{row.daily_request_count}/{row.daily_request_limit} daily requests, "
f"{row.daily_token_count}/{row.daily_token_limit} daily tokens"
)
Admins can call get_usage_limits() on any key object returned from search() or get_by_access_key().
Alert on threshold
Poll usage and fire an alert when consumption crosses a threshold.
import time
def usage_alerts(rows, threshold=0.8):
alerts = []
for row in rows:
req_count = getattr(row, "daily_request_count", 0) or 0
req_limit = getattr(row, "daily_request_limit", 0) or 0
tok_count = getattr(row, "daily_token_count", 0) or 0
tok_limit = getattr(row, "daily_token_limit", 0) or 0
label = getattr(row, "model", None) or "global"
if req_limit and req_count / req_limit >= threshold:
alerts.append(f"{label}: requests {req_count}/{req_limit}")
if tok_limit and tok_count / tok_limit >= threshold:
alerts.append(f"{label}: tokens {tok_count}/{tok_limit}")
return alerts
for _ in range(3):
rows = aix.APIKey.get_usage_limits(model=MODEL_ID)
alerts = usage_alerts(rows, threshold=0.8)
if alerts:
for alert in alerts:
print("ALERT:", alert)
else:
print("Usage within threshold")
time.sleep(60)
Detect rate-limit errors
Rate-limit enforcement surfaces as HTTP 497 (aiXplain per-minute limit) or 429 (standard). Check for both when calling models on a rate-limited key.
from aixplain import Aixplain
aix = Aixplain(api_key="YOUR_MEMBER_API_KEY")
model = aix.Model.get("669a63646eb56306647e1091")
try:
result = model.run(text="Hello")
print(result.data)
except Exception as exc:
code = getattr(exc, "status_code", None)
if code in (429, 497):
print("Rate limit hit — back off and retry")
else:
raise
Delete a key
key.delete()
Deletion is immediate and irreversible. Any application using the key will receive authentication errors.
Rate limits vs. workspace credits
Rate limits and credits are two separate controls that work together:
| Rate limits | Credits | |
|---|---|---|
| Set by | Admin / Owner via an admin key | Purchased by the Owner |
| Enforced by | Access layer (per key, per model) | Platform-wide billing |
| What it blocks | Requests over the limit | Any request when credits reach 0 |
| Resets | Per minute / per day | When credits are topped up |
Use rate limits for predictable spend control, and credits for a hard workspace-wide cap.
Common use cases
Cost control. Prevent runaway costs from misconfigured agents or traffic spikes — set a daily token cap on expensive models, a per-week budget on production keys, or strict RPM limits on test environments.
Fair resource sharing. Cap requests per minute on shared models so a single application or team member can't consume disproportionate capacity.
Compliance and governance. Limit access to high-cost or sensitive models via budgets, and pair rate limits with Inspector policies for full runtime governance. Separate admin keys per environment (dev, staging, prod) create audit-friendly boundaries.
Gradual rollout. Deploy a new agent behind low RPM limits, monitor usage, and raise the limits incrementally as it proves stable.
Related
- API Keys — Creating and managing API keys
- Credits and Billing — Workspace credit management
- Team Management — Roles and permissions