How to configure model rate limiting
This guide provides detailed instructions for utilizing the Model Rate Limiting feature in aiXplain to manage API usage effectively. Follow these steps to integrate and configure rate limits in your workflow.
Overview
The Model Rate Limiting feature enables administrators to:
- Set rate limits for text generation models (token-based models).
- Monitor and update these limits for specific API keys.
- Enforce constraints on API calls to manage resource usage efficiently.
Step 1: Insert Admin Access Key
Admin access keys are required to configure and monitor rate limits. These keys are solely for management and cannot be used for inference. Create an admin key via the Integration page on aiXplain's Studio.
import os
os.environ["TEAM_API_KEY"] = "ADMIN_API_KEY" # Admin API key
Step 2: Set Rate Limits
Creating a Member API Key with Rate Limits
- Create a member access key with a specific name or label.
- Define asset-specific rate limits for individual models accessible by this API key.
- Define global rate limits applicable across all accessible assets by this API key.
Supported rate limits include:
- Tokens per minute/day
- Requests per minute/day
You can optionally, add a budget specifying the total credits available or update the value to add more credits.
Tokens include both input and output tokens. Global limits do not override asset-specific limits; both are enforced together.
from aixplain.factories import APIKeyFactory
from aixplain.modules import APIKey, APIKeyGlobalLimits
from datetime import datetime
api_key = APIKeyFactory.create(
name="Test API Key",
asset_limits=[
APIKeyGlobalLimits(
model="6661df926d36df3b878e0697", # The ID of the model to be rate-limited.
token_per_minute=10,
token_per_day=30,
request_per_minute=2,
request_per_day=2
)
],
global_limits=APIKeyGlobalLimits(
token_per_minute=100,
token_per_day=1000,
request_per_day=1000,
request_per_minute=100
),
budget=1000, #optional
expires_at=datetime(2024, 11, 29) # Set expiration date based on midnight (UTC)
)
api_key.__dict__
Step 3: View Rate Limits
To review rate limits for an API key, use the code below.
api_key_info = APIKeyFactory.get(api_key=api_key.access_key).__dict__
api_key_info
global_limits = api_key_info['global_limits']
print("Global limits:", global_limits.__dict__)
# Loop through each asset limit
for asset_limit in api_key_info['asset_limits']:
print("Asset limits:", asset_limit.__dict__) # Print the details
Step 4: Monitor API Key Usage
Monitor overall API key usage
usage = api_key.get_usage()
for key in usage:
print(key.__dict__)
Monitor usage for a specific model
from aixplain.factories import APIKeyFactory
api_limit = APIKeyFactory.get_usage_limits(api_key="Key",asset_id="6661df926d36df3b878e0697")
for key in api_limit:
print(key.__dict__)
Step 5: Update Rate Limits
You can update an existing API key’s limits and budget as needed.
Updating limits of an existing API key will apply to the next timeframe (day or minute) as per the limit type.
api_key_temp = APIKeyFactory.get(api_key.access_key)
print('Updating key limits: ' + api_key_temp.access_key)
# Update budget
api_key_temp.budget = 200
# Update global rate limits
api_key_temp.global_limits.token_per_day = 50
api_key_temp.global_limits.token_per_minute = 500
# Update rate limits of a specific asset
for i, asset_limit in enumerate(api_key_temp.asset_limits):
if asset_limit.model.id == "6414bd3cd09663e9225130e8":
api_key_temp.asset_limits[i].token_per_minute = 6000
break
api_key_temp = APIKeyFactory.update(api_key_temp)
print("Budget: ", api_key_temp.budget)
print("Global limits:", api_key_temp.global_limits.__dict__)
for asset in api_key_temp.asset_limits:
print("Asset limits:", asset.__dict__)
Step 6: Delete an API Key
To delete an API key when no longer in use, use the following code.
api_key.delete()
Tips for Effective Rate Limiting
- Use admin keys exclusively for configuration and monitoring.
- Regularly check usage to adjust limits as per operational needs.
- Apply both global and asset-specific limits for detailed control.
By leveraging the Model Rate Limiting feature, you can efficiently manage API usage and resource allocation for your models, enhancing the efficiency of your agents and workflows.