Security and Safety

Rate Limiting

Rate limiting constrains how many actions, application programming interface (API) calls, or tokens an agent can consume within a given time period, preventing runaway loops, denial-of-service conditions, and unexpected cost spikes. Without rate limits, a single malfunctioning agent caught in an infinite retry cycle (retrying a failed tool call every two seconds across a 200-step planning loop) can generate a $400 bill from a single run before any human notices, and that is not a hypothetical edge case but a recurring incident pattern documented across public agent deployments. Effective rate limiting operates at multiple levels: per-call limits (maximum tokens per request), per-session limits (maximum total spend per task), and circuit breakers that halt execution when spend or iteration counts cross a threshold you set before the agent ever starts.

subtopics

Token Budgets

Request Throttling

connected to

Cost Tracking Error Recovery

resources

OpenAI: Rate Limitsplatform.openai.comUnderstanding and handling rate limits in OpenAI API calls (platform.openai.com)Anthropic: Rate Limitsdocs.anthropic.comClaude API rate limit tiers and best practices for handling them (docs.anthropic.com)Helicone: Rate Limitinghelicone.aiProxy-level rate limiting and cost controls for LLM API calls (helicone.ai)Token Bucket Algorithmen.wikipedia.orgThe classic rate limiting algorithm applicable to agent systems (wikipedia.org)

view in track