Design a Distributed Rate Limiter in 45 Mins
How would you design a rate limiter for Google or Stripe? Learn the exact architectural patterns to handle high concurrency and distributed throttling.
1. Problem Definition and Scope
We are designing a distributed middleware system that limits the number of requests a user or client can send to an API within a specific time period. The goal is to prevent abuse, ensure fair usage, and protect backend services from being overwhelmed.
Main User Groups:
End Users/Clients: Mobile apps, websites, or 3rd party developers calling the API.
Internal Services: The API Gateway, which enforces the decisions.
Admins: Engineers who configure the rules (e.g., “100 requests per minute”).
Scope:
We will design a Server-side Rate Limiter integrated into an API Gateway.
We will focus on the Sliding Window Counter algorithm (specifically the weighted average approach) for high accuracy and efficiency.
Out of Scope: Client-side throttling (easily bypassed) or Network-layer DDoS mitigation (best handled by Cloudflare/AWS Shield).
2. Clarify functional requirements
Must Have:
Throttle Requests: Accurately block requests that exceed a defined threshold (e.g., 10 req/sec).
Distributed State: Limits must be global. If a user hits Server A, then Server B, the count must be cumulative.
Granularity: Support limiting by various keys: User ID, IP Address, or API Key.
Standard Rejection: Return HTTP 429 Too Many Requests with a Retry-After header.
High Performance: The check must happen in real-time (< 20ms) as it blocks every request.
Nice to Have:
Soft Limits: Log a warning if a user crosses a threshold, but allow the request (monitoring mode).
Dynamic Rules: Ability to update limits without restarting the servers.
Keep reading with a 7-day free trial
Subscribe to System Design Nuggets to keep reading this post and get 7 days of free access to the full post archives.




