Design a Distributed Rate Limiter in 45 Mins

How would you design a rate limiter for Google or Stripe? Learn the exact architectural patterns to handle high concurrency and distributed throttling.

Jan 23, 2026

∙ Paid

1. Problem Definition and Scope

We are designing a distributed middleware system that limits the number of requests a user or client can send to an API within a specific time period. The goal is to prevent abuse, ensure fair usage, and protect backend services from being overwhelmed.

Main User Groups:
- End Users/Clients: Mobile apps, websites, or 3rd party developers calling the API.
- Internal Services: The API Gateway, which enforces the decisions.
- Admins: Engineers who configure the rules (e.g., “100 requests per minute”).
Scope:
- We will design a Server-side Rate Limiter integrated into an API Gateway.
- We will focus on the Sliding Window Counter algorithm (specifically the weighted average approach) for high accuracy and efficiency.
- Out of Scope: Client-side throttling (easily bypassed) or Network-layer DDoS mitigation (best handled by Cloudflare/AWS Shield).

2. Clarify functional requirements

Must Have:

Throttle Requests: Accurately block requests that exceed a defined threshold (e.g., 10 req/sec).
Distributed State: Limits must be global. If a user hits Server A, then Server B, the count must be cumulative.
Granularity: Support limiting by various keys: User ID, IP Address, or API Key.
Standard Rejection: Return HTTP 429 Too Many Requests with a Retry-After header.
High Performance: The check must happen in real-time (< 20ms) as it blocks every request.

Nice to Have:

Soft Limits: Log a warning if a user crosses a threshold, but allow the request (monitoring mode).
Dynamic Rules: Ability to update limits without restarting the servers.

Keep reading with a 7-day free trial

Subscribe to System Design Nuggets to keep reading this post and get 7 days of free access to the full post archives.