The "Thundering Herd" Problem (and how to stop it)

Why 10,000 users will crash your database at 10:01 AM (and how to stop it)?

Nov 30, 2025

Imagine you are running a ticket website for the biggest concert of the year.

For the first hour, everything is smooth. You are caching the “Available Seats” data in Redis with a Time-To-Live (TTL) of 60 seconds. All 50,000 active users are happily hitting the cache, and your database is effectively asleep, handling almost zero load.

Then, the clock strikes 10:01:00.

The cache key for “Available Seats” expires.

In that exact millisecond, 5,000 users refresh the page.

User 1 checks the cache: MISS. → Goes to Database.
User 2 checks the cache: MISS. → Goes to Database.
...
User 5,000 checks the cache: MISS. → Goes to Database.

Because the first request hasn’t finished writing the new data back to the cache yet, all 5,000 requests fall through to your database simultaneously.

Your database CPU spikes to 100%. Connections time out. The system crashes. This is the Thundering Herd (also known as a Cache Stampede).

The Solution: Add “Jitter”

The mistake wasn’t the caching; it was the predictability.

When you set a fixed TTL (e.g., exactly 60 seconds) for popular items, you are scheduling a future DDoS attack on your own database.

The fix is to add Jitter, a small amount of randomness to the expiration time.

Instead of: expires_in = 60 seconds

Do this: expires_in = 60 + random_int(-5, 5) seconds

Now, when the “herd” arrives, the cache keys for different users or different segments expire at slightly different times. The first user to hit the expired key triggers the database read, repopulates the cache, and saves the system, while the others are still served the old (but valid) cache or the newly refreshed one a millisecond later.

Another Solution: Request Coalescing (The “Single-Flight” approach)

Adding jitter is a great, proactive way to spread out the load. But sometimes, you need a harder guarantee. You need a bouncer at the door of your database.

This is where Request Coalescing (sometimes called “request collapsing” or “single-flight”) comes in.

Imagine you have 50 people in an office who all want coffee at 9:00 AM.

Without Coalescing: All 50 people run to the single coffee machine at the same time. Chaos ensues. The machine breaks.
With Coalescing: You have an office manager. The first person asks for coffee. The manager says, “On it,” and goes to the machine. The next 49 people ask for coffee. The manager says, “I’m already getting that, just wait here.” Once the manager has the coffee, they distribute it to all 50 people at once.

How it works technically:

When a request comes into your application server for a cache key that is missing, the server doesn’t immediately run to the database. Instead, it checks an internal, short-lived lock for that key.

Request 1 arrives: It sees no lock. It acquires the lock for key x and proceeds to query the database.
Requests 2-100 arrive: They see that a lock for key x is already held. Instead of querying the DB, they subscribe to the result of the first request and wait.
Request 1 completes: It gets the data from the DB, populates the cache, and releases the lock.
The “Aha!” Moment: The application server then takes that single result and simultaneously returns it to all 100 waiting requests.

The Result: Your database sees only one query instead of 100, even though 100 users experienced a cache miss at the same second.

The Trade-off: This is more complex to implement than simple Jitter. It requires holding state within your application instances, and if not done correctly, a failure in the “leader” request could cause timeouts for everyone waiting. Libraries like Go’s singleflight make this easier, but it’s still added complexity.

Conclusion

The Thundering Herd problem is a classic example of a “success disaster.” It only happens when your product is popular enough to have massive concurrent traffic.

But as we’ve learned today, you don’t have to be a victim of your own success.

Remember these key takeaways:

Caching is a Loan: You are borrowing performance from the future. The Thundering Herd is what happens when the debt collector comes calling all at once.
Don’t be Predictable: Fixed TTLs are a ticking time bomb. Use Jitter to introduce chaos and spread out the repayment of that loan.
Be a Gatekeeper: Use Request Coalescing to shield your database, ensuring that 10,000 questions only result in one answer.

A junior engineer builds for fair weather, assuming the cache will always be there. A senior engineer assumes the cache will fail and builds a system that can survive the storm.

Go build systems that can weather the storm.

System Design Nuggets

Discussion about this post

Ready for more?