System Design Interview Question: Design Discord in 45 Mins
What happens when you send a message on Discord? We trace the request lifecycle through load balancers, caching layers, and message queues in this full guide.
1. Problem Definition and Scope
We are designing a large-scale real-time communication platform organized into communities (Servers/Guilds) and channels. Unlike simple messenger apps, Discord focuses on persistent group chat for large communities (up to hundreds of thousands of users per server).
Main User Groups:
Regular Users: Join servers, chat in text channels, and see real-time presence (who is online).
Admins: Create servers, manage channels, and assign roles.
Main Actions: Sending text messages, reading message history, and receiving real-time updates.
Scope:
In Scope: Real-time Text Chat (1-on-1 and Group), Server/Channel management, Presence (Online/Offline status), and Message History.
Out of Scope: Voice and Video audio processing (we will handle the signaling to join a call, but not the actual media streaming), Screen Sharing, and Payments.
2. Clarify functional requirements
Must Have:
Users can create Servers (Guilds) and Channels.
Users can send text messages and receive them in real-time (<100ms).
Users can view chat history (infinite scroll).
Users can see the online status of friends or guild members.
Messages are persistent and ordered by time.
Supports Unread indicators (badges).
Nice to Have:
“User is typing...” indicators.
Rich media (images/videos) in chat.
Push notifications for mobile devices.
3. Clarify non-functional requirements
Target Users: 50 Million Daily Active Users (DAU).
Concurrent Users: ~10 Million users online at once.
Throughput: Extremely write-heavy (chat) and read-heavy (fan-out).
Latency: Real-time delivery is critical. Users expect instant communication.
Availability: 99.99%. The service must be always on for gamers and communities.
Consistency:
Strong Consistency: For message ordering within a channel.
Eventual Consistency: Acceptable for cross-device read status or presence updates.
Data Retention: Messages are stored forever unless deleted.
4. Back of the envelope estimates
Traffic (QPS):
50M DAU.
Average 20 messages per user/day = 1 Billion messages/day.
Write QPS: 1,000,000,000/86,400 ≈ 12,000 messages/sec (average).
Peak Write QPS:~5x average ≈ 60,000 messages/sec.
Fan-out: If a user sends a message in a channel with 100 active listeners, that is 1 write but 100 outgoing WebSocket pushes.
Storage:
Avg message size = 100 bytes (text + metadata).
Daily Storage = 1B x 100 bytes = 100 GB / day.
Yearly Storage = 100 GB x 365 ≈ 36 TB/year.
Conclusion: We need a distributed database that allows easy horizontal scaling (adding more nodes).
Bandwidth:
Text is cheap.
If 10% of messages have a 500KB image: 100 M x 500 KB = 50 TB / day.
We need a CDN to offload this traffic.
5. API design
We will use a Hybrid Approach: HTTP REST for actions (sending data) and WebSockets for events (receiving data).
1. REST API (Client -> Server Actions)
POST /channels/{channel_id}/messages
Request: { content: “Hello”, nonce: “uuid”, attachments: [...] }
Response: 200 OK, Message Object.
Why REST? It handles uploads, validation, and rate limiting easier than WebSockets.
GET /channels/{channel_id}/messages
Request: ?limit=50&before={message_id}
Response: List of messages.
Note: Uses cursor-based pagination (using message ID) for stable scrolling.
POST /gateway/url
Response: { url: “wss://gateway-3.discord.com” }
Purpose: Returns the best WebSocket server for the user to connect to.
2. WebSocket API (Server -> Client Real-time)
OpCodes (Commands):
IDENTIFY: Client sends auth token to log in.
HEARTBEAT: Client pings every 40s to say “I’m still here.”
Events (Push):
MESSAGE_CREATE: New message data.
PRESENCE_UPDATE: User X went offline.
6. High-level architecture
We separate the “Stateful” connection layer from the “Stateless” logic layer.
Components:
Clients: Web, Mobile, Desktop apps.
Load Balancer: Routes HTTP traffic to API servers.
REST API Servers: Stateless servers that handle “Sending Messages”, “Login”, “Guild Management”.
WebSocket Gateway: Stateful servers. Each server holds ~100k open TCP connections. Their only job is to push data to users.
Service Layer:
Chat Service: Processes messages.
Guild Service: Manages members and roles.
Presence Service: Tracks online status.
Message Queue (Kafka): Connects the API layer to the Gateway layer.
Keep reading with a 7-day free trial
Subscribe to System Design Nuggets to keep reading this post and get 7 days of free access to the full post archives.






