System Design Nuggets

System Design Nuggets

System Design Interview Question: Design Discord in 45 Mins

What happens when you send a message on Discord? We trace the request lifecycle through load balancers, caching layers, and message queues in this full guide.

Arslan Ahmad's avatar
Arslan Ahmad
Jan 22, 2026
∙ Paid

1. Problem Definition and Scope

We are designing a large-scale real-time communication platform organized into communities (Servers/Guilds) and channels. Unlike simple messenger apps, Discord focuses on persistent group chat for large communities (up to hundreds of thousands of users per server).

  • Main User Groups:

    • Regular Users: Join servers, chat in text channels, and see real-time presence (who is online).

    • Admins: Create servers, manage channels, and assign roles.

  • Main Actions: Sending text messages, reading message history, and receiving real-time updates.

  • Scope:

    • In Scope: Real-time Text Chat (1-on-1 and Group), Server/Channel management, Presence (Online/Offline status), and Message History.

    • Out of Scope: Voice and Video audio processing (we will handle the signaling to join a call, but not the actual media streaming), Screen Sharing, and Payments.

2. Clarify functional requirements

Must Have:

  • Users can create Servers (Guilds) and Channels.

  • Users can send text messages and receive them in real-time (<100ms).

  • Users can view chat history (infinite scroll).

  • Users can see the online status of friends or guild members.

  • Messages are persistent and ordered by time.

  • Supports Unread indicators (badges).

Nice to Have:

  • “User is typing...” indicators.

  • Rich media (images/videos) in chat.

  • Push notifications for mobile devices.

3. Clarify non-functional requirements

  • Target Users: 50 Million Daily Active Users (DAU).

  • Concurrent Users: ~10 Million users online at once.

  • Throughput: Extremely write-heavy (chat) and read-heavy (fan-out).

  • Latency: Real-time delivery is critical. Users expect instant communication.

  • Availability: 99.99%. The service must be always on for gamers and communities.

  • Consistency:

    • Strong Consistency: For message ordering within a channel.

    • Eventual Consistency: Acceptable for cross-device read status or presence updates.

  • Data Retention: Messages are stored forever unless deleted.

4. Back of the envelope estimates

  • Traffic (QPS):

    • 50M DAU.

    • Average 20 messages per user/day = 1 Billion messages/day.

    • Write QPS: 1,000,000,000/86,400 ≈ 12,000 messages/sec (average).

    • Peak Write QPS:~5x average ≈ 60,000 messages/sec.

    • Fan-out: If a user sends a message in a channel with 100 active listeners, that is 1 write but 100 outgoing WebSocket pushes.

  • Storage:

    • Avg message size = 100 bytes (text + metadata).

    • Daily Storage = 1B x 100 bytes = 100 GB / day.

    • Yearly Storage = 100 GB x 365 ≈ 36 TB/year.

    • Conclusion: We need a distributed database that allows easy horizontal scaling (adding more nodes).

  • Bandwidth:

    • Text is cheap.

    • If 10% of messages have a 500KB image: 100 M x 500 KB = 50 TB / day.

    • We need a CDN to offload this traffic.

5. API design

We will use a Hybrid Approach: HTTP REST for actions (sending data) and WebSockets for events (receiving data).

1. REST API (Client -> Server Actions)

  • POST /channels/{channel_id}/messages

    • Request: { content: “Hello”, nonce: “uuid”, attachments: [...] }

    • Response: 200 OK, Message Object.

    • Why REST? It handles uploads, validation, and rate limiting easier than WebSockets.

  • GET /channels/{channel_id}/messages

    • Request: ?limit=50&before={message_id}

    • Response: List of messages.

    • Note: Uses cursor-based pagination (using message ID) for stable scrolling.

  • POST /gateway/url

    • Response: { url: “wss://gateway-3.discord.com” }

    • Purpose: Returns the best WebSocket server for the user to connect to.

2. WebSocket API (Server -> Client Real-time)

  • OpCodes (Commands):

    • IDENTIFY: Client sends auth token to log in.

    • HEARTBEAT: Client pings every 40s to say “I’m still here.”

  • Events (Push):

    • MESSAGE_CREATE: New message data.

    • PRESENCE_UPDATE: User X went offline.

6. High-level architecture

We separate the “Stateful” connection layer from the “Stateless” logic layer.

Components:

  1. Clients: Web, Mobile, Desktop apps.

  2. Load Balancer: Routes HTTP traffic to API servers.

  3. REST API Servers: Stateless servers that handle “Sending Messages”, “Login”, “Guild Management”.

  4. WebSocket Gateway: Stateful servers. Each server holds ~100k open TCP connections. Their only job is to push data to users.

  5. Service Layer:

    • Chat Service: Processes messages.

    • Guild Service: Manages members and roles.

    • Presence Service: Tracks online status.

  6. Message Queue (Kafka): Connects the API layer to the Gateway layer.

Keep reading with a 7-day free trial

Subscribe to System Design Nuggets to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2026 Arslan Ahmad · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture