40+ System Design Questions Organized by Difficulty — So You Stop Practicing the Wrong Ones
40+ questions organized by L4, L5, and L6+ levels. What each question tests, the concepts that catch people, and detailed walkthroughs for every one.
What This Guide Covers
40+ system design questions organized by level: L4 (mid-level), L5 (senior), and L6+ (staff)
What each question actually tests, so you know which concepts to prepare for each one
The key concepts, common pitfalls, and what a strong answer looks like for every question
Links to detailed walkthroughs for questions we have covered in depth
A difficulty progression so you know where to start and how to level up
System design interview questions are not created equal.
A question like “Design a URL Shortener” tests fundamentally different skills than “Design a Distributed Transaction System.”
The first tests whether you can build a clean, simple system with reasonable scaling decisions.
The second tests whether you can navigate distributed consensus, failure modes, and consistency guarantees under pressure.
Yet most prep guides list 20 questions in a flat list and tell you to study all of them. This is inefficient.
If you are interviewing for an L4 role, spending a week on distributed transaction design is wasted time.
If you are interviewing for an L6 role, practicing URL Shortener five times is leaving points on the table because the interviewer expects you to handle it in 10 minutes and spend the remaining 35 minutes on depth that L4 candidates are not expected to reach.
This guide organizes every major system design question by the level at which it is most commonly asked and the depth at which it is evaluated.
Each question includes what it tests, the key concepts you need, and a walkthrough link if we have covered it.
Questions are listed roughly in order of how frequently they appear.
How to Use This Guide
If you are preparing for an L4 (mid-level) interview: Master all the L4 questions. These are the ones you will almost certainly see. Study 3-4 L5 questions as well, because interviewers sometimes stretch junior candidates with harder problems to calibrate the top of their range.
If you are preparing for an L5 (senior) interview: You should be able to handle every L4 question comfortably in 25 minutes. Spend your preparation time on L5 questions, which require deeper reasoning about scalability, consistency, and failure handling. Study 2-3 L6 questions to prepare for stretch scenarios.
If you are preparing for an L6+ (staff) interview: L4 questions will appear but the interviewer expects you to finish the basic design in 10-15 minutes and then spend the remaining time on depth that other candidates cannot reach: multi-region architecture, cost optimization, operational concerns, and evolving the design as requirements change. L6 questions are your primary focus.
For a detailed breakdown of what interviewers expect at each level, what FAANG expects from L3 through L6 covers the specific scoring dimensions.
For the grading difference between L5 and L6 answers on the same question, what separates L5 from L6 with real answer comparisons shows the gap clearly.
L4 Questions (Mid-Level)
L4 questions test whether you understand the basic building blocks of system design and can assemble them into a working architecture.
The interviewer is looking for: a clear high-level diagram, reasonable database and caching choices, basic scaling (load balancer, read replicas), and the ability to explain your decisions.
You are not expected to design a globally distributed system or handle every edge case.
1. Design a URL Shortener (Bitly)
Why it is asked at L4: It is the simplest complete system design. It has a small feature set (shorten a URL, redirect to the original), a clear read-heavy access pattern, and straightforward scaling challenges.
It is the “Hello World” of system design interviews.
What it tests: Hashing (how to generate a short code from a long URL), database choice (SQL vs NoSQL for key-value lookups), read-heavy scaling (caching, read replicas), and basic capacity estimation (how many URLs per day, how much storage).
The concept that catches people: The hash collision problem. If two different URLs produce the same short code, you overwrite the first URL. Strong answers explain how to handle this: check for collisions before storing, use a counter-based approach instead of hashing, or use Base62 encoding of an auto-incrementing ID.
Strong answer signal: “I would use Base62 encoding of an auto-incrementing ID rather than hashing the URL. This guarantees uniqueness without collision checking. For 1 billion URLs, Base62 produces 6-character codes, which is short enough. The trade-off is that I need a centralized ID generator, but at our scale a single auto-increment sequence handles 10,000 writes per second.”
We have covered this question in detail: Design a URL Shortener (Bitly) in 45 Minutes.
For the deeper analysis of why this “easy” question traps senior engineers and the specific follow-ups that catch people, read that as well.
2. Design a Paste Service (Pastebin)
Why it is asked at L4: Similar to URL Shortener but with the added dimension of storing content (text blobs) rather than just a mapping. Tests object storage decisions.
What it tests: Blob storage (S3 vs database for storing text), metadata vs content separation (store metadata in a database, content in object storage), TTL-based expiration, and basic access control (public vs private pastes).
The concept that catches people: Storing large text blobs directly in a relational database. Strong candidates separate metadata (paste ID, creation time, expiration, owner) into the database and store the actual content in object storage like S3.
Walkthrough: Design Pastebin
3. Design a Distributed Cache (Redis)
Why it is asked at L4: Caching is a fundamental building block that appears in every system design. Designing the cache itself tests whether you understand the internals rather than just using it as a black box.
What it tests: Cache eviction policies (LRU, LFU, TTL), data distribution (consistent hashing across cache nodes), replication for availability, and the failure scenario when a cache node goes down (thundering herd).
Strong answer signal: “I would use consistent hashing to distribute keys across cache nodes. When a node fails, only the keys mapped to that node are affected. For those keys, requests fall through to the database. To prevent a thundering herd where thousands of requests simultaneously hit the database for the same key, I would use request coalescing: the first request fetches from the database, and all other requests for the same key wait for that result.”
Detailed walkthrough: Design a Distributed Cache (like Redis) in 45 Minutes.
4. Design a Rate Limiter
Why it is asked at L4: Rate limiting is a core API protection mechanism. The question tests whether you can implement a specific algorithm and reason about its trade-offs.
What it tests: Rate limiting algorithms (token bucket, fixed window, sliding window), the distributed dimension (how to share rate limit state across multiple API servers), and failure modes (what happens when the rate limiter itself goes down).
Strong answer signal: Explaining the lazy refill optimization for token bucket. Instead of refilling tokens with a timer, calculate the tokens at request time based on elapsed time. This uses zero background threads and stores only two values per user.
Extensive answer: How to Design a Rate Limiter in 45 Mins?
5. Design a Unique Username Service
Why it is asked at L4: Simple on the surface but tests concurrency and race conditions. Two users trying to register the same username at the same time is a classic distributed systems problem.
What it tests: Uniqueness guarantees, concurrency handling (optimistic locking, database unique constraints), performance under high registration rates, and username validation rules.
Walkthrough: How to Design a Unique Username Service.
6. Design a File Storage Service (Dropbox)
Why it is asked at L4: Tests chunking, sync protocols, and conflict resolution. More complex than URL Shortener but still within L4 scope because the core challenge (file upload and download) is well-understood.
What it tests: File chunking (breaking large files into small pieces for efficient upload and sync), deduplication (if two users upload the same file, store it once), sync conflict resolution (two users edit the same file offline), and metadata vs content storage.
Walkthrough: Design Dropbox in 45 Minutes.
7. Design a Gaming Leaderboard
Why it is asked at L4: Tests sorted data structures and real-time ranking. A good warm-up question that can go deep on data structure choice.
What it tests: Redis sorted sets (the most efficient solution), database alternatives (SQL with ORDER BY, which does not scale), real-time vs batch ranking, and handling ties.
Walkthrough: How to Design a Gaming Leaderboard.
Other L4 Questions
These are commonly asked at L4:
Design a Task Queue (like Celery/SQS): Tests message queue fundamentals, at-least-once delivery, and dead letter queues.
Design a URL Bookmark Service: Tests CRUD operations at scale, tagging, and search.
Design an Authentication System: Tests OAuth flows, JWT tokens, session management, and security basics.
Design a Poll/Voting System: Tests atomic counters, preventing duplicate votes, and real-time result streaming.
L5 Questions (Senior)
L5 questions add significant complexity.
The system has more moving parts, the scaling challenges are harder, the consistency requirements are stricter, and the interviewer expects you to discuss failure modes, monitoring, and operational concerns without being prompted.
You should spend roughly 10 minutes on the high-level design and 25 minutes going deep on 2-3 components.
8. Design Instagram / Photo Sharing
Why it is asked at L5: The feed generation problem is a genuinely hard distributed systems challenge. Fan-out on write vs fan-out on read is a trade-off with no universally correct answer, and the interviewer expects you to reason through both options.
What it tests: Blob storage for images (S3 + CDN), feed generation (fan-out strategies), celebrity problem (users with millions of followers break fan-out-on-write), database schema for social graph, and notification pipeline.
The concept that catches people: Fan-out on write works well for most users (pre-compute the feed when a user posts) but breaks for celebrities. A user with 50 million followers would require 50 million write operations per post. The strong answer uses a hybrid: fan-out on write for regular users, fan-out on read for celebrities.
Detailed Walkthrough: Design Instagram in 45 Mins
9. Design TikTok / Short Video Platform
Why it is asked at L5: Combines video processing (encoding, transcoding, adaptive bitrate), content delivery (CDN architecture), and recommendation pipelines (ML-powered feed ranking). Tests breadth across multiple subsystems.
What it tests: Video upload pipeline (chunked upload, transcoding to multiple resolutions), CDN for video delivery, recommendation engine (candidate generation + ranking), and content moderation.
Walkthrough: Design TikTok in 45 Minutes.
10. Design Discord / Real-Time Chat
Why it is asked at L5: Real-time messaging introduces WebSockets, presence tracking, message ordering, and the challenge of maintaining persistent connections at scale.
What it tests: WebSocket management (how to maintain millions of persistent connections), message ordering (how to guarantee messages appear in the correct order), presence tracking (online/offline/idle status), and server-to-server message routing (when sender and receiver are connected to different servers).
Strong answer signal: “For message ordering, I would use a combination of server-generated timestamps and logical clocks. Each message receives a server timestamp on receipt. Within a channel, messages are ordered by server timestamp. For simultaneous messages (same timestamp), I use a sequence number per channel. The trade-off is that clock skew between servers can cause minor reordering, but for a chat application, sub-second ordering accuracy is acceptable.”
Walkthrough: Design Discord in 45 Minutes.
11. Design a Key-Value Store (DynamoDB)
Why it is asked at L5: This is a deeper version of the distributed cache question. It tests partitioning, replication, and consistency at the storage layer rather than the caching layer.
What it tests: Data partitioning (consistent hashing), replication (quorum-based reads and writes), conflict resolution (last-write-wins vs vector clocks), tunable consistency, and compaction/garbage collection.
Strong answer signal: Explaining how tunable consistency works. “With a replication factor of 3, I can configure W=2 and R=2 for strong consistency (any read sees the latest write because at least one node in the read quorum was in the write quorum). Or I can configure W=1 and R=1 for faster but eventually consistent reads.”
Walkthrough: Design a Key-Value Store (like DynamoDB) in 45 Minutes.
12. Design a Notification System
Why it is asked at L5: Multi-channel delivery (push, email, SMS, in-app) creates a fan-out problem with delivery guarantees, deduplication, and per-user preferences.
What it tests: Message queue architecture for reliable delivery, template rendering, per-channel rate limiting (you cannot send 100 push notifications per hour), user preference management, and delivery tracking (was the notification opened?).
Detailed walkthroughs: Design a Notification System (Push, Email, SMS)
13. Design Netflix / Video Streaming
Why it is asked at L5: Video streaming introduces adaptive bitrate streaming, content encoding pipelines, CDN architecture, and recommendation systems. The breadth of subsystems makes it a demanding question.
What it tests: Video encoding pipeline (transcoding to multiple resolutions and codecs), adaptive bitrate streaming (adjusting quality based on network conditions), CDN placement and cache warming, recommendation engine, and DRM (digital rights management).
Strong answer signal: Explaining adaptive bitrate streaming. “The video is encoded at multiple quality levels (240p, 480p, 720p, 1080p, 4K). The client starts at a low quality and measures download speed. If the speed is sufficient, it switches to a higher quality for the next chunk. If the network degrades, it drops to a lower quality. This provides the best quality the user’s network can sustain without buffering.”
Walkthrough: Design Netflix in 45 Minutes.
14. Design a Payment System
Why it is asked at L5: Financial correctness is non-negotiable. Tests idempotency, exactly-once processing, distributed transactions, and auditability.
What it tests: Idempotent API design (preventing double charges), payment state machine (pending, authorized, captured, refunded), integration with external payment processors (Stripe, Adyen), PCI compliance considerations, and reconciliation.
The concept that catches people: Double payment. If the client retries a payment request (because the network timed out), how do you prevent charging the user twice? Strong answers use idempotency keys: the client sends a unique key with every request, and the server deduplicates based on that key.
Walkthroughs: Designing Stripe Payment Gateway
15. Design Spotify / Music Streaming
Why it is asked at L5: Similar to Netflix but with different challenges: smaller files (songs vs movies), offline playback, collaborative playlists, and social features (what your friends are listening to).
What it tests: Audio encoding and storage, CDN architecture for small frequent requests, offline sync (downloading playlists for offline playback), real-time collaboration (two users editing the same playlist), and social graph integration.
Walkthrough: Design Spotify in 45 Minutes.
16. Design YouTube
Why it is asked at L5: Combines video upload processing, search (title and description indexing), recommendation system, comments (high write volume), and live streaming.
What it tests: Video upload and processing pipeline, search indexing (Elasticsearch), recommendation engine, comment system at scale, and live streaming architecture (different from video-on-demand).
Walkthrough: Design YouTube in 45 Minutes.
17. Design Facebook Newsfeed
Why it is asked at L5: The canonical feed generation question. Tests fan-out strategies, ranking algorithms, and real-time updates.
What it tests: Feed generation (fan-out on write vs read), ranking model (chronological vs relevance-based), real-time updates (new posts appearing without refresh), and privacy (posts should only be visible to the intended audience).
Walkthrough: Designing Facebook Newsfeed in 45 Minutes.
18. Design a Shopping Cart and Checkout System
Why it is asked at L5: Tests stateful user sessions, inventory management (preventing overselling), price consistency (the price should not change between cart and checkout), and payment integration.
What it tests: Cart storage strategy (database vs session vs cookie), inventory reservation (soft lock on add-to-cart, hard deduction on checkout), price locking, abandoned cart recovery, and integration with payment and shipping services.
Walkthrough: Design a Shopping Cart and Stripe Payment Gateway.
Solution 2: Design an e-commerce platform
Other L5 Questions
Commonly asked at L5 but not yet covered in dedicated walkthroughs:
Design Uber/Lyft (Ride Matching): Tests geospatial indexing (how to find nearby drivers), real-time location tracking, matching algorithms, and ETA estimation.
Design WhatsApp / Messaging: Tests end-to-end encryption, message delivery guarantees (sent/delivered/read receipts), group messaging, and media transfer.
Design Twitter / X: Tests feed generation, trending topics, and the real-time firehose.
Design Google Maps: Tests geospatial data storage, shortest path algorithms (Dijkstra, A*), tile serving for map rendering, and real-time traffic integration.
Design a Search Autocomplete System: Tests trie data structures, prefix matching, ranking by popularity, and personalization.
Design a Web Crawler: Tests URL frontier management, politeness policies, deduplication, and distributed crawling coordination.
L6+ Questions (Staff)
L6 questions are the hardest.
They involve multiple interacting subsystems, require reasoning about distributed transactions and consensus, test operational maturity (monitoring, deployment, cost), and often have no “standard” answer.
The interviewer expects you to drive the conversation, proactively identify the hardest problems, and make design decisions that reflect production experience.
19. Design Amazon S3 (Object Storage)
Why it is asked at L6: Object storage requires reasoning about durability guarantees (eleven 9s means losing 1 object per 10 billion per year), erasure coding, multi-tenant isolation, and global replication.
What it tests: Data durability (how to guarantee 99.999999999% durability), erasure coding (storing data fragments across multiple nodes so that any k-of-n fragments can reconstruct the original), multi-tenant architecture, storage tiering (hot, warm, cold, archive), and metadata management at petabyte scale.
Strong answer signal: Explaining erasure coding. “Instead of storing 3 full copies (3x storage overhead), I use Reed-Solomon encoding to split each object into 10 data chunks and 4 parity chunks. Any 10 of the 14 chunks can reconstruct the object. This gives me tolerance for 4 simultaneous node failures with only 1.4x storage overhead instead of 3x.”
Walkthrough: Design Amazon S3 in 45 Minutes.
20. Design a Distributed Rate Limiter
Why it is asked at L6: The single-server rate limiter is L4. The distributed version, where 20 API servers must enforce a shared rate limit without a single point of failure and with sub-millisecond overhead, is L6. The question tests distributed coordination, eventual consistency trade-offs, and performance under constraints.
What it tests: Centralized vs distributed counters, Redis-based atomic operations, local-counter-with-sync approaches, failure modes (what happens when the coordination layer is unavailable), and the accuracy-vs-latency trade-off.
Walkthrough: Design a Distributed Rate Limiter in 45 Minutes.
21. Design a Metrics and Monitoring System (Datadog/Prometheus)
Why it is asked at L6: Metrics systems ingest millions of data points per second, store them efficiently in time series databases, and serve complex queries (P99 latency over the last hour, grouped by service). The scale and query complexity make this a staff-level question.
What it tests: Time series database choice (InfluxDB, Prometheus, custom), data ingestion at scale (millions of metrics per second), aggregation (pre-aggregation vs query-time aggregation), alerting engine, and dashboard serving.
Walkthrough: Design a Metrics and Monitoring System (like Datadog/Prometheus).
22. Design an AI Recommendation System
Why it is asked at L6: Recommendation systems combine ML pipelines (training, feature stores, model serving) with production infrastructure (A/B testing, monitoring, retraining). The breadth of subsystems and the ML-specific components make this staff-level.
What it tests: Candidate generation (retrieving a shortlist from millions of items), ranking model (deep learning or gradient-boosted trees), feature store (preventing training-serving skew), A/B testing framework, and model monitoring and retraining.
Walkthrough: How to Design an AI Recommendation System from Scratch.
23. Design an AI Content Moderation System
Why it is asked at L6: Content moderation combines ML inference (classifying images, text, and video), human review workflows, appeal processes, and the challenge of handling adversarial content. The system must be fast (moderate before content goes live) and accurate (low false positive rate).
What it tests: ML model serving pipeline, human-in-the-loop workflows, content queuing and prioritization, adversarial robustness, and policy enforcement at scale.
Walkthrough: Design a Real-Time AI Content Moderation System.
24. Design ChatGPT
Why it is asked at L6: The hottest system design question in 2026. Tests LLM serving infrastructure, context window management, streaming responses, and cost optimization (GPU inference is expensive).
What it tests: LLM serving architecture (model sharding across GPUs, batching inference requests), streaming token generation, conversation state management, rate limiting per user, and cost optimization (when to use smaller models vs larger models).
Walkthrough: System Design Case Study: How to Design ChatGPT.
25. Design a Multi-Agent AI System
Why it is asked at L6: Multi-agent systems are new to the interview circuit. They test orchestration patterns (how agents communicate), tool use (agents calling external APIs), memory management, and failure handling when one agent produces incorrect output.
What it tests: Agent orchestration (sequential vs parallel vs hierarchical), tool integration, shared memory and context passing, error recovery, and cost management (each agent call costs money).
Walkthrough: How to Design a Multi-Agent AI System.
Other L6+ Questions
Commonly asked at staff level but not yet covered in dedicated walkthroughs:
Design Google Search: Tests web crawling, indexing (inverted index), ranking (PageRank + ML), query parsing, and serving results with sub-200ms latency.
Design a Distributed Transaction System (2PC/Saga): Tests two-phase commit, saga pattern, compensating transactions, and the fundamental trade-off between consistency and availability.
Design a Global Content Delivery Network: Tests edge server placement, cache warming, origin shielding, and routing policies.
Design a Distributed File System (HDFS/GFS): Tests chunk-based storage, metadata management, replication, and fault tolerance at the storage layer.
Design a Real-Time Ad Auction System: Tests real-time bidding (RTB), latency constraints (respond in 100ms), auction mechanics (second-price auction), and budget pacing.
Design a Code Deployment System (CI/CD): Tests build pipelines, canary deployments, rollback strategies, and artifact management.
Design a Distributed Lock Service (Chubby/ZooKeeper): Tests consensus algorithms, lease-based locking, and the split-brain problem.
How the Interview Changes by Company
The same question produces different interviews at different companies because each company evaluates through a different lens.
Google asks open-ended questions and evaluates depth. They might say “Design a system to handle short URLs” without specifying the scale, and they expect you to ask the right questions to scope it. The follow-ups go very deep into specific components. What Google interviewers actually evaluate covers the lens.
Amazon evaluates through Leadership Principles. The same URL shortener question at Amazon expects you to discuss “Customer Obsession” (what does the user experience look like?), “Ownership” (how do you monitor and operate this?), and “Dive Deep” (explain the internals of your hashing approach). 12 system design questions commonly asked at Amazon covers the question bank with LP framing.
Meta evaluates at extreme scale. Every question at Meta implicitly has the constraint “for 3 billion users.” If your design does not address what happens at billions of users, you are under-designing. The 5 concepts Meta weighs most heavily covers the focus areas.
Netflix and Stripe evaluate domain expertise. Netflix questions skew toward streaming, CDN, and recommendation systems. Stripe questions skew toward payments, financial correctness, and API design. How Netflix and Stripe interview differently from FAANG covers the differences.
Conclusion
This collection covers 40+ system design questions organized by the level at which they are most commonly asked and evaluated. For each question, you know what it tests, which concepts to prepare, what a strong answer looks like, and where to find a detailed walkthrough.
Start with the questions at your target level.
Master them. Then study 3-4 questions at the level above to prepare for stretch scenarios.
Practice under time pressure. Record yourself. Review.
The questions will evolve.
AI-related questions (Design ChatGPT, Design a Multi-Agent System, Design an AI Content Moderation Pipeline) have entered the rotation in 2026 and will only become more common.
The fundamentals, databases, caching, distributed systems, and API design, will remain the foundation regardless of how the questions change.


