The Database and Caching Guide for System Design Interviews: SQL, NoSQL, Sharding, and Everything Between
SQL vs NoSQL from scratch. Indexing, replication, sharding, CAP theorem. Caching strategies, invalidation bugs, and the 5 database mistakes that fail interviews instantly.
What This Guide Covers
SQL vs NoSQL explained from scratch, with a decision framework for when to use each
How databases scale: indexing, replication, sharding, and partitioning
The consistency spectrum: ACID, BASE, CAP theorem, and PACELC
Caching from the ground up: strategies, invalidation, and the failure modes that catch everyone
Specialized databases: time series, vector, graph, and when each one earns its place
The database mistakes that interviewers catch immediately
Every system design interview is, at its core, a database conversation. You might be asked to design Instagram, a chat system, or a rate limiter, but the interviewer will always steer the discussion toward data.
How will you store it? How will you retrieve it quickly?
What happens when one server is not enough?
What trade-offs are you making between consistency and availability?
Candidates who understand databases deeply do well in system design interviews.
Candidates who treat databases as black boxes, drawing a cylinder labeled “DB” and moving on, do not.
The interviewer sees that cylinder and asks: “What kind of database? Why that one? How does it scale? What is your consistency model?”
If you cannot answer confidently, the interview stalls.
This guide teaches every database and caching concept you need for system design interviews, starting from the fundamentals and building toward the advanced topics that separate good answers from great ones.
If you are a beginner, start from the top and read through. If you already know the basics, skip to the sections where you feel weakest.
Part 1: SQL vs NoSQL — The First Decision
The first database question in any system design interview is: should you use a SQL database or a NoSQL database?
Getting this wrong is an immediate red flag.
Getting it right, with clear reasoning, is an immediate credibility signal.
What SQL Databases Do
SQL databases (PostgreSQL, MySQL, Oracle) store data in tables with rows and columns.
You define a schema upfront: this table has these columns with these data types.
Every row in the table follows that schema. You query data using SQL, a language that supports joins (combining data from multiple tables), aggregations (counting, summing, averaging), and transactions (grouping multiple operations so they either all succeed or all fail).
The power of SQL is relationships.
A user places an order.
The order contains items.
Each item belongs to a product.
In a SQL database, these are four tables (users, orders, order_items, products) connected by foreign keys.
When you need “all orders placed by user 42 that contain products from category Electronics,” a single SQL query with joins gives you the answer.
The limitation of SQL is scaling.
A single PostgreSQL instance handles roughly 10,000 to 30,000 queries per second. When you need more, you have options (read replicas, connection pooling, partitioning), but each adds complexity.
And horizontal scaling, splitting data across multiple machines, is harder with SQL because joins across machines are expensive.
What NoSQL Databases Do
NoSQL databases come in four flavors, each designed for a specific access pattern.
Key-value stores (Redis, DynamoDB, Memcached) map a key to a value. You can get a value by its key or put a value at a key, and that is essentially it. No joins. No complex queries. But reads and writes are extremely fast (sub-millisecond for Redis) and scaling is straightforward because each key maps to exactly one server.
Document stores (MongoDB, Couchbase) store data as JSON-like documents. Each document can have a different structure. This flexibility is useful when your data does not fit neatly into tables, like user profiles where some users have addresses and some do not, or product listings with different attributes per category.
Wide-column stores (Cassandra, HBase) store data in rows with dynamic columns. They are optimized for writing huge volumes of data quickly and reading it by row key. Think of them as the database behind time-series data, IoT sensors, and activity logs.
Graph databases (Neo4j, Amazon Neptune) store data as nodes and edges. They are optimized for queries about relationships: “find all friends of friends who live in the same city,” “find the shortest path between two users.” Social networks and recommendation engines use graph databases.
The Decision Framework
Here is how to decide in an interview.
Choose SQL when: Your data is structured and relational (users, orders, products). You need complex queries with joins. You need strong consistency (financial transactions, inventory management). Your scale is moderate (under 50,000 QPS). You need ACID transactions.
Choose NoSQL when: Your access pattern is simple (get by key, put by key). You need massive horizontal scale (millions of operations per second). Your data does not have a fixed schema. You can tolerate eventual consistency. You need very low latency (sub-millisecond reads).
In practice, most systems use both.
A SQL database for the core data model (users, orders) and a NoSQL database for specific use cases (Redis for caching, DynamoDB for session storage, Cassandra for activity logs).
Keep reading with a 7-day free trial
Subscribe to System Design Nuggets to keep reading this post and get 7 days of free access to the full post archives.


