System Design Interview Question: Design Amazon S3 in 45 Mins.
How do you achieve 11 nines of durability? Learn the specific architectural patterns Amazon S3 uses to ensure data never gets lost, from erasure coding to replication.
1. Problem Definition and Scope
We are designing a highly scalable, durable object storage service. Unlike a file system, this is a Key-Value store where the “Key” is a URL path and the “Value” is a file (object) that can range from a few bytes to terabytes.
Main User Groups: Developers (storing app assets, logs), Systems (backups, big data analytics), and End Users (viewing content).
Main Actions: Create buckets, upload files (PUT), download files (GET), delete files, and list files.
Scope: We will focus on the Standard storage tier (hot data).
Out of Scope: We will not cover cold archival storage (Glacier), static website hosting, object lifecycle policies, or cross-region replication details.
2. Clarify functional requirements
Must have:
Buckets: Users can create named containers. Names must be globally unique.
Put Object: Users can upload files. The system must support small files (kilobytes) and very large files (terabytes).
Get Object: Users can retrieve files via HTTP.
Delete Object: Users can remove files.
List Objects: Users can list keys in a bucket, supporting pagination and prefix filtering.
Immutability: Objects are immutable. To modify a file, you must overwrite it completely.
Nice to have:
Multipart Upload: Ability to upload a single large file in parallel chunks.
Presigned URLs: Ability to grant temporary access to a private object via a URL.
3. Clarify non-functional requirements
Durability: The most critical requirement. Target 11 nines (99.999999999%). We must not lose data.
Availability: High (99.9% - 99.99%). The service should always accept writes and reads.
Scale: Must support Exabytes of data and Trillions of objects.
Performance: Optimized for throughput (streaming large files) rather than ultra-low latency. Time-to-first-byte should be < 100ms.
Consistency: Strong consistency for new objects. Once a client receives 200 OK for a write, a subsequent read must return that new data.
Cost: Must be cost-effective using commodity hardware.
4. Back of the envelope estimates
Let’s design for a large regional deployment.
Traffic:
Writes: 100,000 QPS.
Reads: 1,000,000 QPS (10:1 read-to-write ratio).
Keep reading with a 7-day free trial
Subscribe to System Design Nuggets to keep reading this post and get 7 days of free access to the full post archives.





