System Design Deep Dive: Designing a URL Shortener with Base62
Master the system design of URL shorteners. Learn the math behind Base62 encoding, why Hashing (MD5) causes collisions, and how to scale ID generation using Twitter Snowflake and ZooKeeper.
Modern web applications rely heavily on data transmission via Uniform Resource Locators (URLs).
As applications grow in complexity, these locators often become unwieldy. They frequently contain deep directory paths, tracking parameters for analytics, and serialized security tokens.
These extremely long strings create functional problems in data storage, user interface design, and communication protocols.
When a string exceeds specific character limits, it can fragment or break, rendering the link useless.
To mitigate these issues, engineers utilize URL shortening services. These systems ingest arbitrary long strings and output compact, unique identifiers.
While the output appears simple, the internal architecture requires a robust understanding of mathematical bases and database theory.
This concept is a staple in system design because it challenges developers to think about data representation efficiency and uniqueness at scale.
This post explores the engineering logic behind Base62 encoding. It details how to mathematically compress database identifiers into short strings and the architectural decisions required to implement this at scale.
The Core Problem: Identification and Brevity
The fundamental challenge in designing a URL shortener is unique identification. The system must accept a long URL and return a short string. Crucially, that short string must serve as a unique key.
If two different long URLs are assigned the same short key, the system fails to redirect users correctly. This is known as a collision.
A second constraint is brevity.
The goal is to make the key as short as possible. A key that is 20 characters long defeats the purpose of the system.
The engineering goal is to pack the maximum amount of unique data into the minimum number of characters. Ideally, we want a string that is only 6 or 7 characters long but can still reference billions of unique database records.
To solve this, we cannot rely on the standard decimal system used in everyday mathematics.
We must look at how computers represent numbers and how we can manipulate those representations to increase information density.
The Database Solution: Integers as Keys
The most reliable method for generating unique keys in a software system is a sequential counter.
Relational databases provide this functionality natively through Auto-Increment primary keys.
When a row is inserted into a database, the system assigns it the next available integer.
Row 1 gets ID 1.
Row 2 gets ID 2.
Row 1,000 gets ID 1,000.
This integer satisfies the uniqueness requirement perfectly. No two rows will ever share the same ID because the database enforces strict serialization of writes. However, using the raw integer as the short URL is inefficient from a user experience perspective.
As the system scales to billions of links, the integer ID grows in length.
An ID of 10,000,000,000 takes up 10 characters.
To shorten this visual representation without changing the underlying data, we must convert the number from Base10 (Decimal) to Base62.
Understanding Numerical Bases
To understand the solution, we must define what a “base” is.
Keep reading with a 7-day free trial
Subscribe to System Design Nuggets to keep reading this post and get 7 days of free access to the full post archives.


