Article 3 of 8

Your First Practical System Design: URL Shortener

Apply everything you've learned to design a URL shortener from scratch.

15 minBeginner
Key Takeaway

Theory without application is trivia. A URL shortener is the perfect first system design exercise: minimal product complexity, maximum architectural richness. You'll make real decisions about ID generation, storage trade-offs, caching strategy, and failure modes. More importantly, you'll see how the five mental models from the previous article directly drive every decision on the whiteboard. This is where principles become engineering.


You've got the mental models. You understand state vs behaviour, sync vs async, the read/write split, bounded contexts, and failure as a first-class citizen. Now comes the real test: can you apply them?

Theory without application is trivia. So let's design something real.

A URL shortener. From scratch.

I've used this exact exercise when mentoring engineers and during hundreds of design discussions, and it works every time. Not because it's hard — a URL shortener is a product you can describe in three sentences. It works because those three sentences hide rich, interconnected design decisions. The surface area is small enough to hold in your head, but the trade-offs are real enough to expose how you actually think about systems.


Why a URL Shortener Is the Perfect First Exercise

A URL shortener takes a long URL, returns a short code, and redirects anyone who visits that short code to the original URL.

That's it. But hiding inside those three sentences are questions about hashing algorithms, data storage, caching, consistency, scale, database selection, and failure handling.

It's also a system most engineers have used. You've clicked a bit.ly or t.co link. You understand the product intuitively — so during the design conversation, you can spend your mental energy on the how rather than the what.

When I use this exercise in mentoring, I'm not testing whether candidates know the "right" architecture. I'm watching how they think. Do they ask about requirements before drawing boxes? Do they consider failure modes? Do they notice the read/write ratio and let it drive their caching strategy? Those are the signals that matter.


Step 1: Requirements Gathering

Before drawing a single box, nail down what you're actually building. I've watched engineers jump to architecture diagrams within the first 30 seconds of a design conversation. That's a mistake. Requirements shape everything that follows.

Functional requirements:

  • Given a long URL, generate a unique short code
  • Given a short code, redirect to the original URL
  • URLs may optionally expire after a configurable TTL
  • Users may optionally choose a custom alias

Non-functional requirements:

  • High availability — if redirects stop working, every link pointing to your service is broken
  • Low redirect latency — sub-50ms ideally
  • Short codes should not be guessable or predictable
  • The system needs to handle a 100:1 read-to-write ratio

That last point is critical. Stop and notice it. This is a read-heavy system. Keep that in mind — it will drive nearly every architectural decision that follows.


Step 2: API Design

Start with the API. Keep it as simple as the requirements demand.

Create short URL:

POST /api/shorten
Body: { "url": "https://example.com/very/long/path", "custom_alias": "my-link", "ttl": 86400 }
Response: { "short_url": "https://sho.rt/a3Bf9k" }

Redirect:

GET /{short_code}
Response: 301 Redirect → https://example.com/very/long/path

Two endpoints. That's your entire API surface. When you find yourself adding more, ask whether the new endpoint is a genuine requirement or premature feature creep.

Now the data model. At its core, this is a key-value mapping:

FieldTypeNotes
short_codeVARCHAR(7)Primary key, indexed
original_urlTEXTThe destination URL
created_atTIMESTAMPWhen it was created
expires_atTIMESTAMPNullable, for TTL support
user_idVARCHARNullable, for authenticated users

Notice how clean this is. One entity, one table, one bounded context. The URL mapping service owns this data and nothing else. The user_id is a reference — not a foreign key joining to another service's database. That's bounded context in action from the very first schema decision.


Step 3: The Core Algorithm

You need to turn a long URL (or a database ID) into a short, unique 6-7 character string. Two credible approaches.

Approach 1: Base62 Encoding with a Counter

Use an auto-incrementing counter (or a distributed ID generator like Twitter's Snowflake) and convert the numeric ID to a base62 string (a-z, A-Z, 0-9).

ID 12345 in base62 becomes 3D7. A 7-character base62 string gives you 62⁷ = ~3.5 trillion unique URLs. You will not run out.

Pros: No collisions. Ever. Each ID is unique by construction.

Cons: Short codes are sequential and somewhat predictable. You need a centralised or coordinated counter.

Approach 2: Hash-Based

Take the long URL, compute a hash (MD5 or SHA-256), and take the first 7 characters of the base62-encoded hash.

Pros: Stateless — no counter needed. The same URL always produces the same short code.

Cons: Collisions. Two different URLs can produce the same 7-character prefix. You need collision detection (query the database, re-hash with a salt, retry).

My recommendation: Base62 encoding with a distributed ID generator. It's simpler to operate, generates zero collisions, and the "predictability" concern is easily mitigated by adding a random offset or shuffling the encoding alphabet. Hash-based approaches sound elegant but collision handling adds real complexity to your write path for no operational benefit.


Step 4: Database Choice and Trade-offs

You need a database that's fast on point lookups by short_code. This is essentially a key-value lookup — not a complex relational join.

Option 1: Relational (PostgreSQL/MySQL)

  • Familiar, battle-tested, ACID guarantees make collision handling straightforward
  • With proper indexing on short_code, point lookups are fast
  • TTL support via expiry columns with a background cleanup job

Option 2: NoSQL (DynamoDB/Cassandra)

  • Purpose-built for high-throughput key-value access patterns
  • Scales horizontally with manageable operational complexity
  • DynamoDB has native TTL support — expired items are cleaned up automatically

For a URL shortener at scale, either works. The bottleneck won't be the database technology — it will be whether you've set up caching correctly. More on that in a moment.

This is State vs Behaviour in front of you: the state (URL mappings) is simple and well-defined. The behaviour (create, redirect) is thin. When your state model is this clean, database technology matters far less than the architecture surrounding it.


Step 5: Read/Write Path Analysis

That 100:1 read-to-write ratio we identified in requirements? Here's where it earns its place.

Write path (URL creation):

  1. Client sends POST with the long URL
  2. Service generates a unique short code
  3. Service writes the mapping to the database
  4. Service returns the short URL

This path is synchronous — the caller needs the short URL immediately to share it. Writes are infrequent. No need to complicate this.

Read path (redirect):

  1. Client sends GET with the short code
  2. Service looks up the short code
  3. Service returns a 301/302 redirect

This path runs 100 times more often than the write path. It needs to be fast — sub-50ms. And it's a pure key-value lookup. This is the read/write split mental model telling you exactly where to invest your optimisation effort.


Step 6: Caching Strategy

For the read path, add a cache layer between the service and the database. Redis or Memcached both work here.

Cache-aside (lazy loading):

  1. On redirect, check the cache first
  2. Cache hit → return immediately (sub-millisecond)
  3. Cache miss → query the database, populate the cache, return the result

With a 100:1 read-to-write ratio and a reasonable cache size, you'll reach 90%+ cache hit rates quickly. The most popular short URLs — which follow a Zipf distribution, meaning the top 1% of URLs account for 50%+ of traffic — will stay warm in cache indefinitely.

Cache invalidation is straightforward here because URL mappings are essentially immutable. Once created, a short code always maps to the same URL. The only invalidation event is expiry, which you handle by setting a TTL on the cache entry matching the URL's expiry time.

Enjoy this. Most systems don't give you cache invalidation this clean.


Step 7: Scaling Considerations

As traffic grows, here's how each layer of the system evolves — and why:

Application tier: Stateless services behind a load balancer. The service holds no local state (everything is in the database and cache), so horizontal scaling is trivial. Add instances when CPU or memory becomes the bottleneck.

Cache tier: Redis cluster with sharding by short code. As your working set grows beyond what fits on a single node, add shards. The key space distributes cleanly across shards.

Database tier: For reads, add read replicas (your 100:1 ratio rewards this immediately). For writes, if you're using counter-based ID generation, you need a strategy for generating unique IDs across multiple write nodes — pre-allocated ID ranges per node, or a dedicated ID generation service, both work.

CDN layer: For the most popular short URLs, cache the redirect at the CDN edge. A 301 redirect with proper cache headers means subsequent requests never reach your application servers. This is the highest-leverage optimisation available once you understand your traffic distribution.

The elegant property here: each layer scales independently. You don't expand the database because the cache is full. You don't add application servers because the database is slow. Each bottleneck has its own lever.


Step 8: Failure Modes

This is where Failure as a First-Class Citizen earns its keep. Let's walk through each failure scenario before they find us in production.

What if the cache goes down?

All traffic hits the database directly. Latency increases, but the system still serves redirects. This is graceful degradation — the cache is a performance optimisation, not a dependency critical to functionality. Set alerts on cache hit rate so you know the moment it's missing.

What if the database goes down?

Reads from cache continue working for all cached URLs — which, given our hit rate, is the vast majority of traffic. New URL creation fails. You return a 503 for creation requests. For most use cases, that's acceptable.

What if the ID generator fails?

URL creation stops. A centralised counter is a single point of failure worth addressing explicitly. Options: multiple generators with non-overlapping ID ranges, or fall back to UUID-based generation temporarily.

What about hot URLs?

A viral short URL might receive tens of millions of hits per minute. Your cache handles this naturally — one key, served from memory. But if that key expires and thousands of requests hit the database simultaneously, you have a cache stampede. Solution: use "probabilistic early recomputation" or a mutex pattern — when the cache misses, only one request queries the database while others wait a few milliseconds. The first request repopulates the cache; everyone else gets the fresh value.

None of these failure modes are theoretical. Every one of them will happen if your service runs long enough. Planning for them now is dramatically cheaper than firefighting them at 3am.


Connecting the Mental Models

Let's step back and see where each mental model from the previous article appeared in this design:

State vs Behaviour: We kept state (URL mappings in the database and cache) cleanly separated from behaviour (the stateless service that creates and resolves URLs). The service scales horizontally because it holds no local state.

Sync vs Async: The write path is synchronous because the caller needs the short URL immediately. Analytics (tracking click counts, referrer data) would be asynchronous — fire an event to a queue, process it without blocking the redirect.

Read/Write Split: The 100:1 ratio drove the entire caching strategy. The cache doesn't exist because caching is fashionable — it exists because we measured the ratio and followed the math.

Bounded Context: The URL mapping service owns its own data. It doesn't share a database with user management or analytics. Each concern has a clear boundary.

Failure First: We designed for cache failures, database outages, and cache stampedes before we wrote a line of code. Not because we're pessimists — because we're realistic about how production systems behave.

This is what applied system design looks like. Not memorised architectures, but mental models producing principled decisions for a specific problem.


Key Takeaways

  • A URL shortener is the ideal first system design exercise because the product surface is tiny but the design decisions are interconnected and real.
  • Always start with requirements, and identify the read/write ratio early — it drives your entire architecture from data model to caching strategy.
  • Base62 encoding with a counter is simpler and more reliable than hash-based approaches. Collision-free, predictable, and operationally straightforward.
  • Database choice matters less than caching strategy in a read-heavy system with simple access patterns. Optimise the common path.
  • Analyse the read path and write path separately. They run at different frequencies, have different latency requirements, and have different failure characteristics.
  • Design for failure upfront. Cache misses, database outages, hot keys, and stampedes are not edge cases — they are operational realities any long-running production service will encounter.
  • Mental models turn a blank whiteboard into structured decisions. Every choice in this design traces back to a principle. That's what principled engineering looks like.