System Design Real World Case Studies

Real-world case studies bring all system design concepts together into concrete, applied examples. Understanding how well-known systems are designed — and the specific problems they solved — builds the intuition needed to design new systems confidently. This topic walks through four classic system design problems, applying every concept covered in this course.

Case Study 1: Design a URL Shortener (like bit.ly)

Requirements

Functional:

Given a long URL, generate a short URL (e.g., bit.ly/abc123)
Visiting the short URL redirects to the original long URL
Short URLs expire after a configured time period

Non-Functional:

100 million new URLs created per day
10 billion redirects per day (~115,000 redirects/second)
Reads (redirects) are 100× more frequent than writes (URL creation)
99.9% uptime requirement

Scale Estimation

Writes: 100M URLs/day = ~1,160 writes/second
Reads:  10B redirects/day = ~115,740 reads/second
Storage per URL: 500 bytes (long URL + metadata)
5-year storage: 100M × 365 × 5 × 500 bytes ≈ 91 TB

Core Design Decisions

Short URL Generation:

Option 1: Hash the long URL
MD5("https://estudy247.com/long-article") → Take first 7 characters
Risk: Collisions (two different URLs could produce same hash)
Solution: Check if short code exists, append counter if collision

Option 2: Unique ID + Base62 encoding
Generate auto-incrementing ID: 12345678
Encode in base62 (a-z, A-Z, 0-9): 12345678 → "5K9aB3"
7-character base62 = 62^7 = 3.5 trillion unique codes → Never runs out

Architecture

Client
  ↓
Load Balancer (Round Robin across API servers)
  ↓
API Servers (Stateless, horizontally scalable)
  ↓              ↓
Cache          Database
(Redis:        (PostgreSQL)
 short→long    short_url | long_url | created_at | expires_at
 mapping)
  ↑
  Cache HIT (95%+ for popular links): Return immediately
  Cache MISS: Fetch from DB, store in cache, return

Redirect flow:
GET /abc123
→ Check Redis cache → HIT → Return 301 Redirect to long URL
→ Cache MISS → Query DB → Store in Redis → Return 301 Redirect

Key design insight:
301 (Permanent Redirect): Browser caches → Future clicks bypass server entirely
302 (Temporary Redirect): Server handles every click → Better analytics tracking

Case Study 2: Design a Notification System

Requirements

Functional:

Send push notifications, SMS, and emails
Support millions of notifications per day
Allow prioritization (critical alerts vs promotional)
Track delivery status (sent, delivered, failed)

Architecture

Triggering Services (Order Service, Marketing Service, etc.)
  ↓
Notification Service API
  ↓
Priority Queue (Message Broker - Kafka)
  High Priority Topic:  Password resets, security alerts
  Normal Priority Topic: Order confirmations, shipping updates
  Low Priority Topic:   Promotions, newsletters
  ↓
Notification Workers (pull from appropriate topics)
  ↓
  +------------------+------------------+------------------+
  |                  |                  |                  |
Push Worker       Email Worker       SMS Worker
(FCM/APNS)        (SendGrid/SES)     (Twilio/SNS)
  ↓                  ↓                  ↓
Mobile Device     Email Server       Phone (SMS)
  ↓
Delivery Status DB (tracks each notification)
  ↓
Analytics Dashboard (delivery rates, failure rates)

Key Design Decisions

Retry Logic:

Failed notification → Retry with exponential backoff:
Attempt 1: Immediately
Attempt 2: 30 seconds later
Attempt 3: 5 minutes later
Attempt 4: 30 minutes later
Attempt 5: 2 hours later
After 5 failures → Move to Dead Letter Queue → Alert team

Rate Limiting per User:

Limit: Max 10 push notifications per user per day
       Max 3 SMS per user per day
       Max 1 promotional email per user per day
→ Prevents notification fatigue, protects user experience

Case Study 3: Design a Social Media Feed (like Twitter/X)

Requirements

Functional:

Users post tweets (short messages)
Users follow other users
Home timeline shows latest tweets from all followed users
Tweets include text, images, links

Non-Functional:

300 million active users
100,000 tweets posted per second
Read-heavy: timeline views far exceed tweet posts

Feed Generation Approaches

Pull Model (Fanout on Read):

When user opens timeline:
→ Fetch list of all accounts user follows (500 accounts)
→ Query each account's recent tweets
→ Merge, sort by timestamp
→ Return timeline

Problem: Opening timeline requires 500+ queries → Slow!
Better for: Users following very few accounts

Push Model (Fanout on Write):

When user posts a tweet:
→ Find all followers (say: 10,000 followers)
→ Write this tweet into each follower's pre-built timeline cache
→ When any follower opens their feed → Already ready! Return instantly

Pre-built Timeline Cache (Redis):
User 42's timeline: [tweet789, tweet456, tweet123, ...]

Problem: Celebrity with 50M followers posts → 50M cache writes!
Better for: Most regular users

Hybrid Approach (Twitter's actual solution):

Regular users (< 10M followers): Push model (fanout on write)
Celebrities (> 10M followers):   Pull model (read at feed generation time)

Normal user feed:     Prebuilt cache + any celebrity tweets merged at read time
Celebrity user feed:  Prebuilt cache + (celebrity posts merged live when user opens feed)

Architecture

Tweet Creation:
Client → API Gateway → Tweet Service → Tweet DB (MySQL sharded by TweetID)
                                     → Media Service (images → S3 → CDN)
                                     → Fanout Service → User timeline caches (Redis)

Feed Read:
Client → API Gateway → Timeline Service → Redis cache → Render feed
                                       → Merge celebrity tweets (for regular users)

Case Study 4: Design a Ride-Sharing System (like Uber)

Requirements

Functional:

Rider requests a ride with pickup and dropoff locations
System matches rider with nearest available driver
Both rider and driver see real-time location updates
Trip completes, payment processes automatically

Location Tracking Challenge

Millions of drivers update their location every 5 seconds.
5,000,000 drivers × 1 update/5 sec = 1,000,000 location writes/second

Solution: Location Service with write-optimized storage
- Use Cassandra for location data (high write throughput)
- Driver locations stored as: { driverID, lat, lng, timestamp }
- Recent location in Redis (fast read for matching)
- Historical locations in Cassandra (analytics, route replay)

Geospatial Matching

Problem: Rider requests ride in Mumbai.
How to find all drivers within 5km efficiently?

Naive approach: Check every driver's location → 5M calculations → Too slow

Solution: Geohashing
Divide the world into a grid of cells. Each cell has a unique string (geohash).
Nearby locations share the same geohash prefix.

Mumbai driver at lat:19.07, lng:72.87 → Geohash: "te7uddh"
Rider at              lat:19.08, lng:72.88 → Geohash: "te7uddk"

Both start with "te7udd" → Same neighborhood → Nearby!

Query: Find all drivers whose geohash starts with "te7udd" → Fast index scan

Real-Time Location Updates (WebSockets)

HTTP (polling) approach:
Rider app: "Where is driver?" → Server responds
Rider app: Wait 2 seconds → "Where is driver?" → Server responds
→ Many requests, delayed updates, server overhead

WebSocket approach:
Client and server maintain persistent two-way connection
Driver app → WebSocket → Location server → Updates all connected riders instantly
→ No polling, instant updates, efficient

System Architecture

+--------+  WebSocket  +----------+  Kafka   +----------+
| Driver |-----------> | Location | -------> | Matching |
|  App   |             | Service  |          | Service  |
+--------+             +----------+          +----------+
                           |                      |
                        Cassandra              Redis (active
                        (history)              driver pool)
                                                   ↑
                                               Geohash index

+--------+  HTTP  +----------+               +----------+
| Rider  |------> |  Trip    | ------------> | Payment  |
|  App   |        | Service  |               | Service  |
+--------+        +----------+               +----------+
    ↑                  |                          |
    | WebSocket         → Notifications         Stripe/
    | (driver            (Push + SMS)           Braintree
    | location)

Common Patterns Across All Case Studies

Pattern Used	URL Shortener	Notifications	Social Feed	Ride Sharing
Caching	Redis (URL map)	User preferences	Timeline cache	Driver locations
Message Queue	No	Kafka (priority)	Fanout queue	Location updates
Load Balancing	API servers	Worker nodes	Feed servers	All services
Horizontal Scaling	API + DB sharding	Worker scaling	Tweet DB sharding	Location service
Async Processing	Expiry cleanup	All notifications	Fanout writes	Payment, receipts

How to Approach Any System Design Problem

Use this framework for any system design interview or real-world design:

Clarify requirements – Ask about scale, features, and priorities. Confirm functional and non-functional requirements.
Estimate scale – Calculate writes/second, reads/second, storage over 5 years.
Define the API – What endpoints does the system expose? What do they accept and return?
High-level design – Draw the major components: client, API gateway, services, caches, databases, queues.
Deep dive into bottlenecks – Identify the hardest parts (fanout, location queries, payment consistency) and explain solutions.
Address failure scenarios – What happens if the database goes down? If the queue fills up? If a service crashes?
Trade-offs – Acknowledge what decisions sacrifice (e.g., AP vs CP, cost vs performance).

Summary

Real-world systems combine every concept from this course: caching, load balancing, sharding, replication, queues, CDN, rate limiting, and security — all working together. A URL shortener demonstrates read-heavy caching. A notification system shows priority queues and retry logic. A social feed reveals the fanout problem and hybrid push-pull strategies. A ride-sharing system highlights real-time geospatial challenges. Mastering system design means recognizing these patterns and knowing when and how to apply each one. The goal is always the same: build a system that is fast, reliable, scalable, and secure at any scale.

Previous lessons

Back to courses

System Design Real World Case Studies

Case Study 1: Design a URL Shortener (like bit.ly)

Requirements

Scale Estimation

Core Design Decisions

Architecture

Case Study 2: Design a Notification System

Requirements

Architecture

Key Design Decisions

Case Study 3: Design a Social Media Feed (like Twitter/X)

Requirements

Feed Generation Approaches

Architecture

Case Study 4: Design a Ride-Sharing System (like Uber)

Requirements

Location Tracking Challenge

Geospatial Matching

Real-Time Location Updates (WebSockets)

System Architecture

Common Patterns Across All Case Studies

How to Approach Any System Design Problem

Summary

Leave a Comment Cancel reply