System Design Caching

Caching is the technique of storing a copy of data in a fast-access location so future requests for that data return much faster. Instead of recalculating or re-fetching data from a slow source every single time, a cache provides the answer instantly from memory.

Think of a chef who memorizes the most popular recipe instead of reading the cookbook every time a customer orders. The first order requires the cookbook (slow), but every repeat order comes from memory (fast). The memorized recipe is the cache.

Why Caching Is Critical in System Design

Database queries, external API calls, and complex calculations take time. When millions of users request the same data simultaneously, repeatedly hitting the database creates a bottleneck. Caching removes this bottleneck by serving the result from memory.

Without cache: 10,000 users each trigger a database query → database overloads → slow responses.

With cache: 10,000 users get the same data from cache → database receives very few requests → fast responses.

How Caching Works

+--------+         +---------+         +----------+
|        |  Check  |         |  MISS:  |          |
| Client | ------> |  Cache  | ------> | Database |
|        |         | (Redis) |         |          |
+--------+         +---------+         +----------+
              HIT:      |                   |
              Return    |     Store result  |
              cached  <-+<------------------+
              data

Client requests data.
The system checks the cache first.
Cache HIT: Data is in cache → return immediately, no database query needed.
Cache MISS: Data not in cache → query database → store result in cache → return to client.
Next time the same data is requested, it comes from cache (HIT).

Cache Hit Rate

Cache hit rate is the percentage of requests served from cache. A high hit rate means the cache is effective.

Cache Hit Rate = (Cache Hits / Total Requests) × 100

Example:
- Total requests: 1,000
- Cache hits: 850
- Hit rate: 850/1000 × 100 = 85%

A hit rate above 80% is generally considered good. Below 50% means the cache is not helping much and needs tuning.

Types of Caching

1. In-Memory Cache

Data is stored in RAM (computer memory) on the application server or a dedicated cache server. This is the fastest form of caching — RAM access takes nanoseconds.

Tools: Redis, Memcached

Example: Storing the top 10 trending products so every user gets them instantly without a database call.

2. Database Query Cache

The database stores the result of frequently run queries. The next time the same query runs, it returns the cached result instead of executing again.

Example: MySQL query cache stores the result of "SELECT * FROM products WHERE category = 'electronics'" so repeated calls skip query execution.

3. CDN Cache (Content Delivery Network)

Static assets like images, videos, JavaScript files, and CSS are cached on servers distributed around the world. Users receive content from the nearest server, not the origin server.

Example: A YouTube thumbnail gets cached at CDN nodes in India, USA, and Europe. Users in each region load it from the nearest node.

4. Browser Cache

The browser stores web assets locally on the user's device. On a repeat visit, the browser loads files from local storage instead of downloading them again.

Example: A website's logo is cached by the browser after the first visit. Every page on the site loads the logo from cache, saving a network request each time.

5. Application-Level Cache

The application code stores computed results in a variable or object. This is the simplest cache — just storing a value to reuse instead of recalculating.

Example: After computing a user's dashboard statistics, the result is stored in a dictionary for 60 seconds before recalculating.

Cache Placement Strategies

Cache-Aside (Lazy Loading)

The application manages the cache manually. It checks the cache first, fetches from the database on a miss, and stores the result in cache before returning it.

Application → Check Cache
  Cache HIT  → Return data
  Cache MISS → Fetch from DB → Write to Cache → Return data

Advantage: Only requested data gets cached (no wasted space).

Disadvantage: First request for any data is always slow (cold start).

Write-Through Cache

Every time data gets written to the database, it simultaneously gets written to the cache. The cache is always up to date.

Application → Write to Database AND Cache at the same time

Advantage: Cache is never stale. Reads are always fast.

Disadvantage: Every write hits both database and cache, making writes slower.

Write-Behind (Write-Back) Cache

Data writes go to the cache first. The cache then asynchronously writes to the database in the background, often in batches.

Application → Write to Cache → Return success
             Cache → Batch write to Database (async)

Advantage: Writes are extremely fast.

Disadvantage: Risk of data loss if cache crashes before writing to the database.

Read-Through Cache

The cache itself handles fetching from the database. The application always talks to the cache, and the cache decides whether to fetch fresh data.

Application → Ask Cache → Cache checks itself
  HIT  → Return cached data
  MISS → Cache fetches from DB → Cache stores result → Returns to app

Cache Eviction Policies

Memory is limited. When the cache fills up, old entries must be removed to make space for new ones. This is called eviction.

Policy	What Gets Removed	Best For
LRU (Least Recently Used)	The item not accessed for the longest time	General-purpose caching (most common)
LFU (Least Frequently Used)	The item accessed fewest times	When popularity matters more than recency
FIFO (First In First Out)	The oldest item added to cache	Simple use cases, time-ordered data
TTL (Time To Live)	Items that have expired based on set time	Data that changes on a known schedule
Random	A random item	When usage patterns are unpredictable

Cache Invalidation

When the source data changes, the cached copy becomes outdated (stale). Cache invalidation is the process of removing or updating stale cache entries.

Scenario: A product's price changes in the database.

Without invalidation:
- Cache still serves old price → Users see wrong price

With invalidation:
Option 1: Delete cache entry → Next request fetches fresh price from DB
Option 2: Update cache entry immediately when price changes
Option 3: Set TTL of 5 minutes → Cache auto-expires and refreshes

Cache invalidation is one of the hardest problems in computer science. The challenge is deciding when data is "stale enough" to evict, balancing freshness against performance.

Cache Stampede (Thundering Herd Problem)

When a popular cache entry expires, thousands of requests simultaneously find a cache MISS and all hit the database at once. This spike can overwhelm the database.

Cache TTL expires for "top_products" at 12:00:00
↓
1,000 requests arrive at 12:00:01
↓
All find MISS → All query database → Database overloads!

Solution: Use a lock or mutex so only one request rebuilds the cache while others wait, or use probabilistic early expiration to refresh the cache before it officially expires.

Distributed Caching

A single cache server becomes a bottleneck as the system grows. Distributed caching spreads the cache across multiple nodes, allowing the system to scale horizontally.

+----------+     +----------+     +----------+
| Cache    |     | Cache    |     | Cache    |
| Node 1   |     | Node 2   |     | Node 3   |
| Keys A-H |     | Keys I-P |     | Keys Q-Z |
+----------+     +----------+     +----------+
         Consistent Hashing distributes keys

Redis Cluster and Memcached are common distributed caching solutions used in large-scale systems.

Summary

Caching dramatically improves system performance by serving frequently requested data from fast memory instead of slow storage. The right caching strategy depends on how often data changes, how critical freshness is, and how much memory is available. Mastering cache placement, eviction, and invalidation is essential for designing systems that stay fast under heavy load.

Previous lessons

Back to courses

Next lessons