System Design Caching
Caching is the technique of storing a copy of data in a fast-access location so future requests for that data return much faster. Instead of recalculating or re-fetching data from a slow source every single time, a cache provides the answer instantly from memory.
Think of a chef who memorizes the most popular recipe instead of reading the cookbook every time a customer orders. The first order requires the cookbook (slow), but every repeat order comes from memory (fast). The memorized recipe is the cache.
Why Caching Is Critical in System Design
Database queries, external API calls, and complex calculations take time. When millions of users request the same data simultaneously, repeatedly hitting the database creates a bottleneck. Caching removes this bottleneck by serving the result from memory.
Without cache: 10,000 users each trigger a database query → database overloads → slow responses.
With cache: 10,000 users get the same data from cache → database receives very few requests → fast responses.
How Caching Works
+--------+ +---------+ +----------+
| | Check | | MISS: | |
| Client | ------> | Cache | ------> | Database |
| | | (Redis) | | |
+--------+ +---------+ +----------+
HIT: | |
Return | Store result |
cached <-+<------------------+
data
- Client requests data.
- The system checks the cache first.
- Cache HIT: Data is in cache → return immediately, no database query needed.
- Cache MISS: Data not in cache → query database → store result in cache → return to client.
- Next time the same data is requested, it comes from cache (HIT).
Cache Hit Rate
Cache hit rate is the percentage of requests served from cache. A high hit rate means the cache is effective.
Cache Hit Rate = (Cache Hits / Total Requests) × 100 Example: - Total requests: 1,000 - Cache hits: 850 - Hit rate: 850/1000 × 100 = 85%
A hit rate above 80% is generally considered good. Below 50% means the cache is not helping much and needs tuning.
Types of Caching
1. In-Memory Cache
Data is stored in RAM (computer memory) on the application server or a dedicated cache server. This is the fastest form of caching — RAM access takes nanoseconds.
Tools: Redis, Memcached
Example: Storing the top 10 trending products so every user gets them instantly without a database call.
2. Database Query Cache
The database stores the result of frequently run queries. The next time the same query runs, it returns the cached result instead of executing again.
Example: MySQL query cache stores the result of "SELECT * FROM products WHERE category = 'electronics'" so repeated calls skip query execution.
3. CDN Cache (Content Delivery Network)
Static assets like images, videos, JavaScript files, and CSS are cached on servers distributed around the world. Users receive content from the nearest server, not the origin server.
Example: A YouTube thumbnail gets cached at CDN nodes in India, USA, and Europe. Users in each region load it from the nearest node.
4. Browser Cache
The browser stores web assets locally on the user's device. On a repeat visit, the browser loads files from local storage instead of downloading them again.
Example: A website's logo is cached by the browser after the first visit. Every page on the site loads the logo from cache, saving a network request each time.
5. Application-Level Cache
The application code stores computed results in a variable or object. This is the simplest cache — just storing a value to reuse instead of recalculating.
Example: After computing a user's dashboard statistics, the result is stored in a dictionary for 60 seconds before recalculating.
Cache Placement Strategies
Cache-Aside (Lazy Loading)
The application manages the cache manually. It checks the cache first, fetches from the database on a miss, and stores the result in cache before returning it.
Application → Check Cache Cache HIT → Return data Cache MISS → Fetch from DB → Write to Cache → Return data
Advantage: Only requested data gets cached (no wasted space).
Disadvantage: First request for any data is always slow (cold start).
Write-Through Cache
Every time data gets written to the database, it simultaneously gets written to the cache. The cache is always up to date.
Application → Write to Database AND Cache at the same time
Advantage: Cache is never stale. Reads are always fast.
Disadvantage: Every write hits both database and cache, making writes slower.
Write-Behind (Write-Back) Cache
Data writes go to the cache first. The cache then asynchronously writes to the database in the background, often in batches.
Application → Write to Cache → Return success
Cache → Batch write to Database (async)
Advantage: Writes are extremely fast.
Disadvantage: Risk of data loss if cache crashes before writing to the database.
Read-Through Cache
The cache itself handles fetching from the database. The application always talks to the cache, and the cache decides whether to fetch fresh data.
Application → Ask Cache → Cache checks itself HIT → Return cached data MISS → Cache fetches from DB → Cache stores result → Returns to app
Cache Eviction Policies
Memory is limited. When the cache fills up, old entries must be removed to make space for new ones. This is called eviction.
| Policy | What Gets Removed | Best For |
|---|---|---|
| LRU (Least Recently Used) | The item not accessed for the longest time | General-purpose caching (most common) |
| LFU (Least Frequently Used) | The item accessed fewest times | When popularity matters more than recency |
| FIFO (First In First Out) | The oldest item added to cache | Simple use cases, time-ordered data |
| TTL (Time To Live) | Items that have expired based on set time | Data that changes on a known schedule |
| Random | A random item | When usage patterns are unpredictable |
Cache Invalidation
When the source data changes, the cached copy becomes outdated (stale). Cache invalidation is the process of removing or updating stale cache entries.
Scenario: A product's price changes in the database. Without invalidation: - Cache still serves old price → Users see wrong price With invalidation: Option 1: Delete cache entry → Next request fetches fresh price from DB Option 2: Update cache entry immediately when price changes Option 3: Set TTL of 5 minutes → Cache auto-expires and refreshes
Cache invalidation is one of the hardest problems in computer science. The challenge is deciding when data is "stale enough" to evict, balancing freshness against performance.
Cache Stampede (Thundering Herd Problem)
When a popular cache entry expires, thousands of requests simultaneously find a cache MISS and all hit the database at once. This spike can overwhelm the database.
Cache TTL expires for "top_products" at 12:00:00 ↓ 1,000 requests arrive at 12:00:01 ↓ All find MISS → All query database → Database overloads!
Solution: Use a lock or mutex so only one request rebuilds the cache while others wait, or use probabilistic early expiration to refresh the cache before it officially expires.
Distributed Caching
A single cache server becomes a bottleneck as the system grows. Distributed caching spreads the cache across multiple nodes, allowing the system to scale horizontally.
+----------+ +----------+ +----------+
| Cache | | Cache | | Cache |
| Node 1 | | Node 2 | | Node 3 |
| Keys A-H | | Keys I-P | | Keys Q-Z |
+----------+ +----------+ +----------+
Consistent Hashing distributes keys
Redis Cluster and Memcached are common distributed caching solutions used in large-scale systems.
Summary
Caching dramatically improves system performance by serving frequently requested data from fast memory instead of slow storage. The right caching strategy depends on how often data changes, how critical freshness is, and how much memory is available. Mastering cache placement, eviction, and invalidation is essential for designing systems that stay fast under heavy load.
