Caching

A cache stores the result of an expensive operation so future requests can be served from fast memory rather than recomputed or re-fetched. A well-placed cache can reduce database load by orders of magnitude and cut response times from hundreds of milliseconds to single-digit milliseconds.

Cache Layers

Caches exist at every layer of the stack. Understanding which layer to target determines what you can cache and what the trade-offs are.

Client-side — browser cache, HTTP cache headers (Cache-Control, ETag). Zero server load for repeated requests.
CDN — edge caches at Points of Presence globally. Reduces latency for static and semi-static content without touching your origin.
Application cache — an in-process cache (in-memory map, Caffeine in Java) or a distributed cache (Redis, Memcached). Stores computed results, database query results, and session data.
Database query cache — some databases internally cache query results. Largely deprecated; MySQL removed its query cache in 8.0 because it was a bottleneck under concurrent writes.
OS / hardware — CPU cache, OS page cache. Transparent to the application; the OS automatically caches recently accessed disk pages in RAM.

In system design, "add a cache" almost always means an application-level distributed cache — specifically Redis.

Cache Strategies

Cache-aside (lazy loading) — the application checks the cache first. On a miss it fetches from the database, writes to the cache, and returns the result. The most common pattern. The cache is populated only on demand, so it never holds data that is never requested. Risk: cold start on cache restart; stale data until TTL expires.
Read-through — the cache sits in front of the database. On a miss the cache itself fetches from the database and populates itself — the application always talks to the cache only. Simplifies application code; the cache library handles population logic.
Write-through — every write goes to the cache and the database synchronously. The cache is always consistent with the database. Trade-off: write latency doubles; the cache fills with data that may never be read.
Write-back (write-behind) — writes go to the cache immediately and to the database asynchronously. Fastest writes but carries a data loss risk if the cache crashes before flushing. Used for write-heavy workloads where a small window of potential loss is acceptable.

Eviction Policies

When the cache fills up, something must be evicted to make room. The eviction policy determines what gets removed.

LRU (Least Recently Used) — evicts the item not accessed for the longest time. The most common policy; works well when recent access predicts future access (temporal locality).
LFU (Least Frequently Used) — evicts the item accessed the fewest total times. Better for skewed access patterns where a small set of hot keys are accessed very frequently. Downside: new items start with a low frequency count and are easily evicted before they get a chance to warm up.
FIFO (First In, First Out) — evicts the oldest inserted item regardless of access. Simple but ignores access popularity entirely.
TTL (Time To Live) — items expire after a fixed duration regardless of access pattern. Simple and predictable; good for data that naturally goes stale (user sessions, API responses).
Random replacement — evicts a random item. Surprisingly competitive with LRU in some workloads; commonly used in CPU cache design.

Cache Invalidation

Phil Karlton famously said there are only two hard problems in computer science: cache invalidation and naming things. When the underlying data changes, the cached version becomes stale.

Invalidation strategies

TTL expiry — accept eventual consistency up to the TTL window. No coordination needed. Choose the TTL based on how stale the data can be before it causes a user-visible problem.
Event-driven invalidation — when a write occurs, explicitly delete or update the relevant cache key. Requires coordination between the write path and the cache. More complex but gives near-real-time consistency.
Write-through — invalidation is automatic because the cache is always written on every database write. Consistency is guaranteed; the trade-off is write latency.

Cache stampede (thundering herd)

When a popular cache key expires, many requests simultaneously miss and all fire database queries at once — potentially overwhelming the database. Solutions:

Mutex / lock — only one request rebuilds the cache; all others wait. Reduces database pressure but adds latency for waiting requests.
Probabilistic early expiry — begin refreshing the cache slightly before it expires using a random jitter. Prevents a cliff-edge stampede of concurrent requests.
Stale-while-revalidate — serve the stale cached value immediately while asynchronously refreshing it in the background. Zero latency increase; briefly stale data.

Redis vs Memcached

	Redis	Memcached
Data structures	Strings, hashes, lists, sets, sorted sets, streams	Strings only
Persistence	RDB snapshots + AOF log	None — memory only
Clustering	Built-in Redis Cluster	Client-side sharding
Threading	Single-threaded commands, multi-threaded I/O in v6+	Multi-threaded — better raw throughput on multi-core
Use cases	Cache, session store, pub/sub, leaderboards, rate limiting, job queues	Simple high-throughput key-value caching

Redis is almost always the right choice unless you specifically need raw multi-threaded throughput with no other features. Its rich data structure support means it doubles as a rate limiter (sorted sets), a session store (hashes), a pub/sub bus (streams), and a job queue (lists with BLPOP) — reducing the number of separate infrastructure components your system needs.