[BEE-9001] Caching Fundamentals and Cache Hierarchy
INFO
CPU cache, application cache, CDN cache -- understanding the full stack of caching layers and the patterns that connect them.
Why Caching Matters
Every time your application fetches data, it pays a cost: network round-trips, disk I/O, CPU for query execution, and the infrastructure that handles all of it. Caching reduces that cost by keeping a copy of data closer to where it is needed.
The three primary motivations are:
- Latency reduction -- A Redis lookup takes ~0.1 ms. A database query under load can take 10-100 ms. A CDN edge node can serve a response in single-digit milliseconds vs. 100-300 ms from the origin. Every layer you avoid saves real time for real users.
- Load reduction -- A popular product page queried 10,000 times per minute hits your database 10,000 times without a cache. With a cache hit ratio of 99%, that drops to 100 database calls. Your database can handle the remaining load without scaling.
- Cost savings -- Fewer database queries, fewer compute cycles, cheaper bandwidth. At scale, a well-tuned cache can reduce infrastructure spend significantly.
Cache Hierarchy
Modern systems have multiple caching layers. Data moves from slower, larger, cheaper storage toward faster, smaller, more expensive storage as it gets closer to the CPU or the end user.
| Layer | Technology | Typical Latency | Typical Capacity |
|---|---|---|---|
| CPU L1 cache | Hardware | ~1 ns | 32-512 KB |
| CPU L2 cache | Hardware | ~4 ns | 256 KB - 4 MB |
| CPU L3 cache | Hardware | ~10-40 ns | 4-64 MB |
| In-process memory | JVM heap, Node.js V8 heap | ~0.01-0.1 ms | MBs (limited by process) |
| Distributed cache | Redis, Memcached | ~0.1-2 ms | GBs |
| Reverse proxy / CDN PoP | Nginx, Varnish, Cloudflare | ~1-50 ms | GBs-TBs |
| Origin / database | PostgreSQL, MySQL, S3 | ~1-100+ ms | TBs+ |
The closer the cache is to the reader, the faster the response -- and the smaller the cache. The hierarchy forces you to be selective about what lives where.
Caching Patterns
Cache-Aside (Lazy Loading)
The application owns the cache interaction. The cache is populated on demand, only when data is actually requested.
Read path:
function get(key):
value = cache.get(key)
if value is null: // cache miss
value = database.query(key)
cache.set(key, value, ttl=300) // populate cache with TTL
return valueWrite path:
function set(key, newValue):
database.update(key, newValue) // write to source of truth first
cache.delete(key) // invalidate stale cache entry
// next read will re-populate from DBCharacteristics:
- Cache is only populated with data that is actually read. No wasted memory on cold data.
- First request after a miss is slow (cold start). Can be mitigated with cache warming.
- The application code must handle both hit and miss paths.
- If you update the DB without invalidating the cache, you get stale reads until TTL expires.
Use when: read-heavy workloads, data that is read far more than it is written, acceptable tolerance for brief stale reads.
Read-Through
The cache layer sits between the application and the database. On a miss, the cache itself fetches the data from the database, stores it, and returns it. The application always talks to the cache.
function get(key):
// Application calls cache only.
// Cache handles miss internally: fetches from DB, stores, returns.
return cache.get(key) // cache manages DB fetch transparentlyCharacteristics:
- Application code is simpler -- no explicit miss-handling logic.
- First miss is still slow; the cache does the heavy lifting, not the app.
- Often provided by caching libraries or frameworks (e.g., Spring Cache, NCache).
Use when: you want to keep caching logic out of business code; the caching layer supports read-through natively.
Write-Through
Every write goes to the cache and the database synchronously before the write is acknowledged to the caller.
function set(key, newValue):
cache.set(key, newValue) // write to cache
database.update(key, newValue) // write to DB (same transaction or two-phase)
return successCharacteristics:
- Cache is always consistent with the database. Reads after writes never see stale data.
- Write latency increases -- you pay for two writes on every mutation.
- Can populate the cache with data that is never read again (write without subsequent reads).
- Combine with TTL to prevent stale data from cold-written keys.
Use when: read-after-write consistency is critical; write volume is manageable; data written is likely to be read again soon.
Write-Behind (Write-Back)
Writes go to the cache immediately and are acknowledged. The cache asynchronously flushes changes to the database in the background.
function set(key, newValue):
cache.set(key, newValue) // write to cache, acknowledge immediately
queue.enqueue(WriteJob(key, newValue)) // async flush to DB
return success // caller gets ACK before DB is updated
// Background worker:
function flushWorker():
while true:
job = queue.dequeue()
database.update(job.key, job.value)Characteristics:
- Very low write latency for the caller.
- Risk of data loss if the cache crashes before the flush completes.
- Increased complexity: you need a reliable flush pipeline and failure recovery.
- Writes can be batched or coalesced, reducing total DB write load.
Use when: write throughput is the primary constraint; eventual persistence is acceptable; you have infrastructure to guarantee flush reliability (persistent queue, WAL-backed cache).
Pattern Comparison
| Pattern | Read Consistency | Write Latency | Write Complexity | Risk |
|---|---|---|---|---|
| Cache-Aside | Eventual (TTL) | Low | Moderate | Stale on write |
| Read-Through | Eventual (TTL) | Low | Low | Cold start |
| Write-Through | Strong | High | Moderate | Cache bloat |
| Write-Behind | Eventual | Very Low | High | Data loss on crash |
Cache Warming
After a deployment, cache flush, or cold start, all cache entries are missing. Every request is a miss, and the origin absorbs the full load. This is a cold cache problem.
Cache warming is the process of pre-populating the cache before production traffic hits it:
- Proactive warming: Before deploying, run a script that fetches and caches high-traffic keys (e.g., top 1000 product pages, homepage data).
- Request replay: Replay recent production traffic against the new cache to simulate real usage patterns.
- Gradual rollout: Route a small percentage of traffic to the new cache first, let it warm up, then shift the rest.
Without cache warming, your origin can experience a thundering herd -- a spike of simultaneous misses overwhelming the database immediately after a deploy.
Cache Hit Ratio
Cache hit ratio is the primary health metric for any cache:
hit ratio = cache hits / (cache hits + cache misses) * 100%A ratio of 95% means 95 out of 100 requests are served from cache. The remaining 5% go to the origin.
Guidelines:
- > 95% -- Healthy for read-heavy static content.
- 80-95% -- Acceptable for mixed workloads with moderate write rates.
- < 80% -- Investigate: TTLs may be too short, key space may be too large, or the data access pattern is not cacheable.
Always measure hit ratio before and after changing caching configuration. A change that feels like an optimization (e.g., adding more cache capacity) can sometimes lower hit ratio by diluting hot data across more nodes.
When NOT to Cache
Caching is a trade-off. There are cases where the cost outweighs the benefit:
- Frequently changing data -- If data changes faster than your TTL, you will serve stale results. A stock ticker updating every second with a 60-second TTL is actively misleading.
- Low read-to-write ratio -- If every row is written once and read once, caching adds overhead with no benefit. Session tokens with single-use semantics are an example.
- Highly personalized data -- Data unique to each user (e.g., a user's shopping cart with applied discounts) may not benefit from shared caching infrastructure. Cache key design becomes complex and memory wasteful.
- Security-sensitive data -- Cached data can be read by any process with cache access. Avoid caching authorization tokens, raw credentials, or PII unless the cache is properly access-controlled and encrypted.
- Cache as primary store -- A cache is not a database. If your cache is cleared (planned eviction, OOM, crash), all data must be recoverable from the source of truth. Never cache data that has no durable backing store.
Common Mistakes
- Caching without measuring hit ratio. You cannot know if your cache is helping without instrumenting it. Add cache hit/miss counters before deploying any caching layer.
- No TTL (stale data forever). Without a TTL, cached entries live until eviction. If the source data changes, the cache will serve stale results indefinitely. Always set a TTL appropriate to the data's change frequency.
- Caching everything. Memory is finite. Caching rarely-accessed data wastes capacity that could hold hot data, reducing overall hit ratio. Profile your access patterns before deciding what to cache.
- Ignoring cache consistency with the source of truth. If you update the database without invalidating or updating the cache, reads will return stale data. Design your write path to manage cache state explicitly (see write patterns above and BEE-201).
- Treating the cache as the only data store. Caches are ephemeral by default. Redis can be restarted, Memcached has no persistence, CDN caches are purged on deploy. Every cached value must be reconstructible from a durable source.
Related BEPs
- BEE-9002 -- Cache Invalidation Strategies: how to keep the cache consistent with the source of truth.
- BEE-9003 -- Cache Eviction Policies: LRU, LFU, TTL, and how eviction affects hit ratio.
- BEE-9004 -- Distributed Caching: consistent hashing, sharding, and replication in multi-node caches.
- BEE-9006 -- HTTP Caching: Cache-Control headers, ETags, conditional requests, and CDN semantics.
References
- Caching Strategies and How to Choose the Right One -- CodeAhoy
- Caching Patterns -- Database Caching Strategies Using Redis (AWS)
- ElastiCache Best Practices and Caching Strategies (AWS)
- Cache-Aside Pattern -- Azure Architecture Center (Microsoft)
- What is a Cache Hit Ratio? -- Cloudflare
- What Is Cache Warming? -- IOriver