Caching Strategies That Actually Work in Production


Every performance optimization guide recommends caching. Add Redis. Cache database queries. Cache API responses. Cache everything. The advice isn’t wrong, but it’s incomplete.

Caching introduces complexity. When it works, it’s invisible. When it breaks, it creates confusing bugs that are hard to debug. After years of implementing caching systems and then debugging them at 2 AM, I’ve developed some opinions about what actually works in production.

The Fundamental Tradeoff

Caching trades consistency for speed. Instead of fetching fresh data every time, you serve stored data that might be stale. This tradeoff is fine when staleness doesn’t matter. It’s a problem when it does.

The challenge is that staleness tolerance varies by use case, sometimes in the same application. User profile data? Cache aggressively, nobody cares if the bio updates in thirty seconds instead of instantly. Account balance? Cache cautiously or not at all, people get upset when numbers are wrong.

Most caching bugs I’ve debugged came from assumptions about staleness tolerance that turned out to be wrong. Someone added caching to speed up a slow query. It worked great. Then a new feature got added that assumed fresh data, and suddenly there’s a bug that only appears sometimes.

The lesson: before adding caching, document what staleness tolerance exists and who depends on fresh data. This sounds obvious, but it’s easy to skip when you’re just trying to make a slow page faster.

Cache Invalidation Is The Hard Part

Phil Karlton’s quote is famous: “There are only two hard things in Computer Science: cache invalidation and naming things.” It’s famous because it’s true.

The problem with cache invalidation is that knowing when to invalidate requires understanding what could change the cached data. That’s easy for simple cases—when you update a user record, invalidate that user’s cache. It gets complicated fast.

Consider a cached homepage that shows “recently published articles.” When a new article publishes, you need to invalidate the homepage cache. When an article gets unpublished, you need to invalidate it. When article titles get edited, maybe you need to invalidate it depending on what the homepage shows. When an article’s author changes their name… does that invalidate the homepage?

This dependency tracking becomes a maintenance burden. Every code change requires thinking “what caches does this affect?” And people forget, so you end up with caches that don’t invalidate when they should.

Strategies That Work

Time-based expiration is simple and robust. Cache for five minutes, then let it expire. The worst-case staleness is five minutes. You don’t need to track dependencies or implement invalidation logic. The downside is you’ll occasionally serve stale data and you’ll regenerate caches regularly even when nothing changed.

For read-heavy, write-rarely data, this works great. Blog homepages, product catalogs, reference data—cache with a reasonable TTL and call it done.

Write-through caching updates the cache whenever you update the database. When someone updates their profile, you update the database and update the cache at the same time. This keeps the cache fresh without invalidation logic.

The catch is it only works if all writes go through the same code path. If any code directly updates the database, or if writes can happen in multiple services, write-through caching becomes unreliable.

Cache-aside (lazy loading) is the pattern where you check the cache first, and if the data isn’t there, you fetch from the database and store it in the cache. This is simple to implement and means you only cache data that’s actually being requested.

The problem is the “thundering herd”—if the cache expires during high traffic, suddenly hundreds of requests all hit the database simultaneously trying to regenerate the same cached data. This can cause the exact outage you were trying to prevent with caching.

Probabilistic early expiration solves the thundering herd by starting cache regeneration slightly before actual expiration, with the probability increasing as expiration approaches. Team400’s engineering teams often implement this pattern for high-traffic endpoints.

When Not To Cache

Don’t cache data that changes frequently relative to its cache lifetime. If your cache TTL is one minute but data updates every ten seconds, you’re adding complexity without much benefit.

Don’t cache data where staleness causes user-visible bugs. Financial transactions, inventory counts, anything where seeing old data causes problems—either don’t cache it or cache very conservatively.

Don’t cache to fix a slow database query without understanding why it’s slow. Sometimes the right fix is query optimization or adding an index, not throwing caching at it. Caching obscures the underlying problem and adds technical debt.

Don’t cache across authentication boundaries. Never serve User A’s cached data to User B. This sounds obvious, but cache key generation bugs that leak data across users are common and severe.

Caching Layers

In practice, you usually have multiple cache layers: CDN, application cache, database query cache, maybe others. Each layer has different characteristics and appropriate use cases.

CDN caching is great for static assets and public pages. It’s geographically distributed, reduces load on your servers, and is fast. But it’s hard to invalidate—you’re at the mercy of TTLs or manual purging.

Application caching (Redis, Memcached) gives you control and fast local access. You can implement complex invalidation logic and shared caching across application servers. The tradeoff is operational complexity and another service to maintain.

Database query caches are often automatic but limited in scope. They help with repeated identical queries but don’t cache computed results or aggregations.

I tend to start with TTL-based application caching for anything that needs to be cached, using conservative TTLs. Then I optimize specific cases based on monitoring. Most applications don’t need elaborate multi-layer caching with complex invalidation. They need a few specific slow operations cached sensibly.

Monitoring and Debugging

If you’re caching, you need cache metrics: hit rate, miss rate, eviction rate, and key count. A low hit rate means you’re caching data that doesn’t get reused. A high eviction rate means your cache is too small or TTLs are too long.

Include cache status in logs and debug tools. When debugging weird bugs, “is this cached?” should be easy to answer. I add cache indicators to server responses during development—HTTP headers or debug output showing whether data came from cache.

Make it easy to bypass caches in development and testing. A query parameter or header that forces fresh data saves time when debugging. Just make sure it doesn’t work in production.

What I Actually Do

For most projects, I start with no caching and optimize based on monitoring. If specific endpoints are slow, I measure where the time goes. If it’s database queries, I optimize the queries first. If that’s not enough, I add caching with simple TTL-based expiration.

I cache at the highest level possible—cache complete API responses rather than individual database queries when I can. This reduces cache complexity and gives the best performance improvement.

I avoid clever invalidation logic unless there’s a clear requirement for it. TTL-based expiration is boring and works. Manual cache invalidation is exciting and breaks in subtle ways.

When I do need invalidation, I prefer explicit cache clearing over automatic dependency tracking. When you update an article, explicitly clear relevant caches in the same transaction. It’s more code but easier to understand and debug.

And I document staleness tolerance. For every cached thing, there’s a comment explaining how stale the data can be and why. This helps future developers (including future me) make good decisions.

Caching is a useful tool. But like all tools, it solves some problems while creating others. Understanding the tradeoffs and keeping implementations simple makes the difference between caching that improves your application and caching that becomes a maintenance burden.

—Murtaza Khan