A mid-size online grocery retailer — call it FreshCart — has a problem that only shows up on Saturday mornings. That is when its 600,000 weekly shoppers all open the app at once to build their weekend baskets, and the product-catalog pages start taking four, five, six seconds to load. The engineering team’s first instinct is the wrong one: add more application servers and a bigger database. They do, the bill climbs, and the pages are still slow, because the bottleneck was never raw capacity. It was that the same product page, with the same price and the same “in stock” badge, was being rebuilt from scratch — re-querying the database, re-rendering the template — tens of thousands of times a minute. Every one of those rebuilds was identical. That is the entire case for caching in one sentence: stop recomputing answers that have not changed.
This article is a foundational walk through where you cache in a web app and when each layer earns its place. We will follow a request from a shopper’s phone all the way to the database and back, and at every hop ask the same two questions: can we keep a copy of this answer closer to the user, and how do we make sure that copy is not lying to them. Caching is the highest-leverage performance work most teams will ever do — and also the easiest to get subtly wrong, because a stale cache does not crash, it just quietly shows the wrong price.
Why caching, and why it is not just “make it faster”
For FreshCart the pressures are the ones every consumer app feels. Latency sells: every extra second on a product page measurably loses baskets, and a slow app on a Saturday is revenue walking out the door. Scale is spiky — traffic is 10× higher on weekend mornings than Tuesday at 3pm, and provisioning the database for the peak is paying for idle capacity all week. Cost is the database itself: managed relational databases are the most expensive tier to scale, and read traffic is the easiest thing to take off them. Correctness is the catch — a price or stock count that is wrong because it is stale is worse than a slow page, because it erodes trust and, for a promoted item, can mean selling at a loss.
Caching addresses all four at once when it is layered correctly. A cache is just a store of precomputed answers kept somewhere faster or closer than the original source, with a rule for how long the copy stays valid. The art is choosing which layer holds which answer, and that is what the rest of this article is about.
Architecture overview
A request from a FreshCart shopper passes through several places where an answer could already be waiting. Think of caching as a set of nested layers, each catching the requests the one in front of it missed:
- The browser holds static assets and some API responses locally, so a returning shopper re-uses files already on their device.
- The CDN edge — here Akamai — sits in hundreds of cities worldwide and serves cached images, scripts, stylesheets, and cacheable pages from a server physically near the shopper, so the request never crosses the ocean to the origin.
- The application cache — Redis (managed as Amazon ElastiCache on AWS, Memorystore on GCP, or Azure Cache for Redis) — holds expensive computed results and database query outputs in memory, shared across all the app servers.
- The database’s own cache — the buffer pool — keeps hot rows and indexes in RAM so even a query that reaches the database often avoids touching disk.
- The database itself is the source of truth, the slowest and most expensive layer, and the one every cache exists to protect.
The design principle that ties them together: serve every request from the furthest-out layer that can correctly answer it. A product image should come from Akamai’s edge, never the origin. A rendered “you may also like” list should come from Redis, not a fresh database query. A real-time stock count for the item in your cart might have to reach the database — but the page around it does not. Caching is the discipline of pushing each answer as far out as its freshness requirements allow.
The control flow on a cache hit is short and cheap: the shopper’s request meets a layer that already has the answer and returns immediately. The flow on a miss is the expensive path we are trying to make rare: the request falls through every layer to the database, the answer is computed, and — critically — it is written back into the caches on the way out so the next identical request is a hit. A cache that does not populate itself on a miss is not a cache.
Layer 1: the CDN edge
The CDN is the first and cheapest win, and for a catalog-heavy app like FreshCart it is enormous. The vast majority of bytes on a product page are static: product photos, the JavaScript bundle, the CSS, fonts, icons. None of that changes between shoppers. Akamai (or AWS CloudFront) caches those assets at edge locations close to users and serves them without ever contacting FreshCart’s origin servers. A shopper in Mumbai hits an Akamai edge in Mumbai; the photos load in milliseconds and FreshCart’s servers never see the request.
What makes an edge cache work — or quietly break — is cache-control headers. The origin tells the CDN how long each asset may be cached. Static assets that never change get long lifetimes; anything personalized or fast-moving gets marked off-limits.
# A product image — safe to cache hard, for a long time, everywhere
Cache-Control: public, max-age=31536000, immutable
# A logged-in shopper's basket — never cache this at a shared edge
Cache-Control: private, no-store
The single most important distinction at the edge is public versus private. A shared CDN edge serving one shopper’s basket or saved address to the next shopper is a serious data-leak bug, not a performance win. Personalized and authenticated responses must be marked private (or no-store) so the CDN never holds a shared copy. This is exactly where identity matters: requests that carry a session — authenticated through FreshCart’s IdP, Microsoft Entra ID for staff-facing admin tools and a customer identity provider for shoppers — must bypass shared caching, while the anonymous catalog browse that makes up most traffic is fully cacheable.
The classic edge technique for content that changes occasionally — a price update, a new “out of stock” badge — is the cache busting / purge pair. Versioned asset URLs (app.4f9c2a.js) let you cache forever and simply ship a new URL when the file changes; the old one is never requested again. For pages, Akamai exposes a purge API so that when a product’s price changes, FreshCart’s catalog service tells the CDN to drop the cached copy of that one product page immediately, rather than waiting for a TTL to expire.
Layer 2: the application cache (Redis)
Past the edge, the requests that remain are the dynamic ones — a category page assembled from inventory, pricing, and personalization. Rebuilding that from the database on every request is what melted FreshCart on Saturday. The fix is an in-memory cache shared by all the app servers: Redis.
Redis sits beside the application and holds the results of expensive work keyed by what produced them. The “you may also like” recommendations for category bakery, the rendered HTML fragment for a top-50 product, the result of a slow aggregation query — each is computed once, stored in Redis with an expiry, and served from memory (sub-millisecond) to every subsequent request until it expires. Because it is shared, app server #7 benefits from the work app server #2 already did. Running it as a managed service — ElastiCache, Memorystore, or Azure Cache for Redis — means failover, patching, and backups are the cloud provider’s problem, not a 2am page for FreshCart’s on-call.
The decision that defines an application cache is how writes flow through it, and there are two foundational patterns every junior engineer should be able to draw.
Cache-aside (lazy loading)
The default, and the right starting point for most data. The application owns the logic: check the cache; on a miss, read the database, then populate the cache.
def get_product(product_id):
key = f"product:{product_id}"
cached = redis.get(key)
if cached:
return cached # hit — fast path
product = db.query_product(product_id) # miss — go to source of truth
redis.set(key, product, ex=300) # populate for next time, 5-min TTL
return product
Cache-aside is simple, resilient (if Redis is down, the app still works — just slower, hitting the database directly), and only ever caches data that is actually requested. Its weakness is the first request after expiry is always a miss, and under heavy concurrent load many requests can miss the same key at once and stampede the database. We address that below.
Write-through
Here writes go through the cache: the application updates the cache and the database together, so the cache is always populated and never stale relative to the last write.
def update_price(product_id, price):
db.update_price(product_id, price) # write to source of truth
key = f"product:{product_id}"
redis.set(key, db.query_product(product_id)) # update cache in lockstep
Write-through keeps reads fast and the cache fresh, at the cost of making every write a little slower and more complex. It shines for data that is read far more than written and where staleness is unacceptable — a product’s price is the canonical example. The trade is concrete enough to tabulate.
| Cache-aside (lazy) | Write-through | |
|---|---|---|
| Cache populated on | Read miss | Write |
| Read latency | Fast after first miss | Always fast |
| Write latency | Normal (no cache work) | Slower (cache + DB) |
| Staleness risk | Until TTL expires | Minimal |
| If cache is down | App still serves (slower) | Writes need a fallback path |
| Best for | Most read data; tolerates short staleness | Hot, read-heavy data that must stay fresh |
| FreshCart example | Recommendation lists, category pages | Live prices, promo flags |
A common, pragmatic combination is cache-aside for reads plus explicit invalidation on writes — read lazily, and when a price changes, delete the cache key so the next read repopulates it from the database. That is often simpler and safer than full write-through, because deleting a key can never make the cache wrong; the worst case is one extra miss.
Layer 3: the database’s own cache
Even requests that reach the database should rarely touch disk. Relational engines keep a buffer pool — hot rows and index pages held in RAM — so frequently accessed data is served from memory inside the database itself. This layer is mostly automatic, but it is real and worth understanding: it is why the first query for a row is slow (disk) and the next is fast (buffered), and why giving the database enough memory to hold the working set is one of the cheapest performance levers available before you add any external cache at all. FreshCart’s job here is mostly to protect this layer — every request Redis and Akamai absorb is one the database’s buffer pool does not have to fight for.
A close cousin is the read replica: a copy of the database that serves read queries so the primary handles writes. It is not strictly a cache, but it serves the same goal — taking read load off the expensive primary — and pairs naturally with the layers above for the reads that genuinely need fresh, queryable data.
TTL and invalidation: the only hard part
There is an old joke that there are only two hard problems in computer science, and cache invalidation is one of them. For a junior engineer the practical version is this: a cache trades freshness for speed, and TTL plus invalidation is how you control that trade.
TTL (time-to-live) is the blunt, reliable tool: every cached entry carries an expiry, after which it is discarded and recomputed. Choosing the TTL is a business decision dressed as a technical one. A product image: effectively forever. A recommendation list: minutes — slightly stale suggestions hurt no one. A stock count for a fast-selling item: seconds, or do not cache the live number at all. The rule of thumb: set TTL to the longest staleness the business can tolerate for that specific piece of data, and no longer. Longer TTL means higher hit rates and lower cost, but more stale-data risk — the dial is yours to set per data type, and it should be a deliberate choice, not a copied default.
Active invalidation is the precise tool for when waiting for a TTL is not good enough — a price change must be visible now, not in five minutes. The pattern: when the source data changes, explicitly evict the affected cache entries. FreshCart’s catalog service, on a price update, deletes the Redis key for that product and calls Akamai’s purge API for that product page, so both layers refetch fresh. The danger to internalize early is partial invalidation — clearing Redis but forgetting the CDN, so the edge keeps serving the old price for an hour. Every cached representation of a piece of data must be invalidated together, or you have simply moved the staleness somewhere harder to see.
Two more failure patterns belong in any foundational treatment because they cause real outages:
- Cache stampede (thundering herd). A popular key expires and thousands of concurrent requests all miss and hit the database at once, which can take the database down — the cache that was protecting it briefly turns into a load amplifier. Mitigations: add a small random jitter to TTLs so keys do not all expire together, and use a short lock so only one request recomputes a missing key while the others wait for it.
- Caching failures and the empty-result trap. If a lookup returns “no such product” and you do not cache that fact, every request for a bad ID becomes a database hit forever — a cheap denial-of-service vector. Cache negative results too, with a short TTL.
Enterprise considerations
A cache is infrastructure, and at FreshCart’s scale it has to be operated like any other tier — secured, observed, scaled, and reasoned about for cost.
Security. Treat the cache as the sensitive store it is. Redis holds session data, personalization, and sometimes PII, so it must run inside a private network with no public endpoint, with encryption in transit and at rest enabled, and with AUTH turned on — the credential for which lives in a secrets manager such as HashiCorp Vault, leased to the application rather than baked into a config file. The CDN layer is part of the security perimeter too: Akamai terminates TLS at the edge and provides WAF and bot mitigation in front of the origin, and the public/private header discipline from Layer 1 is itself a security control, since a misconfigured shared cache leaking one shopper’s data to another is a breach. Cloud-posture tooling — Wiz (and Wiz Code scanning the infrastructure-as-code before it ships) — flags a Redis instance accidentally exposed to the public internet or left without encryption, and CrowdStrike Falcon sensors on the application and cache hosts provide runtime threat detection feeding the security team.
Observability. You cannot tune what you cannot see, and the single most important metric for any cache is the hit rate — the fraction of requests served from cache rather than falling through to the source. Datadog (or Dynatrace) collects hit rate, latency, evictions, and memory pressure per layer, so FreshCart can see at a glance that the catalog edge cache is running 95% hits while a newly added recommendation cache is only at 40% and needs its key design rethought. A falling hit rate or a rising eviction rate is the early warning that the cache is undersized or a TTL is too short; an alert on those metrics catches a Saturday-morning meltdown before shoppers do. When a cache-related incident does fire, it auto-raises a ServiceNow ticket so there is a tracked record and a change trail, not just a Slack message that scrolls away.
Scaling and cost. The whole point of the cache is cost: by absorbing the bulk of read traffic, Akamai and Redis let FreshCart run a far smaller and cheaper database than the raw request count would imply — the most expensive tier is shielded by the two cheapest. Redis scales by adding memory (a bigger node) or by clustering (sharding keys across nodes) for both capacity and throughput; the managed services autoscale and fail over for you. The honest cost trade is summarized below.
| Lever | What it does | Effect |
|---|---|---|
| Raise CDN hit rate | More requests served at the edge | Fewer origin servers; lower egress and compute |
| Raise Redis hit rate | More reads served from memory | Smaller, cheaper database tier |
| Tune TTL up | Longer-lived cache entries | Higher hit rate, more staleness risk |
| Cache negative results | Bad-ID lookups stop hitting the DB | Removes a cheap DoS vector |
| Add read replicas | Offload fresh reads from the primary | Protects write capacity, adds infra cost |
Operations and delivery. The cache configuration is code, not a console click. Redis instances, ElastiCache parameter groups, and Akamai cache rules are defined in Terraform (with Ansible handling any host-level configuration), versioned and reviewed like everything else, so a TTL change goes through a pull request and not an ad-hoc edit nobody remembers. The pipeline that applies it — GitHub Actions or Jenkins for build and test, Argo CD for GitOps delivery to the Kubernetes clusters the app runs on — gives every cache change an audit trail and a one-click rollback. The same discipline that protects the database protects the cache config: change it deliberately, review it, and be able to revert it.
A note on where this pattern recurs: the same three-layer thinking applies far beyond a retail catalog. A university running Moodle for 40,000 students sees identical Saturday-morning-style spikes during exam season — course pages, quiz content, and reading lists that are read constantly and written rarely — and solves it with exactly this stack: a CDN for static course assets, Redis for Moodle’s session and application cache, and the database protected behind both. Internal virtual appliances (a caching reverse proxy, a WAF appliance) often slot into the same edge role on-premises. Caching is not a retail trick; it is a general-purpose answer to “this answer has not changed, stop recomputing it.”
Explicit tradeoffs
Caching is staleness in exchange for speed and cost — own that trade or do not cache. Every layer you add is another copy of the truth that can drift from the source, and another thing to invalidate when the truth changes. The complexity is real: cache-aside means writing miss-handling logic; write-through means coordinating two stores on every write; invalidation means remembering every place a piece of data is cached, or shipping a stale price to production. A cache also adds a moving part that can fail, so the application must degrade gracefully when Redis is unavailable — fall through to the database, slower but correct — rather than fall over.
When not to cache. Data that changes on every request, or that is read once and never again, gains nothing from a cache and only adds risk — caching it is pure overhead. Highly personalized, security-sensitive responses should not sit in a shared cache at all. And a cache is never a fix for a fundamentally slow query or a missing index; cache a bad query and you have a fast-but-stale wrong answer instead of a slow right one. Fix the source first, then cache it.
The alternatives, and when they win. If your read load is heavy but the data must always be perfectly fresh and queryable, read replicas beat an application cache. If the expensive thing is a computation rather than a fetch, precomputing and storing the result (materialized views, scheduled rebuilds) can be better than caching on demand. And if traffic is genuinely flat and modest, the simplest answer is to skip the cache entirely and right-size the database — a cache you do not need is just a staleness bug waiting to happen.
The shape of the win
For FreshCart, the payoff is not “a faster app,” it is a cheaper and calmer one. After layering caching properly, the Saturday-morning catalog pages that took six seconds load in under one, served almost entirely from Akamai’s edge and Redis. The database — the tier they were about to spend a fortune scaling up — sees a fraction of the read traffic and comfortably handles the weekend peak on the hardware it already had. The on-call rotation stops dreading Saturdays. And the engineering lesson generalizes far past one grocery app: most performance problems are not a shortage of compute, they are the same unchanged answer being recomputed a million times. Caching, layered from the browser through the edge through Redis to the database, is how you stop doing that — and TTL and invalidation are the discipline that keeps the speed honest.