Architecture Multi-cloud

Database Selection 101: SQL, NoSQL, and When to Use Each

A mid-sized online-grocery company — call it FreshCart, serving a few metro cities with same-day delivery — is two years old and has exactly one database: a single PostgreSQL instance that started as “the obvious choice” and is now the load-bearing wall holding up the whole business. Catalog, carts, orders, user sessions, the delivery-tracking page, and the per-store inventory counter all live in it. The symptoms are textbook. The home page is slow at 7pm because every page load reruns the same catalog query. The “track my order” map refreshes every few seconds and hammers the same rows. The Friday-evening flash sale on strawberries oversold by 400 units because two checkout requests read the same stock number a millisecond apart. And a junior engineer, told to “make it faster,” is about to do the most expensive thing in this whole story: pick a trendy database off a conference slide and migrate to it on a hunch.

This article is the conversation that should happen before that migration. It is deliberately a fundamentals piece — no exotic engines, no heroic scale — because the single most valuable database skill is not knowing twenty databases. It is being able to look at one feature, describe how it reads and writes data, and from that derive which kind of store fits. The headline rule, which we will earn rather than assert: you do not choose a database and then design around it. You describe your access patterns and then choose the database that serves them.

Start with access patterns, not products

An access pattern is a plain-English sentence about how a piece of data is read and written. You can usually write it before you know any product name. FreshCart’s real patterns look like this:

Notice that these sentences already imply their stores. Two of them want strong relational integrity (orders, money, stock). Two of them are really caching problems (catalog, live map). One is a list-by-key read that a document store does beautifully. The art is hearing the requirement and recognizing the shape — and the rest of this article is a vocabulary for those shapes.

The four questions that pull the shape out of any feature:

  1. What is the unit you read? A single self-contained thing (an order, a user profile)? Or rows you slice, join, and aggregate across (all orders this quarter by region)?
  2. What are the consistency rules? Must a write be visible immediately and correct under concurrency (money, stock), or is “a few seconds stale” perfectly fine (a view counter, a delivery dot on a map)?
  3. What is the read/write ratio and the volume? Write-once read-rarely, or read-the-same-thing-ten-thousand-times?
  4. Do the relationships matter at query time? Will you constantly ask “join A to B to C,” or do you always fetch one aggregate whole?

The three families you actually need first

Most foundational systems are well served by three categories. Learn these cold before reaching for anything else.

Relational / SQL — PostgreSQL, MySQL, and their managed forms (AWS RDS / Aurora, Azure Database for PostgreSQL, Google Cloud SQL). Data lives in tables with a fixed schema and typed columns. The superpowers are ACID transactions (a multi-row change either fully happens or fully does not), joins (combine related tables at read time), and a mature query planner. This is the correct default for anything involving money, inventory, or records that must agree with each other. FreshCart’s orders, payments, and stock belong here and arguably never leave.

Document / NoSQLAmazon DynamoDB, Azure Cosmos DB, Google Firestore, MongoDB. Data lives as self-contained JSON-ish documents, usually with a flexible schema, retrieved by a key. The superpowers are horizontal scale (the store partitions across many nodes by a key, so throughput grows by adding machines) and predictable single-digit-millisecond reads when you query the way the table was designed. The crucial, commonly-missed catch: these stores reward you for knowing your access pattern in advance and punish you for ad-hoc queries. DynamoDB does not join; you model the access pattern into the keys, or you scan the whole table and weep. A user profile with embedded preferences and addresses, read by user ID, is a perfect document.

Cache / key-valueRedis, Memcached, and managed forms (Amazon ElastiCache, Azure Cache for Redis, Google Memorystore). Data lives in memory as keys mapped to values, optionally with a time-to-live. It is not your source of truth; it is a fast copy in front of one. The superpower is latency: microseconds, and it absorbs read storms so your real database stays calm. FreshCart’s catalog page and live delivery map are not really database problems — they are caching problems wearing a database costume.

Dimension Relational (SQL) Document (NoSQL) Cache (key-value)
Data shape Tables, fixed schema Self-contained documents Keys → values in memory
Read by Keys, joins, ad-hoc filters/sorts Primarily the partition key Exact key
Transactions Strong, multi-row ACID Limited; per-item or per-partition None (it is a copy)
Scales by Bigger box; read replicas Adding nodes (horizontal) Adding memory/nodes
Best for Money, inventory, related records High-volume known-key lookups Hot reads, sessions, derived views
Worst at Web-scale single-table throughput Ad-hoc queries and joins Being your system of record

Walking FreshCart’s data through the model

This is the worked example — the heart of the article. We take each access pattern and derive the store, showing the data and control flow as a request moves through the system.

Orders, payments, inventory → relational, and non-negotiably so

A checkout is a single logical event that touches several tables at once: insert an order row, insert line items, record a payment, and decrement stock for each product. Either all of that happens or none of it does — a paid order with no stock reserved, or stock reserved with no payment, is a corrupt business state. That “all or nothing” is the literal definition of a transaction, and it is why this stays on PostgreSQL.

The oversell bug is worth dwelling on because it teaches the core idea. Two checkouts both read stock = 1, both think “great, one left,” and both sell it. The fix is not a faster database; it is letting the database arbitrate the race with an atomic conditional write inside a transaction:

BEGIN;
UPDATE inventory
   SET stock = stock - 1
 WHERE product_id = 'strawberry-250g'
   AND stock > 0
RETURNING stock;            -- zero rows back? sold out — abort the checkout
INSERT INTO orders (...) VALUES (...);
COMMIT;

The AND stock > 0 guard plus row-level locking means exactly one of the two racing checkouts wins the last unit and the other is cleanly told “sold out.” A document store can approximate this with per-item conditional writes, but the moment a single business action must atomically span several tables — order, line items, payment, stock — relational ACID is doing real work that you would otherwise be reimplementing badly by hand. Do not move money or inventory off SQL to chase throughput you do not have yet.

The order-history and product pages → read replicas, then a cache

“Show me my last 90 days of orders” is a read-heavy query that does not need to see a write from one millisecond ago. So the first scaling move is not a new database — it is a read replica: a streaming copy of the primary that serves reads while the primary handles writes. RDS, Cloud SQL, and Azure Database all create one with a checkbox. Point reporting and history reads at the replica and the primary breathes.

The product catalog is the clearest “this is secretly a cache” case. The same query runs thousands of times an hour and the underlying data changes twice a day. Putting Redis in front turns a database storm into a memory lookup:

GET catalog:home:v37          -> HIT  : return JSON from memory in ~0.5 ms
                              -> MISS : query Postgres once, SET with TTL 600s, return

This is the cache-aside pattern: check cache, on a miss read the database and populate the cache, and invalidate the key when the catalog is republished. The live delivery map is the same shape — write the courier’s position to Redis with a short TTL and let the map read from memory, accepting a second of staleness, instead of pounding a relational row dozens of times a second per active order.

User profiles and the session store → where document and cache earn their place

A user profile — preferences, saved addresses, dietary tags, notification settings — is a self-contained blob almost always read by user ID. That is a document. You can keep it in SQL (and at FreshCart’s size, you happily do, as a jsonb column), but it is the canonical example of where a document store like DynamoDB or Cosmos DB fits naturally once profile volume and read rate climb: one key in, one document out, single-digit milliseconds, scales horizontally without you thinking about it.

Login sessions are the cleanest non-relational case of all. A session is a short-lived key (session:abc123 → {user, expiry}) read on nearly every request and thrown away when it expires. It does not belong in your transactional database competing with checkouts — it belongs in Redis with a TTL. This single move (sessions out of Postgres, into Redis) often buys back more primary-database headroom than any query tuning.

The decision, summarized

FreshCart feature Access pattern Store Why
Orders / payments Multi-table atomic write, read by key PostgreSQL (RDS/Cloud SQL) Needs ACID across tables
Inventory decrement Atomic conditional write under contention PostgreSQL Race correctness = transactions
Order history Read-heavy, slightly-stale OK SQL read replica Offload reads from primary
Product catalog Same read ×10,000/hr, rarely changes Redis (cache-aside) It is a caching problem
Live delivery map High-frequency, staleness OK Redis (short TTL) Absorb read storm
User profile Self-contained, read by user ID SQL jsonb now → document later Aggregate read by key
Login session Short-lived key, every request Redis (TTL) Ephemeral, not a record

The punchline: FreshCart’s “we need a new database” panic is really “we need a read replica and a cache, and we should move sessions out of the primary.” Two of the three families it needs are in front of or beside the relational store it already has — not a replacement for it.

Architecture overview

Database Selection 101: SQL, NoSQL, and When to Use Each — architecture

Here is the small, sane shape FreshCart should run, and how a request flows through it.

The application tier (the API behind the storefront) talks to three data stores, each chosen by the access patterns above. The relational primary is the system of record for orders, payments, and inventory — the single source of truth. A read replica trails it and serves history and reporting reads. Redis sits beside the application as a cache for the catalog and live-map reads and as the session store. The delivery edge — static assets, catalog API responses — is fronted by Akamai as the CDN and WAF, so a chunk of catalog traffic is served at the edge and never reaches the cache, let alone the database. Behind these, the document store is the next hop FreshCart grows into for profiles, not a day-one component.

Control flow for a catalog page load: request hits Akamai (CDN cache hit → served at the edge, done) → on a miss, the API checks Redis (GET catalog:home) → on a Redis miss, the API queries the read replica, writes the result into Redis with a TTL, and returns it. Three layers, each shielding the one below; the relational primary is touched only when both caches miss.

Control flow for a checkout (the write path): request hits the API → the API opens a transaction on the relational primary → atomically decrements inventory with the stock > 0 guard, inserts the order and line items, records the payment → commits → invalidates the affected catalog cache keys in Redis → returns. Money and stock only ever move through the one store that can guarantee they move correctly. The read replica and the caches are downstream of this truth, never the place the truth is written.

The operational and security plane wraps all three stores identically, which is the part juniors under-build:

That last bullet is the difference between a database you operate and one you merely have. Even a junior-scale system deserves Terraform-defined infrastructure, a migration that runs in a pipeline, and a metric that tells you the truth.

Managed vs. self-hosted: the tradeoff to get right early

FreshCart’s team is small, so this decision is mostly made for them — but understanding why matters.

Managed (RDS, Cloud SQL, Azure Database, DynamoDB, ElastiCache) means the cloud provider runs the engine: patching, backups, replica creation, failover, and minor-version upgrades are their job. You pay a premium and give up some low-level knobs. Self-hosted (PostgreSQL on a VM or in Kubernetes, configured by Ansible) means total control and lower sticker price — and that you now personally own 3am failover, backup verification, and the security patch that just dropped.

Factor Managed Self-hosted
Backups / PITR Built in, push-button You build and test restores
Failover / HA Automated multi-AZ You design and operate it
Patching Provider, in a window Your pager, your CVE triage
Cost Higher sticker price Lower price, real staff time
Control Limited extensions/knobs Full
Best for Small/lean teams Large teams with a real DBA

For a team of FreshCart’s size the answer is managed, almost always: the labor saved on backups, patching, and failover dwarfs the price premium, and a junior team should not be hand-rolling Postgres high-availability. Self-hosting earns its keep when you have genuine DBA depth, a specific extension or version the managed service will not run, or scale economics where the premium becomes a number worth engineering against. The honest cost framing: managed is not “more expensive,” it is salary you are not paying converted into a line item — and for a lean team that trade is almost always correct.

Failure modes, scaling, and the cost of getting it wrong

The failure modes are mostly choosing the wrong family, and each has a signature:

Scaling, in the order you actually do it:

  1. Cache the hot reads (Redis). Cheapest, biggest win, every time.
  2. Add a read replica for read-heavy, staleness-tolerant queries.
  3. Index and tune the queries your APM (Datadog/Dynatrace) flags as slow — often a missing index, not a missing database.
  4. Scale up the primary (a bigger instance) before doing anything exotic.
  5. Only then consider moving a specific high-volume, known-key access pattern (sessions, profiles, an event feed) to a horizontally-scaling document store — and move that pattern, not the whole system.

The deepest lesson is that step 5 is last for a reason: most teams that “needed NoSQL” actually needed steps 1 through 4. Polyglot persistence — using SQL and a document store and a cache, each for what it is good at — is the mature end state, but it is something you grow into one access pattern at a time, not a thing you architect up front out of fear.

Security and cost, briefly but non-optionally

Security is identical across all three stores and not advanced: encrypt at rest and in transit (managed services do this with a setting); never expose a database to the public internet (Wiz alerts the instant one drifts that way); authenticate with short-lived IAM/Vault-issued credentials federated through Okta/Entra ID, not a shared static password; least-privilege the app’s database role so a compromised app cannot drop tables; and keep CrowdStrike Falcon on the hosts for runtime detection. Backups must be not just taken but restore-tested — an untested backup is a hope. (Even the internal team’s training portal — a Moodle instance teaching new hires these very patterns — follows the same rules: managed database, Vault-issued credentials, no secrets in the repo, because the lesson and the practice should match.)

Cost tracks the choices directly. Relational read replicas and larger instances are predictable monthly line items. A cache is cheap insurance that reduces database cost by deflecting reads. Document stores like DynamoDB bill on read/write capacity, which is wonderfully cheap for the access pattern you designed and brutally expensive for the scan you did not — so a mis-modeled NoSQL table is the easiest way to get a surprise invoice. And the largest hidden cost is always the wrong-family migration done in a panic: the safe, boring path of cache-then-replica-then-tune is not only better engineering, it is dramatically cheaper than re-platforming under fire.

The shape of the win

FreshCart does not need a new database. It needs to name its access patterns and serve each from the right family: orders and inventory stay on transactional PostgreSQL because money and stock demand ACID; the catalog and live map move to a Redis cache because they are read storms, not write problems; sessions leave the primary for Redis where ephemeral keys belong; order history reads move to a replica; and profiles are a jsonb column today and the natural first thing to graduate to a document store tomorrow — if and when its volume actually warrants it. Wrap all three in the same operational plane — Vault for credentials, Okta/Entra ID for identity, Wiz for posture, CrowdStrike Falcon for runtime, Dynatrace/Datadog for visibility, Terraform/Ansible and GitHub Actions/Jenkins/Argo CD through a ServiceNow change gate for every change — and you have a foundational data layer that is correct, observable, and reversible. That is the whole skill: not memorizing engines, but reading a feature, hearing its access pattern, and reaching for the family that already fits.

DatabasesSQLNoSQLData ArchitectureCachingFundamentals
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading