Designing CQRS Read-Model Pipelines and Managing Eventual Consistency

CQRS gets adopted for the wrong reason about half the time. A team hears “it scales,” splits their controllers into CommandHandler and QueryHandler, and ships. Six months later they are firefighting a different problem than the one they thought they were solving — because the code-path split was never the hard part. The hard part is the read model: keeping a fleet of denormalized projections correct and current off an append-only event log, and being honest with the user about the window where a write has happened but the query side does not know it yet. CQRS — Command Query Responsibility Segregation — is the deliberate decision to use one model to change state and a different, purpose-built model to read it. That is the whole idea; everything difficult about it lives downstream of that sentence.

This article is the pipeline end to end, for the point where you have already decided (or are deciding) that CQRS earns its keep for at least one bounded context. We draw the line between write and read model, enumerate every projection mode (synchronous vs asynchronous, single vs multiple), and go deep on the three mechanics that separate a toy from production: checkpointing (where the projector remembers its place), idempotency (surviving redelivery without corrupting), and replay/rebuild (treating read models as disposable caches you regenerate from the log). Then we confront eventual consistency as a first-class design problem — read-your-writes, staleness windows, the UI patterns that hide the gap from the one user who notices — because a CQRS system that lies to users about freshness is worse than the CRUD app it replaced.

By the end you will be able to decide whether to reach for CQRS at all (the answer is often “not yet”), and if you do, build a pipeline that is idempotent, crash-safe, rebuildable, and honest. The mental model to carry through every section: your write side owns exactly one job — validate a command and emit facts. Everything a user reads is a cache. Once you internalise that read models are disposable caches over an authoritative log, most operational decisions fall out on their own.

What problem this solves

The pain CQRS addresses is a schema conflict you cannot win. A single data model serves both writers and readers, and their demands are opposed. Writers want normalization, tight invariants, small aggregates addressed by ID, a lock-friendly shape that makes “is this command legal?” cheap. Readers want the opposite: wide denormalized rows, pre-joined blobs that render a whole screen in one fetch, full-text search, roll-ups computed once and read a million times. When both live in one table, every choice is a compromise: add an index for a query and slow every write; denormalize for a report and now three writers must keep the copy in sync; shard for write throughput and your reporting joins fall apart.

What breaks without CQRS in the contexts that need it: read load starves write load on the same database (a heavy report stalls order ingestion at peak); the “single model” grows a dozen half-committed denormalizations that drift; and independent scaling is impossible because reads and writes share one connection pool and one failure domain. Teams paper over this with read replicas — which help throughput but not the schema conflict, because a replica is the same shape as the primary. The read wants a different model, not just a different copy.

Who hits this: systems with a genuine asymmetry between how state changes and how it is queried — high read fan-out, query patterns no single normalized schema serves (keyed lookup and full-text and dashboard roll-up over one entity), independent scaling needs (a public read tier surviving a spike without touching the write tier), or audit demands that make an append-only log valuable in its own right. If none describe your context — a CRUD app nobody queries in interesting ways — CQRS adds a pipeline, a consistency gap, and operational surface for zero benefit, and the honest recommendation is don’t. Half this article is about drawing that line correctly.

Before the deep dive, here is the field in one frame — the forces that push toward or away from CQRS, and what each one actually buys or costs:

Force in your context	Pushes you toward CQRS because…	The cost you take on
High read : write ratio	Reads scale independently on cheap denormalized stores	A projection pipeline per read model
Many query shapes over one entity	Each shape gets a purpose-built store, no schema compromise	N stores to keep current and consistent
Read spikes must not hurt writes	Read tier is a separate failure domain	Eventual-consistency gap between the tiers
Complex domain invariants	Write model stays small and normalized, invariants are cheap	Command side and query side diverge in shape
Audit / temporal requirements	The event log is a natural audit trail	Event versioning, upcasting, replay time
Simple CRUD, low fan-out	Nothing — this is the wrong tool	All of the above cost, none of the benefit
One query shape, no spikes	Nothing — a read replica or an index is enough	You’d pay complexity for a problem you don’t have

Learning objectives

By the end of this article you can:

Decide whether CQRS earns its complexity for a given bounded context, and articulate the cheaper alternatives (read replica, materialized view, cache) you should try first.
Separate a normalized, invariant-enforcing write model from purpose-built read models, and design the event as the immutable contract between them.
Choose read stores per query shape — relational, document, search, columnar, key-value — and justify each against the query it serves.
Build a projection worker that consumes an event log in order, applies events idempotently, and commits its checkpoint in the same transaction as the read-model write for exactly-once effects.
Reason precisely about delivery guarantees — at-most-once, at-least-once, exactly-once-delivery (a myth over a network) vs exactly-once-effects — and pick the pattern that matches your store.
Choose between synchronous and asynchronous projections, and know the ordering, latency, and consistency trade-offs of each.
Close the consistency gap with read-your-writes, version tokens, optimistic UI, and sticky strong-read fallback — paying for consistency only where a human perceives its absence.
Run replay/rebuild and zero-downtime read-model migrations with blue-green projections, and detect drift with a reconciliation job before it corrupts a customer’s view.
Distinguish CQRS from event sourcing, adopt each deliberately, and recognise the anti-patterns (leaking staleness into every screen, over-splitting models, building full event sourcing when you only needed CQRS).

Prerequisites & where this fits

You should be comfortable with event-driven fundamentals: the difference between a command (an imperative request that can be rejected) and an event (an immutable past-tense fact), and why append-only logs behave differently from mutable tables. You should understand transactions at the level of “these two writes both commit or both roll back,” and have working knowledge of at least one log or broker — Apache Kafka, EventStoreDB, Azure Service Bus / Event Hubs, AWS Kinesis, or Google Pub/Sub. Familiarity with the transactional outbox pattern helps.

This sits in the distributed systems / data architecture track, downstream of a few things. CQRS is frequently paired with — but is not the same as — event sourcing; if you want the write side to make the log its system of record, read Event Sourcing in Production: Aggregate Design, Snapshots, and Projection Rebuilds. Reliable publishing (getting events onto the log atomically with the state change) is solved by the Building the Transactional Outbox and Inbox Pattern for Exactly-Once Event Publishing pattern; the projector’s idempotency mechanics are generalised in Designing Idempotent APIs and Deduplication for Reliable Distributed Systems; and the consistency vocabulary behind “eventual” is the subject of Multi-Region Data: Choosing Replication and Consistency Without Losing Writes.

Where it fits in a bigger picture: CQRS is a per-bounded-context tactic, not a system-wide architecture. In a microservices estate, one service might be CQRS with three read models, its neighbour a plain CRUD table, and a third pure event sourcing. The decision is local. A quick map of the adjacent layers and who owns each:

Layer	What lives here	Typical owner	How it relates to CQRS
Command API	Validates and accepts commands, returns 202 + version	App/dev team	The write side’s front door
Write model / aggregate	Enforces invariants, emits events	App/dev team	The source of truth (state or event log)
Event log / broker	Ordered, durable, replayable events	Platform team	The contract carrier; enables replay
Outbox / CDC	Atomic hand-off of events onto the log	App + platform	Solves the dual-write problem
Projection workers	Consume events, build read models	App/dev team	The heart of this article
Read stores	Denormalized, per-query-shape	App + DBA	Disposable, rebuildable caches
Read API	Serves queries, gates on checkpoints	App/dev team	Where eventual consistency meets the user

Core concepts

Six mental models make every later decision obvious. Internalise these and the rest of the article is application.

The write model enforces invariants; read models answer questions. An aggregate loads its prior state, decides whether a command is legal, and appends new events. It should be normalized to the point of being almost unqueryable: a single stream addressed by ID, optimized for consistency, not for the seventeen ways a UI wants to slice it. A read model goes the other direction — shaped for one query pattern, allowed to be redundant and lossy. The OrderSummary that powers order history is a different table than the WarehousePickList that powers fulfillment, even though both derive from OrderPlaced. Do not make one read model serve both — the point of CQRS is that you no longer have to compromise.

The event is the contract between the two sides. Keep events immutable, past-tense (PaymentCaptured, never CapturePayment), versioned, and free of any read-model concern. A projection failing, being fixed, or being added must never require touching the write side. This decoupling is what makes read models disposable: a new projector interprets the same historical facts into a totally different shape without rewriting history. The only hard constraint is a forward-compatible event schema — old events stay readable via upcasters (functions that transform an old version into the current one on read).

A read model is a disposable, rebuildable cache. Because every read model is derived from the log, you can throw it away and rebuild it from position zero. A schema change is a rebuild, not a destructive migration; a projector bug is a reprojection, not a data-loss incident. Once you truly believe read models are caches, blue-green swaps, drift reconciliation, and adding a read store on a Tuesday become routine.

“Exactly-once delivery” does not exist over a network; exactly-once effects do. No messaging system can guarantee delivery precisely once — the acknowledgement itself can be lost, forcing redelivery. You build at-least-once delivery plus idempotent application, yielding exactly-once effects: the read model ends up in the same state whether an event arrives once or five times. This is the load-bearing wall of the pipeline — every projector must assume it will see events more than once and treat redelivery as a no-op.

Ordering is a per-stream guarantee, not a global one. Events for one aggregate (one order-1234 stream) must be applied in emission order or you get nonsense (OrderShipped before OrderPlaced). Events across aggregates usually have no meaningful order and process in parallel. The rule: one projector process (or partition consumer) per ordering domain. Kafka gives you this by keying on aggregate ID (all of one key’s events land on one partition); EventStoreDB gives per-stream order natively.

Eventual consistency is a UX problem, not just a data problem. There is a window — usually milliseconds, sometimes seconds under load — between a write committing and every read model reflecting it. The window is unavoidable in an asynchronous pipeline; the user noticing is avoidable. Read-your-writes machinery exists so the one user who just made a change sees it immediately, while everyone else tolerates a staleness they cannot perceive. Treat visible staleness as a bug to design around, not a property to shrug at.

The vocabulary in one table

Pin down every moving part before the deep sections. The glossary at the end repeats these for lookup; this is the mental model side by side:

Concept	One-line definition	Where it lives	Why it matters
Command	An imperative request that may be rejected	Write API	The only thing that can change state
Event	An immutable fact in the past tense	Event log	The contract between write and read
Aggregate	A consistency boundary that emits events	Write model	Enforces invariants; one stream per instance
Write model	Normalized store optimized for invariants	Aggregate DB / event log	Source of truth
Read model / projection	Denormalized store for one query shape	Postgres, Cosmos, search…	A disposable cache
Projector / projection worker	Consumer that turns events into a read model	Long-running process	The heart of the pipeline
Checkpoint	The projector’s remembered position in the log	In the read store, ideally	Enables crash-safe resume
Idempotent apply	Applying the same event twice = applying it once	In the projector	Survives at-least-once redelivery
Catch-up subscription	Read history to head, then go live with no gap	Projector startup	How a projector resumes or bootstraps
Replay / rebuild	Regenerate a read model from position zero	Ops action	Fixes drift; enables schema change
Eventual consistency	Read models converge to the write, eventually	The whole system	The gap users can feel
Read-your-writes	The writer sees their own change immediately	Read API + client	Hides the gap from who notices
Projection lag	Distance between log head and a checkpoint	Metric	The SLA on freshness
Drift	Caught-up but wrong read model	Latent bug	The silent killer

When CQRS earns its complexity — and when it does not

The most valuable skill in this topic is not building projections — it is knowing when not to. CQRS is a genuine step up in operational cost: one model to two, one store to N, strong read-your-writes-for-free to a consistency gap you must manage. That cost is worth paying in specific circumstances and pure waste otherwise. Get the decision right and everything else is engineering; get it wrong and you have volunteered for a maintenance burden that delivers nothing.

Start from the cheaper alternatives and escalate only when they genuinely fail — most “we need CQRS” conversations end at a read replica or a materialized view. Read your context against the left column and follow it across:

If your context looks like…	Try first / it wants…	Escalate to CQRS when…
CRUD admin panel, low traffic, one shape	A plain table	Never — no asymmetry to exploit; CQRS is all cost
Slow reads on an existing shape	Add an index	One schema can’t serve all your query shapes
Reporting is heavy but shape matches OLTP	A read replica (same shape, different copy)	Readers need a different shape, not a copy
Pre-aggregated roll-ups, dashboards	An in-DB materialized view	You need document/search/columnar stores too
Hot-read latency, repeated identical queries	A cache (Redis) in front	Cache invalidation gets impossibly complex
One entity, keyed lookup and full-text and dashboard	CQRS with 3 read models	(already there — no single schema serves all three)
Public read tier must survive spikes without hurting writes	CQRS (separate read store)	(already there — failure-domain isolation is the point)
Complex invariants, but reads are simple keyed fetches	CQRS-lite (outbox → one read model)	Keep the small write model; project one view
You want an audit log anyway	Event sourcing (which enables CQRS)	The log is valuable in its own right
“We use CQRS everywhere as a standard”	Stop	A global mandate is a smell, not an achievement

Three sizing signals tell you the asymmetry is real enough to justify the split; if none are true, you are almost certainly better off without CQRS:

Signal	Threshold that argues for CQRS	How to measure it
Read : write ratio	≫ 10:1 (often 100:1 or 1000:1)	Requests per second on query vs command endpoints
Distinct query shapes over one entity	≥ 3 that no single schema serves well	Count the indexes/denormalizations you keep adding
Read-driven write contention	Reports/queries measurably stalling writes	Lock waits, replication lag, p99 write latency during report runs

The mistake to avoid is CQRS-as-default. A well-designed system frequently has one CQRS context surrounded by a dozen plain-CRUD ones. “We use CQRS everywhere” is the architectural equivalent of “microservices for everything” — applied past the point where it pays, generating consistency gaps and pipelines nobody needed.

There is also a middle ground teams miss: CQRS without event sourcing. Keep a conventional state-storing write side (a normal orders table with an aggregate on top), emit events via the transactional outbox, and build read models off those events. The log is a delivery mechanism, not the system of record — you get the read/write split and independent read scaling without event sourcing’s real costs (versioning, upcasting, full replay time). Coupling CQRS to event sourcing is a choice — make it deliberately, not by assuming they are a package deal.

Separating write and read models

Once you have committed to CQRS for a context, the first design act is drawing the line. Everything good downstream depends on this line being clean.

The write model: small, normalized, invariant-first

The write model’s only job is to answer “is this command legal, and if so, what facts does it produce?” It should be as small as the invariants allow. A single aggregate — the consistency boundary — loads its current state, validates the command against business rules, and emits events. It is addressed by ID, never queried by attribute, never denormalized for a UI. If you find yourself adding a column “so the dashboard can show it,” stop — that column belongs in a read model.

Two shapes of write model, and when each fits:

Write-model style	State storage	Emits events via	Best when	Cost
State-stored + outbox	Current state in normal tables	Transactional outbox row, same TX	You want CQRS but not event sourcing	Outbox table + relay/CDC
Event-sourced	Only the event stream (no state table)	Appending to the stream is the write	You want a full audit log and temporal queries	Versioning, upcasting, snapshots, replay time
Hybrid (state + published events)	State table, events published for integration	Outbox or CDC on the state table	Legacy write side you can’t rewrite	Two sources of “truth” to keep aligned

The event contract rules — break these and your read side becomes coupled to write-side internals, defeating the point:

Rule	Right	Wrong
Tense & naming	`OrderPlaced`, `PaymentCaptured`	`PlaceOrder`, `CapturePayment` (those are commands)
Immutability	Never edit a published event	“Fixing” a bad event in place
No read concerns	`ItemAddedToCart{sku, qty}`	`ItemAddedToCart{cartTotalForDashboard}`
Versioned	`OrderPlaced.v2` with an upcaster from v1	Silently changing v1’s shape
Self-contained facts	Include the data the fact asserts	Requiring a read-model lookup to interpret it
Explicit IDs	Carry `orderId`, `customerId`	Positional or implicit correlation

The read model: wide, denormalized, one shape per consumer

A read model is designed backwards from a single query. If the order-history screen needs order ID, status, total, item count, and a thumbnail, the order_summary row carries exactly those columns, pre-joined so the screen renders in one indexed fetch. If fulfillment needs a pick list grouped by warehouse bin, that is a different projection with a different shape, even though both consume OrderPlaced. The discipline: do not share a read model across consumers with different query shapes — a shared read model reintroduces the schema compromise you adopted CQRS to escape.

The trade you are making, stated plainly:

Property	Write model	Read model
Optimized for	Correctness of change	Speed of a specific query
Normalization	High (small, keyed)	Low (wide, denormalized, redundant)
Addressed by	Aggregate ID	Whatever the query filters/sorts on
Number of them	One per aggregate type	As many as you have query shapes
Consistency	Strong (transactional)	Eventual (lags the log)
On corruption	Incident (it’s the truth)	Reproject (it’s a cache)
Schema change	Migration (careful)	Rebuild (routine)

Choosing read stores per query shape

There is no single correct read store. The whole freedom of CQRS is that each read model can live in the store that best fits its query. Match the store to the shape, not to a house standard.

Query shape	Store class	Example products	Why this store
Keyed lookup, joins, ad-hoc reporting	Relational	PostgreSQL, Azure SQL, Aurora	Transactional projection updates; mature indexing; SQL for reports
Document-per-aggregate, whole-screen render	Document	Cosmos DB, MongoDB, DynamoDB	One read returns a full denormalized blob; no joins
Full-text, faceted, relevance ranking	Search	Elasticsearch, OpenSearch, Azure AI Search	Inverted index and scoring an RDBMS can’t do well
Pre-aggregated counters, dashboards, OLAP	Columnar	ClickHouse, BigQuery, Snowflake	Roll-ups computed once, scanned cheaply
Hot single-key, sub-ms	Key-value / cache	Redis, DynamoDB, Memcached	Lowest-latency point reads
Graph traversals, relationships	Graph	Neo4j, Neptune, Cosmos Gremlin	Multi-hop queries that kill relational joins
Time-series, metrics	TSDB	TimescaleDB, InfluxDB, Prometheus	Time-bucketed aggregation and retention

A useful nuance for relational read models: lean on native materialized views for aggregation-heavy projections rather than hand-maintaining counters in projector code. Let the projector keep the row-level order_summary current, and let Postgres compute the roll-ups off it:

CREATE MATERIALIZED VIEW order_daily_totals AS
SELECT
  date_trunc('day', placed_at) AS day,
  region,
  count(*)                     AS order_count,
  sum(total_cents)             AS gross_cents
FROM order_summary
GROUP BY 1, 2
WITH NO DATA;

CREATE UNIQUE INDEX ON order_daily_totals (day, region);

-- Refreshed concurrently so dashboard reads are not blocked during recompute
REFRESH MATERIALIZED VIEW CONCURRENTLY order_daily_totals;

REFRESH ... CONCURRENTLY requires that unique index and does not lock out readers, which matters when a dashboard hits the view continuously. The trade-off is a full recompute rather than an incremental update, so reserve materialized views for roll-ups, not row-level projections. Projector-maintained vs materialized view comes down to update frequency and aggregation cost:

Read-model shape	Maintain via	Why
Row-level (one row per aggregate)	Projector `UPSERT` per event	Incremental, low-latency, exact
Small roll-ups updated per event	Projector increments a counter	Cheap; stays real-time
Heavy aggregations, many source rows	Native materialized view, refreshed on a schedule	Full recompute is simpler and correct; slight staleness OK
Search / relevance	Projector writes to search index	The store owns the inverted index

Building the projection worker

A projection worker (projector, or consumer) is a long-running process that reads events in order and applies them to a read store. Three properties are non-negotiable: it processes events in order per stream, it is idempotent, and it records its position (checkpoint) so it can resume after a crash. Miss any one and you get corruption after the first restart.

Here is the shape of a worker consuming from EventStoreDB and writing to Postgres. Study the two load-bearing details before the code: the read-model write uses a natural upsert (so redelivery is harmless), and the checkpoint is committed in the same transaction as the read-model write (so the position advances if and only if the projection did).

const sub = client.subscribeToAll({
  fromPosition: await loadCheckpoint(),          // resume point
  filter: streamNameFilter({ prefixes: ["order-"] }),
});

for await (const resolved of sub) {
  const event = resolved.event;
  if (!event) continue;

  await db.tx(async (t) => {
    switch (event.type) {
      case "OrderPlaced":
        await t.none(
          `INSERT INTO order_summary (order_id, customer_id, total_cents, status, placed_at)
           VALUES ($1,$2,$3,'placed',$4)
           ON CONFLICT (order_id) DO NOTHING`,
          [event.data.orderId, event.data.customerId, event.data.totalCents, event.data.placedAt]
        );
        break;
      case "OrderShipped":
        await t.none(
          `UPDATE order_summary SET status='shipped', shipped_at=$2
           WHERE order_id=$1`,
          [event.data.orderId, event.data.shippedAt]
        );
        break;
    }
    // The checkpoint moves ONLY if the read-model write above committed.
    await t.none(
      `INSERT INTO projection_checkpoint (name, commit_pos, prepare_pos)
       VALUES ('order_summary', $1, $2)
       ON CONFLICT (name) DO UPDATE
         SET commit_pos = EXCLUDED.commit_pos,
             prepare_pos = EXCLUDED.prepare_pos`,
      [resolved.commitPosition?.toString(), resolved.preparePosition?.toString()]
    );
  });
}

The ON CONFLICT DO NOTHING and the checkpoint-in-the-same-transaction are the whole game. Drop the upsert and a redelivered OrderPlaced throws a duplicate-key error. Drop the shared transaction and a crash between the read-model write and the checkpoint leaves the position lagging the data — on restart you reapply, safe only because you were idempotent. Get both right and the worker is exactly-once in effect over an at-least-once channel. And run one worker per ordering domain — one process (or consumer-group partition) per projection — so two threads never race to apply events for order-1234 out of order.

The apply-handler contract, event type by event type, is worth tabulating because getting the SQL verb wrong is a common corruption source:

Event	Read-model operation	Idempotency mechanism	Gotcha if wrong
`OrderPlaced` (create)	`INSERT … ON CONFLICT DO NOTHING`	Natural key = `order_id`	Plain `INSERT` throws on redelivery
`OrderShipped` (state change)	`UPDATE … WHERE order_id=$1`	Setting a value is naturally idempotent	Fine to reapply; sets same value
`ItemAdded` (append to a list)	`INSERT … ON CONFLICT DO NOTHING` on `(order_id,item_id)`	Composite key	Non-idempotent if you `count = count + 1`
`DiscountApplied` (increment)	`UPDATE SET total = base - discount` (recompute) not `total = total - x`	Recompute from a base, don’t accumulate	Accumulating double-applies on redelivery
`OrderCancelled` (delete/tombstone)	`UPDATE SET status='cancelled'` (soft)	Idempotent state set	Hard `DELETE` then a replayed earlier event resurrects nothing — order matters

The rule hiding in that table: prefer naturally idempotent operations (set a value, upsert by key) over ones that are not (increment, append-with-count). When you cannot avoid an accumulation, recompute the derived value from a stable base carried on the event rather than adding a delta to the current row. total = total - discount is a landmine; total = order_base - total_discounts computed from the event’s own data is safe under redelivery.

Synchronous vs asynchronous projections

A projection can update synchronously (in the same request/transaction that produced the event) or asynchronously (a separate worker consuming the log after the fact). This is one of the most consequential choices in the pipeline, because it directly determines whether you even have an eventual-consistency gap.

Dimension	Synchronous projection	Asynchronous projection
When it runs	In the command’s transaction / request	A separate worker, after commit
Consistency for that read model	Strong (no gap)	Eventual (lags the log)
Coupling	Write path now depends on the read store	Fully decoupled
Write latency	Higher (does N projection writes inline)	Minimal (just append the event)
Failure blast radius	A read-store failure fails the write	Read-store failure just delays the projection
Scales reads independently?	No — tied to the write path	Yes — the whole point
Replayable / rebuildable?	Awkward (no independent consumer)	Yes, natively
Ordering	Trivially correct (inline)	Needs per-stream ordering discipline
Right for	1–2 critical views that must be read-your-writes always	Everything else; multiple read models; independent scaling

The guidance: default to asynchronous. Async projections give you independent read scaling, rebuildability, and decoupling — the reasons you adopted CQRS. Reserve synchronous projection for the rare read model that cannot tolerate any staleness and is cheap to maintain inline (a single-table view keyed like the aggregate). Even then you have coupled write availability to that read store — if the projection write fails, the command fails. A common hybrid: synchronous for one critical view (free read-your-writes there), asynchronous for the rest.

The subtle failure with synchronous projection is the dual-write problem in disguise: if it writes to a different store in a different transaction, you have two writes with no atomicity — a crash between them leaves them inconsistent, exactly what CQRS-with-a-log was meant to solve. Synchronous projection is only truly safe in the same transactional store as the write model. Cross-store “synchronous” is a distributed transaction in disguise; prefer async + outbox there.

Delivery guarantees and idempotency

Every discussion of “exactly-once” in distributed systems is really a discussion of where the idempotency lives. Let us make the guarantees precise, because the words are routinely abused.

Guarantee	What it means	Achievable over a network?	Consequence if you rely on it
At-most-once	Delivered zero or one times; may be lost	Yes (fire-and-forget)	Silent data loss on any failure — almost never acceptable for projections
At-least-once	Delivered one or more times; never lost	Yes (ack after processing)	Duplicates — must be made idempotent
Exactly-once delivery	Delivered precisely once	No (the ack can be lost)	A myth; don’t design assuming it
Exactly-once effects	The result is as if delivered once	Yes (at-least-once + idempotency)	The real target — build for this

So the entire pipeline is built on at-least-once delivery plus idempotent application. Three techniques get you to exactly-once effects, in rough order of preference:

Idempotency technique	How it works	Best when	Limitation
Natural upsert	`INSERT … ON CONFLICT` / `UPSERT` keyed by aggregate ID	The read row has a stable natural key	Only for set/replace ops, not accumulations
Transactional checkpoint	Store position with the write, atomically	The read store is transactional	Doesn’t help across non-transactional stores
Dedupe guard table	`processed_events(event_id)` checked before applying	No natural upsert available; accumulations	Extra write; the guard must be in the same TX
Deterministic derived values	Recompute `total` from event data, not a running delta	Aggregations/increments	Requires the base value on the event
Idempotency key on the sink	Downstream (e.g. a payment API) dedupes on a key you send	Side effects to external systems	Depends on the external system supporting it

For a store that cannot do a natural upsert (or when the event does accumulate), the guard table pattern makes any apply idempotent — but the guard insert must share the transaction, exactly like the checkpoint:

BEGIN;
-- Dedupe: skip if we've already applied this event
INSERT INTO processed_events (event_id) VALUES ($1)
  ON CONFLICT (event_id) DO NOTHING;
-- Only proceed if the row was actually inserted (this event is new)
-- (in app code: check rowCount; if 0, ROLLBACK/return early)

UPDATE customer_stats
   SET lifetime_orders = lifetime_orders + 1
 WHERE customer_id = $2;

INSERT INTO projection_checkpoint (name, position) VALUES ('customer_stats', $3)
  ON CONFLICT (name) DO UPDATE SET position = EXCLUDED.position;
COMMIT;

For Kafka-based pipelines, the equivalent of the transactional checkpoint is the offset commit being part of the same unit of work as the read-model write. The minimum bar is disabling auto-commit and committing offsets after the database transaction succeeds:

props.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");

// Process the batch, write to the read DB, THEN commit offsets
// only after the DB transaction has committed.
try {
    dbTransaction.begin();
    applyRecordsToReadModel(records);
    dbTransaction.commit();
    consumer.commitSync(currentOffsets);   // ack only after the write is durable
} catch (Exception e) {
    dbTransaction.rollback();               // offsets NOT committed → redelivered
}

enable.auto.commit=false plus a manual commitSync after the write is the floor. Auto-commit on a timer acknowledges events you have not yet projected, and a crash in that window silently drops them — turning at-least-once into at-most-once, the one thing you cannot tolerate for projections. Kafka’s read_committed isolation additionally ensures the consumer only sees events from committed producer transactions, which matters if the producer writes the event transactionally with its own state.

A comparison of how the major logs give you ordered, resumable, at-least-once consumption — the substrate the projector sits on:

Log / broker	Ordering unit	Resume mechanism	At-least-once via	Native replay
Kafka	Partition (key → partition)	Committed consumer offset	Manual offset commit after write	Seek to offset / earliest
EventStoreDB	Stream (and global `$all`)	Stored checkpoint position	Catch-up subscription	`fromPosition(0)`
Azure Event Hubs	Partition	Checkpoint (blob store)	Checkpoint after processing	Read from a stored offset/enqueued time
AWS Kinesis	Shard	Sequence number (KCL)	Checkpoint after processing	`TRIM_HORIZON` / `AT_SEQUENCE_NUMBER`
Google Pub/Sub	Ordering key (opt-in)	Ack after processing	Ack post-write; redelivery on nack	Replay via seek to a timestamp/snapshot
Azure Service Bus	Session (for ordering)	Complete/abandon per message	Complete post-write	Limited (not a log; use with outbox)

Catch-up subscriptions and checkpointing

A projector must bootstrap a new read model from the entire history, resume where it left off after a crash or deploy, and catch up if it falls behind live. A catch-up subscription does all three: on startup the worker reads historically from its stored checkpoint to the live head, then transitions seamlessly to a live subscription with no gap or duplicate at the seam. A brand-new projection just starts from position zero — the same code path with an empty checkpoint.

Where you store the checkpoint has real consequences. The cardinal rule: checkpoint into the read store itself, atomically with the projection write. Checkpointing to a separate system on a separate timeline reintroduces the dual-write problem — position and data can disagree after a crash.

Checkpoint location	Atomic with the write?	Crash behaviour	Verdict
Same transactional store, same TX	Yes	Position and data always agree	Best — the default
Same store, separate TX	No	Can advance past unwritten data (skip) or lag (reapply)	Only safe if fully idempotent
Separate database	No	Two timelines drift on failure	Reintroduces dual-write — avoid
Broker-managed offset (auto-commit)	No	Acks events not yet projected → data loss	Never for projections
In-memory only	No	Full replay on every restart	Fine for a cache you happily rebuild; costly at scale

The failure semantics of the checkpoint-vs-write ordering — “which do I write first?” is a real interview question and a real bug:

Order of operations	If it crashes between them	Net effect (with idempotent apply)
Write projection, then checkpoint (separate)	Projection done, checkpoint not advanced	Event reapplied on restart — safe because idempotent
Checkpoint, then write projection (separate)	Checkpoint advanced, projection not done	Event skipped — data loss — never do this
Both in one transaction	Whole transaction rolls back	Event reapplied cleanly — best

The rule: if you must separate them, write the projection first and the checkpoint second, and be idempotent. Skipping is unrecoverable without a rebuild; reapplying is free when idempotent. The single-transaction form sidesteps the question entirely — reach for it whenever the read store is transactional.

Replay and rebuild: read models as disposable caches

Because read models are derived, a schema change is not a destructive migration — it is a rebuild. This is the operational superpower of CQRS, and the blue-green pattern makes it safe with zero read downtime:

Create order_summary_v2 alongside the live order_summary (blue).
Start a new projector that replays the entire log from position zero into v2 (green), using a separate checkpoint row.
When green’s checkpoint catches up to and stays at the live head, atomically swap reads via a view or feature flag.
Keep blue for a rollback window, then drop it.

The swap itself is one transaction, and its inverse is the rollback:

BEGIN;
ALTER TABLE order_summary    RENAME TO order_summary_blue;
ALTER TABLE order_summary_v2 RENAME TO order_summary;
COMMIT;
-- Reads cut over in one transaction; rollback is the inverse rename.

This works only because events carry no read-model concerns: a new projector can interpret the same historical facts into a totally different shape. The one true constraint is that your event schema is forward-compatible — the v2 projector must be able to read every historical event version, which is what upcasters are for (you cannot rewrite history, so you must be able to read the past).

When to reach for a rebuild vs a surgical fix vs a fresh projection:

Situation	Action	Why
New query shape needed	New projection, replay from zero	Additive; blue stays untouched
Schema change to an existing read model	Blue-green rebuild + swap	Zero-downtime; instant rollback
Projector bug corrupted the read model	Reproject (rebuild) from zero	The log is intact; the cache was wrong
A single row is wrong for a known one-off reason	Still prefer reproject the affected key	A manual patch hides the root cause; next replay reintroduces it
Event schema itself was wrong	Cannot rewrite history — add an upcaster + new event version, then reproject	History is immutable

Rebuild cost is the one real tax, and it scales with log size. Techniques to keep replay tractable:

Rebuild-cost lever	What it does	Trade-off
Snapshots (event-sourced write side)	Replay from a periodic state snapshot, not zero	Snapshot maintenance; only helps the write replay, not read projections
Parallel replay by partition/key	Rebuild N streams concurrently	Must preserve per-stream order within each worker
Batched writes during catch-up	Bulk-insert rows historically, index after	Live-mode must switch back to per-event upserts
Filtered subscription	Only replay the streams this projection needs	Requires stream/topic filtering support
Warm the green table off-peak	Run the heavy replay when traffic is low	Longer wall-clock window to catch up

A concrete sense of scale: replaying 50 million events through a projector that sustains 5,000 events/second takes about 2.8 hours of wall clock. That is fine for a weekend blue-green migration and painful for an emergency mid-incident rebuild — which is the argument for catching drift early (next section) rather than discovering it when a customer does.

Multiple read models and ordering

You run multiple read models because each serves a query shape no other can. It is safe to run many because they are independent consumers of the same log — one can be down, slow, or mid-rebuild without affecting the others. One event stream feeding several projections:

Event	`order_summary` (Postgres, customer view)	`parcel_doc` (Cosmos, public tracking)	`order_search` (Elasticsearch, ops search)	`order_daily_totals` (materialized view, dashboard)
`OrderPlaced`	Insert row, status=placed	Upsert doc, status=placed	Index doc	(counted on refresh)
`PaymentCaptured`	Update paid=true	Update doc	Update field	—
`OrderShipped`	Update status, shipped_at	Update status + tracking #	Reindex	—
`OrderDelivered`	Update status=delivered	Update status	Update field	—
`OrderCancelled`	status=cancelled	status=cancelled	Update field	recount on refresh

Each projection has its own checkpoint, own lag, and own rebuild schedule — the independence that is precisely why you never share a read model across shapes (sharing couples their availability and consistency windows).

Ordering is where correctness quietly breaks. The guarantees you have depend on your log and consumer topology:

Ordering scope	Guarantee available	How to preserve it	What breaks if you don’t
Within one stream/aggregate	Total order (must preserve)	One consumer/partition per key; key by aggregate ID	`OrderShipped` applied before `OrderPlaced`
Across aggregates	Usually no meaningful order	Parallelise freely across keys	Nothing — they’re independent
*Across event types* in one stream**	Emission order within the stream	Same-stream ordering already gives it	Causal violations (paid before placed)
Global total order	Only some logs (`$all`, single-partition) provide it	Single partition (kills parallelism) or a global sequence	Rarely needed; expensive to demand

The practical rules for keeping ordering correct at scale:

Rule	Rationale
Key events by aggregate ID on the log	Guarantees all of one aggregate’s events land in one ordered partition
One consumer per partition in a group	No two threads race on the same key
Never assume cross-partition order	Two aggregates’ events can arrive in any relative order
Carry a per-stream version/sequence on events	Lets a projector detect and reject out-of-order or gap conditions
Handle the “update before insert” race defensively	An `UPDATE … WHERE order_id=$1` that matches zero rows may mean the `OrderPlaced` hasn’t been applied — buffer or upsert-with-defaults

That last row is a subtle, common bug: if two events for one order reach two workers (misconfigured partitioning), the OrderShipped update can run before the OrderPlaced insert and silently no-op (zero rows updated), leaving a shipped order that never appears. The structural fix is one-consumer-per-key; the defensive fix is an upsert that creates a stub row on the update path so no event is silently lost.

Closing the consistency gap: read-your-writes and staleness

This is the part users actually feel, and the part most CQRS write-ups hand-wave. A command returns 202 Accepted, the UI navigates to a list, and the new item is not there yet because the async projection has not caught up. You cannot eliminate the gap — but you can hide it from the one user who just made the change, the only user who notices. Everyone else sees a staleness of milliseconds no human perceives.

The mechanisms, cheapest to strongest, with their trade-offs:

Mechanism	How it works	Backend cost	Covers	Weakness
Optimistic UI	Client renders the data it just submitted, reconciles later	Zero	The writer’s own screen, common case	Diverges if the write is later rejected
Version token (read-your-writes)	Command returns the event’s position; client passes it back; read API gates until the projection’s checkpoint ≥ token	One checkpoint read per query	The writer, precisely	Adds a small wait or a `pending` flag
Poll-until-ready	Client polls the read API with the token; API returns `pending` until caught up	Cheap polling	The writer	UX must handle the brief spinner
Sticky strong-read fallback	Route the one critical query straight to the write model (load aggregate by ID)	A write-model read	“Show me the thing I just created”	Defeats the split; use sparingly
Subscribe / push (WebSocket/SSE)	Server pushes the update when the projection catches up	A subscription channel	Live screens	Infra weight of a push channel

The version-token flow is the workhorse. The command returns the log position of the event it produced; the client carries that token on its next query; the read API compares it to the projection’s checkpoint (the same checkpoint the worker advances):

POST /orders            -> 202, body: { "orderId": "...", "version": 481523 }
GET  /orders?after=481523
  -> 200 with the list         if projection checkpoint >= 481523
  -> 200 { "consistencyPending": true, "retryAfterMs": 300 } otherwise

-- The read API gates on the SAME checkpoint row the worker writes:
SELECT commit_pos >= $1 AS is_current
FROM projection_checkpoint
WHERE name = 'order_summary';

The sticky strong-read fallback is the escape hatch for the rare screen that cannot tolerate any staleness: route that one query straight to the write model — load the aggregate by ID. Use it sparingly, since it defeats the split; reserve it for “show me the exact thing I just created,” not list or search views (which cannot be served from a single aggregate anyway).

Which staleness tolerance maps to which mechanism, so you spend effort only where it is perceived:

Screen / read	Staleness tolerance	Recommended mechanism
The item the user just created/edited	Zero (they’ll notice instantly)	Optimistic UI + version token
A list the user just added to	Seconds is fine for others; zero for the adder	Version token for the adder; plain read for others
Public dashboard / analytics	Seconds to minutes	Plain async read; no gating
Full-text search results	Seconds	Plain async read
“Confirm my payment went through”	Zero — legal/financial weight	Sticky strong read or synchronous projection
Someone else’s data you’re viewing	Seconds	Plain async read

The governing principle: pay for consistency only where a human will perceive its absence, and only for that human. Gating every query on a checkpoint adds latency for millions of readers to solve a problem only the writer has; gating the writer’s own next read solves it for exactly the person who cares.

Designing the UI around eventual consistency

The worst outcome is leaking the architecture onto users as spinners and “your change may take a moment” on every screen — an implementation detail pushed into the product. The UI patterns that keep the common path feeling synchronous:

UI pattern	Effect	When to use
Optimistic insert with rollback	The new row appears instantly; removed if the write is rejected	Creates/edits by the current user
Confirmed-state badge	Row shows “pending confirmation” until the projection catches up, then clears	Financial/critical actions
Version-gated refresh	The list refresh waits (briefly) for the checkpoint before showing	The writer’s own list view
Silent reconciliation	Client trusts its optimistic state, quietly re-syncs on the next server response	Low-stakes, high-frequency edits
Explicit “processing” state	A genuinely long async op shows an honest progress state	Long-running commands (batch import)

Monitoring lag and reconciling drift

Two metrics decide whether a CQRS system is healthy, and they fail in opposite ways.

Projection lag is the distance between the live log head and each projection’s checkpoint. Track it as both count and wall-clock time, and alert on time — that is what your read-your-writes SLA is denominated in. Lag creeping up means a projector is falling behind: a slow read store, a poison event, a scaled-down worker.

ProjectionLag_CL
| where TimeGenerated > ago(1h)
| summarize max_lag_seconds = max(lag_seconds_d) by projection_name_s, bin(TimeGenerated, 1m)
| where max_lag_seconds > 10

Drift is the silent killer: a projection caught up (lag zero) but wrong, because an apply-handler bug corrupted data before it was fixed — lag reads zero while the customer sees garbage. The only defense is a periodic reconciliation job that recomputes a checksum or count from the read model and compares it to an independent recompute from the authoritative source (the log or the write model):

-- Reconcile: per-customer order counts in the read model
-- must equal the count derived from the authoritative source.
SELECT rm.customer_id, rm.cnt AS read_model_cnt, src.cnt AS source_cnt
FROM   (SELECT customer_id, count(*) cnt FROM order_summary GROUP BY 1) rm
FULL JOIN (SELECT customer_id, count(*) cnt FROM order_events_placed GROUP BY 1) src
       ON rm.customer_id = src.customer_id
WHERE COALESCE(rm.cnt,0) <> COALESCE(src.cnt,0);

Any non-empty result is drift. The remediation is almost always a targeted reprojection rather than a manual patch, because a patch hides the root cause and the next replay reintroduces the bug. The two metrics and how to act on each:

Metric	Detects	Alert on	Remediation
Projection lag (time)	A projector falling behind live	> your read-your-writes SLA (e.g. 10 s)	Scale the worker; fix the slow store; skip/quarantine a poison event
Projection lag (count)	Backlog size, replay progress	Sustained growth	Same; also informs rebuild ETA
Drift (reconciliation)	Caught-up-but-wrong data	Any non-zero mismatch	Reproject the affected read model from zero
Poison-event rate	Events the projector can’t process	> 0 into a DLQ	Fix the handler; replay from the DLQ
Consumer liveness	A dead/stuck worker	No checkpoint movement for N minutes	Restart; check partition assignment

Architecture at a glance

Follow the diagram left to right. A command enters the write API and is validated by an aggregate, which enforces invariants and emits events. Those events reach the append-only event log reliably — via a transactional outbox if the write side is state-stored, or by appending directly to the stream if it is event-sourced. Independent projection workers consume the log in order (keyed by aggregate ID to preserve per-stream order), apply events idempotently, and commit each read model’s checkpoint in the same transaction as its write. The result is a fan-out of purpose-built read models: Postgres order_summary, a Cosmos document per parcel, and an Azure AI Search index. The read API serves queries from these projections and, for the user who just wrote, gates on the projection’s checkpoint against a version token — returning fresh state or an honest consistencyPending, never stale data dressed as current. Two loops watch it all: a lag monitor alerts when a checkpoint falls behind the head beyond the read-your-writes SLA, and a scheduled reconciliation job recomputes counts from the log and compares them to each read model to catch silent drift, remedied by a targeted reprojection.

Real-world scenario

A logistics platform — call it Portivo — ran order tracking on a single Azure SQL database. Reads had overwhelmed writes: a public “where is my parcel” page, an ops console, and a partner API all hammered the same tables, and a heavy nightly report could stall ingestion at peak. The read:write ratio measured roughly 220:1 in business hours, three distinct query shapes shared one schema, and the reporting run demonstrably added 400–900 ms to write p99. All three CQRS signals were true — justified, not CQRS-as-default.

They moved to CQRS with the write side kept state-stored, emitting events through a transactional outbox into Azure Service Bus via a relay. Three asynchronous projections consumed the stream: a Cosmos DB document per parcel for the public page, a Postgres read model for the ops console, and Azure AI Search for partner search. Each projector committed its checkpoint in the same transaction as its write, upserted by parcel ID for idempotency, and consumed with one consumer per Service Bus session (keyed by parcel ID) to preserve per-parcel order.

The consistency gap surfaced in week one, exactly where the theory predicts. The public tracking page is the screen a customer opens immediately after a status change — the textbook read-your-writes case — and the Cosmos projection ran a few hundred milliseconds behind under load. Customers refreshed, saw the old status, and filed “tracking is broken” tickets. The fix was version tokens: the status-update API returned the event’s sequence number, the page passed it back on its poll, and the read API gated on the Cosmos checkpoint:

// Read API: only serve the projection once it has caught up
// to the version the client was told about.
var checkpoint = await store.GetCheckpointAsync("parcel_tracking");
if (checkpoint < request.AfterVersion)
    return Results.Ok(new { consistencyPending = true, retryAfterMs = 400 });

var parcel = await container.ReadItemAsync<ParcelView>(
    request.ParcelId, new PartitionKey(request.ParcelId));
return Results.Ok(parcel.Resource);

The tickets stopped. Reporting load no longer touched the write database, ingestion throughput became predictable (write p99 dropped under 120 ms), and — the payoff that sold the team on the pattern — a later schema change to the ops read model shipped as a blue-green reprojection over a weekend with zero downtime: a v2 Postgres projection replayed the log from zero into a green table, caught up, and swapped in under one transaction. Six weeks later an apply-handler bug that had briefly miscounted per-customer parcel totals was caught not by a customer but by the nightly reconciliation job, and remediated with a targeted reprojection rather than a manual patch. The lesson Portivo wrote into their guidance: adopt CQRS per context where the numbers justify it, and budget for the consistency gap on day one rather than in production tickets.

Advantages and disadvantages

Advantages	Disadvantages
Read and write models each get an optimal, uncompromised schema	Two models to design, build, and keep aligned
Reads scale independently on cheap denormalized stores	An eventual-consistency gap you must actively manage
Read spikes cannot starve the write path (separate failure domains)	More moving parts: log, projectors, checkpoints, monitoring
Multiple query shapes served by purpose-built stores	Each read model is another pipeline to operate and reproject
Read models are disposable — rebuild on schema change or bug	Rebuild time grows with log size; emergency reprojects can be slow
The event log doubles as an audit trail (especially with event sourcing)	Event versioning and upcasting become permanent obligations
New read models added without touching the write side	Debugging spans producer, log, and consumer — more surface
Read-your-writes machinery localises staleness to who notices	Naive implementations leak staleness onto every screen

The advantages dominate in high-fan-out, multi-shape, spike-sensitive contexts — a public read tier over a transactional core, genuinely divergent query needs, or an audit log required anyway. The disadvantages dominate in simple CRUD contexts, where you have volunteered for a consistency gap and a pipeline to solve a problem an index would have handled. The single best predictor of regret is applying CQRS by mandate rather than measured asymmetry.

Hands-on lab

This lab builds a minimal but real CQRS projection pipeline using Postgres as both the (simulated) event log and the read store, so you see idempotency, checkpointing, replay, and read-your-writes with your own eyes. It is free and needs only Docker and psql.

Step 1 — Start Postgres.

docker run -d --name cqrs-lab -e POSTGRES_PASSWORD=lab -p 5432:5432 postgres:16
sleep 5
export PGPASSWORD=lab
psql -h localhost -U postgres -c "SELECT version();"
# Expected: PostgreSQL 16.x ...

Step 2 — Create the event log and the read model with its checkpoint.

psql -h localhost -U postgres <<'SQL'
CREATE TABLE event_log (
  global_pos bigserial PRIMARY KEY,
  stream_id  text NOT NULL,
  event_type text NOT NULL,
  data       jsonb NOT NULL
);
CREATE TABLE order_summary (
  order_id    text PRIMARY KEY,
  customer_id text,
  total_cents int,
  status      text,
  placed_at   timestamptz
);
CREATE TABLE projection_checkpoint (
  name     text PRIMARY KEY,
  position bigint NOT NULL DEFAULT 0
);
INSERT INTO projection_checkpoint (name) VALUES ('order_summary');
SQL
echo "schema created"

Step 3 — Append some events (the write side’s output).

psql -h localhost -U postgres <<'SQL'
INSERT INTO event_log (stream_id, event_type, data) VALUES
 ('order-1','OrderPlaced', '{"orderId":"order-1","customerId":"cust-A","totalCents":2500,"placedAt":"2026-06-08T10:00:00Z"}'),
 ('order-1','OrderShipped','{"orderId":"order-1","shippedAt":"2026-06-08T12:00:00Z"}'),
 ('order-2','OrderPlaced', '{"orderId":"order-2","customerId":"cust-B","totalCents":900,"placedAt":"2026-06-08T11:00:00Z"}');
SQL
echo "3 events appended"

Step 4 — Run the projector once (idempotent apply + checkpoint in one transaction). This SQL function projects every event after the stored checkpoint and advances it atomically.

psql -h localhost -U postgres <<'SQL'
CREATE OR REPLACE FUNCTION project_order_summary() RETURNS void AS $$
DECLARE ev RECORD; cp bigint;
BEGIN
  SELECT position INTO cp FROM projection_checkpoint WHERE name='order_summary' FOR UPDATE;
  FOR ev IN SELECT * FROM event_log WHERE global_pos > cp ORDER BY global_pos LOOP
    IF ev.event_type = 'OrderPlaced' THEN
      INSERT INTO order_summary (order_id, customer_id, total_cents, status, placed_at)
      VALUES (ev.data->>'orderId', ev.data->>'customerId', (ev.data->>'totalCents')::int,
              'placed', (ev.data->>'placedAt')::timestamptz)
      ON CONFLICT (order_id) DO NOTHING;                 -- idempotent create
    ELSIF ev.event_type = 'OrderShipped' THEN
      UPDATE order_summary SET status='shipped' WHERE order_id = ev.data->>'orderId';
    END IF;
    UPDATE projection_checkpoint SET position = ev.global_pos WHERE name='order_summary';
  END LOOP;
END; $$ LANGUAGE plpgsql;

SELECT project_order_summary();
SELECT order_id, status, total_cents FROM order_summary ORDER BY order_id;
SELECT position FROM projection_checkpoint WHERE name='order_summary';
SQL

Expected output: order-1 | shipped | 2500, order-2 | placed | 900, and checkpoint position = 3.

Step 5 — Prove idempotency and resume. Run the projector again with no new events: nothing changes and the checkpoint holds. Then append a new event and run once more — only the new one is applied.

psql -h localhost -U postgres -c "SELECT project_order_summary();"   # no-op, checkpoint stays 3
psql -h localhost -U postgres -c \
 "INSERT INTO event_log (stream_id,event_type,data) VALUES ('order-2','OrderShipped','{\"orderId\":\"order-2\"}');"
psql -h localhost -U postgres -c "SELECT project_order_summary();"
psql -h localhost -U postgres -c "SELECT order_id,status FROM order_summary ORDER BY order_id;"
# Expected: order-2 now 'shipped'; checkpoint advanced to 4

Step 6 — Simulate a rebuild (read model is a disposable cache). Wipe the read model and reset the checkpoint, then reproject the whole log from zero — you get the identical result, proving the read model is derivable.

psql -h localhost -U postgres <<'SQL'
TRUNCATE order_summary;
UPDATE projection_checkpoint SET position = 0 WHERE name='order_summary';
SELECT project_order_summary();          -- replays the WHOLE log
SELECT order_id, status FROM order_summary ORDER BY order_id;
SQL
echo "rebuilt from position zero — identical result"

Step 7 — Read-your-writes gate. A read API would compare a client’s version token to the checkpoint. Simulate it: the token 4 is served (checkpoint ≥ 4); a token 9 returns “pending.”

psql -h localhost -U postgres -c \
 "SELECT position >= 4 AS serve_token_4, position >= 9 AS serve_token_9 FROM projection_checkpoint WHERE name='order_summary';"
# Expected: serve_token_4 = t (true), serve_token_9 = f (false → consistencyPending)

Step 8 — Reconciliation (drift check). Compare read-model order count to the count derived from the log; a zero-row result means no drift.

psql -h localhost -U postgres <<'SQL'
SELECT * FROM (
  SELECT count(*) rm FROM order_summary) a,
  (SELECT count(*) src FROM event_log WHERE event_type='OrderPlaced') b
WHERE a.rm <> b.src;   -- empty result = consistent
SQL
echo "reconciliation ran (empty = no drift)"

Step 9 — Teardown.

docker rm -f cqrs-lab
echo "lab environment removed"

You have now built, in miniature, every load-bearing mechanic of the article: an ordered log, an idempotent projector, a transactional checkpoint, a full rebuild, a read-your-writes gate, and a reconciliation check. Scaling to production means swapping the event_log table for Kafka/EventStoreDB/Event Hubs, running the projector as a long-lived catch-up subscription, and adding the second and third read stores.

Common mistakes & troubleshooting

The projection pipeline has a characteristic set of failures; knowing them by name turns a two-hour mystery into a two-minute diagnosis. Each row below is symptom → root cause → how to confirm → fix:

#	Symptom	Root cause	How to confirm	Fix
1	Duplicate rows after a redeploy or crash	Non-idempotent apply (plain `INSERT`)	Query for `count(*) > 1` per aggregate key	Natural upsert (`ON CONFLICT`) or dedupe guard; reproject to clean
2	An order that exists is missing from a read model	Event skipped — checkpoint advanced before/without the write	Diff read-model keys vs `OrderPlaced` events in the log	Checkpoint in the same TX; if separate, write-then-checkpoint; reproject
3	“Shipped” order with no base row	Out-of-order apply (two consumers on one key)	Check partition/session assignment; look for zero-row `UPDATE`s	One consumer per partition/session keyed by aggregate ID; upsert-with-stub
4	Projector crash-loops, whole projection stalls	Poison event the handler can’t process	Logs repeat on the same `global_pos`/offset	Route to a DLQ and skip; fix the handler; replay from DLQ
5	Read model caught up (lag=0) but data is wrong	Silent drift from an earlier handler bug	Reconciliation query returns non-zero rows	Reproject the read model from zero
6	Read-your-writes SLA breached under load	Runaway lag — slow store or under-scaled worker	Lag metric (time) climbing past the SLA	Scale the worker; batch writes during catch-up; faster read store
7	After a crash, position and data disagree	Checkpoint in a separate store/transaction (dual write)	Compare stored checkpoint to `max(applied position)` in the read model	Move the checkpoint into the read store, committed in the same TX
8	Counters/totals are too high	Accumulation double-counted on redelivery (`x = x + delta`)	Reconcile the counter against a log-derived count	Recompute from a stable base carried on the event, not a running delta
9	New projector can’t read historical events	Event schema changed with no upcaster for the old version	Projector throws on old `event_type`/version	Add upcasters; version events; never edit history
10	Emergency reproject won’t finish in time	Single-threaded replay over a huge log	Estimate: events ÷ throughput = wall clock	Parallel replay by key/partition; batch inserts; snapshots on the write side
11	Auto-committed offsets, events lost after crash	Broker auto-commit acked events not yet projected	`enable.auto.commit` is `true` (default in some clients)	`enable.auto.commit=false`; `commitSync` after the DB write
12	“Synchronous” projection to another store is inconsistent	Cross-store synchronous write = disguised dual write	Two writes, two transactions, no atomicity	Make it async + outbox, or keep the projection in the same TX store

Best practices

Decide CQRS per bounded context, by measured asymmetry — never by mandate. Verify a real read:write ratio, multiple genuine query shapes, or read-driven write contention before splitting. “We use CQRS everywhere” is a smell.
Try the cheap alternatives first. An index, a read replica, or a materialized view solves most “we need CQRS” conversations. Escalate only when they genuinely fail the schema conflict.
Keep events immutable, past-tense, versioned, and free of read-model concerns. The event is the contract; a projection change must never require touching the write side.
Make every projector idempotent. Prefer naturally idempotent operations (upsert, set) over accumulations; when you must accumulate, recompute from a base on the event.
Commit the checkpoint in the same transaction as the read-model write. If you cannot, always write the projection first and the checkpoint second — never the reverse (skipping is unrecoverable; reapplying is free when idempotent).
One consumer per ordering domain. One worker or partition per aggregate key preserves per-stream order; never assume cross-partition order.
Default to asynchronous projections. Reserve synchronous for the rare view that must be strongly consistent and lives in the same transactional store as the write model.
Treat read models as disposable caches. Schema changes and bug fixes are blue-green rebuilds, not destructive migrations or manual patches.
Gate read-your-writes only for the writer. Return a version token from commands; gate the writer’s own next query on the checkpoint; leave everyone else on plain reads.
Alert on projection lag in wall-clock time, tied to the read-your-writes SLA, and run a scheduled reconciliation job to catch silent drift.
Design the UI to hide staleness on the common path with optimistic rendering, so eventual consistency never becomes the default user experience.
Keep the event schema forward-compatible with upcasters so any future projector can read all of history — you can never rewrite the log.

Security notes

Events can carry sensitive data forever. Because the log is append-only and replayable, PII in an event is effectively permanent and re-read on every rebuild. Minimise sensitive fields; where you must carry them, encrypt the payload (envelope encryption) so deleting the key crypto-shreds the data — the practical answer to “right to erasure” over an immutable log.
Least privilege per read store. Each projector needs write access only to its read store and read access only to the log. Grant scoped credentials (a Postgres role limited to order_summary, a Cosmos key scoped to one container) so a compromised projector cannot corrupt sibling read models.
Protect the log itself — it is the source of truth. Use append-only permissions for producers, restrict who can seek/replay, and audit administrative access; a tampered log corrupts every read model on the next rebuild.
Do not leak internal topology through read APIs. Return status, not the checkpoint position, projector health, or dependency hostnames. A consistencyPending flag is fine; internal positions are not for external callers.
Secure the reconciliation and rebuild tooling. These jobs read all history and can rewrite read models — treat them as privileged operations behind managed identity and RBAC, not scripts anyone can run.
Version tokens are not secrets, but they leak ordering. Don’t let a token double as an authorization grant — authorise the request independently of the token it carries.

Cost & sizing

The bill drivers of a CQRS pipeline, and how to keep them sane:

Cost driver	What you pay for	Rough INR / month (small prod)	What it buys	Watch-out
Event log / broker	Managed Kafka / Event Hubs / EventStoreDB throughput + retention	~₹8,000–25,000	Ordered, durable, replayable events	Retention long enough to rebuild; TU/partition sizing
Read store: relational	Postgres/Azure SQL compute + storage	~₹6,000–20,000	Keyed/joined queries, materialized views	Index bloat; right-size to read load
Read store: document	Cosmos/Mongo RU/s or vCore	~₹8,000–30,000	Whole-screen single reads	RU/s provisioning; partition-key design
Read store: search	Elasticsearch/AI Search units	~₹10,000–40,000	Full-text, facets, relevance	Sized by index size + query rate
Projection compute	Long-running worker containers	~₹4,000–12,000	The projectors themselves	One per ordering domain; scale by lag
Monitoring / telemetry	Lag + reconciliation metrics ingestion	~₹1,000–3,000	Detecting lag and drift	Sample high-volume signals

Sizing rules of thumb:

Log retention must exceed your worst-case rebuild need. If you ever want to reproject from zero, the log must still hold position zero — otherwise you need snapshots or an archived log. Retention is a correctness parameter, not just a cost knob.
Size projection compute by lag, not CPU. A projector is I/O-bound on the read store; scale workers (or partitions) when lag time creeps toward your SLA, not when CPU is high.
Denormalized read stores trade storage for read speed — you are deliberately storing the same fact many times. Budget for read storage being several times the write model’s size; that is the pattern working, not waste.
Rebuild is a periodic burst cost. A full reproject temporarily doubles a read store (blue + green) and spikes broker read throughput. Plan capacity for the migration window, then release it.
The cheapest CQRS is the one you didn’t build. The largest cost saving is refusing CQRS for contexts that don’t need it — an index or a read replica is an order of magnitude cheaper to run than a projection pipeline.

Interview & exam questions

1. What exactly does CQRS separate, and what does it not require? CQRS separates the model used to change state (commands) from the model used to read it (queries) — two models, often two stores, shaped independently. It does not require event sourcing; you can run CQRS off a state-stored write side that emits events via a transactional outbox. Coupling CQRS to event sourcing is a deliberate choice, not a prerequisite.

2. Why is “exactly-once delivery” a myth, and what do you build instead? Over a network the acknowledgement of a delivered message can itself be lost, forcing the sender to redeliver — so no system can guarantee exactly-once delivery. You build at-least-once delivery plus idempotent application, which yields exactly-once effects: the read model reaches the same state whether an event arrives once or many times.

3. Where should a projector store its checkpoint, and why? In the read store itself, committed in the same transaction as the read-model write. That makes the position advance if and only if the projection did, so a crash can never leave them disagreeing. Checkpointing to a separate system on a separate timeline reintroduces the dual-write problem CQRS was meant to escape.

4. If you must write the projection and the checkpoint separately, which goes first? Always write the projection first, then the checkpoint — and be idempotent. A crash between them then only causes a harmless reapply on restart. The reverse order (checkpoint first) can skip an event, which is unrecoverable without a full rebuild.

5. Why is one consumer per partition/key essential? Events for one aggregate must be applied in emission order (you can’t ship an order before placing it). Keying events by aggregate ID and running one consumer per partition guarantees all of an aggregate’s events are processed in order by a single worker. Two consumers on one key can apply events out of order and corrupt the read model.

6. What is drift, why is it more dangerous than lag, and how do you catch it? Lag is a projection being behind (visible, self-healing as it catches up). Drift is a projection being caught up but wrong — a past handler bug corrupted the data, and lag reads zero while customers see garbage. You catch drift with a scheduled reconciliation job that recomputes counts/checksums from the log and compares them to the read model; the fix is a reprojection, not a manual patch.

7. Explain read-your-writes and why you don’t just make every read strongly consistent. After a write, the writer’s UI may query a projection that hasn’t caught up, showing stale data. Read-your-writes has the command return the event’s position (a version token); the client passes it back and the read API gates on the projection’s checkpoint, serving the writer only once caught up. You gate only the writer’s query because making every read strongly consistent adds latency for millions of readers to solve a problem only the one writer has.

8. When is a synchronous projection appropriate, and what’s the trap? For a rare read model that cannot tolerate any staleness and is cheap to maintain inline. The trap is that if the “synchronous” projection writes to a different store in a different transaction, it’s a disguised dual write — a crash between the two leaves them inconsistent. Synchronous projection is only truly safe in the same transactional store as the write model; cross-store should be async + outbox.

9. How do you change a read model’s schema with zero downtime? Blue-green rebuild: create the v2 table, start a new projector that replays the whole log from zero into it (separate checkpoint), wait until it catches up to and holds at the head, then swap reads atomically (rename/view/flag). Keep the old table for a rollback window. This works because read models are disposable caches derived from the log.

10. What constrains your ability to reproject from zero, and how do you make big rebuilds tractable? The log must still hold position zero — retention is a correctness parameter. Big rebuilds are made tractable by parallel replay across partitions/keys, batched bulk writes during catch-up (indexing after), filtered subscriptions that only replay relevant streams, and (for event-sourced write sides) snapshots that shorten write-side replay.

11. Give three signals that a context genuinely needs CQRS, and three that it doesn’t. Needs it: a read:write ratio far above 10:1, three-plus query shapes no single schema serves, or reporting demonstrably stalling writes. Doesn’t: a simple CRUD panel with one query shape, reporting whose shape matches the OLTP schema (a read replica fits), or “it’s our standard.” The single best predictor of regret is applying CQRS by mandate rather than measured asymmetry.

12. How do idempotent and non-idempotent apply operations differ, and how do you handle an increment? Set/replace and keyed upsert are naturally idempotent — reapplying yields the same state. Increment/append-with-count are not — redelivery double-counts. Handle an increment by recomputing the derived value from a stable base carried on the event (total = order_base - total_discounts) rather than accumulating a delta, or guard it with a processed_events(event_id) dedupe table in the same transaction.

These map to AWS Certified Solutions Architect – Professional and Azure Solutions Architect Expert (AZ-305) for the event-driven, decoupling, and data-store-selection domains; the delivery-guarantee and idempotency material is core distributed-systems interview territory at the senior/staff level. A compact mapping for revision:

Question theme	Where it’s tested
CQRS vs event sourcing; when to use	AZ-305, SA-Pro; architecture interviews
Delivery guarantees; exactly-once effects	Distributed-systems / staff interviews
Idempotency & checkpointing	Senior backend interviews; event-driven design
Read-store selection per query shape	AZ-305, SA-Pro (data-store selection)
Eventual consistency & read-your-writes	Distributed-systems; system-design interviews
Replay / blue-green rebuilds	Senior architecture interviews

Quick check

Your team says “we should use CQRS everywhere as a standard.” Give the one-sentence reason this is a smell, and name the two cheaper alternatives you’d try first for a typical context.
A projector crashes mid-batch and restarts; afterward the read model has two rows for one order. What single property was missing, and what’s the fix?
True or false: you should write the checkpoint before the read-model write so you never reprocess an event. Explain.
A read model shows lag = 0, but a customer reports their order count is wrong. What is this called, why didn’t lag catch it, and how do you find and fix it?
After a write, the user’s own list doesn’t show the new item. What mechanism fixes this for that user without slowing down everyone else’s reads, and what does the command need to return?

Answers

It’s a smell because CQRS is a per-bounded-context decision justified by measured asymmetry (read:write ratio, multiple query shapes, read-driven write contention), not a global mandate — applied everywhere it adds a consistency gap and a pipeline for zero benefit. The two cheaper alternatives to try first are a read replica (for throughput on the same shape) and a materialized view / index (for roll-ups or a slow query).
Idempotency was missing — a plain INSERT inserted a duplicate on redelivery. Fix it with a natural upsert (INSERT … ON CONFLICT (order_id) DO NOTHING) or a processed_events dedupe guard, then reproject to clean the existing duplicate.
False. Writing the checkpoint first means a crash between checkpoint and projection skips the event permanently (unrecoverable without a rebuild). Always write the projection first, then the checkpoint, and be idempotent so a crash only causes a harmless reapply — or commit both in one transaction.
It’s drift — the projection is caught up but wrong, from a handler bug that corrupted data before it was fixed. Lag can’t catch it because lag only measures how far behind the checkpoint is, not correctness. Find it with a reconciliation job that recomputes the count from the event log and compares it to the read model; fix it by reprojecting the read model from zero, not by patching the row.
Read-your-writes via a version token: the command returns the position of the event it produced; the client passes that token on its next query, and the read API gates that query on the projection’s checkpoint (serving the writer only once it’s caught up, or returning consistencyPending). Because you gate only the writer’s own query, everyone else stays on fast plain reads. The command must return the event’s log position / version token.

Glossary

CQRS (Command Query Responsibility Segregation) — using one model to change state (commands) and a separate, purpose-built model to read it (queries); often two stores.
Event — an immutable, named fact in the past tense (OrderPlaced); the contract between write and read sides.
Write model / aggregate — the normalized, invariant-enforcing source of truth; the aggregate loads state, validates a command, and emits events (one stream per instance).
Read model / projection — a denormalized store shaped for exactly one query pattern; a disposable, rebuildable cache derived from the log.
Projector / projection worker — a long-running consumer that reads events in order and applies them to a read model.
Checkpoint — the projector’s remembered position in the log; committed atomically with the read-model write so the two never disagree.
Catch-up subscription — a consumer that reads history from its checkpoint to the live head, then transitions to live with no gap or duplicate at the seam.
Idempotent apply — applying the same event twice produces the same read-model state as applying it once; the defense against at-least-once redelivery.
Exactly-once effects — the result is as if each event were applied once, achieved by at-least-once delivery plus idempotent application (versus the myth of exactly-once delivery).
Eventual consistency — read models converge to reflect the write eventually, with a (usually small) window where they lag the log.
Read-your-writes — ensuring the user who made a change sees it on their next read, typically via a version token gated on the projection checkpoint.
Projection lag — the distance between the live log head and a projection’s checkpoint, tracked as time and count; the freshness SLA.
Drift — a projection that is caught up (lag zero) but wrong, due to a past handler bug; detected by reconciliation, fixed by reprojection.
Replay / reprojection / rebuild — regenerating a read model from position zero (blue-green for zero downtime); enables schema changes and drift repair.
Transactional outbox — writes events to an outbox row in the same transaction as the state change, then relays them to the log, avoiding dual writes.
Upcaster — a function that transforms an old event version into the current schema on read, keeping the log forward-compatible with new projectors.

Next steps

You can now decide whether CQRS fits a context and build a projection pipeline that is idempotent, crash-safe, rebuildable, and honest about consistency. Build outward:

Next: Event Sourcing in Production: Aggregate Design, Snapshots, and Projection Rebuilds — when you decide the event log should be your system of record, not just a delivery channel.
Related: Building the Transactional Outbox and Inbox Pattern for Exactly-Once Event Publishing — how a state-stored write side reliably gets events onto the log without a dual write.
Related: Designing Idempotent APIs and Deduplication for Reliable Distributed Systems — the idempotency mechanics your projectors depend on, generalised.
Related: Multi-Region Data: Choosing Replication and Consistency Without Losing Writes — the consistency-model vocabulary behind “eventual,” and what stronger guarantees cost.
Related: Implementing Distributed Transactions with Sagas: Orchestration vs Choreography in Depth — coordinating multi-step business processes across the aggregates your events describe.
Related: Database Selection 101: SQL, NoSQL, and When to Use Each — picking the right store for each read model’s query shape.