Every distributed system I have reviewed that “loses events” has the same root cause, and it is almost never the broker. It is the assumption that a service can write to its database and publish to Kafka or Service Bus in one atomic step. It cannot. The database commit and the broker publish are two independent network operations, and any failure between them either loses the event or duplicates it. This article builds the fix end to end: a transactional outbox so producers never lose an event, a change-data-capture relay so events ship reliably, and a consumer inbox so handlers process each event exactly once even when the broker redelivers.
Step 1 – Understand the dual-write problem
Consider an order service that must persist an order and publish OrderPlaced. The naive code does two writes:
// BROKEN: dual write with no atomicity
await _db.SaveOrderAsync(order); // 1. commits to Postgres
await _producer.PublishAsync(orderPlaced); // 2. publishes to Kafka
There is no atomicity spanning these two systems. Walk the failure modes:
- The process crashes after line 1 but before line 2. The order exists; the event was never published. Downstream services (billing, inventory) never react. Silent data loss.
- Line 1 commits, line 2 publishes, but the publish acknowledgement is lost on the wire. Your retry republishes. Now you have a duplicate
OrderPlaced. - You reverse the order (publish first, then save). Now a publish followed by a save failure emits an event for an order that does not exist – a phantom.
You cannot solve this with a distributed transaction. Two-phase commit (XA) across a relational database and Kafka is not generally supported, couples your services to a transaction coordinator, and holds locks across network boundaries. The industry abandoned 2PC for exactly these reasons. The outbox sidesteps it entirely by making the event part of the same local transaction as the business data.
The core insight: turn the dual write into a single write to one transactional resource (your database), then relay asynchronously. That converts an unsolvable atomicity problem into a solvable delivery problem.
Step 2 – Design the outbox table, envelope, and ordering keys
The outbox is a table in the same database and same schema as your business data, so a single ACID transaction covers both. Here is a Postgres schema:
CREATE TABLE outbox (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
aggregate_type TEXT NOT NULL, -- e.g. 'order'
aggregate_id TEXT NOT NULL, -- routing/ordering key
event_type TEXT NOT NULL, -- e.g. 'OrderPlaced'
payload JSONB NOT NULL, -- the event envelope
headers JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
-- only used by the polling publisher (Step 3a); CDC ignores it
published_at TIMESTAMPTZ NULL
);
-- Polling publisher reads the unpublished backlog in creation order.
CREATE INDEX idx_outbox_unpublished
ON outbox (created_at)
WHERE published_at IS NULL;
aggregate_id is the single most important column. It is both your partition key (so all events for one order land on the same partition and stay ordered) and your idempotency anchor downstream.
The payload is a self-describing envelope, never a raw domain object. Version it from day one, because consumers will outlive your current schema:
{
"eventId": "0f7c0b2e-2b1a-4f9e-9b7e-2c8a1d3f4a5b",
"eventType": "OrderPlaced",
"eventVersion": 1,
"aggregateType": "order",
"aggregateId": "ORD-10042",
"occurredAt": "2026-06-08T09:14:32.118Z",
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
"data": {
"orderId": "ORD-10042",
"customerId": "CUST-77",
"totalCents": 14999,
"currency": "EUR"
}
}
eventId is the deduplication token the inbox keys on. traceId carries W3C Trace Context for end-to-end correlation (Step 7).
The producer write becomes one transaction with no broker call at all:
await using var tx = await _db.BeginTransactionAsync();
await _db.SaveOrderAsync(order, tx);
var envelope = OutboxEnvelope.From(new OrderPlaced(order));
await _db.InsertOutboxAsync(new OutboxRow
{
AggregateType = "order",
AggregateId = order.Id, // partition + ordering key
EventType = "OrderPlaced",
Payload = envelope
}, tx);
await tx.CommitAsync(); // order + event commit atomically, or neither does
If the commit fails, neither the order nor the event exists. If it succeeds, both are durable. The dual write is gone.
Step 3 – Relay events out of the outbox
The relay moves rows from the outbox to the broker. There are two production-grade approaches. Pick based on your latency budget and operational appetite.
3a. Polling publisher
A background worker polls for unpublished rows, publishes them, and marks them done. Use FOR UPDATE SKIP LOCKED so multiple relay instances can run concurrently without double-publishing or blocking each other:
-- Each relay worker claims a disjoint batch.
SELECT id, aggregate_id, event_type, payload, headers
FROM outbox
WHERE published_at IS NULL
ORDER BY created_at
FOR UPDATE SKIP LOCKED
LIMIT 100;
var batch = await _db.ClaimUnpublishedBatchAsync(100);
foreach (var row in batch)
{
await _producer.ProduceAsync(
topic: $"{row.AggregateType}.events",
key: row.AggregateId, // partitioning key
value: row.Payload);
await _db.MarkPublishedAsync(row.Id);
}
This is at-least-once: if the worker crashes after ProduceAsync but before MarkPublishedAsync, the row is reclaimed and republished on restart. Duplicates are expected and handled by the inbox. Polling is simple and has no external dependency, but it trades latency (poll interval) and adds read load against your primary.
3b. CDC with Debezium (log tailing)
Change-data-capture tails the database’s write-ahead log instead of querying the table. There is no polling, no published_at update, and near-real-time latency. Debezium’s outbox event router SMT reshapes each inserted outbox row into a properly-keyed broker message. With Postgres, enable logical replication first:
# postgresql.conf
wal_level = logical
max_replication_slots = 4
max_wal_senders = 4
Then register the connector against Kafka Connect:
{
"name": "order-outbox-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "orders-db",
"database.port": "5432",
"database.user": "debezium",
"database.password": "${file:/secrets/db.properties:password}",
"database.dbname": "orders",
"topic.prefix": "ordersvc",
"table.include.list": "public.outbox",
"plugin.name": "pgoutput",
"transforms": "outbox",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
"transforms.outbox.route.by.field": "aggregate_type",
"transforms.outbox.table.field.event.key": "aggregate_id",
"transforms.outbox.table.field.event.payload": "payload",
"transforms.outbox.route.topic.replacement": "${routedByValue}.events"
}
}
The router uses aggregate_id as the Kafka message key (preserving per-aggregate ordering) and routes to a topic per aggregate_type. Because Debezium reads the WAL, it captures every committed insert exactly as the database serialized it – you get at-least-once delivery with the database log as the durable source of truth, and connector offsets survive restarts.
Rule of thumb: start with the polling publisher unless you already operate Kafka Connect. CDC is superior at scale and latency, but Debezium plus a replication slot is real infrastructure to own. A neglected, lagging replication slot will pin WAL and eventually fill your disk – monitor it (Step 7).
Step 4 – Build the consumer inbox for exactly-once processing
The broker gives you at-least-once delivery, which means duplicates are guaranteed, not hypothetical. “Exactly-once processing” is achieved on the consumer by making handling idempotent: record every processed eventId and skip repeats. The inbox table is that ledger, and – critically – it lives in the consumer’s own database so the dedup check and the business update commit in one transaction.
CREATE TABLE inbox (
event_id UUID PRIMARY KEY, -- from envelope.eventId
consumer TEXT NOT NULL, -- which logical consumer
processed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
The handler attempts to claim the event by inserting its id. The primary key makes the claim atomic and the unique-violation tells you it is a duplicate:
await using var tx = await _db.BeginTransactionAsync();
var claimed = await _db.TryInsertInboxAsync(envelope.EventId, "billing", tx);
if (!claimed)
{
await tx.RollbackAsync();
ack(); // already processed -> ack and move on
return;
}
await _db.ApplyBillingSideEffectAsync(envelope.Data, tx); // business work
await tx.CommitAsync(); // inbox row + side effect commit together
ack();
TryInsertInboxAsync runs INSERT ... ON CONFLICT (event_id) DO NOTHING and returns whether a row was inserted. Because the inbox insert and the business side effect share one transaction, you can never apply the side effect without recording the dedup key, and you can never record the key without applying the side effect. A redelivery after the commit hits the primary key, is recognized as a duplicate, and is acked without re-running the work.
This is “effectively-once” end to end. The producer is at-least-once via the outbox, the broker is at-least-once, and the consumer collapses duplicates via the inbox. True broker-level exactly-once (Kafka transactions /
processing.guarantee=exactly_once_v2) only covers consume-transform-produce within Kafka – it does not extend to your external database. The inbox is what makes the side effect idempotent, so prefer it for any handler that touches an external system.
Step 5 – Preserve per-aggregate ordering and partition keys
Ordering is only ever guaranteed per partition, and only when one aggregate maps to one partition. Always set the message key to aggregate_id:
// Kafka: same key -> same partition -> ordered per aggregate.
await _producer.ProduceAsync("order.events",
new Message<string, string> { Key = order.Id, Value = payload });
For Azure Service Bus, the equivalent is SessionId. Enable sessions on the queue/subscription and stamp the aggregate id:
var message = new ServiceBusMessage(payload)
{
SessionId = order.Id, // FIFO within a session
MessageId = envelope.EventId, // broker-side dedup hint
ContentType = "application/json"
};
await _sender.SendMessageAsync(message);
Two non-obvious traps:
- Kafka producer retries can reorder within a partition. Set
enable.idempotence=true(the modern default in current clients), which pinsmax.in.flight.requests.per.connection<=5and guarantees ordered, dedup’d writes per partition. - Service Bus
MessageIddedup requires duplicate detection enabled on the entity, and it only deduplicates within the configured detection window. It is a best-effort optimization, not a replacement for the inbox. Keep the inbox as the authoritative dedup.
Repartitioning later is painful: changing partition count rehashes keys and breaks the “one aggregate, one partition” invariant for in-flight ordering. Size partitions generously up front.
Step 6 – Handle poison messages, dead-letters, and replay
Some messages will never succeed – a schema your consumer cannot parse, a referenced entity that was hard-deleted. Retrying them forever blocks the partition and starves healthy traffic. Bound the retries, then dead-letter.
Service Bus has a native dead-letter sub-queue. Cap delivery attempts and the broker moves poison messages aside automatically:
az servicebus queue update \
--resource-group rg-messaging \
--namespace-name sb-orders-prod \
--name billing-commands \
--max-delivery-count 5
After 5 failed deliveries the message lands in billing-commands/$DeadLetterQueue with the failure reason in its properties, where an operator or a remediation job can inspect and replay it. Kafka has no built-in DLQ, so the convention is an explicit dead-letter topic after a retry budget:
if (attempt >= maxAttempts)
{
await _producer.ProduceAsync("billing.events.DLT",
new Message<string, string> { Key = key, Value = payload });
consumer.Commit(result); // advance past the poison message
return;
}
The outbox gives you a second superpower here: replay. Because the outbox retains the original events, recovering from a botched consumer deploy does not require the broker to still hold the messages. Re-emit a window from the source of truth:
-- Re-publish everything an aggregate emitted in an incident window.
SELECT id, aggregate_id, event_type, payload
FROM outbox
WHERE aggregate_type = 'order'
AND created_at BETWEEN '2026-06-07 00:00Z' AND '2026-06-07 06:00Z'
ORDER BY created_at;
Feed those rows back through the relay. Consumers that already processed them no-op via the inbox, so replay is safe to run broadly – the inbox makes redelivery idempotent by construction.
Step 7 – Observe publish lag, relay backlog, and correlation
You cannot operate this pattern blind. Three signals matter most.
Relay backlog – how many events are stuck in the outbox. A growing backlog means the relay is down or behind:
SELECT count(*) AS unpublished,
now() - min(created_at) AS oldest_age
FROM outbox
WHERE published_at IS NULL; -- for CDC, alert on connector lag instead
Replication slot lag (CDC only) – a Debezium slot that stops advancing pins WAL and threatens the database. Watch retained bytes:
SELECT slot_name,
pg_size_pretty(
pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)
) AS retained_wal
FROM pg_replication_slots
WHERE slot_type = 'logical';
Consumer lag – standard Kafka offset lag per consumer group, scraped by Prometheus / Kafka Lag Exporter and alerted in your platform. In Azure, query Service Bus active and dead-letter message counts. A KQL alert over the metric:
AzureMetrics
| where ResourceProvider == "MICROSOFT.SERVICEBUS"
| where MetricName == "DeadletteredMessages"
| summarize maxDLQ = max(Maximum) by Resource, bin(TimeGenerated, 5m)
| where maxDLQ > 0
For end-to-end correlation, the traceId in the envelope (Step 2) is the thread that stitches producer span -> relay -> broker -> consumer span into one distributed trace. Restore it on the consumer side so the handler’s spans link back to the originating request:
var parentCtx = Propagators.DefaultTextMapPropagator.Extract(
default, envelope.Headers, (h, k) => h.TryGetValue(k, out var v) ? new[]{v} : Array.Empty<string>());
using var activity = _activitySource.StartActivity(
"process OrderPlaced", ActivityKind.Consumer, parentCtx.ActivityContext);
Now a single trace shows the order commit, the relay hop, and every downstream handler – including publish lag as the gap between producer and consumer spans.
Step 8 – Operational runbook
The outbox grows forever unless you prune it. Decoupling cleanup from publish matters: never delete on publish, because you want a retention window for replay and audit. Run a separate retention job:
-- Nightly: delete published events older than the replay window.
DELETE FROM outbox
WHERE published_at IS NOT NULL
AND published_at < now() - interval '7 days';
For CDC (where published_at is never set), retain by created_at and tune the window to cover your worst-case incident replay. Batch the delete (LIMIT in a loop) on hot tables so you never hold a long lock or bloat autovacuum.
Failure recovery checklist for an on-call engineer:
| Symptom | Likely cause | Action |
|---|---|---|
| Outbox backlog climbing | Relay worker / connector down | Restart relay; confirm broker reachable; backlog drains automatically (at-least-once) |
retained_wal growing |
Debezium slot stalled or removed | Restart connector; if slot was dropped, recreate it before WAL fills disk |
| Duplicate side effects | Inbox bypassed or wrong dedup key | Verify handler keys on eventId and inbox insert shares the business transaction |
| Out-of-order processing | Key not set to aggregate_id, or idempotence off |
Fix producer key; set enable.idempotence=true / Service Bus SessionId |
| Messages in DLQ | Poison payload / missing dependency | Inspect dead-letter reason; fix consumer; replay from outbox or DLQ |
Enterprise scenario
A payments platform team I worked with ran a ledger service on Postgres and published PaymentCaptured to Kafka with a plain dual write. During a routine broker rolling restart, the producer’s publish calls timed out after the ledger had committed. The application’s retry then succeeded – but for thousands of payments the first attempt had also landed before the timeout fired client-side. Downstream the reconciliation service double-counted captures, and finance opened a severity-1 when the daily settlement file overstated revenue by a six-figure sum.
The constraint that shaped the fix: reconciliation was a third-party system they could not modify to be idempotent, and regulators required a replayable audit trail of every emitted event for seven years. They could not simply “turn on Kafka exactly-once” – it would not have covered the external reconciler, which was the actual point of duplication.
They moved to outbox plus CDC. The ledger write and the PaymentCaptured envelope now commit in one Postgres transaction; Debezium tails the WAL and keys each message on payment_id. For the unmodifiable reconciler, they inserted a thin idempotent inbox shim in front of it, keyed on the envelope eventId:
INSERT INTO reconciler_inbox (event_id, processed_at)
VALUES (@eventId, now())
ON CONFLICT (event_id) DO NOTHING;
-- forward to reconciler only when this INSERT actually inserted a row
The shim forwards a payment to reconciliation only when the insert affected a row, collapsing every Kafka redelivery into a single forward. The seven-year audit requirement was satisfied by retaining the outbox in cold storage instead of a 7-day prune. Duplicate settlements went to zero, and the next broker restart – which previously meant an incident – passed unnoticed, because at-least-once delivery plus inbox dedup is exactly what the pattern is built to survive.
Verify
Prove the pattern works before you trust it in production:
- Atomicity. Kill the producer process between the business save and commit (inject a fault). Confirm neither the row nor the outbox entry exists – the transaction rolled back cleanly.
- At-least-once relay. Crash the relay worker right after
ProduceAsyncbut beforeMarkPublishedAsync. On restart, confirm the event is republished and no event is lost. - Exactly-once processing. Manually re-deliver the same
eventIdto a consumer 10 times. Confirm exactly one inbox row, one side effect, and 9 fast no-op acks. - Ordering. Emit 1,000 events for a single
aggregate_idand assert the consumer observes them increated_atorder despite running multiple consumer threads. - Replay. Run the Step 6 replay query over a past window into a fresh consumer; confirm correct state with no duplicates (inbox absorbs prior processing).
- Backlog alarm. Stop the relay and confirm the backlog query and your alert fire within the SLO; restart and confirm the backlog drains to zero.