Architecture Multi-Cloud

Building the Transactional Outbox and Inbox Pattern for Exactly-Once Event Publishing

Every distributed system I have reviewed that “loses events” has the same root cause, and it is almost never the broker. It is the assumption that a service can write to its database and publish to Kafka or Service Bus in one atomic step. It cannot. The database commit and the broker publish are two independent network operations, and any failure between them either loses the event or duplicates it. This article builds the fix end to end: a transactional outbox so producers never lose an event, a change-data-capture relay so events ship reliably, and a consumer inbox so handlers process each event exactly once even when the broker redelivers.

Step 1 – Understand the dual-write problem

Consider an order service that must persist an order and publish OrderPlaced. The naive code does two writes:

// BROKEN: dual write with no atomicity
await _db.SaveOrderAsync(order);          // 1. commits to Postgres
await _producer.PublishAsync(orderPlaced); // 2. publishes to Kafka

There is no atomicity spanning these two systems. Walk the failure modes:

You cannot solve this with a distributed transaction. Two-phase commit (XA) across a relational database and Kafka is not generally supported, couples your services to a transaction coordinator, and holds locks across network boundaries. The industry abandoned 2PC for exactly these reasons. The outbox sidesteps it entirely by making the event part of the same local transaction as the business data.

The core insight: turn the dual write into a single write to one transactional resource (your database), then relay asynchronously. That converts an unsolvable atomicity problem into a solvable delivery problem.

Step 2 – Design the outbox table, envelope, and ordering keys

The outbox is a table in the same database and same schema as your business data, so a single ACID transaction covers both. Here is a Postgres schema:

CREATE TABLE outbox (
    id              UUID         PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate_type  TEXT         NOT NULL,        -- e.g. 'order'
    aggregate_id    TEXT         NOT NULL,        -- routing/ordering key
    event_type      TEXT         NOT NULL,        -- e.g. 'OrderPlaced'
    payload         JSONB        NOT NULL,        -- the event envelope
    headers         JSONB        NOT NULL DEFAULT '{}',
    created_at      TIMESTAMPTZ  NOT NULL DEFAULT now(),
    -- only used by the polling publisher (Step 3a); CDC ignores it
    published_at    TIMESTAMPTZ  NULL
);

-- Polling publisher reads the unpublished backlog in creation order.
CREATE INDEX idx_outbox_unpublished
    ON outbox (created_at)
    WHERE published_at IS NULL;

aggregate_id is the single most important column. It is both your partition key (so all events for one order land on the same partition and stay ordered) and your idempotency anchor downstream.

The payload is a self-describing envelope, never a raw domain object. Version it from day one, because consumers will outlive your current schema:

{
  "eventId": "0f7c0b2e-2b1a-4f9e-9b7e-2c8a1d3f4a5b",
  "eventType": "OrderPlaced",
  "eventVersion": 1,
  "aggregateType": "order",
  "aggregateId": "ORD-10042",
  "occurredAt": "2026-06-08T09:14:32.118Z",
  "traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
  "data": {
    "orderId": "ORD-10042",
    "customerId": "CUST-77",
    "totalCents": 14999,
    "currency": "EUR"
  }
}

eventId is the deduplication token the inbox keys on. traceId carries W3C Trace Context for end-to-end correlation (Step 7).

The producer write becomes one transaction with no broker call at all:

await using var tx = await _db.BeginTransactionAsync();
await _db.SaveOrderAsync(order, tx);

var envelope = OutboxEnvelope.From(new OrderPlaced(order));
await _db.InsertOutboxAsync(new OutboxRow
{
    AggregateType = "order",
    AggregateId   = order.Id,          // partition + ordering key
    EventType     = "OrderPlaced",
    Payload       = envelope
}, tx);

await tx.CommitAsync(); // order + event commit atomically, or neither does

If the commit fails, neither the order nor the event exists. If it succeeds, both are durable. The dual write is gone.

Step 3 – Relay events out of the outbox

The relay moves rows from the outbox to the broker. There are two production-grade approaches. Pick based on your latency budget and operational appetite.

3a. Polling publisher

A background worker polls for unpublished rows, publishes them, and marks them done. Use FOR UPDATE SKIP LOCKED so multiple relay instances can run concurrently without double-publishing or blocking each other:

-- Each relay worker claims a disjoint batch.
SELECT id, aggregate_id, event_type, payload, headers
FROM outbox
WHERE published_at IS NULL
ORDER BY created_at
FOR UPDATE SKIP LOCKED
LIMIT 100;
var batch = await _db.ClaimUnpublishedBatchAsync(100);
foreach (var row in batch)
{
    await _producer.ProduceAsync(
        topic: $"{row.AggregateType}.events",
        key:   row.AggregateId,        // partitioning key
        value: row.Payload);

    await _db.MarkPublishedAsync(row.Id);
}

This is at-least-once: if the worker crashes after ProduceAsync but before MarkPublishedAsync, the row is reclaimed and republished on restart. Duplicates are expected and handled by the inbox. Polling is simple and has no external dependency, but it trades latency (poll interval) and adds read load against your primary.

3b. CDC with Debezium (log tailing)

Change-data-capture tails the database’s write-ahead log instead of querying the table. There is no polling, no published_at update, and near-real-time latency. Debezium’s outbox event router SMT reshapes each inserted outbox row into a properly-keyed broker message. With Postgres, enable logical replication first:

# postgresql.conf
wal_level = logical
max_replication_slots = 4
max_wal_senders = 4

Then register the connector against Kafka Connect:

{
  "name": "order-outbox-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "orders-db",
    "database.port": "5432",
    "database.user": "debezium",
    "database.password": "${file:/secrets/db.properties:password}",
    "database.dbname": "orders",
    "topic.prefix": "ordersvc",
    "table.include.list": "public.outbox",
    "plugin.name": "pgoutput",
    "transforms": "outbox",
    "transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
    "transforms.outbox.route.by.field": "aggregate_type",
    "transforms.outbox.table.field.event.key": "aggregate_id",
    "transforms.outbox.table.field.event.payload": "payload",
    "transforms.outbox.route.topic.replacement": "${routedByValue}.events"
  }
}

The router uses aggregate_id as the Kafka message key (preserving per-aggregate ordering) and routes to a topic per aggregate_type. Because Debezium reads the WAL, it captures every committed insert exactly as the database serialized it – you get at-least-once delivery with the database log as the durable source of truth, and connector offsets survive restarts.

Rule of thumb: start with the polling publisher unless you already operate Kafka Connect. CDC is superior at scale and latency, but Debezium plus a replication slot is real infrastructure to own. A neglected, lagging replication slot will pin WAL and eventually fill your disk – monitor it (Step 7).

Step 4 – Build the consumer inbox for exactly-once processing

The broker gives you at-least-once delivery, which means duplicates are guaranteed, not hypothetical. “Exactly-once processing” is achieved on the consumer by making handling idempotent: record every processed eventId and skip repeats. The inbox table is that ledger, and – critically – it lives in the consumer’s own database so the dedup check and the business update commit in one transaction.

CREATE TABLE inbox (
    event_id      UUID         PRIMARY KEY,   -- from envelope.eventId
    consumer      TEXT         NOT NULL,      -- which logical consumer
    processed_at  TIMESTAMPTZ  NOT NULL DEFAULT now()
);

The handler attempts to claim the event by inserting its id. The primary key makes the claim atomic and the unique-violation tells you it is a duplicate:

await using var tx = await _db.BeginTransactionAsync();

var claimed = await _db.TryInsertInboxAsync(envelope.EventId, "billing", tx);
if (!claimed)
{
    await tx.RollbackAsync();
    ack();                      // already processed -> ack and move on
    return;
}

await _db.ApplyBillingSideEffectAsync(envelope.Data, tx); // business work
await tx.CommitAsync();         // inbox row + side effect commit together
ack();

TryInsertInboxAsync runs INSERT ... ON CONFLICT (event_id) DO NOTHING and returns whether a row was inserted. Because the inbox insert and the business side effect share one transaction, you can never apply the side effect without recording the dedup key, and you can never record the key without applying the side effect. A redelivery after the commit hits the primary key, is recognized as a duplicate, and is acked without re-running the work.

This is “effectively-once” end to end. The producer is at-least-once via the outbox, the broker is at-least-once, and the consumer collapses duplicates via the inbox. True broker-level exactly-once (Kafka transactions / processing.guarantee=exactly_once_v2) only covers consume-transform-produce within Kafka – it does not extend to your external database. The inbox is what makes the side effect idempotent, so prefer it for any handler that touches an external system.

Step 5 – Preserve per-aggregate ordering and partition keys

Ordering is only ever guaranteed per partition, and only when one aggregate maps to one partition. Always set the message key to aggregate_id:

// Kafka: same key -> same partition -> ordered per aggregate.
await _producer.ProduceAsync("order.events",
    new Message<string, string> { Key = order.Id, Value = payload });

For Azure Service Bus, the equivalent is SessionId. Enable sessions on the queue/subscription and stamp the aggregate id:

var message = new ServiceBusMessage(payload)
{
    SessionId   = order.Id,          // FIFO within a session
    MessageId   = envelope.EventId,  // broker-side dedup hint
    ContentType = "application/json"
};
await _sender.SendMessageAsync(message);

Two non-obvious traps:

Repartitioning later is painful: changing partition count rehashes keys and breaks the “one aggregate, one partition” invariant for in-flight ordering. Size partitions generously up front.

Step 6 – Handle poison messages, dead-letters, and replay

Some messages will never succeed – a schema your consumer cannot parse, a referenced entity that was hard-deleted. Retrying them forever blocks the partition and starves healthy traffic. Bound the retries, then dead-letter.

Service Bus has a native dead-letter sub-queue. Cap delivery attempts and the broker moves poison messages aside automatically:

az servicebus queue update \
  --resource-group rg-messaging \
  --namespace-name sb-orders-prod \
  --name billing-commands \
  --max-delivery-count 5

After 5 failed deliveries the message lands in billing-commands/$DeadLetterQueue with the failure reason in its properties, where an operator or a remediation job can inspect and replay it. Kafka has no built-in DLQ, so the convention is an explicit dead-letter topic after a retry budget:

if (attempt >= maxAttempts)
{
    await _producer.ProduceAsync("billing.events.DLT",
        new Message<string, string> { Key = key, Value = payload });
    consumer.Commit(result);   // advance past the poison message
    return;
}

The outbox gives you a second superpower here: replay. Because the outbox retains the original events, recovering from a botched consumer deploy does not require the broker to still hold the messages. Re-emit a window from the source of truth:

-- Re-publish everything an aggregate emitted in an incident window.
SELECT id, aggregate_id, event_type, payload
FROM outbox
WHERE aggregate_type = 'order'
  AND created_at BETWEEN '2026-06-07 00:00Z' AND '2026-06-07 06:00Z'
ORDER BY created_at;

Feed those rows back through the relay. Consumers that already processed them no-op via the inbox, so replay is safe to run broadly – the inbox makes redelivery idempotent by construction.

Step 7 – Observe publish lag, relay backlog, and correlation

You cannot operate this pattern blind. Three signals matter most.

Relay backlog – how many events are stuck in the outbox. A growing backlog means the relay is down or behind:

SELECT count(*) AS unpublished,
       now() - min(created_at) AS oldest_age
FROM outbox
WHERE published_at IS NULL;   -- for CDC, alert on connector lag instead

Replication slot lag (CDC only) – a Debezium slot that stops advancing pins WAL and threatens the database. Watch retained bytes:

SELECT slot_name,
       pg_size_pretty(
         pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)
       ) AS retained_wal
FROM pg_replication_slots
WHERE slot_type = 'logical';

Consumer lag – standard Kafka offset lag per consumer group, scraped by Prometheus / Kafka Lag Exporter and alerted in your platform. In Azure, query Service Bus active and dead-letter message counts. A KQL alert over the metric:

AzureMetrics
| where ResourceProvider == "MICROSOFT.SERVICEBUS"
| where MetricName == "DeadletteredMessages"
| summarize maxDLQ = max(Maximum) by Resource, bin(TimeGenerated, 5m)
| where maxDLQ > 0

For end-to-end correlation, the traceId in the envelope (Step 2) is the thread that stitches producer span -> relay -> broker -> consumer span into one distributed trace. Restore it on the consumer side so the handler’s spans link back to the originating request:

var parentCtx = Propagators.DefaultTextMapPropagator.Extract(
    default, envelope.Headers, (h, k) => h.TryGetValue(k, out var v) ? new[]{v} : Array.Empty<string>());
using var activity = _activitySource.StartActivity(
    "process OrderPlaced", ActivityKind.Consumer, parentCtx.ActivityContext);

Now a single trace shows the order commit, the relay hop, and every downstream handler – including publish lag as the gap between producer and consumer spans.

Step 8 – Operational runbook

The outbox grows forever unless you prune it. Decoupling cleanup from publish matters: never delete on publish, because you want a retention window for replay and audit. Run a separate retention job:

-- Nightly: delete published events older than the replay window.
DELETE FROM outbox
WHERE published_at IS NOT NULL
  AND published_at < now() - interval '7 days';

For CDC (where published_at is never set), retain by created_at and tune the window to cover your worst-case incident replay. Batch the delete (LIMIT in a loop) on hot tables so you never hold a long lock or bloat autovacuum.

Failure recovery checklist for an on-call engineer:

Symptom Likely cause Action
Outbox backlog climbing Relay worker / connector down Restart relay; confirm broker reachable; backlog drains automatically (at-least-once)
retained_wal growing Debezium slot stalled or removed Restart connector; if slot was dropped, recreate it before WAL fills disk
Duplicate side effects Inbox bypassed or wrong dedup key Verify handler keys on eventId and inbox insert shares the business transaction
Out-of-order processing Key not set to aggregate_id, or idempotence off Fix producer key; set enable.idempotence=true / Service Bus SessionId
Messages in DLQ Poison payload / missing dependency Inspect dead-letter reason; fix consumer; replay from outbox or DLQ

Enterprise scenario

A payments platform team I worked with ran a ledger service on Postgres and published PaymentCaptured to Kafka with a plain dual write. During a routine broker rolling restart, the producer’s publish calls timed out after the ledger had committed. The application’s retry then succeeded – but for thousands of payments the first attempt had also landed before the timeout fired client-side. Downstream the reconciliation service double-counted captures, and finance opened a severity-1 when the daily settlement file overstated revenue by a six-figure sum.

The constraint that shaped the fix: reconciliation was a third-party system they could not modify to be idempotent, and regulators required a replayable audit trail of every emitted event for seven years. They could not simply “turn on Kafka exactly-once” – it would not have covered the external reconciler, which was the actual point of duplication.

They moved to outbox plus CDC. The ledger write and the PaymentCaptured envelope now commit in one Postgres transaction; Debezium tails the WAL and keys each message on payment_id. For the unmodifiable reconciler, they inserted a thin idempotent inbox shim in front of it, keyed on the envelope eventId:

INSERT INTO reconciler_inbox (event_id, processed_at)
VALUES (@eventId, now())
ON CONFLICT (event_id) DO NOTHING;
-- forward to reconciler only when this INSERT actually inserted a row

The shim forwards a payment to reconciliation only when the insert affected a row, collapsing every Kafka redelivery into a single forward. The seven-year audit requirement was satisfied by retaining the outbox in cold storage instead of a 7-day prune. Duplicate settlements went to zero, and the next broker restart – which previously meant an incident – passed unnoticed, because at-least-once delivery plus inbox dedup is exactly what the pattern is built to survive.

Verify

Prove the pattern works before you trust it in production:

  1. Atomicity. Kill the producer process between the business save and commit (inject a fault). Confirm neither the row nor the outbox entry exists – the transaction rolled back cleanly.
  2. At-least-once relay. Crash the relay worker right after ProduceAsync but before MarkPublishedAsync. On restart, confirm the event is republished and no event is lost.
  3. Exactly-once processing. Manually re-deliver the same eventId to a consumer 10 times. Confirm exactly one inbox row, one side effect, and 9 fast no-op acks.
  4. Ordering. Emit 1,000 events for a single aggregate_id and assert the consumer observes them in created_at order despite running multiple consumer threads.
  5. Replay. Run the Step 6 replay query over a past window into a fresh consumer; confirm correct state with no duplicates (inbox absorbs prior processing).
  6. Backlog alarm. Stop the relay and confirm the backlog query and your alert fire within the SLO; restart and confirm the backlog drains to zero.

Checklist

event-drivenmicroservicesmessagingdata-consistencyoutbox-pattern

Comments

Keep Reading