Designing Idempotent APIs and Deduplication for Reliable Distributed Systems

Every reliable distributed system I have reviewed eventually duplicates a write, and the team is always surprised. They shouldn’t be. The moment you put a network between a client and your API – or a broker between a producer and a consumer – retries become mandatory, and retries make duplicate delivery a certainty, not an edge case. A client times out at 10 seconds, retries, and your server processes both requests. A consumer crashes after handling a message but before acking, and the broker redelivers. The fix is not “retry less.” It is to make the operation idempotent: executing it once or five times produces the same observable result and the same side effects. This article builds that property end to end – idempotency keys, a deduplication store, concurrency control, conditional writes, and consumer-side ledgers – with the failure modes that bite teams in production.

Step 1 – Accept that at-least-once is the only honest contract

Start with the uncomfortable truth: there is no exactly-once delivery over an unreliable network. The Two Generals problem guarantees it. If a client sends a request and the response is lost, the client cannot distinguish “the server never received it” from “the server processed it but the ack vanished.” Its only safe move is to retry.

End-to-end architecture for idempotent writes and consumer deduplication: a client mints one idempotency key reused across retries, a synchronous HTTP write path with an atomic idempotency-store claim and conditional writes, a fail-closed external card processor keyed deterministically, and async consumers that dedup on a processed-message ledger written in the same transaction as the side effect

The retry that causes the duplicate looks like this – the client cannot tell a lost response from an unprocessed request, so it resends and the server charges twice:

Client                         Server
  | --- POST /payments ----------> |  (1) processed, charged $50
  |                                |
  |     X  response lost           |  (2) ack never arrives
  |                                |
  | --- POST /payments (retry) --> |  (3) processed AGAIN, charged $50 again

So delivery is at-least-once whether you like it or not. The pattern is: at-least-once delivery + idempotent processing = effectively-once results. You cannot move the duplication; you can only make it harmless. Walk the sources of retries so you respect all of them:

Client libraries. SDKs like the AWS SDK, gRPC, and most HTTP clients retry on connection resets and 5xx by default.
Proxies and gateways. Load balancers and API gateways reissue requests on timeouts.
Message brokers. Kafka, RabbitMQ, Azure Service Bus, and SQS all redeliver when an ack/commit is missing. Their native contract is at-least-once.
Humans. A user double-clicks “Pay.” A support engineer reruns a batch job.

Do not try to make a non-idempotent operation safe by tightening timeouts or reducing retry counts. That only changes the probability of a duplicate, never the possibility. Design for the duplicate that will happen, because at scale “one in a million” is several times a day.

Step 2 – Decide which operations even need an idempotency key

Not everything does. Map your operations to HTTP semantics first, because the method already tells you the contract.

Method	Idempotent by spec?	Notes
`GET`, `HEAD`	Yes	Read-only. Safe to retry freely.
`PUT`	Yes	Full replacement – same body, same result. Use natural idempotency (Step 6).
`DELETE`	Yes	Deleting twice leaves the resource deleted. Return 204 on the second call.
`POST`	No	Creates a new resource each call. This is where you need keys.
`PATCH`	No (usually)	Partial mutations like “increment balance” are not naturally idempotent.

The hard cases are POST and non-idempotent PATCH: “create an order,” “charge a card,” “increment a counter.” Those carry a client-generated idempotency key. The well-known reference is Stripe’s Idempotency-Key header, which is now a de facto standard and the subject of an IETF draft (draft-ietf-httpapi-idempotency-key-header). We will model on it.

Step 3 – Generate the key on the client, scope it on the server

The key must originate on the client, before the first attempt, and be reused for every retry of that same logical operation. If the server generates it, a retried request gets a new key and you have learned nothing.

# Client generates a UUIDv4 once, reuses it across retries of the SAME intent.
IDEM_KEY=$(uuidgen)

curl -sS -X POST https://api.kloudvin.io/v1/payments \
  -H "Authorization: Bearer $TOKEN" \
  -H "Idempotency-Key: $IDEM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"amount": 5000, "currency": "usd", "order_id": "ORD-10042"}'

Two server-side rules make this robust:

Scope the key. An idempotency record is unique per (authenticated_principal, route, idempotency_key). Never globally. A leaked or guessed key from tenant A must never collide with tenant B, and the same key on POST /payments versus POST /refunds are different operations.
Fingerprint the request. Store a hash of the canonical request body alongside the key. If a client reuses a key but sends a different body, that is a client bug – reject it with 422 Unprocessable Entity rather than silently returning the first response. This catches the nasty class where a key gets reused for two genuinely different charges.

import hashlib, json

def request_fingerprint(method: str, path: str, body: dict) -> str:
    # Canonicalize: sorted keys, no insignificant whitespace, stable separators.
    canonical = json.dumps(body, sort_keys=True, separators=(",", ":"))
    material = f"{method}\n{path}\n{canonical}".encode("utf-8")
    return hashlib.sha256(material).hexdigest()

Step 4 – Build the idempotency store with response caching and TTLs

The store records, for each scoped key: the request fingerprint, a state, the cached response, and an expiry. When a request arrives, you either start fresh, return the cached response, or signal “in progress.”

CREATE TABLE idempotency_keys (
    principal_id     TEXT        NOT NULL,
    route            TEXT        NOT NULL,
    idem_key         TEXT        NOT NULL,
    fingerprint      TEXT        NOT NULL,          -- sha256 of canonical request
    state            TEXT        NOT NULL,          -- 'in_progress' | 'completed'
    response_status  INT,                           -- cached HTTP status
    response_body    JSONB,                         -- cached response payload
    created_at       TIMESTAMPTZ NOT NULL DEFAULT now(),
    expires_at       TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (principal_id, route, idem_key)
);

-- Reap expired keys so the table does not grow unbounded.
CREATE INDEX idx_idem_expiry ON idempotency_keys (expires_at);

The control flow, expressed as a single atomic insert-or-detect:

from datetime import datetime, timedelta, timezone

TTL = timedelta(hours=24)

def handle_idempotent(conn, principal, route, key, fingerprint, do_work):
    now = datetime.now(timezone.utc)
    # Atomic claim: succeeds only if no row exists for this scoped key.
    row = conn.execute("""
        INSERT INTO idempotency_keys
            (principal_id, route, idem_key, fingerprint, state, expires_at)
        VALUES (%s, %s, %s, %s, 'in_progress', %s)
        ON CONFLICT (principal_id, route, idem_key) DO NOTHING
        RETURNING state
    """, (principal, route, key, fingerprint, now + TTL)).fetchone()

    if row is None:
        # A record already exists. Read it and branch.
        existing = conn.execute("""
            SELECT fingerprint, state, response_status, response_body
            FROM idempotency_keys
            WHERE principal_id=%s AND route=%s AND idem_key=%s
        """, (principal, route, key)).fetchone()

        if existing.fingerprint != fingerprint:
            return 422, {"error": "idempotency_key_reused_with_different_body"}
        if existing.state == "completed":
            # The happy replay: return the original response verbatim.
            return existing.response_status, existing.response_body
        # state == 'in_progress' -> a concurrent duplicate is mid-flight.
        return 409, {"error": "request_in_progress", "retry_after": 1}

    # We won the claim. Execute the real work, then persist the response.
    status, body = do_work()
    conn.execute("""
        UPDATE idempotency_keys
        SET state='completed', response_status=%s, response_body=%s
        WHERE principal_id=%s AND route=%s AND idem_key=%s
    """, (status, json.dumps(body), principal, route, key))
    return status, body

TTL choice matters. Stripe retains idempotency results for 24 hours, which comfortably outlives any reasonable client retry window while bounding storage. Pick a TTL longer than your longest client retry budget (including exponential backoff and any human-initiated retries), but short enough that the table stays small. 24 hours is a sane default; go to 7 days only if you have offline batch retries.

Step 5 – Handle concurrent duplicates with locks and conflict responses

Step 4 already closes the most dangerous race using the database primary key as a lock: two simultaneous requests with the same key both run the INSERT ... ON CONFLICT DO NOTHING, but only one row gets created, so only one caller proceeds. The loser sees in_progress and gets a 409.

That 409 is deliberate. The alternative – blocking the second request until the first finishes, then returning its cached response – is tempting but dangerous: if the first request hangs, you tie up a connection and can cascade into thread-pool exhaustion. Returning 409 Conflict with a Retry-After keeps the server responsive and pushes the wait to the client, where backoff already lives.

If you front the store with Redis for latency, use an atomic SET NX with an expiry as the lock, and fall back to the durable store as the source of truth:

# Atomic lock acquisition: set only if absent, auto-expire to avoid deadlock
# if the holder crashes. PX is the lock TTL in milliseconds.
redis-cli SET "idem:{tenant-7}:payments:8f3c...e1" "in_progress" NX PX 30000
# -> "OK"  means you own the operation
# -> (nil) means someone else is mid-flight; return 409

A crashed worker holding a non-expiring lock is the classic deadlock. Always set an expiry on the lock and make it longer than your worst-case processing time, but shorter than your TTL. Redis locks are an optimization for the common path, never the durability guarantee – the database row is what survives a Redis flush.

Step 6 – Prefer natural idempotency: conditional writes and upserts

Idempotency keys are necessary when an operation has no natural identity. But many “POST” operations are secretly PUTs in disguise, and you can get idempotency for free by writing against a unique constraint or a version precondition. Reach for this first; it has no extra store to maintain.

Unique constraints (upsert). If the resource has a natural or client-supplied identity, let the database enforce single-creation:

INSERT INTO orders (order_id, customer_id, total_cents, status)
VALUES ('ORD-10042', 'CUST-88', 5000, 'placed')
ON CONFLICT (order_id) DO NOTHING
RETURNING order_id;
-- Second call returns no row -> already created -> respond 200 with existing order.

Conditional writes (compare-and-set). For updates, gate the mutation on the current state so a replay is a no-op. This is ETag/If-Match at the HTTP layer and optimistic concurrency at the data layer:

-- Move to 'shipped' ONLY if currently 'paid'. A duplicate that arrives after
-- the first success matches zero rows and changes nothing.
UPDATE orders
SET status='shipped', version = version + 1
WHERE order_id='ORD-10042' AND status='paid' AND version=7;

Cloud datastores expose the same primitive natively. DynamoDB conditional expressions and Azure Cosmos DB optimistic concurrency via If-Match on the _etag let you make a write succeed at most once without a separate dedup table:

aws dynamodb put-item \
  --table-name Orders \
  --item '{"order_id":{"S":"ORD-10042"},"status":{"S":"placed"}}' \
  --condition-expression "attribute_not_exists(order_id)"
# Returns ConditionalCheckFailedException on a duplicate -> treat as success.

The lesson: a PATCH /balance that says “increment by 100” is not idempotent, but “set version 7 to 8 with balance 600” is. Whenever you can express the write as a conditional state transition, do that instead of an additive one – it is idempotent by construction.

Step 7 – Deduplicate message consumers with a processed-message ledger

Brokers deliver at-least-once, so the consumer is where you enforce single processing. The pattern mirrors the HTTP store: a processed-message ledger keyed by a stable message identity, written in the same transaction as the side effect.

The non-negotiable detail is that the ledger insert and the business write commit atomically. If you process the message, then separately mark it processed, a crash in between gives you reprocessing.

BEGIN;

-- 1. Claim the message id. If it already exists, this violates the PK and
--    the whole transaction rolls back -> we know it is a duplicate, skip it.
INSERT INTO processed_messages (message_id, consumer_group, processed_at)
VALUES ('msg-7f3c-...-e1', 'inventory-svc', now());

-- 2. The actual side effect, in the SAME transaction.
UPDATE inventory SET reserved = reserved + 5 WHERE sku = 'SKU-99';

COMMIT;
-- Only after COMMIT do we ack/commit the offset to the broker.

Two rules govern the message identity:

Use a stable, producer-assigned id, carried in the message envelope – never the broker’s delivery tag or offset, which changes on redelivery. A UUID minted by the producer (or the business aggregate id for ordered streams) is correct.
Order the operations as: process-in-transaction, commit, then ack. Ack last. If you ack before committing and then crash, the message is gone but the work never landed – the worst outcome.

For high-throughput streams where a relational ledger is too slow, Kafka offers the transactional/idempotent producer (enable.idempotence=true, transactional.id) to dedup on the produce side, and consumers use read_committed isolation. That handles producer retries within Kafka, but cross-system side effects (writing to your database) still need the ledger above – Kafka’s exactly-once is internal to Kafka.

# Producer side: dedup retries and enable transactions within Kafka.
enable.idempotence=true
transactional.id=order-producer-1
acks=all
# Consumer side: only see messages from committed transactions.
isolation.level=read_committed

Step 8 – Carry idempotency end to end through a payment/order flow

The keys must propagate, or each hop reintroduces duplication. Tie the whole chain to a single correlation id plus per-operation idempotency keys.

POST /orders            Idempotency-Key: K1   correlation_id: C1
   -> creates order (upsert on order_id)         [idempotent: Step 6]
   -> publishes OrderPlaced { message_id: M1, correlation_id: C1 }

payment-svc consumes M1                          [ledger dedup: Step 7]
   -> calls Stripe with Idempotency-Key derived from order_id + attempt
   -> publishes PaymentCaptured { message_id: M2, correlation_id: C1 }

fulfilment-svc consumes M2                        [ledger dedup: Step 7]
   -> conditional UPDATE status='paid'->'shipped' [idempotent: Step 6]

Derive the downstream payment idempotency key deterministically from durable business data, so a replay of OrderPlaced regenerates the same key and Stripe dedups the charge:

# Deterministic: the same order always yields the same payment idem key,
# so a redelivered OrderPlaced cannot double-charge.
payment_idem_key = f"pay:{order_id}:{payment_attempt}"

The anti-pattern is generating a fresh UUID at each hop. If payment-svc mints a random key every time it consumes OrderPlaced, a redelivery produces a new key and Stripe sees a brand-new charge. Determinism from durable inputs is what makes idempotency survive redelivery. The correlation_id (C1) is for tracing the whole flow; the per-step idempotency keys are what enforce single execution.

Step 9 – Engineer for the failure modes that actually bite

The pattern fails in specific, predictable ways. Close each one explicitly.

Partial writes. The side effect commits but the idempotency record does not (or vice versa). Mitigation: write the idempotency/ledger row in the same transaction as the side effect (Step 7). For the HTTP store where the work touches an external system, persist the response only after the work succeeds, and make the work itself idempotent (Step 6) so a re-run is safe.
Key collisions. Two distinct operations end up with the same key (client bug, weak key generation). Mitigation: fingerprint the body (Step 3) and reject reuse-with-different-body via 422; scope keys per principal and route.
Store outages. Your Redis or dedup database is down. Decide the policy before the incident: fail closed for money-movement (reject the write rather than risk a double-charge) and fail open only for operations that are naturally idempotent at the data layer (Step 6), where a duplicate is harmless. Make this an explicit, documented decision per endpoint, not an accident of which exception handler fires.
TTL expiry mid-retry. A client retries after the record expired; you treat it as new and double-execute. Mitigation: set the TTL strictly longer than the maximum client retry budget, and lean on natural idempotency (unique constraints) as the backstop that outlives any TTL.
Non-deterministic responses. You cache a response containing a freshly generated id or timestamp, and the replay returns stale data. That is usually fine and even desirable (the replay should mirror the original), but be deliberate: never cache responses that embed short-lived secrets or one-time tokens.

Verify

Confirm the property holds under the conditions that actually produce duplicates.

# 1. Replay safety: fire the SAME key twice; expect one side effect, identical body.
KEY=$(uuidgen)
for i in 1 2; do
  curl -sS -o /tmp/resp_$i.json -w "%{http_code}\n" \
    -X POST https://api.kloudvin.io/v1/payments \
    -H "Idempotency-Key: $KEY" \
    -H "Content-Type: application/json" \
    -d '{"amount":5000,"currency":"usd","order_id":"ORD-VERIFY"}'
done
diff /tmp/resp_1.json /tmp/resp_2.json && echo "IDEMPOTENT: identical responses"

# 2. Body-mismatch protection: reuse the key with a different amount -> expect 422.
curl -sS -o /dev/null -w "%{http_code}\n" -X POST https://api.kloudvin.io/v1/payments \
  -H "Idempotency-Key: $KEY" -H "Content-Type: application/json" \
  -d '{"amount":9999,"currency":"usd","order_id":"ORD-VERIFY"}'   # -> 422

# 3. Concurrency: fire 20 parallel requests with one key; exactly one wins.
seq 20 | xargs -P 20 -I{} curl -sS -o /dev/null -w "%{http_code}\n" \
  -X POST https://api.kloudvin.io/v1/payments \
  -H "Idempotency-Key: $KEY" -H "Content-Type: application/json" \
  -d '{"amount":5000,"currency":"usd","order_id":"ORD-VERIFY"}' | sort | uniq -c
# Expect: one 200/201, the rest 200 (cached) or 409 (in_progress). Never two charges.

Then assert the side effect count directly – the response codes lie if the handler is buggy:

-- The ground truth: exactly one charge for the verification order.
SELECT count(*) FROM payments WHERE order_id = 'ORD-VERIFY';  -- must be 1

For consumers, the equivalent verification is to redeliver the same message_id (replay the partition or requeue the message) and confirm the ledger blocks the second execution: SELECT count(*) FROM processed_messages WHERE message_id = 'msg-...' returns 1, and the side-effect table is unchanged.

Enterprise scenario

A payments platform team I worked with ran a checkout API behind an API gateway with a 29-second integration timeout. Under a traffic spike, their downstream card processor slowed to 30+ seconds per call. The gateway timed out and retried; the original request was still processing and also succeeded. Customers were charged two and three times. The post-incident metric was brutal: roughly 0.4% of high-value transactions during the spike were double-charged, and reconciliation took the finance team a week.

The constraint was that they could not lower the gateway timeout (legitimate slow calls existed) and could not make the third-party processor synchronous-and-fast. So they could not stop the retry; they had to make it harmless. The fix had three parts. First, they made the client mint an Idempotency-Key per checkout intent and reuse it across gateway retries. Second, they added a Postgres idempotency store with the INSERT ... ON CONFLICT DO NOTHING claim (Step 4) so concurrent duplicates serialized on the primary key – the in-flight retry got a 409, not a second charge. Third, and most importantly, they derived the processor’s idempotency key deterministically from the order id, so even a fully replayed request hit the processor with the same key and the processor itself deduped:

# The deterministic key that finally stopped the double-charges.
def processor_idem_key(order_id: str, attempt: int) -> str:
    return f"chk:{order_id}:{attempt}"   # same order + attempt -> same key, always

The key engineering judgment was the fail-closed policy: when the idempotency store was unreachable, the checkout endpoint returned 503 rather than charging blind. Losing a sale is recoverable; double-charging a customer and refunding under a chargeback dispute is not. After the change, double-charge incidents during the next comparable spike went to zero, verified by the count(*) = 1 assertion baked into their synthetic canary.

Designing Idempotent APIs and Deduplication for Reliable Distributed Systems

Step 1 – Accept that at-least-once is the only honest contract

Step 2 – Decide which operations even need an idempotency key

Step 3 – Generate the key on the client, scope it on the server

Step 4 – Build the idempotency store with response caching and TTLs

Step 5 – Handle concurrent duplicates with locks and conflict responses

Step 6 – Prefer natural idempotency: conditional writes and upserts

Step 7 – Deduplicate message consumers with a processed-message ledger

Step 8 – Carry idempotency end to end through a payment/order flow

Step 9 – Engineer for the failure modes that actually bite

Verify

Enterprise scenario

Checklist

Written by Vinod

Comments

Keep Reading

Secure Multi-Cloud Landing Zone and Enterprise Architecture for Media & Streaming: A Complete Azure + AWS Design

Secure Multi-Cloud Landing Zone and Enterprise Architecture for Healthcare: A Complete Azure + AWS Design

Zero-Downtime Multi-Cloud Landing Zone for a Universal Bank — Enterprise Reference Architecture