Azure Event Hubs at Scale: Partitioning, Capture, Kafka Endpoint, and Stream Analytics Processing

Event Hubs is the part of an Azure data platform that breaks in the least obvious way. A queue that overflows pages you; an ingestion pipeline that silently caps at its throughput-unit ceiling just adds latency, then drops the oldest events off the retention window while every dashboard stays green. The mistakes that cause this are made on day one — wrong partition count, a checkpoint store nobody load-tested, a consumer that commits offsets it never processed — and partition count in particular is immutable on the standard tier. You live with the choice.

This guide builds a high-throughput pipeline the way it survives production: capacity sized against real numbers, partitions chosen for parallelism without lying about ordering, consumers that checkpoint correctly, Capture archiving every event to ADLS for replay, the Kafka endpoint fronting existing producers, and a Stream Analytics job doing windowed aggregation with exactly-once output. Examples use the az eventhubs CLI and the modern Azure.Messaging.EventHubs SDK family — the supported successor to the old Microsoft.Azure.EventHubs and WindowsAzure.ServiceBus packages.

Tier shapes everything. The Standard tier bills in throughput units (TUs) and caps partitions at creation. Premium and Dedicated bill in processing units (PUs) / capacity units (CUs), allow far higher partition counts, and let you raise partition count after creation. Capture is included in Premium/Dedicated and is a paid add-on on Standard. Decide the tier before you provision anything below.

1. Throughput units, processing units, and auto-inflate

Capacity on the Standard tier is a throughput unit: 1 TU = 1 MB/s or 1000 events/s ingress, and 2 MB/s or 4096 events/s egress, whichever limit you hit first. The egress allowance is double ingress for a reason — every event is typically read by multiple consumer groups, so a single 1 MB/s producer feeding three consumer groups already needs 3 MB/s of egress, i.e. you are egress-bound, not ingress-bound. Size against the larger of your two numbers.

TUs are shared across the entire namespace, not per hub. Ten hubs in one namespace contend for the same TU pool. A noisy hub starves a quiet one. Isolate latency-sensitive workloads into their own namespace.

Auto-inflate raises TUs automatically up to a ceiling as load grows. It never scales down — that is a manual operation or a scheduled job. Treat the ceiling as a cost guardrail, not a target.

RG=rg-eh-telemetry
NS=eh-telemetry-prod          # globally unique
LOC=eastus

az group create -n $RG -l $LOC

# Standard namespace, auto-inflate 2 -> 10 TUs
az eventhubs namespace create -g $RG -n $NS -l $LOC \
  --sku Standard --capacity 2 \
  --enable-auto-inflate true \
  --maximum-throughput-units 10

The Premium/Dedicated unit is the processing unit. PUs are CPU/memory reservations that deliver predictable latency and isolated tenancy; there is no per-second event throttle to reason about the way TUs impose. A rough planning figure is roughly 5–10 MB/s ingress per PU depending on payload size and partition spread, but you validate with a load test — never a brochure number.

Auto-inflate is an ingress trigger. If you are egress-bound (many consumer groups), auto-inflate may not fire even while consumers throttle. Alarm on egress throttling directly (Section 8), not on ingress.

2. Partition count: ordering vs parallelism

A partition is an ordered, append-only log. Two truths follow and they are in tension:

Ordering is per-partition only. Events land in a partition in send order; across partitions there is no global order.
Parallelism is capped by partition count. Within one consumer group, at most one active reader owns a partition. N partitions means at most N concurrent processors. You cannot out-scale your partition count.

So partition count is simultaneously your maximum read parallelism and the granularity at which ordering holds. The decision hinges on the partition key.

Send strategy	Ordering	Parallelism	Use when
No key (round-robin)	None	Best balanced spread	Order does not matter; max throughput
Partition key (hashed)	Per key	Good if keys are high-cardinality	Per-entity ordering (per device, per account)
Explicit partition id	Per partition	Manual, risks hot partitions	Almost never — you are pinning yourself

Always prefer a partition key over an explicit partition id. The key is hashed to a partition, so all events for device-42 stay ordered together, while the hub keeps the spread even — provided your keys are high-cardinality. A key like “region” with five values wastes everything beyond five partitions and creates hot partitions.

Sizing rule of thumb: partitions >= peak TUs (or PU-equivalent throughput), and >= your maximum desired consumer parallelism, with headroom. On Standard, partition count is fixed at creation (1–32 by default, more by request) and cannot change — so over-provision modestly. 32 is a sane default for a hub you expect to grow; it costs nothing extra on Standard and gives you room.

EH=device-telemetry
az eventhubs eventhub create -g $RG --namespace-name $NS -n $EH \
  --partition-count 32 \
  --cleanup-policy Delete \
  --retention-time-in-hours 168     # 7 days (Standard max is 7d; Premium up to 90d)

On Premium/Dedicated you can increase partition count after creation, but only upward, and only with the Delete cleanup policy (not compaction). Existing data is not rebalanced — new partitions start empty and key-to-partition mapping shifts for new sends. Plan it as a migration, not a knob.

3. Consumer groups, checkpoint stores, and the processor client

A consumer group is an independent view over the hub with its own cursor — each group reads the full stream at its own pace. Give every distinct downstream its own group; never share one group across two unrelated apps, or they will steal partition ownership from each other. Standard allows up to 20 consumer groups per hub.

az eventhubs eventhub consumer-group create -g $RG --namespace-name $NS \
  --eventhub-name $EH -n stream-analytics
az eventhubs eventhub consumer-group create -g $RG --namespace-name $NS \
  --eventhub-name $EH -n enrichment-worker

The right consumer abstraction is the EventProcessorClient (or EventHubConsumerClient with a checkpoint store in Python/JS). It does three things you must not hand-roll: distributes partition ownership across instances, load-balances when instances are added or removed, and persists progress to a checkpoint store (a Blob container). Checkpointing is what makes restart resumable — without it, a restart replays from the configured start position.

// Azure.Messaging.EventHubs.Processor — checkpoints to Blob
var storage = new BlobContainerClient(
    new Uri("https://ehcheckpoints.blob.core.windows.net/enrichment"),
    new DefaultAzureCredential());

var processor = new EventProcessorClient(
    storage,
    consumerGroup: "enrichment-worker",
    fullyQualifiedNamespace: "eh-telemetry-prod.servicebus.windows.net",
    eventHubName: "device-telemetry",
    credential: new DefaultAzureCredential());

processor.ProcessEventAsync += async args =>
{
    await HandleAsync(args.Data);          // your idempotent work
    await args.UpdateCheckpointAsync();    // checkpoint AFTER successful processing
};
processor.ProcessErrorAsync += args =>
{
    Log.Error(args.Exception, "partition {P}", args.PartitionId);
    return Task.CompletedTask;
};

await processor.StartProcessingAsync();

Three rules carry the design:

Checkpoint after processing, not before. A checkpoint is “I am done through offset X.” Checkpoint first and a crash loses every in-flight event between the checkpoint and the failure — silent data loss.
Checkpoint in batches, not per event. Every checkpoint is a Blob write. Per-event checkpointing throttles the storage account and dominates latency. Checkpoint every N events or every few seconds. The tradeoff: more replay on restart. Tune N against your idempotency budget.
Make processing idempotent. Event Hubs is at-least-once. After a crash you reprocess from the last checkpoint. Dedup on a business key or design effects to be replay-safe.

4. Event Hubs Capture to ADLS with Avro and replay

Capture automatically writes the raw stream to Blob or ADLS Gen2 as Avro, with no consumer to run and no impact on your TU egress budget. It is the cheapest durable archive you will get and the foundation of replay, late-arriving reprocessing, and audit. Files roll on a size or time window, whichever trips first.

SA=ehcapturelake          # ADLS Gen2 account, hierarchical namespace ON

az eventhubs eventhub update -g $RG --namespace-name $NS -n $EH \
  --enable-capture true \
  --capture-interval 300 \
  --capture-size-limit 314572800 \
  --skip-empty-archives true \
  --destination-name EventHubArchive.AzureBlockBlob \
  --storage-account "/subscriptions/<sub-id>/resourceGroups/$RG/providers/Microsoft.Storage/storageAccounts/$SA" \
  --blob-container capture \
  --archive-name-format '{Namespace}/{EventHub}/{PartitionId}/{Year}/{Month}/{Day}/{Hour}/{Minute}/{Second}'

--capture-interval is seconds (60–900), --capture-size-limit is bytes (10 MB–500 MB). Always set --skip-empty-archives true — otherwise quiet partitions write empty Avro files every interval and you pay for storage, transactions, and downstream scan time on noise.

The archive-name-format must contain {PartitionId}, {Namespace}, and {EventHub} (the time tokens are optional but you want them for partition pruning in queries). Each Avro record carries SequenceNumber, Offset, EnqueuedTimeUtc, Body, and Properties, so the archive is self-describing — you can reconstruct exact stream order per partition from the captured files alone.

Replay is “read the Avro back.” Point Spark, Stream Analytics (reference/blob input), or a backfill job at the container and filter by the time path. Capture is the answer to “a downstream had a bug for six hours, reprocess yesterday” without ever touching live retention.

Capture writes Avro only — there is no JSON or Parquet option. If your lake standard is Parquet, run a downstream conversion (e.g. a Synapse/Databricks job, or Stream Analytics with a Parquet output). Do not try to “fix” the Capture format; you cannot.

5. Kafka-protocol endpoint for existing producers and consumers

Every Standard+ namespace exposes a Kafka endpoint on port 9093 — your existing Kafka producers and consumers connect to Event Hubs as if it were a broker, no code change, just config. A namespace maps to a bootstrap server; an event hub maps to a Kafka topic; partitions and consumer groups map 1:1. Authentication is SASL/PLAIN over TLS, where the username is the literal string $ConnectionString and the password is the namespace connection string.

# Kafka client config pointing at an Event Hubs namespace
bootstrap.servers=eh-telemetry-prod.servicebus.windows.net:9093
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
  username="$ConnectionString" \
  password="Endpoint=sb://eh-telemetry-prod.servicebus.windows.net/;SharedAccessKeyName=...;SharedAccessKey=...";

For production, prefer OAuth (SASL/OAUTHBEARER) with Microsoft Entra ID over a connection string so you are not shipping a shared key into every client. Either way, the endpoint is wire-compatible enough for librdkafka, the Java client, Kafka Connect, and most Kafka tooling.

Know the gaps before you bet on it: the Kafka endpoint does not support the Kafka transactions API, idempotent producer EOS, compacted topics in the Kafka sense (Event Hubs has its own log-compaction feature, configured separately), or the Kafka admin operations that imply broker-level control. It is excellent for lift-and-shift of producers/consumers; it is not a drop-in for a Kafka Streams app that depends on transactions. Validate your specific client’s feature use against the supported set.

6. Stream Analytics: windowing, watermarks, and exactly-once

Azure Stream Analytics (ASA) reads an event hub as input and runs SQL-like queries with temporal windows. The four window types are the vocabulary of stream processing:

Window	Shape	Use
Tumbling	Fixed, non-overlapping	Per-minute counts, billing buckets
Hopping	Fixed size, overlapping by a hop	Moving aggregates (“last 5 min, every 1 min”)
Sliding	Emits on event arrival/expiry	“alert if >N in any 30s”
Session	Gap-defined, variable length	User sessions, activity bursts

The non-negotiable detail is event time vs arrival time. By default ASA windows on arrival time; for correct results under network delay and retries you must declare the event timestamp and a watermark tolerance via TIMESTAMP BY ... OVER .... The watermark is how far ASA waits for stragglers before closing a window. Too tight drops late events; too loose adds latency. Make it explicit.

-- Per-device 1-minute counts on EVENT time, partitioned for scale
SELECT
    deviceId,
    System.Timestamp() AS windowEnd,
    COUNT(*)           AS events,
    AVG(temperature)   AS avgTemp
INTO   [aggregates-output]
FROM   [telemetry-input] TIMESTAMP BY enqueuedAt OVER deviceId
GROUP BY deviceId, TumblingWindow(minute, 1)

TIMESTAMP BY enqueuedAt OVER deviceId applies the timestamp per partition key — substreams — so one slow device cannot hold the global watermark hostage. This is the single highest-leverage clause in an ASA query at scale.

Exactly-once is real but conditional. ASA guarantees exactly-once processing internally, and exactly-once delivery end-to-end only to specific sinks that support it — notably Azure SQL Database, Cosmos DB, and (in the v2/streaming-ingestion path) Synapse and Blob/ADLS via Delta. Plain Blob/JSON, Event Hubs output, and most others are at-least-once — the sink must dedup. Match your sink to your guarantee; do not assume the ASA badge means exactly-once for every output.

To use all partitions in parallel, make the job embarrassingly parallel: keep the same partition key from input through GROUP BY to output, and the input partition count, query partitioning, and streaming units (SUs) all align. Add SUs (the ASA compute unit, in the V2 model) when the job’s SU% utilization or watermark delay climbs.

7. Networking: private endpoints and dedicated clusters

By default the namespace has a public FQDN. To pull it onto the VNet, disable public access and add a private endpoint, which maps the namespace into private DNS zone privatelink.servicebus.windows.net (Event Hubs shares the Service Bus DNS namespace — this is correct, not a typo).

# Lock down public access, then private endpoint onto the VNet
az eventhubs namespace update -g $RG -n $NS --public-network-access Disabled

az network private-endpoint create -g $RG -n pe-$NS \
  --vnet-name vnet-data --subnet snet-pe \
  --private-connection-resource-id $(az eventhubs namespace show -g $RG -n $NS --query id -o tsv) \
  --group-id namespace \
  --connection-name eh-conn

az network private-dns zone create -g $RG -n privatelink.servicebus.windows.net
az network private-dns link vnet create -g $RG -n link-eh \
  --zone-name privatelink.servicebus.windows.net --virtual-network vnet-data --registration-enabled false
az network private-endpoint dns-zone-group create -g $RG \
  --endpoint-name pe-$NS -n zg \
  --private-dns-zone privatelink.servicebus.windows.net --zone-name servicebus

The Kafka endpoint over a private endpoint still uses port 9093, and Capture still needs a network path to the storage account — give the storage account a private endpoint too, or enable the “trusted Microsoft services” firewall exception so Capture can write. Forgetting this is the classic “private networking broke Capture” outage.

A Dedicated cluster is the top tier: single-tenant capacity in CUs, the highest partition counts, 90-day retention, and isolation from noisy-neighbor TU contention. You move to Dedicated when sustained throughput crosses roughly the 100+ MB/s range or when tenancy/compliance demands single-tenant isolation — not before, because it carries a substantial fixed monthly floor.

8. Diagnostics: throttling, lag, and capacity planning

The three metrics that tell you the truth, in Azure Monitor:

ThrottledRequests — you have hit the TU ceiling. On Standard, raise TUs or the auto-inflate max. Persistent throttling at the max means you are undersized or it is time for Premium.
IncomingBytes vs OutgoingBytes — confirms whether you are ingress- or egress-bound. Egress >> ingress means consumer-group fan-out is your real constraint (Section 1).
Consumer lag — there is no single built-in “lag” metric; you derive it. Compare each partition’s last enqueued sequence number (exposed on the partition properties / LastEnqueuedSequenceNumber) against your checkpointed sequence number. A growing gap means a consumer is falling behind — scale it out (up to partition count) or speed up its processing.

// Throttling and capture failures over the last hour, from diagnostic logs
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.EVENTHUB"
| where Category in ("OperationalLogs", "ArchiveLogs")
| where TimeGenerated > ago(1h)
| where ResultDescription has_any ("throttl", "ServerBusy", "Capture")
| summarize count() by Category, OperationName, ResultDescription
| order by count_ desc

Capacity planning is arithmetic, not guesswork: required_TUs = max(ingress_MBps / 1, total_egress_MBps / 2) where total_egress = ingress * number_of_consumer_groups, then round up and add headroom for spikes. Re-derive it every time you add a consumer group, because each one adds a full copy of egress.

Verify

Confirm each layer end to end before declaring done:

# Hub config: partition count, retention, Capture state
az eventhubs eventhub show -g $RG --namespace-name $NS -n $EH \
  --query '{partitions:partitionCount, retentionHrs:retentionDescription.retentionTimeInHours, capture:captureDescription.enabled, skipEmpty:captureDescription.skipEmptyArchives}'

# Namespace: SKU, auto-inflate, public access
az eventhubs namespace show -g $RG -n $NS \
  --query '{sku:sku.name, capacity:sku.capacity, autoInflate:isAutoInflateEnabled, maxTU:maximumThroughputUnits, publicAccess:publicNetworkAccess}'

# Consumer groups present
az eventhubs eventhub consumer-group list -g $RG --namespace-name $NS --eventhub-name $EH -o table

Send a keyed event; confirm all events for that key land in one partition (read PartitionId on the received event).
Wait one capture interval; list the capture container and confirm Avro files appear under the {PartitionId}/.../{Hour} path, and that no empty files exist on idle partitions.
Restart a consumer instance; confirm it resumes from the checkpoint, not from the start of retention (watch for a replay storm — that means checkpointing is broken).
Point a Kafka client at :9093 with $ConnectionString auth; produce and consume one message.
Start the ASA job; confirm SU% utilization is healthy and watermark delay is near zero under load.
Force throttling with a load test; confirm ThrottledRequests fires and auto-inflate raises TUs.

Enterprise scenario

A logistics platform ingested GPS pings from ~120,000 vehicles into a single Standard hub created two years earlier with 4 partitions — chosen when the fleet was 3,000 trucks. As the fleet grew, the enrichment consumer group could run at most 4 parallel readers, processing fell behind during the morning surge, and lag grew until events aged past the 7-day retention and were lost — discovered only when a customer’s route-history report had gaps. Because partition count is immutable on Standard, no amount of consumer scaling could help: 4 partitions is a hard ceiling of 4 readers.

The fix was a side-by-side migration, not an in-place change. They provisioned a new hub with 64 partitions, kept the partition key as vehicleId (high cardinality, so the spread is even and per-vehicle ordering holds), and dual-wrote from producers to old and new hubs for one retention window. The enrichment service was pointed at the new hub’s consumer group, scaled to 16 instances (well under the 64-partition ceiling, leaving headroom), and lag dropped to seconds. Capture was enabled on the new hub from minute one so any future consumer bug is recoverable by replaying Avro from ADLS instead of racing the retention clock. The old hub was decommissioned after the cutover window closed.

# New hub sized for the real fleet, Capture on from day one
az eventhubs eventhub create -g $RG --namespace-name $NS -n vehicle-telemetry-v2 \
  --partition-count 64 \
  --retention-time-in-hours 168 \
  --enable-capture true --capture-interval 300 --skip-empty-archives true \
  --destination-name EventHubArchive.AzureBlockBlob \
  --storage-account "/subscriptions/<sub-id>/resourceGroups/$RG/providers/Microsoft.Storage/storageAccounts/$SA" \
  --blob-container capture \
  --archive-name-format '{Namespace}/{EventHub}/{PartitionId}/{Year}/{Month}/{Day}/{Hour}/{Minute}/{Second}'

The lesson the team internalized: partition count is a day-one capacity decision with no day-two undo on Standard. Over-provision partitions (they are nearly free), keep a high-cardinality key, and turn on Capture before you need it — so the next surprise is a replay, not a data-loss incident.

Azure Event Hubs at Scale: Partitioning, Capture, Kafka Endpoint, and Stream Analytics Processing

1. Throughput units, processing units, and auto-inflate

2. Partition count: ordering vs parallelism

3. Consumer groups, checkpoint stores, and the processor client

4. Event Hubs Capture to ADLS with Avro and replay

5. Kafka-protocol endpoint for existing producers and consumers

6. Stream Analytics: windowing, watermarks, and exactly-once

7. Networking: private endpoints and dedicated clusters

8. Diagnostics: throttling, lag, and capacity planning

Verify

Enterprise scenario

Checklist

Written by Vinod

Comments

Keep Reading

Application Gateway for Containers: Gateway API on AKS with Traffic Splitting, mTLS, and Header Routing

Azure Service Bus at Scale: Sessions, Deduplication, and Dead-Letter Handling

GPU Workloads and KAITO Inference on AKS: Node Pools, Drivers, and Autoscaling