Event-Driven Architectures with Azure Event Grid: MQTT, Routing, and Reliable Delivery

Most teams meet Event Grid as “the thing that fires a function when a blob lands” — the original product, a global push-only router built on custom topics and system topics. The newer surface, Event Grid namespaces, is a different animal: an MQTT v5 broker, a queue-like pull delivery API, namespace topics with 7-day retention, and dead-lettering to Blob Storage. For fleet telemetry ingestion or back-pressure-tolerant fan-out to slow consumers, namespaces are the tier you want, and the design decisions are not obvious.

This guide builds an end-to-end namespace system: MQTT clients publishing telemetry, messages routed into a namespace topic, and two consumer styles — push to Event Hubs and pull for a throughput-controlled worker — with retries and dead-lettering wired correctly. Every command targets the namespace tier, which behaves nothing like the basic tier you may know.

1. Namespaces vs. custom topics vs. system topics

Pick the wrong resource and you will fight the platform for the life of the system. The three are not interchangeable.

Capability	System topic	Custom topic (basic)	Namespace topic
Source	Azure services (Blob, Resource Groups, etc.)	Your app	Your app
MQTT broker	No	No	Yes
Pull delivery	No	No	Yes
Push to Event Hubs	Yes	Yes	Yes
Push to Functions, Service Bus, Storage queues, webhooks	Yes	Yes	Not yet (Event Hubs only today)
Schema	EventGridSchema / CloudEvents	EventGridSchema / CloudEvents	CloudEvents 1.0 JSON only
Max throughput (ingress / egress)	~5 MB/s	~5 MB/s	40 MB/s / 80 MB/s
Retention	Best-effort, 24h retry	24h retry	7 days

The key trade-off: namespace topics give you MQTT, pull delivery, high throughput, and durable retention, but the push destination set is still narrower than basic (Event Hubs only at time of writing — more are rolling out). A common production shape is therefore MQTT into a namespace topic, push to Event Hubs, then Event Hubs fans out to Stream Analytics, Functions, or Fabric. Namespace topics also accept only CloudEvents 1.0 JSON — no proprietary EventGridSchema.

Namespace topics cannot host system topics, domain topics, or partner topics, and they cannot subscribe to Azure service events. They carry your events only. If you need Blob-created events, that is still a system topic.

Create the namespace with both MQTT and a system-assigned identity (you will need the identity for routing and dead-letter):

RG=rg-eventing
LOC=eastus
NS=egns-telemetry

az eventgrid namespace create \
  --resource-group $RG \
  --name $NS \
  --location $LOC \
  --topic-spaces-configuration "{state:Enabled}" \
  --identity "{type:SystemAssigned}"

Enabling topicSpacesConfiguration.state = Enabled is what turns on the MQTT broker; without it you get a pull-delivery-only namespace.

2. MQTT broker: clients, topic spaces, and permission bindings

The broker speaks MQTT v3.1.1 and v5 (and both over WebSocket). QoS 0 and 1 are supported; QoS 2 is not. Authorization is not per-client-per-topic — unmanageable at fleet scale. Instead you compose four resources:

Clients — one registry entry per device/app, keyed by an authentication name (an X.509 cert subject / thumbprint, or a Microsoft Entra identity).
Client groups — a query over client attributes that buckets clients (e.g. all building == "b12" sensors).
Topic spaces — a set of MQTT topic templates (e.g. devices/${client.authenticationName}/telemetry).
Permission bindings — grant a client group Publisher or Subscriber rights on a topic space.

az eventgrid namespace client create \
  --resource-group $RG \
  --namespace-name $NS \
  --client-name sensor-0007 \
  --authentication-name sensor-0007 \
  --state Enabled \
  --client-certificate-authentication "{validationScheme:ThumbprintMatch,allowedThumbprints:[A1B2C3D4E5F6...]}" \
  --attributes "{building:'b12',role:'sensor'}"

Define a topic space whose template scopes each device to its own subtree, then create a client group that selects the sensors:

az eventgrid namespace topic-space create \
  --resource-group $RG \
  --namespace-name $NS \
  --name ts-telemetry \
  --topic-templates "devices/\${client.authenticationName}/telemetry/#"

az eventgrid namespace client-group create \
  --resource-group $RG \
  --namespace-name $NS \
  --name cg-sensors \
  --query "attributes.role = 'sensor'"

The ${client.authenticationName} variable is the whole point: a single topic space template gives each client publish rights to only its own topic, without one binding per device. Bind publish permission:

az eventgrid namespace permission-binding create \
  --resource-group $RG \
  --namespace-name $NS \
  --name pb-sensors-pub \
  --client-group-name cg-sensors \
  --topic-space-name ts-telemetry \
  --permission Publisher

A client may not connect, publish, or subscribe to anything until a permission binding explicitly allows it. Default-deny is the security posture, and it is correct for IoT.

3. Routing MQTT messages into a topic

MQTT messages live inside the broker. To get them into the rest of Azure, configure routing: every message is wrapped in a CloudEvents envelope and published to one namespace topic (or custom topic) you nominate. From there, event subscriptions take over.

First create the destination namespace topic:

az eventgrid namespace topic create \
  --resource-group $RG \
  --namespace-name $NS \
  --name mqtt-ingest

Routing is set on the namespace’s topicSpacesConfiguration and is most reliably applied as a properties object via az resource. The two fields that matter are routeTopicResourceId (where messages land) and routingIdentityInfo (which identity authenticates the publish — for a namespace topic in the same namespace, None works because no cross-resource role assignment is needed):

{
  "properties": {
    "topicSpacesConfiguration": {
      "state": "Enabled",
      "routeTopicResourceId": "/subscriptions/<SUB>/resourceGroups/rg-eventing/providers/Microsoft.EventGrid/namespaces/egns-telemetry/topics/mqtt-ingest",
      "routingIdentityInfo": { "type": "None" }
    }
  }
}

az resource update \
  --resource-type Microsoft.EventGrid/namespaces \
  --ids "/subscriptions/<SUB>/resourceGroups/$RG/providers/Microsoft.EventGrid/namespaces/$NS" \
  --is-full-object \
  --properties @routing.json

If you route to a custom topic instead (to reach a push destination namespace topics do not yet support, like Service Bus), the topic must use CloudEvents v1.0, sit in the same region, and have the namespace identity granted the EventGrid Data Sender role — set routingIdentityInfo.type to SystemAssigned. Disabling public network access on the namespace breaks routing, so plan private networking on the consumer side, not the broker.

When the broker wraps a message, the CloudEvent’s subject carries the original MQTT topic and data carries the payload — exactly what you filter on next.

4. Push vs. pull delivery, and when pull wins

This is the design fork that defines your consumer architecture.

Push delivery registers a destination in the subscription, and Event Grid POSTs (or AMQP-sends) events to it as they arrive. It is reactive and zero-polling, but the consumer must expose a reachable endpoint and absorb whatever rate Event Grid pushes (within batching limits).

Pull delivery inverts control: the consumer connects to Event Grid and receives events with queue-like semantics — receive, then acknowledge, release, or reject. Reach for pull when:

The consumer cannot expose an endpoint (locked-down network, batch job, on-prem worker).
You need back-pressure — a struggling consumer slows its receive cadence instead of being overwhelmed.
You need a private link to consume over private IP space (push cannot do this).
You want to process at a chosen time (overnight batch) rather than as events occur.

A push subscription to Event Hubs (the supported namespace push destination today):

az eventgrid namespace topic event-subscription create \
  --resource-group $RG \
  --namespace-name $NS \
  --topic-name mqtt-ingest \
  --name sub-eventhubs \
  --delivery-configuration '{
    "deliveryMode": "Push",
    "push": {
      "deliveryWithResourceIdentity": {
        "identity": { "type": "SystemAssigned" },
        "destination": {
          "endpointType": "EventHub",
          "properties": {
            "resourceId": "/subscriptions/<SUB>/resourceGroups/rg-eventing/providers/Microsoft.EventHub/namespaces/ehns-telemetry/eventhubs/telemetry"
          }
        }
      }
    }
  }'

A pull subscription is just deliveryMode: Queue:

az eventgrid namespace topic event-subscription create \
  --resource-group $RG \
  --namespace-name $NS \
  --topic-name mqtt-ingest \
  --name sub-worker \
  --delivery-configuration '{
    "deliveryMode": "Queue",
    "queue": {
      "receiveLockDurationInSeconds": 60,
      "maxDeliveryCount": 5,
      "eventTimeToLive": "P1D"
    }
  }'

receiveLockDurationInSeconds is the window in which a received event must be acknowledged before it becomes available again; maxDeliveryCount caps redeliveries before the event is dead-lettered or dropped.

5. CloudEvents, advanced filters, and subject-based routing

Namespace topics are CloudEvents-native, so filtering keys off CloudEvents attributes and into the data payload. A receive response nests each CloudEvent under event alongside brokerProperties (the lock token and delivery count):

{
  "value": [
    {
      "brokerProperties": { "lockToken": "CiYK...", "deliveryCount": 1 },
      "event": {
        "specversion": "1.0",
        "id": "B688-1234-1235",
        "source": "egns-telemetry",
        "subject": "devices/sensor-0007/telemetry/temp",
        "type": "MQTT.EventPublished",
        "time": "2026-06-08T17:31:00Z",
        "data": { "celsius": 91.4, "battery": 0.62 }
      }
    }
  ]
}

Filter so a subscription only sees the events it cares about. Two complementary tools:

Subject filters — cheap prefix/suffix matching on subject, which for routed MQTT is the device topic.
Advanced filters — typed comparisons (NumberGreaterThan, StringIn, BoolEquals, StringContains) against any attribute or data field via JSON path.

A subscription that only wakes the worker for over-temperature readings from building 12:

{
  "filtersConfiguration": {
    "includedEventTypes": ["MQTT.EventPublished"],
    "filters": [
      { "operatorType": "StringBeginsWith", "key": "subject", "values": ["devices/"] },
      { "operatorType": "NumberGreaterThan", "key": "data.celsius", "values": [85] }
    ]
  }
}

Doing this server-side is not a nicety — it is throughput and cost. Every event a subscription does not match is one your consumer never receives, never locks, and never pays to process. Filter aggressively at the subscription; reserve client-side logic for genuinely dynamic cases.

6. Retries, batching, and dead-letter to Blob Storage

Reliable delivery is three coordinated settings: how hard Event Grid retries, how it batches on push, and where poison events go to die.

Retry budget. On a pull subscription, eventTimeToLive (the P1D ISO-8601 duration above) is the wall-clock ceiling; maxDeliveryCount is the attempt ceiling. Whichever is hit first ends delivery. On push, Event Grid retries with exponential backoff against transient failures; a hard 4xx (other than throttling) is treated as non-retryable and goes straight to dead-letter.

Dead-letter. Configure a Blob Storage destination so undeliverable events are preserved instead of dropped. Prerequisites: enable a managed identity on the namespace and grant it Storage Blob Data Contributor on the storage account. The subscription property is deadLetterDestinationWithResourceIdentity, and deliveryRetryPeriodInDays sets the maximum dead-letter retry window (max 2 days):

{
  "deadLetterDestinationWithResourceIdentity": {
    "deliveryRetryPeriodInDays": 2,
    "endpointType": "StorageBlob",
    "StorageBlob": {
      "blobContainerName": "deadletter",
      "resourceId": "/subscriptions/<SUB>/resourceGroups/rg-eventing/providers/Microsoft.Storage/storageAccounts/stegdeadletter"
    },
    "identity": { "type": "SystemAssigned" }
  }
}

Dead-lettered events are written as CloudEvents JSON with an added deadletterProperties block — deadletterreason, deliveryattempts, deliveryresult, and timestamps — so a replay job knows why each event failed. Blobs land under a time-partitioned path:

<container>/<namespace>/<topic>/<subscription>/<yyyy>/<MM>/<dd>/<HH>/<guid>.json

That deadletterreason is the difference between a five-minute replay and an afternoon of forensics. An Unauthorized reason means fix the consumer’s auth and rehydrate; a parse failure means the producer shipped a bad schema and those events should probably not be replayed at all.

7. Securing endpoints with managed identity and webhook validation

Three security surfaces, three mechanisms:

MQTT clients authenticate with X.509 certificates (CA-signed or thumbprint-pinned) or Microsoft Entra ID / JWT. Authorization is the default-deny permission-binding model from step 2.
Push to Azure services (Event Hubs, and the destinations rolling out) uses the namespace’s managed identity plus an RBAC role on the target — deliveryWithResourceIdentity above. No keys, no SAS tokens, no secrets to rotate.
Push to webhooks must complete the CloudEvents abuse-protection handshake: Event Grid issues an OPTIONS request with a WebHook-Request-Origin header, and your endpoint must echo it in WebHook-Allowed-Origin. This proves endpoint ownership and stops Event Grid being used to flood a third party. Better still, front the webhook with Microsoft Entra and validate the presented token.

Grant the namespace identity rights on the Event Hub used in step 4:

NS_PRINCIPAL=$(az eventgrid namespace show -g $RG -n $NS --query identity.principalId -o tsv)
EH_ID=$(az eventhubs eventhub show -g $RG --namespace-name ehns-telemetry -n telemetry --query id -o tsv)

az role assignment create \
  --assignee-object-id $NS_PRINCIPAL \
  --assignee-principal-type ServicePrincipal \
  --role "Azure Event Hubs Data Sender" \
  --scope $EH_ID

8. Observability, delivery metrics, and replaying failed events

You cannot operate what you cannot see. Route the namespace’s diagnostic metrics to Log Analytics and watch the delivery counters — a climbing dead-letter count or non-zero drop count is telling you something is wrong now.

AzureMetrics
| where ResourceProvider == "MICROSOFT.EVENTGRID"
| where MetricName in ("DeliverySuccessCount", "DeliveryAttemptFailCount", "DeadLetteredCount", "DroppedEventCount")
| summarize Total = sum(Total) by MetricName, bin(TimeGenerated, 5m)
| order by TimeGenerated desc

DroppedEventCount is the alarm metric: events dropped were not dead-lettered (no destination configured, or the dead-letter retry window itself expired) — they are gone. Alert on it at zero tolerance.

Replay is your recovery path. Dead-letter blobs are CloudEvents JSON, so rehydration is: read the blob, strip deadletterProperties, re-publish to the topic. A worker can drain the dead-letter container directly:

az storage blob list \
  --account-name stegdeadletter \
  --container-name deadletter \
  --prefix "egns-telemetry/mqtt-ingest/sub-worker/2026/06/08/" \
  --auth-mode login \
  --query "[].name" -o tsv

Pipe each blob through a function that re-posts the inner event object to the topic endpoint. Decide replay policy by deadletterreason: transient auth/throttle failures rehydrate cleanly; schema or business-rule rejections usually should not.

Enterprise scenario

A connected-vehicle platform team ingested telemetry from roughly 200,000 vehicles over MQTT into a namespace, routed to a namespace topic, and pushed to Event Hubs for a Stream Analytics pipeline. The constraint surfaced during a regional Stream Analytics incident: the job stalled for ninety minutes, Event Hubs back-pressured, and the push subscription began burning through maxDeliveryCount. Because the subscription had no dead-letter destination, DroppedEventCount climbed — they were silently losing trip data they were contractually required to retain.

The fix was two-part and structural. First, every namespace-topic subscription got a mandatory dead-letter destination, enforced in the subscription module so a subscription literally could not be provisioned without one:

{
  "deadLetterDestinationWithResourceIdentity": {
    "deliveryRetryPeriodInDays": 2,
    "endpointType": "StorageBlob",
    "StorageBlob": {
      "blobContainerName": "vehicle-deadletter",
      "resourceId": "/subscriptions/<SUB>/resourceGroups/rg-fleet/providers/Microsoft.Storage/storageAccounts/stfleetdlq"
    },
    "identity": { "type": "SystemAssigned" }
  }
}

Second, they added a parallel pull subscription on the same topic feeding a back-pressure-tolerant archival worker. Because each subscription gets its own independent copy of every event, the archival path drained at its own pace and could not be starved by the analytics path stalling. During the next incident DroppedEventCount stayed flat, the dead-letter container captured the analytics overflow, and a replay job rehydrated it once Stream Analytics recovered — zero data loss, SLA held. The lesson written into their platform standards: on a namespace topic, fan out by subscription, dead-letter every subscription, and alert on dropped — not failed — events.

Verify

# 1. Namespace exists with MQTT (topic spaces) enabled
az eventgrid namespace show -g $RG -n $NS \
  --query "{state:topicSpacesConfiguration.state, identity:identity.type}" -o table

# 2. Permission bindings are in place (default-deny means no binding = no access)
az eventgrid namespace permission-binding list -g $RG --namespace-name $NS -o table

# 3. Routing points at the ingest topic
az eventgrid namespace show -g $RG -n $NS \
  --query "topicSpacesConfiguration.routeTopicResourceId" -o tsv

# 4. Subscriptions show the expected delivery modes and dead-letter config
az eventgrid namespace topic event-subscription show \
  -g $RG --namespace-name $NS --topic-name mqtt-ingest --name sub-worker \
  --query "{mode:deliveryConfiguration.deliveryMode, dlq:deadLetterDestinationWithResourceIdentity.StorageBlob.blobContainerName}" -o table

Then publish a test message from an MQTT client (e.g. mosquitto_pub with the client cert), and confirm via the KQL query that DeliverySuccessCount increments and DroppedEventCount stays zero. Force a failure (point a push subscription at a bad endpoint) and confirm a blob appears under the dead-letter prefix within a few minutes.

Event-Driven Architectures with Azure Event Grid: MQTT, Routing, and Reliable Delivery

1. Namespaces vs. custom topics vs. system topics

2. MQTT broker: clients, topic spaces, and permission bindings

3. Routing MQTT messages into a topic

4. Push vs. pull delivery, and when pull wins

5. CloudEvents, advanced filters, and subject-based routing

6. Retries, batching, and dead-letter to Blob Storage

7. Securing endpoints with managed identity and webhook validation

8. Observability, delivery metrics, and replaying failed events

Enterprise scenario

Verify

Checklist

Written by Vinod

Comments

Keep Reading

Application Gateway for Containers: Gateway API on AKS with Traffic Splitting, mTLS, and Header Routing

Azure Event Hubs at Scale: Partitioning, Capture, Kafka Endpoint, and Stream Analytics Processing

Azure Service Bus at Scale: Sessions, Deduplication, and Dead-Letter Handling