A Data Mesh is not a product you buy or a single Azure resource you deploy. It is an operating model — and the architecture’s job is to make that operating model cheap to live in. The four principles (domain ownership, data as a product, self-serve data platform, and federated computational governance) only pay off when the platform team can hand a domain a paved road: “here is your landing zone, here is how you publish a product, here is the contract you must honour, and the rest is automated.” This article describes a concrete Azure realisation of that paved road built around ADLS Gen2, Synapse/Fabric compute, and Microsoft Purview for federated governance.
The business scenario
Picture a mid-sized insurer, a multi-brand retailer, or a bank that has spent the last decade building a central data warehouse and a central data engineering team. The pattern is always the same and it always breaks the same way.
Every new analytics request — “I need claims fraud signals joined with policy data and the new telematics feed” — becomes a ticket in the central team’s backlog. The central team does not deeply understand claims, or policy, or telematics; they understand pipelines. So they spend weeks reverse-engineering source semantics, the domain experts spend weeks in review meetings correcting them, and by the time the dataset ships the business question has moved on. The warehouse becomes a tangle of 4,000 undocumented tables that nobody dares change because some unknown dashboard might depend on them. Lineage is tribal knowledge. Data quality is “whatever the last pipeline produced.” The central team is simultaneously a bottleneck and blamed for every quality problem in data they never owned.
This is the failure mode Data Mesh targets: the central team does not scale with the number of domains, and centralising ownership separates the people who understand the data from the people responsible for it. The scenario spans company sizes. A 200-person company might have 4 domains (Sales, Finance, Product, Support) and a two-person platform team. A 20,000-person enterprise might have 60 domains across 8 business units. The architecture is the same shape; only the cardinality changes. That scale-invariance is exactly why it belongs in an architecture center.
The goal of the build is to invert ownership: each domain owns its data end-to-end and publishes data products — versioned, documented, quality-guaranteed, discoverable datasets with an owner and an SLA — while a small platform team owns the self-serve infrastructure and a federated governance body owns the rules (not the data). The promise to the business: a new domain can publish a trustworthy, governed data product in days, and any other domain can find and consume it without filing a ticket.
A deliberate framing note before we go further: do not adopt this if you have one or two domains and a happy central warehouse. Data Mesh is an answer to organisational scale, and it imposes real overhead (platform team, governance forum, per-domain ownership). The “When to use it” section is blunt about this. Everything below assumes you have already concluded that central ownership is your bottleneck.
Architecture overview
The architecture has three planes that map directly onto three different owners, and keeping them separate is the entire trick.
1. The data product plane (owned by domains). Each domain gets its own isolated landing zone — in practice an Azure subscription (or a tightly-scoped resource group for smaller orgs) containing its own ADLS Gen2 storage account, its own compute, and its own pipelines. A domain ingests from its operational sources (its own Azure SQL / Cosmos DB / SaaS APIs / event streams), transforms the data inside its boundary, and publishes the output ports of its data products into a curated, read-optimised zone in its storage account. Crucially, a data product is not just a folder of Parquet — it is the data plus its schema contract, its quality metrics, its documentation, its access policy, and its registration in the catalog. Domains never write into each other’s storage. They consume each other’s products through shared, governed access.
2. The self-serve platform plane (owned by the platform team). This is the paved road: a set of Terraform/Bicep modules, a CI/CD template, and shared services that every domain consumes identically. When a new domain onboards, the platform pipeline stamps out their landing zone (storage with the medallion container layout, a Synapse workspace or Fabric capacity, managed identities, private endpoints, default Purview registration, default policies, cost tags) from a template. The platform team also runs the shared services that must be central: the Purview account, the network hub, the identity/entitlements integration, and the data-product catalog/marketplace UI. The platform team’s KPI is lead time to a new data product, not the number of pipelines they personally write — they should write zero domain pipelines.
3. The federated governance plane (owned by a cross-domain guild). A governance forum — platform team plus a data steward from each domain — agrees on global policies that are then enforced computationally rather than by review. “Global” means cross-cutting: PII classification and masking rules, the interoperability standards (every product must expose a canonical customer_id, timestamps in UTC, a documented schema, a freshness SLA), retention, and the residency rules. These become Purview classification rules, Purview DLP/data-policies, Azure Policy definitions, and CI checks — encoded once, applied to every domain automatically.
The end-to-end data path. Follow a single fact from source to consumption:
- An operational event (a new claim) lands in the Claims domain’s source system.
- The Claims domain’s own pipeline (Synapse pipeline, Fabric Data Factory, or Databricks job — the platform allows a small menu, not a free-for-all) ingests it into the Bronze container of the Claims storage account: raw, immutable, append-only.
- Domain-owned transformations clean and conform it into Silver (validated, deduplicated, business entities).
- The domain shapes the consumer-facing Gold data product — e.g.
claims_enrichedas Delta tables — and runs data quality checks (Great Expectations / Fabric Data Quality / dbt tests) as a gate. Failing the contract blocks publication. - Publication registers the product: the Purview scan picks up the schema and lineage, the product’s metadata (owner, SLA, classification, sample, contract version) is written to the catalog, and an access policy is attached. The product is now discoverable.
- A consumer in the Fraud domain searches the catalog/marketplace, finds
claims_enriched, sees its schema, freshness, owner, and quality score, and requests access. Approval grants entitlement (an Entra group / Purview policy) — no copy, no ticket to the Claims engineers. - Fraud queries the product in place via Synapse serverless SQL
OPENROWSET/external tables or a Fabric shortcut (so there is one physical copy, queried from many places), joins it with their own telematics product, and builds their fraud model. Their model output becomes a new data product, and the lineage in Purview now shows the full chain Claims → Fraud automatically.
The diagram, described in words: three horizontal swim-lanes. Top lane is several domain boxes (Claims, Policy, Telematics, Finance), each an identical stack — Sources → ADLS Gen2 (Bronze/Silver/Gold) → domain compute → Gold “data product” output ports. Bottom-left is the platform plane: an IaC factory feeding each domain box (dotted “provisions” arrows), plus a network hub and CI/CD. Bottom-right is the governance plane: Microsoft Purview at the center with arrows scanning every domain’s Gold zone (pulling lineage and classifications), and pushing policies down into each domain. A horizontal “consumption fabric” connects domains to each other — Synapse serverless and Fabric shortcuts — so arrows between domain Gold zones represent query-in-place, never bulk copies. The catalog/marketplace sits above everything as the single discovery surface.
The defining property: data flows peer-to-peer between domains, governance flows top-down as policy, and infrastructure flows top-down as templates. No single component is a runtime bottleneck for queries, and no central team is a bottleneck for delivery.
Component breakdown
| Component | Azure service | Role in the mesh | Key configuration choices |
|---|---|---|---|
| Domain landing zone | Azure Subscription / Resource Group + Management Group | Blast-radius and cost isolation per domain | One subscription per domain at scale; RG-per-domain for small orgs. Domains sit under a “Data Domains” management group with inherited Azure Policy. |
| Data product storage | ADLS Gen2 (one account per domain) | Physical home of Bronze/Silver/Gold + the product output ports | Hierarchical namespace on; containers bronze/silver/gold; Gold as Delta/Parquet; lifecycle rules archive Bronze; private endpoints only. |
| Domain compute | Synapse Analytics, Microsoft Fabric, or Azure Databricks | Domain-owned transform + serve | Platform offers a curated menu. Synapse serverless SQL for query-in-place; Spark for transforms; Fabric for low-code domains. |
| Cross-domain query | Synapse serverless OPENROWSET / Fabric OneLake shortcuts |
Read another domain’s Gold product without copying | Shortcuts/external tables point at the owning domain’s Gold; access enforced by ACLs + Purview policy. One physical copy. |
| Catalog & governance | Microsoft Purview (single account) | Discovery, classification, lineage, federated policy | One Purview account, collections per domain delegating curation. Scheduled scans of Gold. Data Map + Unified Catalog as the marketplace. |
| Entitlements | Microsoft Entra ID groups + Purview data policies | Who can read which product | Per-product Entra group dp-<domain>-<product>-reader; access requests flow through catalog → group membership. |
| Self-serve factory | Terraform (or Bicep) modules + Azure DevOps/GitHub Actions | Stamps domain landing zones & publishes products | A data-domain module + a data-product module. Publishing is a pipeline, not a portal click. |
| Network hub | Azure Virtual Network + Private DNS + Private Endpoints | Keep all data-plane traffic private | Hub-and-spoke; each domain a spoke; Private Endpoints for Storage, Synapse, Purview; no public storage endpoints. |
| Observability | Azure Monitor + Log Analytics + Purview Insights | Platform & product health, quality trends | Per-product dashboards: freshness, quality score, query volume, cost. Central platform SLO dashboard. |
A few choices deserve the why, not just the what:
-
ADLS Gen2 per domain, not one giant lake. A shared storage account re-creates the monolith — shared throughput limits, a shared blast radius, and ACLs nobody can reason about. One account per domain gives each domain its own scaling envelope, its own RBAC root, and a clean ownership boundary. The “single logical lake” experience is restored at the query layer (shortcuts/serverless), not the storage layer.
-
Query-in-place over copy-on-consume. The instinct to copy a product into the consumer’s account is what created the 4,000-table warehouse. Synapse serverless external tables and Fabric OneLake shortcuts let Fraud read Claims’ Gold where it lives. One copy, owned by the producer, governed once. The producer can evolve the product behind a contract; consumers don’t hold stale forks.
-
Purview collections mirror the org, not the platform. Purview’s collection hierarchy is set to mirror the domain structure so each domain’s steward curates their assets and owns their classifications, while global classification rules (PII, residency) are defined at the root and inherited. This is the literal implementation of “federated” governance — central rules, local curation.
-
Publishing is a pipeline. A data product is “published” by a CI/CD job that runs the quality gate, registers/updates the Purview asset, attaches the access policy, and bumps the contract version. Making publication a click in a portal loses the audit trail and the gate; making it a pipeline makes the contract enforceable and reproducible.
Implementation guidance
Landing-zone topology. Use a management-group hierarchy: a data-platform MG with two children — platform-services (Purview, network hub, shared CI) and data-domains (one subscription per domain). Azure Policy assigned at data-domains enforces the non-negotiables on every domain automatically: deny public blob access, require private endpoints, require the cost-center and data-domain tags, allowed regions only, and require diagnostic settings to the central Log Analytics workspace. New domains inherit all of this the moment their subscription lands in the MG — governance by construction.
IaC: two modules, one factory. The platform team owns exactly two reusable modules.
module "data_domain"provisions the landing zone: ADLS Gen2 account (HNS on) withbronze/silver/goldcontainers, a Synapse workspace or Fabric capacity, a user-assigned managed identity, private endpoints + Private DNS A-records, the domain’s Purview collection, default Azure Policy assignments, and budget alerts. Inputs are deliberately tiny:domain_name,cost_center,data_classification_default,region. A domain onboards by adding ~8 lines to a config file; the pipeline does the rest.module "data_product"registers a product: creates the Gold table location, thedp-<domain>-<product>-readerEntra group, the Purview asset + glossary terms + classification, the access-policy stub, and wires the freshness/quality dashboards. The domain’s publish pipeline calls this module plus runs the quality gate.
Terraform vs Bicep: Bicep is the cleaner fit for the pure-Azure control plane (subscriptions, Synapse, storage, policy) and gets day-one support for new resource types. Terraform wins if you must also manage Entra groups, Purview, and (often) Databricks/Fabric in the same graph, since those providers cover resources Bicep cannot, and it gives you a single plan/apply across identity + data + governance. This reference uses Terraform for that reason; keep state in an Azure Storage backend with a state file per domain so a domain’s apply can never corrupt another’s. Pin module versions and require PR review on the platform modules — they are the contract every domain depends on.
Identity & access wiring. Everything is managed-identity and Entra-group first; no keys, no SAS tokens, no connection strings in pipelines.
- Domain pipelines authenticate to their own storage with the user-assigned managed identity provisioned by the module (
Storage Blob Data Contributoron their own account only). - Cross-domain read access is never a direct RBAC grant from a human to storage. A consumer requests access in the catalog; approval adds their Entra group to the product’s
...-readergroup, and a Purview data-access policy (or storage ACL on the Gold path) grantsStorage Blob Data Readerscoped to that one product’s path. This keeps “who can read this product” answerable from one place and revocable in one place. - Synapse/Fabric query the foreign product via the same identity, so the audit log shows which principal read which product — the lineage and the access log line up.
Networking. Hub-and-spoke. The hub holds Private DNS zones (privatelink.dfs.core.windows.net, privatelink.sql.azuresynapse.net, privatelink.purview.azure.com) and shared DNS resolution; each domain is a spoke VNet. All storage accounts are set to public network access = Disabled with private endpoints; Synapse uses a managed VNet with managed private endpoints to reach foreign domains’ storage. The result is that cross-domain queries traverse the Microsoft backbone privately — a domain in subscription A reads a private-endpoint’d storage account in subscription B, with DNS resolved from the hub.
The medallion contract inside a domain. Bronze is raw and immutable (append-only, retain for replay, lifecycle-archive after N days). Silver is conformed and validated. Gold is the product — and only Gold is exposed across the mesh. The platform template ships the container layout and a sample dbt/Spark project so domains start from a known-good shape rather than inventing their own.
Enterprise considerations
Security & Zero Trust. The mesh’s isolation model is its Zero Trust story: per-domain subscriptions and storage accounts mean a compromised domain pipeline cannot read another domain’s raw data, because it has no identity that grants it. Every access is explicit, scoped to a single product path, granted via group, and logged. Layer on: private endpoints everywhere (no public data-plane), CMK encryption on ADLS for regulated domains, and Purview classification-driven masking so PII columns are auto-tagged on scan and masked for readers who lack the entitlement. Defender for Storage and Defender for Cloud run across all domain subscriptions centrally. The principle to hold onto: governance is enforced computationally — a steward classifies a column as PII.NationalID once, and the masking + access policy follow automatically, rather than relying on each consumer to “remember” to handle it.
Cost optimization. The single biggest lever is serverless and query-in-place: Synapse serverless SQL bills per TB scanned, so a consumer querying a foreign Gold product pays only for what they read, and there is no idle warehouse and no duplicated storage of copied datasets. Partition and Z-order/OPTIMIZE Gold tables so consumer queries prune aggressively (a well-partitioned product can cut scan cost 10x). Per-domain subscriptions make cost attributable — each domain sees its own bill and is accountable for its products’ efficiency, which is itself a cost control (central pools hide waste). Use ADLS lifecycle management to tier Bronze to Cool/Archive. Right-size Fabric capacities and Synapse Spark pools with auto-pause. Tag everything with data-domain and cost-center (enforced by policy) so showback/chargeback is automatic.
Scalability. The architecture scales organisationally — adding the 30th domain is the same templated operation as the 3rd, and it adds no load to existing domains because each has its own compute and storage envelope. It scales technically because there is no shared runtime bottleneck: ADLS Gen2 scales to petabytes per account, serverless compute scales per query, and consumers parallelise across products. The one component that is genuinely central — Purview — scales with capacity units and scheduled (not continuous) scanning; keep scan scope to Gold to avoid scanning churny Bronze.
Reliability & DR (RTO/RPO). DR is per data product, tiered by business criticality, which is far cheaper than DR-ing one monolith uniformly.
| Product tier | Example | RPO | RTO | Mechanism |
|---|---|---|---|---|
| Tier 1 (critical) | claims_enriched, customer_master |
~15 min | ~1 hr | GZRS storage + object replication of Gold to paired region; IaC redeploy of compute |
| Tier 2 (important) | marketing_segments |
~24 hr | ~8 hr | ZRS + nightly cross-region copy of Gold |
| Tier 3 (best-effort) | exploratory products | Rebuildable | Rebuild | Re-run pipeline from Bronze in paired region |
Bronze immutability is the ultimate safety net: any Silver/Gold product can be rebuilt deterministically from Bronze, so the true RPO for derived products is bounded by Bronze replication, not by snapshotting every table. Storaccounts use GZRS for Tier-1 domains; the medallion + IaC pairing means recovery is “replay from Bronze on redeployed compute,” which is testable in a DR drill.
Observability. Two altitudes. Platform SLOs: lead time to onboard a domain, lead time to publish a product, % of products meeting their freshness SLA, scan success rate — these tell the platform team whether the paved road works. Product health (per product, owned by the domain): freshness (time since last successful load vs SLA), quality score (pass rate of contract tests), schema-drift alerts, query volume and cost, and top consumers. All diagnostics flow to the central Log Analytics workspace (enforced by policy); Purview Insights gives the cross-cutting view of classification coverage and stale assets. A product that misses its SLA pages its domain, not the platform team — accountability follows ownership.
Governance. This is where Data Mesh lives or dies, so make it concrete. The federated governance forum publishes a small set of global standards and encodes each one as enforcement, not a wiki page:
- Interoperability — every product exposes canonical join keys (
customer_id), UTC timestamps, a documented schema, and a freshness SLA → enforced by a CI contract check in the publish pipeline. - Classification & privacy — PII rules defined as Purview classification rules at the collection root, inherited by all domains; masking policies attached automatically.
- Discoverability — a product is not “published” unless it has an owner, description, glossary terms, and a sample in Purview → the publish pipeline fails otherwise.
- Residency & retention — Azure Policy on the
data-domainsMG restricts regions; lifecycle rules enforce retention.
The cultural commitment is that the governance body owns the rules and the platform owns the enforcement, but no central team owns the data — each product has a named domain owner accountable for its quality and its SLA.
Reference enterprise example
NorthPeak Insurance is a fictional mid-market insurer: ~4,500 employees, ~3.1M policyholders across motor, home, and life, ~28 TB of analytical data growing ~1 TB/quarter. Their starting point is the classic monolith: a central 11-person data team, a single Synapse dedicated SQL pool warehouse, ~2,800 tables, and an 9-week median lead time from “business asks for a dataset” to “dataset in production.” Claims fraud detection has been “next quarter’s project” for five quarters because the central team can’t get to it.
The decision. NorthPeak has 6 natural domains (Policy, Claims, Pricing/Actuarial, Customer, Marketing, Finance) and a clear bottleneck — textbook Data Mesh conditions. They keep the central team but re-charter it: 4 people become the platform team (own the Terraform factory + Purview + network), and the other 7 embed into domains as domain data engineers reporting into the business units. A governance guild forms with one steward per domain.
The build (first two quarters). The platform team ships the data_domain and data_product Terraform modules and a publish pipeline. Each of the 6 domains gets a subscription, an ADLS Gen2 account (bronze/silver/gold), a Synapse workspace, managed identity, private endpoints, and a Purview collection — all stamped from the module. A single Purview account holds 6 collections mirroring the domains. Global policies (PII classification, customer_id/UTC interoperability, India + EU residency split for life vs motor) are encoded as Purview rules + Azure Policy on the data-domains management group.
First products. Customer domain publishes customer_master (Tier 1, GZRS, 15-min RPO) as the canonical customer_id source. Policy publishes policies_active. Claims publishes claims_enriched. Each is a Delta product in its domain’s Gold container with a contract (schema + freshness SLA of 1 hour + dbt quality tests) enforced at publish time.
The payoff query. The long-stalled fraud use case: the Claims domain’s own engineers (who actually understand claims) build a fraud_signals product. They consume policies_active and customer_master in place via Synapse serverless external tables — no copies, access granted through the catalog by adding Claims’ Entra group to each product’s reader group. Purview lineage now shows customer_master + policies_active → fraud_signals automatically. What had been stalled for five quarters shipped in under three weeks because the people building it owned the data and could self-serve the dependencies.
Outcomes after ~9 months.
| Metric | Before (monolith) | After (mesh) |
|---|---|---|
| Median lead time, new dataset to prod | ~9 weeks | ~6 days |
| Cross-team access request | Ticket to central team, ~2 weeks | Catalog request, ~1 day, no copy |
| Duplicated copies of “customer” data | 7 divergent versions | 1 product (customer_master) |
| Analytical storage growth from copies | ~1 TB/qtr of duplicates | near-zero (query-in-place) |
| Owner accountable for a given dataset | “the central team” (i.e. nobody) | named domain owner + SLA |
| Fraud-detection use case | Stalled 5 quarters | Live in <3 weeks |
NorthPeak’s monthly Azure analytics spend went up about 12% in raw infra (6 isolated stacks cost more than one shared pool) but down in total cost of ownership: they retired the duplicate-copy storage, the central team stopped being a 24/7 bottleneck, and — the number the CFO cared about — the fraud product paid for the entire migration within two quarters of going live. The honest caveat: the first quarter felt slower and more expensive while the platform and the governance habits were being built. The inflection came once the third and fourth domains onboarded with zero platform-team effort.
When to use it
Use a Data Mesh when your bottleneck is organisational: many distinct business domains, a central data team that is structurally unable to keep up, repeated duplication of “the same” entity (five versions of “customer”), and a clear separation between people who understand the data and the people currently responsible for it. The signal is that your problem is ownership and lead time, not technology. It also presupposes a real platform-engineering capability — someone has to build and run the paved road.
Do not use it when you have one or two domains and a central warehouse that is keeping up — you would be paying the full overhead (platform team, governance forum, N isolated stacks, per-product DR) to solve a problem you don’t have. It is also the wrong move if leadership won’t actually devolve ownership: a “mesh” where domains are named but the central team still does all the work is the worst of both worlds — monolith effort with distributed-system complexity. And if your organisation lacks the engineering maturity to run IaC, CI/CD, and managed identities, build that muscle first; a mesh on shaky platform foundations just multiplies the chaos.
Anti-patterns to avoid. The distributed monolith — domains write into each other’s storage or copy products on consume, recreating the tangle with extra steps; the fix is query-in-place plus the per-domain storage boundary. Governance-by-wiki — writing standards in a Confluence page and hoping; if a rule isn’t a Purview rule, an Azure Policy, or a CI gate, it doesn’t exist. The hidden central team — the platform team quietly writing domain pipelines “to help,” which re-creates the bottleneck; their KPI must be lead time, with zero domain pipelines authored. Mesh-washing the lake — renaming lake folders “data products” without owners, contracts, SLAs, or catalog registration; a product without a contract and an owner is just a folder.
Alternatives. If you have organisational scale but want less operational overhead, a Data Lakehouse (a single governed lake — Fabric/OneLake or Databricks Unity Catalog — with central-but-federated governance and clear domain zones rather than separate subscriptions) gives you much of the discoverability and governance with one platform to run; it is the right step before a full mesh for most companies, and many never need to go further. If your need is primarily a curated, well-modelled serving layer for BI rather than peer-to-peer domain exchange, a classic central data warehouse / dimensional model is simpler and entirely appropriate — Data Mesh would be over-engineering. The decision rule: choose the mesh only when the cost of central ownership (lead time, bottleneck, duplication) visibly exceeds the cost of distributed ownership (platform team, governance, N stacks). Below that line, a lakehouse or a warehouse will serve you better and cost you less.