AWS Enterprise Architecture: Data Mesh

A centralized data lake works right up until it doesn’t. One team owns the pipelines, every other team files a ticket, and the backlog grows faster than the data. A data mesh flips the ownership model: the teams that produce the data also own it as a product, publish it through a shared governance layer, and let other domains discover and consume it without a central bottleneck. On AWS the load-bearing primitives for this are AWS Lake Formation (fine-grained, cross-account permissions), the AWS Glue Data Catalog (the technical metadata backbone), and per-domain AWS accounts wired together under AWS Organizations. This article is a reusable reference for standing that up — from a three-domain startup to a fifty-domain enterprise.

The business scenario

Picture a mid-market retailer, growing fast, that has accumulated the classic “data gravity” problem. Sales, Marketing, Supply Chain, and Finance each generate operational data in their own systems. Eighteen months ago a small platform team built a central data lake in a single AWS account and offered to ingest everyone’s data. It worked beautifully for the first three pipelines. Now it’s a chokepoint:

The platform team has a 9-week backlog of “please add this dataset” requests, and they don’t understand the domain semantics well enough to model the data correctly — so they keep getting it subtly wrong.
Producers have no incentive to keep schemas clean because “data quality is the lake team’s problem now.”
Every consumer has read access to the entire bucket because per-table IAM policies became unmanageable past ~40 tables. Finance data, including columns with customer PII, is one s3:GetObject away from anyone in the analytics account.
The single account is now a blast-radius and a billing nightmare: nobody can tell whether Marketing’s ad-hoc Athena habit or Supply Chain’s nightly Spark job is responsible for the spend.

This is the moment a data mesh pays for itself. The four principles — domain ownership, data as a product, self-serve platform, and federated computational governance — map almost one-to-one onto AWS building blocks. The goal isn’t “more technology”; it’s to decentralize the production of data while centralizing the governance of access. Crucially, this is not a big-bang rewrite: the same pattern that serves a 3-domain company scales to 50 domains by adding accounts, not by re-architecting.

What “good” looks like at the end:

Each business domain owns its own AWS account, its own pipelines, and its own S3 storage. They publish curated data products (well-described tables with SLAs, an owner, and a contact).
A central governance account holds the authoritative Lake Formation catalog and the tag taxonomy. It grants and audits access; it does not own the data.
Consumers discover products in a catalog/portal, request access, and — once approved — query data in place via Athena/Redshift/EMR with column- and row-level controls enforced by Lake Formation. No copying, no full-bucket access, no 9-week tickets.

Architecture overview

The end-to-end shape is a hub-and-spoke catalog with in-place, cross-account data sharing. Storage and compute live in the spokes (domains); governance and the source-of-truth catalog live in the hub.

AWS data mesh reference architecture: producer domain accounts register S3 data products in the Glue Data Catalog and apply LF-Tags, share their catalog to a central governance account via AWS RAM and Lake Formation, and consumer analytics accounts query the data in place with Athena and Redshift Spectrum through resource links under Lake Formation-vended, column- and row-filtered credentials.

Accounts (the spokes and the hub). Under a single AWS Organization, you create one OU per data concern. A Governance OU holds the central governance account. A Domains OU holds one producer account per domain (Sales, Marketing, Supply Chain, Finance, …). Optionally a Consumers OU holds analytics/BI accounts for teams that consume but don’t produce. AWS Organizations + Service Control Policies (SCPs) provide the guardrails; AWS RAM (Resource Access Manager) is the wire that Lake Formation uses to share catalog resources across these account boundaries.

Producer (domain) path — how data becomes a product. Inside a domain account, source data lands in a domain-owned S3 bucket (e.g. s3://acme-sales-dataproducts/). An AWS Glue crawler or an explicit Glue ETL/CREATE TABLE job registers the schema into that account’s Glue Data Catalog, and the underlying S3 location is registered as a Lake Formation data lake location so that Lake Formation — not raw S3 IAM — mediates access. The domain team curates the table: partitions, a schema contract, and LF-Tags (Lake Formation tag-based access control attributes) such as domain=sales, sensitivity=pii, layer=curated. At this point the table is a candidate data product.

Governance path — the hub takes the source of truth. The recommended pattern (AWS calls it the centralized catalog / central governance account model) is that the producer shares its database/tables to the central governance account via Lake Formation cross-account grants. The central account becomes the authoritative catalog: it owns the LF-Tag taxonomy, holds the resource links, and is where all consumer-facing grants are issued. This gives you one place to define “who can see PII,” one place to audit, and one tag ontology for the whole company — federated governance, decentralized ownership.

Consumer path — discover, request, query in place. A consumer (say, a Marketing analyst in the analytics account) browses available products in a data catalog/portal — Amazon DataZone (now folded into the next-generation SageMaker Catalog / Amazon SageMaker Unified Studio) is the AWS-native option, or a lightweight internal portal backed by Glue Catalog APIs. They submit an access request. Governance approves it by issuing a Lake Formation grant — ideally an LF-Tag policy (“grant SELECT on all tables where domain=sales AND sensitivity=public to the Marketing analytics role”) rather than a per-table grant. AWS RAM propagates the share; the consumer account creates a resource link to the shared database, and the analyst runs Athena (or Redshift Spectrum, or EMR/Spark) directly against the producer’s S3 data. Lake Formation enforces column-level, row-level, and cell-level filters at query time, and the data is never copied — the analyst’s compute reads the producer’s bucket through the Lake Formation credential vending path.

So the request/data path for a single query is: analyst → Athena in consumer account → resource link → Lake Formation (central grant + LF-Tag policy + data-filter) → vended temporary credentials → producer’s registered S3 location → filtered result back to Athena. Metadata flows hub-and-spoke; bytes flow point-to-point from producer storage to consumer compute, governed end to end.

Component breakdown

Component	What it does	Why it’s here	Key configuration choices
AWS Organizations + OUs	Account hierarchy and policy inheritance	Hard account boundaries give per-domain isolation, blast-radius control, and clean cost attribution	OUs: Governance, Domains, Consumers, Security/Log-archive. Enable trusted access for Lake Formation, RAM, and CloudTrail org trail
Service Control Policies (SCPs)	Org-wide guardrails	Prevent domains from disabling encryption, leaving the org, or creating public S3 — governance you can’t opt out of	Deny `s3:PutBucketPublicAccessBlock` removal, deny disabling default encryption, restrict regions, protect Lake Formation settings
Domain S3 buckets	Physical storage of each domain’s data products	Producers own their bytes; storage lives where the domain lives	Bucket-per-domain (or per-layer: raw/curated/product). SSE-KMS with a domain CMK; register the bucket/prefix as a Lake Formation location
AWS Glue Data Catalog	Technical metadata (databases, tables, schemas, partitions)	The lingua franca every engine (Athena, Redshift, EMR, Spark) reads; the thing that gets shared across accounts	One catalog per account; per-database settings. Use Glue 4.0+ crawlers or explicit DDL; enable partition indexing for large tables
AWS Lake Formation	Fine-grained permissions + cross-account sharing	Replaces unmanageable per-table IAM with column/row/cell-level grants and tag policies; vends scoped credentials	Switch databases to Lake Formation permissions (remove `IAMAllowedPrincipals`). Define LF-Tags. Use Version 3+ cross-account grants. Set up data filters for row/column security
LF-Tags (TBAC)	Attribute taxonomy on catalog resources	Grant on attributes (`sensitivity`, `domain`, `layer`) instead of on hundreds of individual tables — this is what makes the mesh scale	Centralize tag definitions in the governance account; delegate tag assignment to domains for their own resources
AWS RAM	Shares catalog resources across accounts	The transport Lake Formation uses for cross-account grants	Enable sharing within the org; accept shares (or auto-accept for trusted org); creates the cross-account principal plumbing
Resource links	Account-local pointer to a shared database/table	Lets consumers query a remote catalog object as if it were local	Create in the consumer account after a share is accepted; point Athena/Redshift at the link
Athena / Redshift Spectrum / EMR	Query and compute engines in consumer accounts	In-place query — compute is decentralized and billed to the consumer	Athena workgroups per team with result-location + cost controls; Redshift via Spectrum or data sharing; EMR/Spark with Lake Formation integration enabled
Amazon DataZone / SageMaker Catalog	Business catalog, data portal, subscription workflow	Human-facing discovery, glossaries, and a request/approve flow on top of the technical catalog	Map DataZone domains/projects to AWS accounts; let subscriptions drive the underlying Lake Formation grants
CloudTrail + Lake Formation access logs + CloudWatch	Audit and observability	Prove who accessed what, across every account, from one place	Org-level CloudTrail to a central log-archive account; Lake Formation audit events; per-domain cost & query dashboards

A few of these deserve emphasis. LF-Tags are the single most important scaling decision. Without them, every new consumer means a fresh round of per-table grants and your governance team becomes the new bottleneck — you’ve just moved the ticket queue. With them, you grant once against an attribute (SELECT where layer=curated AND sensitivity=public) and every current and future table that carries those tags is automatically in scope. Data filters (row-level expressions and column projections, including cell-level via combining both) are how a single shared table serves both a Finance user who may see salaries and a regional manager who may see only their region’s rows.

Implementation guidance

Bootstrapping the org and accounts. Use AWS Control Tower to lay down the landing zone (OUs, guardrails, centralized logging, an account factory). Provision domain accounts through Account Factory for Terraform (AFT) so every new domain comes pre-baked with the same baseline. The data-mesh-specific wiring is best expressed as Terraform and applied per account from a CI/CD pipeline (CodePipeline or GitHub Actions assuming an OIDC role per account):

Lake Formation settings (aws_lakeformation_data_lake_settings): set the governance/admin principals, and critically remove IAMAllowedPrincipals as a default so that catalog access is governed by Lake Formation rather than legacy IAM-only mode.
Register storage with aws_lakeformation_resource pointing the S3 location at a Lake Formation service-linked or custom role that has bucket access (the credential-vending role).
LF-Tags: define the taxonomy centrally with aws_lakeformation_lf_tag (e.g. sensitivity = [public, internal, confidential, pii], domain = [sales, marketing, …], layer = [raw, curated, product]).
Grants: prefer aws_lakeformation_permissions with an lf_tag_policy block for tag-based grants, plus aws_lakeformation_lf_tag_expression where supported, over enumerating tables. Cross-account grants name the consumer account ID (or an Organization/OU as the principal for org-wide shares).
Glue: aws_glue_catalog_database, aws_glue_crawler (or explicit aws_glue_catalog_table for contract-first schemas), with crawler schedules and a dedicated Glue IAM role.

Networking. The catalog/RAM control plane is account-to-account over AWS’s backbone — no VPC needed for the sharing. For the data plane, keep S3 and analytics traffic private: use S3 Gateway VPC Endpoints in each consumer VPC, Interface (PrivateLink) Endpoints for Glue, Lake Formation, Athena, and KMS, and avoid routing analytics traffic over the public internet. If consumers use Redshift or EMR in private subnets, this keeps the entire query path inside your network perimeter. Cross-account KMS access must be granted in the key policy of each domain’s CMK so consumer roles can decrypt the data they’re authorized to read.

Identity wiring. Standardize on IAM Identity Center (SSO) for human access, mapping enterprise groups (e.g. marketing-analysts) to permission sets that assume the right roles in consumer accounts. The role that actually queries data is registered as a Lake Formation principal and is what grants target. For machine/pipeline access inside domains, use account-scoped IAM roles assumed via OIDC from CI. The chain to internalize: Identity Center group → permission set → consumer-account role → Lake Formation grant (via LF-Tag policy) → data filter → vended S3+KMS credentials. Lake Formation does the last-mile authorization; IAM only gets the principal to the door.

Producer onboarding flow (the self-serve part). A domain team should be able to publish a product without a governance ticket: (1) drop curated data in their registered bucket, (2) run the crawler / apply the table contract, (3) assign LF-Tags they’re delegated to manage, (4) share to the central governance account (a templated Terraform module), and (5) register the product in DataZone. Governance only intervenes to approve cross-domain consumption, and even that can be policy-driven for low-sensitivity tiers.

Enterprise considerations

Security & Zero Trust. The mesh is a Zero-Trust data architecture by construction: no principal has standing access to a bucket; every read is an explicit, attribute-based grant evaluated at query time, with temporary vended credentials rather than long-lived bucket policies. Enforce least privilege with LF-Tag policies scoped to the minimum sensitivity tier, column-level security to hide PII columns from analysts who don’t need them, and row-level filters for tenant/region isolation. SCPs make the non-negotiables (encryption, public-access-block, region pinning) un-bypassable. Always remove IAMAllowedPrincipals — leaving it on silently bypasses Lake Formation and is the single most common misconfiguration. Encrypt with per-domain KMS CMKs so a domain can cryptographically revoke access, and audit decrypt usage via CloudTrail.

Cost optimization. Decentralization is itself a cost lever: each domain’s storage and each consumer’s compute hit their own bill, so showback/chargeback is automatic and teams feel their Athena scans. Concretely: store products in Parquet, partitioned and compacted, so Athena/Spectrum scan less; use Glue partition indexes to cut partition-filtering cost; set Athena per-query and per-workgroup data-scanned limits; lifecycle raw data to S3 Intelligent-Tiering / Glacier; and prefer in-place query over copy to avoid duplicating petabytes across accounts. Lake Formation and RAM themselves carry no per-request fee — you pay for Glue, S3, KMS, and the query engines.

Scalability. This architecture scales by adding accounts and tags, not by re-architecting. Going from 5 to 50 domains is 45 more applications of the same Terraform baseline. The LF-Tag taxonomy means grant complexity grows with the number of attributes (a handful) rather than the number of tables (thousands). Watch the real limits: Glue/Lake Formation have account- and region-level quotas on databases, tables, partitions, and concurrent grants — design partitioning to stay well under partition limits, and federate very large domains across multiple accounts if needed.

Reliability & DR (RTO/RPO). S3 gives 11-nines durability per region; for regional resilience, enable S3 Cross-Region Replication on product buckets and replicate the Glue Data Catalog (export/replicate databases and tables, or rebuild via crawlers from replicated data) and the Lake Formation grants (they’re Terraform — re-apply in the DR region). A practical posture: RPO ≈ 15 min for product data via CRR, RTO ≈ 1–2 hours to re-point catalog and re-issue grants in the secondary region, since compute (Athena/EMR) is stateless and stands up quickly. The catalog and grant definitions living in version-controlled IaC is what makes a fast RTO realistic — your “DR plan” is largely terraform apply in another region.

Observability. Centralize an organization CloudTrail in a log-archive account so every cross-account share and data access is captured in one place. Use Lake Formation’s access logging to answer “who queried this PII table last quarter.” Per-domain CloudWatch dashboards track pipeline health and freshness against the product’s SLA; per-consumer Athena dashboards track bytes scanned and cost. Surface data product health (freshness, completeness, last-updated) in the DataZone portal so consumers trust what they subscribe to.

Governance. This is federated computational governance: standards (tag taxonomy, encryption, naming, PII classification) are defined centrally and enforced as code (SCPs, Lake Formation policies, IaC modules), while domains retain autonomy over modeling and publishing. The governance account is the policy authority and the audit point — not a data owner. Define a lightweight data contract per product (schema, SLA, owner, sensitivity) and make breaking-schema-changes a versioned, reviewed event rather than a silent crawler update.

Reference enterprise example

NimbusCart, a mid-market online retailer (~1,200 employees, ₹1,800 crore revenue), ran the exact centralized-lake bottleneck described above: one platform team, a 9-week backlog, and full-bucket access in a shared analytics account. They migrated to a data mesh over two quarters.

What they built. Under a Control Tower landing zone, they created a Governance OU (one central governance account) and a Domains OU with four producer accounts: Sales, Marketing, Supply Chain, and Finance. A separate Consumers OU held a Analytics account (BI + ad-hoc) and a Data Science account.

Finance registered s3://nimbus-finance-products/ as a Lake Formation location, published a gl_transactions table tagged domain=finance, layer=curated, sensitivity=confidential, with a column-level filter hiding employee_salary and a row-level filter so regional controllers see only their region_code.
Sales published orders and order_lines (sensitivity=internal) and a derived public product daily_sales_by_category (sensitivity=public).
All four domains shared to the central governance account, which owned the LF-Tag taxonomy (sensitivity ∈ {public, internal, confidential, pii}).
Governance issued exactly two broad LF-Tag grants instead of hundreds of per-table grants: Analytics-team role gets SELECT where sensitivity IN {public, internal} across all domains; Data Science role gets the same plus specific confidential Sales tables it requested through DataZone. PII columns were granted to nobody by default.

The numbers.

Producer onboarding for a new dataset dropped from ~3 weeks (ticket + central modeling) to under a day (self-serve: crawl, tag, share via the templated module).
The Marketing analyst’s Athena bill became visible and halved once products were Parquet+partitioned — typical ad-hoc queries went from scanning ~120 GB to ~14 GB thanks to partition pruning and column projection.
Cross-account sharing meant zero data duplication — the previous design had copied Sales data into the analytics account nightly (~4 TB/day of redundant transfer and storage); that disappeared.
A quarterly access audit that used to take two days of manual IAM spelunking became a single CloudTrail/Lake Formation query in the log-archive account.

A decision they got right. Early on, an engineer proposed per-table grants “to keep it simple for the first three tables.” Governance overruled it and mandated LF-Tags from day one. When the company later onboarded its 5th and 6th domains and tripled its consumers, no new grants were needed for public/internal data — the new tables simply inherited the existing tag policies. That single choice is the difference between a mesh that scales and a mesh that becomes the new bottleneck.

The outcome. Six months in, the central platform team of four had repositioned from “pipeline operators” to “platform + governance,” each domain owned its products end-to-end, PII exposure went from “everyone in the analytics account” to “explicitly granted, audited, and mostly nobody,” and the 9-week backlog was gone because the backlog’s owner — the central team — was no longer in the critical path.

When to use it

Use a data mesh on AWS when you have multiple independent data-producing domains with their own teams, a central lake that has become a bottleneck, and a real need for fine-grained, auditable cross-team data sharing (especially with PII/regulatory pressure). It shines when domains genuinely understand their data better than any central team could, and when organizational ownership can actually be moved — mesh is as much an org change as a tech change.

Trade-offs. You’re trading the simplicity of one account and one catalog for operational complexity across many accounts: more IAM, more Terraform, a tag taxonomy to govern, and RAM/Lake Formation mechanics to learn. There’s real upfront platform investment (Control Tower, AFT, the self-serve modules) before the first domain benefits. And it demands organizational maturity — domains must actually staff data ownership; if they won’t, you get the worst of both worlds (decentralized chaos with no central safety net).

Anti-patterns to avoid.

Mesh for a single team / small dataset. If one team produces almost all the data, a single governed lake (Lake Formation in one account, LF-Tags, no cross-account sharing) is simpler and sufficient. Don’t pay the multi-account tax for organizational scale you don’t have.
Leaving IAMAllowedPrincipals enabled — it silently bypasses every Lake Formation control you just built.
Per-table grants at scale — recreates the bottleneck under a new name; commit to LF-Tags early.
A “mesh” with no governance — decentralized ownership without federated computational governance is just data chaos with extra accounts.
Treating the central account as a data owner — it should govern and audit, not hoard data, or you’ve rebuilt the centralized lake.

Alternatives. A single-account governed lake (Lake Formation + LF-Tags, no RAM) for organizations below the domain-complexity threshold. A lake house with Redshift data sharing when most consumers are warehouse users and producers are few. Amazon DataZone / SageMaker Unified Studio as the primary surface if you want the business-catalog and subscription experience to drive Lake Formation under the hood with less hand-rolled tooling. And for cross-cloud or open-format strategies, an Apache Iceberg + open catalog approach (S3 Tables / Glue Iceberg REST catalog) layered with the same Lake Formation governance — the mesh pattern holds; only the table format and catalog surface change.

AWS Enterprise Architecture: Data Mesh

The business scenario

Architecture overview

Component breakdown

Implementation guidance

Enterprise considerations

Reference enterprise example

When to use it

Written by Vinod

Comments

Keep Reading

Data Contracts and Schema Registry for Reliable Pipelines

Data Quality and Observability Architecture

Enterprise Data Catalog, Lineage and Governance