Data AWS

AWS Enterprise Architecture: Data Mesh

A centralized data lake works right up until it doesn’t. One team owns the pipelines, every other team files a ticket, and the backlog grows faster than the data. A data mesh flips the ownership model: the teams that produce the data also own it as a product, publish it through a shared governance layer, and let other domains discover and consume it without a central bottleneck. On AWS the load-bearing primitives for this are AWS Lake Formation (fine-grained, cross-account permissions), the AWS Glue Data Catalog (the technical metadata backbone), and per-domain AWS accounts wired together under AWS Organizations. This article is a reusable reference for standing that up — from a three-domain startup to a fifty-domain enterprise.

The business scenario

Picture a mid-market retailer, growing fast, that has accumulated the classic “data gravity” problem. Sales, Marketing, Supply Chain, and Finance each generate operational data in their own systems. Eighteen months ago a small platform team built a central data lake in a single AWS account and offered to ingest everyone’s data. It worked beautifully for the first three pipelines. Now it’s a chokepoint:

This is the moment a data mesh pays for itself. The four principles — domain ownership, data as a product, self-serve platform, and federated computational governance — map almost one-to-one onto AWS building blocks. The goal isn’t “more technology”; it’s to decentralize the production of data while centralizing the governance of access. Crucially, this is not a big-bang rewrite: the same pattern that serves a 3-domain company scales to 50 domains by adding accounts, not by re-architecting.

What “good” looks like at the end:

Architecture overview

The end-to-end shape is a hub-and-spoke catalog with in-place, cross-account data sharing. Storage and compute live in the spokes (domains); governance and the source-of-truth catalog live in the hub.

AWS data mesh reference architecture: producer domain accounts register S3 data products in the Glue Data Catalog and apply LF-Tags, share their catalog to a central governance account via AWS RAM and Lake Formation, and consumer analytics accounts query the data in place with Athena and Redshift Spectrum through resource links under Lake Formation-vended, column- and row-filtered credentials.

Accounts (the spokes and the hub). Under a single AWS Organization, you create one OU per data concern. A Governance OU holds the central governance account. A Domains OU holds one producer account per domain (Sales, Marketing, Supply Chain, Finance, …). Optionally a Consumers OU holds analytics/BI accounts for teams that consume but don’t produce. AWS Organizations + Service Control Policies (SCPs) provide the guardrails; AWS RAM (Resource Access Manager) is the wire that Lake Formation uses to share catalog resources across these account boundaries.

Producer (domain) path — how data becomes a product. Inside a domain account, source data lands in a domain-owned S3 bucket (e.g. s3://acme-sales-dataproducts/). An AWS Glue crawler or an explicit Glue ETL/CREATE TABLE job registers the schema into that account’s Glue Data Catalog, and the underlying S3 location is registered as a Lake Formation data lake location so that Lake Formation — not raw S3 IAM — mediates access. The domain team curates the table: partitions, a schema contract, and LF-Tags (Lake Formation tag-based access control attributes) such as domain=sales, sensitivity=pii, layer=curated. At this point the table is a candidate data product.

Governance path — the hub takes the source of truth. The recommended pattern (AWS calls it the centralized catalog / central governance account model) is that the producer shares its database/tables to the central governance account via Lake Formation cross-account grants. The central account becomes the authoritative catalog: it owns the LF-Tag taxonomy, holds the resource links, and is where all consumer-facing grants are issued. This gives you one place to define “who can see PII,” one place to audit, and one tag ontology for the whole company — federated governance, decentralized ownership.

Consumer path — discover, request, query in place. A consumer (say, a Marketing analyst in the analytics account) browses available products in a data catalog/portal — Amazon DataZone (now folded into the next-generation SageMaker Catalog / Amazon SageMaker Unified Studio) is the AWS-native option, or a lightweight internal portal backed by Glue Catalog APIs. They submit an access request. Governance approves it by issuing a Lake Formation grant — ideally an LF-Tag policy (“grant SELECT on all tables where domain=sales AND sensitivity=public to the Marketing analytics role”) rather than a per-table grant. AWS RAM propagates the share; the consumer account creates a resource link to the shared database, and the analyst runs Athena (or Redshift Spectrum, or EMR/Spark) directly against the producer’s S3 data. Lake Formation enforces column-level, row-level, and cell-level filters at query time, and the data is never copied — the analyst’s compute reads the producer’s bucket through the Lake Formation credential vending path.

So the request/data path for a single query is: analyst → Athena in consumer account → resource link → Lake Formation (central grant + LF-Tag policy + data-filter) → vended temporary credentials → producer’s registered S3 location → filtered result back to Athena. Metadata flows hub-and-spoke; bytes flow point-to-point from producer storage to consumer compute, governed end to end.

Component breakdown

Component What it does Why it’s here Key configuration choices
AWS Organizations + OUs Account hierarchy and policy inheritance Hard account boundaries give per-domain isolation, blast-radius control, and clean cost attribution OUs: Governance, Domains, Consumers, Security/Log-archive. Enable trusted access for Lake Formation, RAM, and CloudTrail org trail
Service Control Policies (SCPs) Org-wide guardrails Prevent domains from disabling encryption, leaving the org, or creating public S3 — governance you can’t opt out of Deny s3:PutBucketPublicAccessBlock removal, deny disabling default encryption, restrict regions, protect Lake Formation settings
Domain S3 buckets Physical storage of each domain’s data products Producers own their bytes; storage lives where the domain lives Bucket-per-domain (or per-layer: raw/curated/product). SSE-KMS with a domain CMK; register the bucket/prefix as a Lake Formation location
AWS Glue Data Catalog Technical metadata (databases, tables, schemas, partitions) The lingua franca every engine (Athena, Redshift, EMR, Spark) reads; the thing that gets shared across accounts One catalog per account; per-database settings. Use Glue 4.0+ crawlers or explicit DDL; enable partition indexing for large tables
AWS Lake Formation Fine-grained permissions + cross-account sharing Replaces unmanageable per-table IAM with column/row/cell-level grants and tag policies; vends scoped credentials Switch databases to Lake Formation permissions (remove IAMAllowedPrincipals). Define LF-Tags. Use Version 3+ cross-account grants. Set up data filters for row/column security
LF-Tags (TBAC) Attribute taxonomy on catalog resources Grant on attributes (sensitivity, domain, layer) instead of on hundreds of individual tables — this is what makes the mesh scale Centralize tag definitions in the governance account; delegate tag assignment to domains for their own resources
AWS RAM Shares catalog resources across accounts The transport Lake Formation uses for cross-account grants Enable sharing within the org; accept shares (or auto-accept for trusted org); creates the cross-account principal plumbing
Resource links Account-local pointer to a shared database/table Lets consumers query a remote catalog object as if it were local Create in the consumer account after a share is accepted; point Athena/Redshift at the link
Athena / Redshift Spectrum / EMR Query and compute engines in consumer accounts In-place query — compute is decentralized and billed to the consumer Athena workgroups per team with result-location + cost controls; Redshift via Spectrum or data sharing; EMR/Spark with Lake Formation integration enabled
Amazon DataZone / SageMaker Catalog Business catalog, data portal, subscription workflow Human-facing discovery, glossaries, and a request/approve flow on top of the technical catalog Map DataZone domains/projects to AWS accounts; let subscriptions drive the underlying Lake Formation grants
CloudTrail + Lake Formation access logs + CloudWatch Audit and observability Prove who accessed what, across every account, from one place Org-level CloudTrail to a central log-archive account; Lake Formation audit events; per-domain cost & query dashboards

A few of these deserve emphasis. LF-Tags are the single most important scaling decision. Without them, every new consumer means a fresh round of per-table grants and your governance team becomes the new bottleneck — you’ve just moved the ticket queue. With them, you grant once against an attribute (SELECT where layer=curated AND sensitivity=public) and every current and future table that carries those tags is automatically in scope. Data filters (row-level expressions and column projections, including cell-level via combining both) are how a single shared table serves both a Finance user who may see salaries and a regional manager who may see only their region’s rows.

Implementation guidance

Bootstrapping the org and accounts. Use AWS Control Tower to lay down the landing zone (OUs, guardrails, centralized logging, an account factory). Provision domain accounts through Account Factory for Terraform (AFT) so every new domain comes pre-baked with the same baseline. The data-mesh-specific wiring is best expressed as Terraform and applied per account from a CI/CD pipeline (CodePipeline or GitHub Actions assuming an OIDC role per account):

Networking. The catalog/RAM control plane is account-to-account over AWS’s backbone — no VPC needed for the sharing. For the data plane, keep S3 and analytics traffic private: use S3 Gateway VPC Endpoints in each consumer VPC, Interface (PrivateLink) Endpoints for Glue, Lake Formation, Athena, and KMS, and avoid routing analytics traffic over the public internet. If consumers use Redshift or EMR in private subnets, this keeps the entire query path inside your network perimeter. Cross-account KMS access must be granted in the key policy of each domain’s CMK so consumer roles can decrypt the data they’re authorized to read.

Identity wiring. Standardize on IAM Identity Center (SSO) for human access, mapping enterprise groups (e.g. marketing-analysts) to permission sets that assume the right roles in consumer accounts. The role that actually queries data is registered as a Lake Formation principal and is what grants target. For machine/pipeline access inside domains, use account-scoped IAM roles assumed via OIDC from CI. The chain to internalize: Identity Center group → permission set → consumer-account role → Lake Formation grant (via LF-Tag policy) → data filter → vended S3+KMS credentials. Lake Formation does the last-mile authorization; IAM only gets the principal to the door.

Producer onboarding flow (the self-serve part). A domain team should be able to publish a product without a governance ticket: (1) drop curated data in their registered bucket, (2) run the crawler / apply the table contract, (3) assign LF-Tags they’re delegated to manage, (4) share to the central governance account (a templated Terraform module), and (5) register the product in DataZone. Governance only intervenes to approve cross-domain consumption, and even that can be policy-driven for low-sensitivity tiers.

Enterprise considerations

Security & Zero Trust. The mesh is a Zero-Trust data architecture by construction: no principal has standing access to a bucket; every read is an explicit, attribute-based grant evaluated at query time, with temporary vended credentials rather than long-lived bucket policies. Enforce least privilege with LF-Tag policies scoped to the minimum sensitivity tier, column-level security to hide PII columns from analysts who don’t need them, and row-level filters for tenant/region isolation. SCPs make the non-negotiables (encryption, public-access-block, region pinning) un-bypassable. Always remove IAMAllowedPrincipals — leaving it on silently bypasses Lake Formation and is the single most common misconfiguration. Encrypt with per-domain KMS CMKs so a domain can cryptographically revoke access, and audit decrypt usage via CloudTrail.

Cost optimization. Decentralization is itself a cost lever: each domain’s storage and each consumer’s compute hit their own bill, so showback/chargeback is automatic and teams feel their Athena scans. Concretely: store products in Parquet, partitioned and compacted, so Athena/Spectrum scan less; use Glue partition indexes to cut partition-filtering cost; set Athena per-query and per-workgroup data-scanned limits; lifecycle raw data to S3 Intelligent-Tiering / Glacier; and prefer in-place query over copy to avoid duplicating petabytes across accounts. Lake Formation and RAM themselves carry no per-request fee — you pay for Glue, S3, KMS, and the query engines.

Scalability. This architecture scales by adding accounts and tags, not by re-architecting. Going from 5 to 50 domains is 45 more applications of the same Terraform baseline. The LF-Tag taxonomy means grant complexity grows with the number of attributes (a handful) rather than the number of tables (thousands). Watch the real limits: Glue/Lake Formation have account- and region-level quotas on databases, tables, partitions, and concurrent grants — design partitioning to stay well under partition limits, and federate very large domains across multiple accounts if needed.

Reliability & DR (RTO/RPO). S3 gives 11-nines durability per region; for regional resilience, enable S3 Cross-Region Replication on product buckets and replicate the Glue Data Catalog (export/replicate databases and tables, or rebuild via crawlers from replicated data) and the Lake Formation grants (they’re Terraform — re-apply in the DR region). A practical posture: RPO ≈ 15 min for product data via CRR, RTO ≈ 1–2 hours to re-point catalog and re-issue grants in the secondary region, since compute (Athena/EMR) is stateless and stands up quickly. The catalog and grant definitions living in version-controlled IaC is what makes a fast RTO realistic — your “DR plan” is largely terraform apply in another region.

Observability. Centralize an organization CloudTrail in a log-archive account so every cross-account share and data access is captured in one place. Use Lake Formation’s access logging to answer “who queried this PII table last quarter.” Per-domain CloudWatch dashboards track pipeline health and freshness against the product’s SLA; per-consumer Athena dashboards track bytes scanned and cost. Surface data product health (freshness, completeness, last-updated) in the DataZone portal so consumers trust what they subscribe to.

Governance. This is federated computational governance: standards (tag taxonomy, encryption, naming, PII classification) are defined centrally and enforced as code (SCPs, Lake Formation policies, IaC modules), while domains retain autonomy over modeling and publishing. The governance account is the policy authority and the audit point — not a data owner. Define a lightweight data contract per product (schema, SLA, owner, sensitivity) and make breaking-schema-changes a versioned, reviewed event rather than a silent crawler update.

Reference enterprise example

NimbusCart, a mid-market online retailer (~1,200 employees, ₹1,800 crore revenue), ran the exact centralized-lake bottleneck described above: one platform team, a 9-week backlog, and full-bucket access in a shared analytics account. They migrated to a data mesh over two quarters.

What they built. Under a Control Tower landing zone, they created a Governance OU (one central governance account) and a Domains OU with four producer accounts: Sales, Marketing, Supply Chain, and Finance. A separate Consumers OU held a Analytics account (BI + ad-hoc) and a Data Science account.

The numbers.

A decision they got right. Early on, an engineer proposed per-table grants “to keep it simple for the first three tables.” Governance overruled it and mandated LF-Tags from day one. When the company later onboarded its 5th and 6th domains and tripled its consumers, no new grants were needed for public/internal data — the new tables simply inherited the existing tag policies. That single choice is the difference between a mesh that scales and a mesh that becomes the new bottleneck.

The outcome. Six months in, the central platform team of four had repositioned from “pipeline operators” to “platform + governance,” each domain owned its products end-to-end, PII exposure went from “everyone in the analytics account” to “explicitly granted, audited, and mostly nobody,” and the 9-week backlog was gone because the backlog’s owner — the central team — was no longer in the critical path.

When to use it

Use a data mesh on AWS when you have multiple independent data-producing domains with their own teams, a central lake that has become a bottleneck, and a real need for fine-grained, auditable cross-team data sharing (especially with PII/regulatory pressure). It shines when domains genuinely understand their data better than any central team could, and when organizational ownership can actually be moved — mesh is as much an org change as a tech change.

Trade-offs. You’re trading the simplicity of one account and one catalog for operational complexity across many accounts: more IAM, more Terraform, a tag taxonomy to govern, and RAM/Lake Formation mechanics to learn. There’s real upfront platform investment (Control Tower, AFT, the self-serve modules) before the first domain benefits. And it demands organizational maturity — domains must actually staff data ownership; if they won’t, you get the worst of both worlds (decentralized chaos with no central safety net).

Anti-patterns to avoid.

Alternatives. A single-account governed lake (Lake Formation + LF-Tags, no RAM) for organizations below the domain-complexity threshold. A lake house with Redshift data sharing when most consumers are warehouse users and producers are few. Amazon DataZone / SageMaker Unified Studio as the primary surface if you want the business-catalog and subscription experience to drive Lake Formation under the hood with less hand-rolled tooling. And for cross-cloud or open-format strategies, an Apache Iceberg + open catalog approach (S3 Tables / Glue Iceberg REST catalog) layered with the same Lake Formation governance — the mesh pattern holds; only the table format and catalog surface change.

AWSArchitectureEnterpriseReference Architecture
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading