GCP Enterprise Architecture: Data Mesh

A central data team is the most over-subscribed resource in almost every enterprise. The teams that create the data — Orders, Marketing, Logistics, Risk — throw it over a wall into a shared warehouse, and one small platform group becomes the single thread through which every model, every join, and every “can you add this column” request must pass. The backlog grows, the platform team learns each domain’s semantics badly and slowly, and producers stop caring about quality because quality became somebody else’s job. A data mesh is the organizational and architectural answer: the domain that produces the data also owns it as a product, publishes it through a shared self-serve platform, and lets other domains discover and consume it under one governance regime — without a central choke point.

On Google Cloud the mesh has an unusually clean mapping, because Google ships a service whose entire purpose is to be the domain abstraction layer: Dataplex. A Dataplex lake is a domain; zones inside it separate raw from curated; the assets it manages are BigQuery datasets and Cloud Storage buckets that physically live in the domain’s own project. Dataplex Universal Catalog (the unification of Data Catalog and Dataplex) is the discovery, metadata, and policy-tag plane for the whole organization. Analytics Hub is the governed exchange where a domain publishes a data product and other domains subscribe to it — a read-only, zero-copy link rather than an export. This article is a reusable reference for standing that up, from a three-domain company on a few terabytes to a fifty-domain regulated enterprise on petabytes. What changes with scale is the number of lakes and the strictness of the policy tags, not the shape.

The business scenario

Picture a mid-market online retailer that has grown into something larger and is now drowning in its own success. Four years ago a two-person platform team built a single BigQuery project, acme-analytics-prod, and offered to ingest everyone’s data. For the first handful of pipelines it was wonderful. Today it is the bottleneck the whole company complains about:

The platform team carries an 11-week backlog of “please onboard this dataset” and “please add this field” requests. They do not understand Logistics’ carrier-SLA semantics or Risk’s chargeback model well enough to model the tables correctly, so they get them subtly wrong and the rework loop never closes.
Producers have no incentive to keep schemas clean or document anything, because once the data lands in acme-analytics-prod its quality is “the analytics team’s problem.”
Access is all-or-nothing. Past ~50 tables, per-dataset IAM became unmanageable, so almost everyone who needs anything has bigquery.dataViewer on the whole project. Finance tables with customer PII and unhashed card BINs are one query away from any analyst in the building.
The single project is a cost and blast-radius problem: one runaway SELECT * on a 40 TB table, or one mis-scheduled hourly transform, hits everybody’s slots, and nobody can attribute the spend to a team because it is all one billing line.
Nobody can find anything. There is no catalog; tribal knowledge lives in a Confluence page that was last accurate in 2023.

This is the moment a data mesh pays for itself. The four mesh principles — domain ownership, data as a product, self-serve platform, and federated computational governance — map almost one-to-one onto Google Cloud primitives. The objective is not “more technology.” It is to decentralize the production of data while centralizing the governance of access and the discoverability of products. Critically, this is not a big-bang rewrite. The same pattern that serves three domains scales to fifty by adding projects and Dataplex lakes, not by re-architecting.

What “good” looks like at the end:

Each business domain owns its own GCP project, its own BigQuery datasets and Cloud Storage buckets, and its own pipelines. The domain registers that storage into a Dataplex lake it administers, and publishes curated, documented data products (well-described tables/views with an owner, an SLA, a freshness guarantee, and a data-quality scorecard).
A central governance project holds the organization-wide policy-tag taxonomy, Universal Catalog tag templates, and the org policies — and audits access. It does not own anyone’s data.
Consumers search one catalog across all domains, find a product, and consume it through an Analytics Hub subscription (zero-copy, read-only) or a governed BigQuery grant. Column-level security via policy tags and row-level security via row-access policies are enforced at query time. No copying, no full-project access, no 11-week tickets.

Architecture overview

The end-to-end shape is a mesh of self-governing domains over a shared catalog and exchange. Storage, compute, and pipelines live in the domain projects (the nodes); the catalog, the policy taxonomy, and the publish/subscribe exchange are the shared fabric that connects them.

GCP data mesh on Dataplex, BigQuery and Analytics Hub: decentralized domain projects (Dataplex lakes with raw GCS and curated BigQuery zones) over a shared governance project, Universal Catalog and a zero-copy Analytics Hub exchange, with the cross-domain consumer query path numbered 1 to 7

Resource hierarchy — the nodes and the fabric. Under the GCP Organization, you create a folder layout that mirrors the mesh: a domains/ folder holds one project per domain (acme-orders, acme-marketing, acme-logistics, acme-risk, …); a platform/ folder holds the shared governance/catalog project and the self-serve tooling project (Composer, Terraform CI, the data-product template). Organization policies and IAM provide guardrails you cannot opt out of; project boundaries give you per-domain isolation, blast-radius control, and clean cost attribution by default — every domain is its own billing slice.

Producer (domain) path — how raw data becomes a product. Inside a domain project, source data lands in a domain-owned Cloud Storage bucket or streams into BigQuery (via Dataflow, Datastream CDC, Pub/Sub + BigQuery subscriptions, or BigLake over object storage). The domain creates a Dataplex lake (orders-domain) and inside it two or more zones: a raw zone (Cloud Storage, schema-on-read) and a curated zone (BigQuery datasets and/or BigLake tables, schema-validated). It attaches its buckets and datasets to those zones as assets. Dataplex then runs automatic discovery over those assets — crawling files and tables, inferring schema and partitions, and registering everything into the Universal Catalog without a human writing DDL. The domain team curates the product: it writes a transform (BigQuery SQL, a dbt model, or a Dataform pipeline) that produces a clean, documented curated table, attaches Universal Catalog tag templates (owner, SLA, freshness, classification), tags sensitive columns with policy tags, and wires a Dataplex data-quality scan so the product ships with a measured, published quality score. At this point the curated table is a candidate data product.

Governance path — federated, computational, central where it must be. The shared governance project owns exactly two things that must be global: the policy-tag taxonomy (a BigQuery taxonomy of policy tags like pii.email, pii.card_bin, confidential.financial, each mapped to a fine-grained reader role) and the catalog tag-template definitions (the metadata contract every product must fill in — owner, domain, SLA, freshness, sensitivity). Domains apply those tags to their own resources; governance defines them once for everyone. This is the “federated computational governance” pillar made concrete: policy is authored centrally as code, enforced automatically by BigQuery and Dataplex at query time, and the data-quality and lineage signals roll up so a platform owner can see, across all domains, which products are healthy, which contain PII, and where any column flows downstream — without owning a single byte.

Consumer path — discover, subscribe, query in place. A consumer (say a Marketing analyst) opens Dataplex search / the Universal Catalog and searches across every domain’s published products at once — filtered by tag template, classification, or business glossary term. They find orders.curated.fact_order and read its product page: owner, SLA, freshness, quality score, schema, and lineage. To consume it they either (a) subscribe via Analytics Hub — the Orders domain has published the product into a data exchange as a listing; the analyst’s project subscribes and receives a linked dataset, a read-only, zero-copy pointer that always reflects the source, billed to the consumer’s project for compute — or (b) receive a direct BigQuery IAM grant for tighter internal sharing. Either way, when they run the query, column-level access is enforced by the policy tags (the analyst lacking pii.email reader simply does not see that column; the query does not fail, the column is masked or omitted), and row-level security is enforced by any row-access policies on the table. The bytes are never copied — the consumer’s BigQuery slots read the producer’s storage through the governed link.

So the request/data path for a single cross-domain query is: analyst → Dataplex search (find product) → Analytics Hub subscription (linked dataset) → BigQuery query in the consumer project → policy-tag check (column ACL) + row-access policy (row filter) → producer’s BigQuery storage read in place → governed result back. Metadata and discovery are centralized in the catalog; policy is centralized as code; ownership, storage, pipelines, and compute are decentralized to the domains; and bytes flow point-to-point, zero-copy, from producer storage to consumer compute — governed end to end.

Component breakdown

Component	What it does	Why it’s there	Key configuration choices
Organization + folders + projects	Resource hierarchy and policy inheritance	Project-per-domain gives hard isolation, blast-radius control, and per-domain cost attribution for free	Folders: `domains/`, `platform/`. One project per domain; one shared governance project; one self-serve tooling project
Organization policies	Org-wide guardrails	Governance domains cannot disable — block public buckets, restrict regions/data residency, enforce CMEK	`storage.publicAccessPrevention`, `gcp.resourceLocations`, `constraints/bigquery.disableBQOmniAWSConnections` as needed, domain-restricted sharing
Dataplex lake	The domain abstraction — a logical container of a domain’s data assets	This is the mesh node; one lake = one domain, administered by that domain	One lake per domain; lake-level IAM delegated to the domain’s data-product team; metastore attached if Spark/Hive interop is needed
Dataplex zones	Sub-divide a lake into raw vs. curated tiers	Separates schema-on-read landing data from schema-validated, product-grade data	Raw zone → Cloud Storage assets; Curated zone → BigQuery + BigLake assets with schema enforcement
Dataplex assets	Attach a specific bucket or BigQuery dataset to a zone	Brings existing storage under domain governance without moving it	Reference existing buckets/datasets in the domain project; set discovery schedule per asset
Dataplex auto-discovery	Crawls assets, infers schema/partitions, registers entries	Eliminates hand-written DDL and keeps the catalog current as data lands	Enable per zone; schedule (e.g. hourly for hot landing zones); CSV/JSON/Parquet/Avro inference options
BigQuery datasets & tables/views	The physical, queryable data products	The serving surface of the mesh; authorized views are the classic “product port”	Curated datasets per product; authorized views/authorized datasets to expose a product without granting base tables
BigLake tables	BigQuery-governed tables over Cloud Storage / external object data	Lets object-storage data be a first-class, policy-tagged product, not a second-class export	BigLake connection per domain; fine-grained security so even external data honors policy tags
Dataplex Universal Catalog	Org-wide search, technical + business metadata, tag templates	One place to find any product across all domains; the metadata contract	Tag templates for owner/SLA/freshness/classification; business glossary terms; search facets
Policy tags (taxonomy)	Column-level access control taxonomy in BigQuery	Grant on a classification (`pii.email`) once, not on hundreds of columns — this is what makes column security scale	Define taxonomy in the governance project; map each tag to a fine-grained reader role; enforce on columns at query time
Row-access policies	Row-level filters on a table	Same table, different rows per consumer (e.g. region, tenant) without copies	Defined by the producing domain on its tables; expressed as SQL predicates bound to groups
Analytics Hub	Publish/subscribe exchange for data products	The governed, zero-copy port for cross-domain and even cross-org sharing	Data exchanges per trust boundary; listings = published products; linked datasets = read-only subscriptions billed to consumer
Dataplex data quality & profiling	Scheduled DQ rules + profiling scans on assets	Ships the product with a measured, published quality score (the “as a product” SLA)	Auto-recommended + custom rules; publish results to the catalog; gate promotion raw→curated on pass
Data lineage (Dataplex)	Automatic column/table lineage across BigQuery & beyond	Impact analysis and trust — see where a product’s columns flow before you change them	Auto-captured for BigQuery/Dataflow/Dataform; surfaced on the product page in the catalog
Dataform / Dataflow / Datastream / Pub/Sub	The pipelines that build products	Decentralized — each domain owns and runs its own ELT/ETL/CDC	Dataform for in-warehouse SQL ELT; Datastream for CDC from operational DBs; Pub/Sub→BigQuery for streaming
Cloud Composer (platform)	Optional orchestration / golden-path templates	The self-serve plane — a paved road so domains onboard in days, not weeks	Shared in the platform project, or per-domain; ships a Terraform “data-product” module

Implementation guidance

The whole mesh is infrastructure-as-code, and the most important IaC decision is the module boundary: ship a single, opinionated “domain” Terraform module so a new domain is a 30-line call, not a research project. That module is the self-serve platform pillar expressed as code.

The domain module (Terraform). Google provides first-class google_dataplex_* and google_bigquery_* resources; a domain module composes them:

google_project (or a reference to an existing one) under the domains/ folder, with the domain’s billing label.
google_dataplex_lake for the domain, plus google_dataplex_zone (one RAW, one CURATED) and google_dataplex_asset resources attaching the domain’s google_storage_bucket and google_bigquery_dataset.
Discovery (discovery_spec) enabled on each zone with a schedule.
A google_bigquery_dataset per curated product, plus the authorized view wiring (google_bigquery_dataset_iam_* and view-on-base-table grants) so the product port is exposed without base-table access.
google_dataplex_datascan resources for data quality and data profiling, pointed at the curated tables, with rules and a schedule.
Standard IAM bindings that delegate lake admin and dataset owner to the domain’s group, while the org-level taxonomy stays in the governance project.

The governance module (Terraform). Lives in the platform/governance project and is owned by the central team:

google_data_catalog_taxonomy + google_data_catalog_policy_tag to define the policy-tag taxonomy once (pii.email, pii.card_bin, confidential.financial, …). Domains reference these tags by ID when tagging columns via policy_tags on a google_bigquery_table schema.
google_data_catalog_tag_template resources for the metadata contract (owner, SLA hours, freshness minutes, classification, on-call) that every product must populate.
google_bigquery_analytics_hub_data_exchange for each trust boundary; domains create google_bigquery_analytics_hub_listing resources to publish into it.
Fine-grained reader roles bound to the right groups so policy-tag enforcement actually gates columns.

Because Dataplex and the catalog are regional/multi-regional, pin every domain’s lake, datasets, and buckets to the same location (e.g. EU or us-central1) the org policy mandates — Analytics Hub linked datasets and BigQuery queries cannot cross region boundaries silently, and data-residency rules depend on it.

Networking and identity wiring. A data mesh is mostly an identity problem, not a network one — the heavy lifting is IAM, not VPC:

Identity. Sync your IdP (e.g. Entra ID / Google Workspace) groups via Cloud Identity, and grant groups, never users. Each domain has a grp-<domain>-data-owners (lake admin, dataset owner) and the platform team has grp-platform-governance (taxonomy and exchange admin). Policy-tag reader roles are bound to consumer groups like grp-marketing-analysts. Workload Identity Federation lets domain CI/CD (GitHub Actions, GitLab) deploy without long-lived service-account keys; pipelines run as dedicated service accounts per domain with least privilege.
Network. Lock down data egress with VPC Service Controls: put all the data projects in a service perimeter so BigQuery and Cloud Storage data cannot be exfiltrated to a project outside the org, even with valid credentials. Use Private Google Access / Private Service Connect so analytics traffic to BigQuery and GCS stays on Google’s backbone and never traverses the public internet. Analytics Hub sharing within the perimeter is allowed; cross-perimeter listings are an explicit, audited bridge.
Encryption. CMEK (Cloud KMS) on every dataset and bucket, with keys in a domain (or central security) key project, enforced by org policy.

Promotion and contracts. The golden path is: data lands in the raw zone → a Dataform/SQL transform builds the curated table → a Dataplex DQ scan runs → on pass, the table is tagged with the metadata contract, policy tags are applied to sensitive columns, and an Analytics Hub listing is created (or refreshed) by CI. A failed DQ scan blocks promotion and pages the domain’s on-call. The data contract is literally the tag template plus the DQ ruleset, both versioned in Git.

Enterprise considerations

Security & Zero Trust. The mesh defaults to Zero Trust because nothing is granted at the project level for data — access is per-product, per-column, per-row, and bound to groups. The enforcement layers stack: VPC Service Controls stop exfiltration at the perimeter; IAM on Analytics Hub listings controls who can subscribe; policy tags mask or omit sensitive columns at query time for anyone lacking the fine-grained reader role (the query succeeds without the column rather than erroring, which keeps analysts productive); row-access policies filter rows per group; CMEK owns the encryption boundary; and dynamic data masking can show email as a hash to one role and cleartext to another on the same column. Every access is logged — see observability.

Cost optimization. Project-per-domain is the FinOps win: each domain is a billing slice, so the “whose query cost ₹3 lakh this month” question answers itself. On top of that: use BigQuery editions with reservations and autoscaling (assign a baseline + autoscale slots per domain so one domain’s runaway job cannot starve another), or on-demand with per-project/per-user bytes-billed quotas as a hard cap for smaller domains. Storage is cheap and shared-by-reference: Analytics Hub linked datasets mean a product consumed by ten domains is stored once and never duplicated — the classic mesh anti-pattern of “everyone exports a copy” is structurally impossible here. Partition and cluster every large product table, set partition expiration on raw zones, and lifecycle raw Cloud Storage to coldline. Tag every resource with domain, cost-center, and environment for showback/chargeback in BigQuery billing export.

Scalability. The pattern scales by addition. Going from 4 to 40 domains means 36 more module calls — more lakes, more projects, more listings — not a re-architecture, because there is no central pipeline to overload. BigQuery itself scales to petabytes and to thousands of concurrent queries; the catalog search scales across the whole org; Analytics Hub fans a single product out to many subscribers at zero marginal storage. The thing that must scale organizationally is the federated governance council that owns the shared taxonomy — keep the policy-tag and tag-template set small and stable.

Reliability & DR (RTO/RPO). BigQuery storage is durable and replicated within a location, and multi-region datasets (US, EU) give cross-regional resilience automatically. For stricter needs, BigQuery managed disaster recovery / cross-region dataset replication provides a standby with a defined failover; table snapshots and time travel (default 7 days, configurable to 2–7) cover accidental deletes and bad transforms with near-zero RPO for recent state. Cloud Storage uses dual-region or multi-region buckets for the raw zone. Practical targets for a mid-size deployment: RPO ≈ minutes (continuous replication + time travel) and RTO ≈ 1 hour (failover to the replica region and re-point Analytics Hub listings). Because pipelines are per-domain and IaC-defined, recovering a single domain is independent of the others — a failure blast-radius bounded to one lake, not the whole platform.

Observability. Three layers, all native: (1) Cloud Audit Logs + BigQuery INFORMATION_SCHEMA views give you who queried what, which columns/policy tags were touched, bytes billed, and slot utilization per domain — the governance audit trail. (2) Dataplex data-quality and profiling scans publish a live scorecard per product, and Dataplex lineage shows column-level flow for impact analysis. (3) Cloud Monitoring dashboards and alerts on slot contention, DQ-scan failures, and freshness SLA breaches, routed to each domain’s on-call. The product page in the Universal Catalog ties these together so a consumer sees the SLA, the latest quality score, and the lineage before they subscribe.

Governance. This is the centralized pillar, and it is computational, not manual: policy tags and tag templates are authored once as code in the governance project and enforced automatically by BigQuery/Dataplex; promotion is gated on DQ scans in CI; data residency and CMEK are enforced by org policy; and a lightweight federated governance council (platform + one rep per domain) owns the shared taxonomy and the data-contract standard. The principle is standards centralized, execution decentralized — governance defines the rules and audits compliance; domains own and run everything else.

Reference enterprise example

NorthBank Retail Group is a fictional ₹9,000-crore omnichannel retailer (≈ 4,000 stores, online, and a co-branded card) running on Google Cloud. They had the exact pain in the scenario: one BigQuery project, an 11-week onboarding backlog, and a co-branded-card PII exposure that nearly failed a PCI audit. They adopted the mesh over two quarters.

Topology. Five domains, one lake each, under a domains/ folder, all pinned to the asia-south1-anchored ASIA multi-region for data-residency:

Domain (project / lake)	Headline data products	Sensitive columns (policy tags)	Monthly BigQuery spend
Orders (`nb-orders`)	`fact_order`, `fact_returns`	`customer_email` (`pii.email`)	₹6.2 L
Marketing (`nb-marketing`)	`dim_customer_360`, `campaign_response`	`email`, `phone` (`pii.*`)	₹4.1 L
Logistics (`nb-logistics`)	`shipment_sla`, `carrier_perf`	— (none)	₹2.3 L
Card / Risk (`nb-card-risk`)	`txn_authz`, `chargeback_cases`	`card_bin` (`pii.card_bin`), `pan_hash` (`confidential.financial`)	₹7.8 L
Merchandising (`nb-merch`)	`sku_margin`, `assortment_plan`	—	₹3.0 L

Shared nb-platform-gov holds the policy-tag taxonomy (pii.email, pii.phone, pii.card_bin, confidential.financial), the tag templates (owner, SLA-hours, freshness-min, classification, on-call), and two Analytics Hub exchanges: internal-exchange (all five domains) and partner-exchange (a tightly-scoped exchange that publishes only shipment_sla to their 3PL logistics partner’s separate GCP org).

Key decisions. (1) Card/Risk publishes via Analytics Hub, never via direct grant — txn_authz is listed with card_bin and pan_hash carrying policy tags; only grp-fraud-analysts holds the pii.card_bin and confidential.financial reader roles, so a Marketing analyst subscribing to the same product sees every column except those two, masked at query time. This is what flipped the PCI finding from fail to pass. (2) Marketing’s dim_customer_360 consumes Orders and Card data by subscription, not export — it joins three domains’ products as linked datasets, so there is exactly one physical fact_order, governed once, and Marketing’s queries are billed to nb-marketing. (3) Logistics’ shipment_sla is dual-published — internally and, via the partner exchange, to the 3PL across an org boundary, zero-copy, with an audited cross-perimeter bridge. (4) Promotion is gated on Dataplex DQ — fact_order ships with a 99.2% completeness and 100% uniqueness scorecard on the product page; a failed scan blocks the listing refresh and pages Orders’ on-call.

Reservations & cost. Each domain gets a small BigQuery Enterprise reservation with autoscaling (Card/Risk baseline 500 slots, others 100–200, autoscaling to a per-domain ceiling), so Card/Risk’s nightly fraud scoring can never starve Marketing’s dashboards. Total platform spend landed at ≈ ₹23.4 L/month — lower than the old single-project bill, because the zero-copy subscriptions eliminated five redundant nightly export pipelines and their storage, and per-domain quotas killed the runaway-SELECT * incidents.

Outcome. Onboarding a new data product dropped from 11 weeks to under 4 days (a domain calls the Terraform module, lands data, the DQ scan passes, CI publishes the listing). Cross-domain discovery went from “ask in Slack” to a single Universal Catalog search. The PCI audit passed with the card columns provably masked-by-default. And the central platform team — still two people — stopped writing other teams’ pipelines and instead owned the paved road: the Terraform module, the taxonomy, the exchanges, and the governance council. Decentralized production, centralized governance, exactly as intended.

When to use it

Use a GCP data mesh when you have multiple data-producing domains with distinct semantics, a central data team that has become a delivery bottleneck, and a real need for fine-grained, classification-driven access (PII/PCI/financial) across organizational boundaries. The sweet spot starts around three or more domains with genuinely different data and ownership, and the value compounds as you grow — the pattern is explicitly designed so that scaling is additive.

Trade-offs. The mesh moves cost and complexity to the producers: every domain now owns pipelines, quality scans, and an on-call rotation it did not have before. That is the point — accountability follows ownership — but it demands domains that are willing and able to own data as a product. It also requires a real federated governance function; without someone owning the shared taxonomy and the data-contract standard, you get fifty inconsistent catalogs and the mesh degrades into a swamp.

Anti-patterns to avoid. (1) “Mesh-washing” a central lake — slapping Dataplex lakes onto datasets that one central team still builds and owns is not a mesh; it is a relabeled monolith with extra YAML. The ownership has to actually move. (2) Export-and-copy instead of subscribe — if domains email each other CSVs or schedule cross-project exports, you have rebuilt the four-divergent-copies problem; force consumption through Analytics Hub linked datasets or authorized views so there is one physical copy. (3) Per-column IAM instead of policy tags — grant on classifications, not on hundreds of individual columns, or column security will not scale past the first few products. (4) A sprawling taxonomy — keep policy tags and tag templates small and stable; a 200-tag ontology nobody understands is worse than ten that everyone does. (5) Mesh for one domain — if you have a single analytics team on a few terabytes, you do not need a mesh; you need a clean lakehouse (one governed copy in BigQuery + BigLake, one team), which is dramatically simpler.

Alternatives. For a single team / single domain, use the GCP lakehouse pattern (BigQuery + BigLake + Dataplex governance, one queryable copy) — same building blocks, none of the federation overhead. For a lift-and-shift warehouse with light governance needs, a well-organized BigQuery project with authorized views and policy tags may be enough without the full mesh apparatus. If you are multi-cloud and the data physically lives in S3 or Azure, BigQuery Omni / BigLake over external stores can bring those into the same catalog and policy model so the mesh spans clouds. And if your organization is not ready to make domains own their data, do not start with the mesh — fix the ownership culture first, because the architecture cannot manufacture accountability the org is unwilling to take on.

GCP Enterprise Architecture: Data Mesh

The business scenario

Architecture overview

Component breakdown

Implementation guidance

Enterprise considerations

Reference enterprise example

When to use it

Written by Vinod

Comments

Keep Reading

Data Contracts and Schema Registry for Reliable Pipelines

Data Quality and Observability Architecture

Enterprise Data Catalog, Lineage and Governance