Most “microservices on Kubernetes” diagrams are a lie of omission. They show a tidy box of services talking to each other and call it an architecture. The real work — the part that separates a platform you can run with a small team from a 3 a.m. pager that never stops — is everything around the services: who is allowed to call whom and over what (the mesh), what is actually allowed to run (the supply chain), which Google APIs each pod can touch (identity), and how a fleet of clusters across regions stays identical without a human ever running kubectl apply against production (GitOps). This article is a complete, reusable GCP reference architecture for production microservices built on GKE Autopilot, Anthos Service Mesh, Artifact Registry, Workload Identity, and Config Sync. It is sized to start with one cluster and a dozen services, and to grow to a multi-region fleet running hundreds — without changing shape.
The business scenario
Picture an engineering organisation that has outgrown a monolith. Maybe it is a 40-engineer fintech that started with one Rails app and now has a dozen teams each wanting to ship on their own cadence; maybe it is a 600-engineer retailer breaking a fifteen-year-old Java monolith into domains. The company name and stack differ, but the forces pushing them onto this exact architecture are the same three every time.
The first is deployment independence colliding with operational reality. The whole point of microservices is that the payments team can ship without waiting on the catalog team. But the moment you have twenty services, the platform questions arrive all at once: how do they discover each other, how is traffic between them encrypted, how do you roll out version N+1 to 5% of traffic, how do you stop a bad deploy from taking down checkout? Teams that answer these per-service, in application code, end up with twenty subtly different retry policies and zero consistent security. The org needs these to be platform properties, not per-team folklore.
The second is the cluster-ops tax. Self-managed Kubernetes is a part-time job that quietly becomes three full-time jobs: node pool sizing, OS patching, autoscaler tuning, security CVE chases, control-plane upgrades, and the perennial “why is this node NotReady.” A 40-person company cannot spare two SREs to babysit node pools, and a 600-person company would rather spend those SREs on reliability of the product, not the substrate. The org wants Kubernetes’ API and ecosystem without operating the machines underneath it.
The third is the audit-and-supply-chain squeeze. Once you are processing payments or holding customer data, someone will ask: prove that only code that passed CI, was scanned, and was signed is running in production. Prove that service A cannot read service B’s database. Prove that prod and the DR region are configured identically. “Trust me, we use Kubernetes” is not an answer a SOC 2 or PCI auditor accepts, and it is not an answer a board accepts after the first incident.
This architecture answers all three. GKE Autopilot removes the cluster-ops tax by running and securing the nodes for you and billing per pod resource request. Anthos Service Mesh makes mTLS, traffic-splitting, and resilience platform properties applied by sidecar, not by application code. Artifact Registry plus Binary Authorization turns “only trusted code runs” from a slogan into an admission-control gate. Workload Identity gives every service its own scoped Google identity with no static keys. And Config Sync makes Git the single source of truth so every cluster in the fleet is, provably, what the repo says it is. The same blueprint runs a single us-central1 cluster for a startup and a three-region fleet for an enterprise; only the cluster count and quotas change.
Architecture overview
Follow one request and one deploy, and the design explains itself.
The request path. A user hits the public hostname, which resolves to a Global External Application Load Balancer fronted by Cloud Armor (WAF + L7 DDoS). The load balancer does not target VMs; it targets the cluster directly through container-native load balancing — the GKE Gateway controller programs the LB’s backend service with standalone network endpoint groups (NEGs) whose endpoints are the actual pod IPs of the ingress gateway. Traffic therefore goes edge → pod with no extra NodePort hop. The ingress lands on the Anthos Service Mesh ingress gateway (an Envoy pod) running inside the cluster.
From the gateway inward, every hop is mesh-managed. Each application pod runs the workload container plus an Envoy sidecar injected automatically; pod-to-pod traffic is upgraded to mutual TLS with certificates issued by the mesh’s CA and rotated on a short clock — the application code sends plaintext HTTP to localhost and the sidecars do the encryption. The mesh’s control plane (managed ASM, running on Google’s infrastructure, not your nodes) distributes routing, retry, timeout, and authorization policy to every sidecar. So a request from frontend to checkout to payments is encrypted, authenticated by SPIFFE identity, authorised by L7 policy, retried on transient failure, and traced end-to-end — without any of those three services containing a line of code for it.
When a service needs a Google API — checkout writing to Cloud SQL, inventory publishing to Pub/Sub, a worker reading from a GCS bucket — it does not use a downloaded service-account key. Its Kubernetes service account is bound via Workload Identity Federation for GKE to a Google service account (or, in the newer model, granted IAM directly on the KSA principal). The pod’s metadata server hands out short-lived tokens scoped to exactly that identity. payments can reach the payments database; catalog cannot, because its identity has no such grant.
The deploy path. A developer merges to main. Cloud Build (or any CI) builds the image, runs tests and a vulnerability scan, pushes to Artifact Registry, and — critically — produces a cryptographic attestation that this digest passed the pipeline. A separate Git repo of Kubernetes manifests (the “config” repo) is updated with the new image digest. Config Sync (part of Config Management) is continuously watching that repo; it pulls the change and reconciles the cluster to match, so the rollout is a Git commit, not a kubectl command. As the new pod tries to start, Binary Authorization intercepts the admission request and checks: is this exact image digest signed by the required attestors? No attestation, no admission — a hand-pushed or tampered image is rejected by the cluster itself.
So the end-to-end picture is two intersecting loops. The runtime loop: user → Cloud Armor → Global LB → (container-native NEG) → ASM ingress gateway → mTLS mesh of Autopilot pods → Google services via Workload Identity, with telemetry flowing to Cloud Operations. The delivery loop: commit → Cloud Build → Artifact Registry (+ attestation) → config repo → Config Sync → cluster, gated at the door by Binary Authorization.
The diagram in words: at the top, users hitting a single anycast VIP guarded by Cloud Armor. Below it, a Global LB whose arrow lands inside a cluster boundary on an ingress-gateway pod. Inside the boundary, a lattice of service pods each drawn with a small sidecar square, every connecting line labelled “mTLS.” Each pod has a thin dotted line out to a Google service (Cloud SQL, Pub/Sub, GCS, Secret Manager) labelled “Workload Identity — no keys.” Off to the left, a CI/CD column: Git → Cloud Build → Artifact Registry, with a lock icon (“attestation”) feeding a gate icon (“Binary Authorization”) sitting on the cluster’s admission boundary. A second Git repo (“config”) feeds a Config Sync agent inside the cluster. The whole cluster box is duplicated faintly to the right to signify a second region in the fleet, both fed by the same two Git repos.
Component breakdown
Each component earns its place by removing a specific failure mode rather than adding a feature.
| Component | Role in this architecture | Key configuration choices |
|---|---|---|
| GKE Autopilot | The managed substrate: runs and secures nodes, schedules pods, bills per pod request. Eliminates node-pool ops. | Autopilot mode (no node pools to manage). Regional clusters (control plane + nodes spread across 3 zones) for HA. Release channel = Regular. Private cluster: private nodes, authorized networks on the control-plane endpoint. Set per-pod CPU/memory requests carefully — they are the billing and scheduling unit. |
| Anthos Service Mesh (managed) | East-west security and traffic control: automatic mTLS, L7 authz, traffic-splitting, retries/timeouts, golden-signal telemetry — all by sidecar. | Managed control plane (Google-hosted, auto-upgraded). Strict PeerAuthentication (mTLS STRICT) mesh-wide. Default-deny AuthorizationPolicy, then explicit allow per service-pair. VirtualService + DestinationSubset for canary weights. Sidecar injection via namespace label. |
| Artifact Registry | The single trusted store for container images and Helm/OCI artifacts; the source of truth for “what can run.” | One regional repo per environment (or per team) co-located with the cluster region to cut pull latency/egress. CMEK encryption. On-push vulnerability scanning (Artifact Analysis). Cleanup policies to expire untagged digests. Reader IAM bound to the cluster’s node identity only. |
| Binary Authorization | Admission-time gate that lets only signed, policy-compliant image digests run. Turns supply-chain policy into enforcement. | Cluster policy: requireAttestationsBy the CI attestor (and optionally a vuln-scan attestor). evaluationMode = REQUIRE_ATTESTATION, enforcementMode = ENFORCED_BLOCK_AND_AUDIT_LOG. Break-glass annotation for emergencies (audited). Continuous validation to flag drift after admission. |
| Workload Identity | Per-pod, keyless access to Google APIs via short-lived tokens mapped from KSA → IAM. Kills static service-account keys. | Workload Identity enabled on the cluster. Each service’s KSA bound to a least-privilege GSA (or IAM granted directly to the KSA principal). One identity per service, never a shared node SA. iam.disableServiceAccountKeyCreation org policy on, so keys cannot be minted. |
| Config Sync (Config Management) | GitOps reconciliation: every cluster continuously converges to a Git repo. Makes the fleet provably identical. | Unstructured or hierarchical repo. Sync from the config repo’s main. RootSync for platform-wide policy (namespaces, NetworkPolicy, quotas), RepoSync per team namespace for app manifests. Drift is auto-reverted. Pair with Policy Controller (OPA Gatekeeper) for guardrails. |
| Global External ALB + Cloud Armor | The north-south edge: anycast entry, TLS, WAF, container-native routing straight to gateway pods. | GKE Gateway API resource provisions the LB. Standalone NEGs = container-native LB (edge → pod). Google-managed cert. Cloud Armor preconfigured OWASP rules + rate-based bans + Adaptive Protection. |
| Cloud SQL / Spanner / Memorystore / Pub/Sub | The stateful tier the (stateless) services depend on; reached over the VPC with Workload Identity. | Private Service Connect / private IP only — no public DB endpoints. Cloud SQL Auth Proxy or direct private IP. Pub/Sub for async fan-out between services. Memorystore for shared caches/sessions. |
| Cloud Operations (Ops suite) | Observability plane: logs, metrics, traces, SLOs — fed natively by Autopilot and ASM. | ASM emits the four golden signals per service automatically. Managed Service for Prometheus for app metrics. Cloud Trace context propagated by sidecars. SLOs with burn-rate alerts on the services that matter. |
Three of these choices deserve emphasis because they are where teams most often go wrong.
Autopilot’s billing unit is the pod request, so right-sizing requests is cost control. On a Standard cluster you pay for nodes whether pods use them or not, and slack hides in node headroom. On Autopilot you pay for the sum of pod CPU/memory requests (rounded to Autopilot’s allowed shapes). An over-requested replicas: 10 deployment asking for 2 vCPU when it uses 0.3 is now a line item you can see and fix. This is a feature, but it changes the discipline: VPA in recommendation mode and a habit of setting requests close to real usage are not optional.
Mesh mTLS and authorization are two separate switches, and only turning on the first is a common, dangerous half-measure. PeerAuthentication: STRICT encrypts and authenticates traffic, but by itself it still lets any meshed service call any other meshed service — encrypted. The Zero-Trust property you actually want comes from a default-deny AuthorizationPolicy plus explicit allows (“frontend may call checkout on POST /cart; nothing may call payments except checkout”). Encryption without authorization is a locked door with no lock on the inner rooms.
Binary Authorization is only as strong as where the attestation is created. If your CI signs the image before the vulnerability scan and tests pass, you have a signature that proves nothing. The attestor must sign the digest at the end of a pipeline that has already gated on scan and test results — and the signing key must live somewhere CI can use but humans cannot exfiltrate (Cloud KMS with tight IAM). Done right, a developer literally cannot docker push something into prod, because the cluster will refuse it at admission.
Implementation guidance
The whole platform should be code: Terraform for the Google-side infrastructure, and Git-backed Kubernetes manifests reconciled by Config Sync for everything inside the cluster. Resist the urge to gcloud/kubectl your way to a running system — the entire value proposition here is reproducibility.
Project and IaC layout. Use a small set of projects under a folder: a platform project for shared infra (Artifact Registry, KMS, the config repo’s deploy identity), and one project per environment (dev, staging, prod) — or per region in the fleet. Terraform provisions the cluster and the Google primitives; it should not manage in-cluster app objects (that is Config Sync’s job). Keep state in a GCS backend with versioning and per-environment state isolation.
A representative Terraform skeleton for the cluster (HCL):
resource "google_container_cluster" "platform" {
name = "platform-prod"
location = "us-central1" # regional = control plane in 3 zones
enable_autopilot = true # Autopilot
release_channel { channel = "REGULAR" }
# Keyless access to Google APIs from pods
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
# Private cluster: nodes have no public IPs
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = false # control plane reachable from authorized nets
master_ipv4_cidr_block = "172.16.0.0/28"
}
master_authorized_networks_config {
cidr_blocks { cidr_block = var.admin_cidr display_name = "ci-and-bastion" }
}
# Only signed images may run
binary_authorization { evaluation_mode = "PROJECT_SINGLETON_POLICY_ENFORCE" }
}
Managed ASM and Config Management are then enabled as fleet features on the cluster (via the gke_hub feature/membership resources or the gcloud container fleet equivalents), so the mesh and GitOps agents are installed and auto-upgraded by Google rather than pinned by you.
Networking and identity wiring. Use VPC-native (alias IP) clusters so pods get real VPC IPs — this is what makes container-native NEGs and direct DB connectivity work. Databases sit behind Private Service Connect or private IP; there are no public database endpoints anywhere in the design. North-south traffic uses the Gateway API (gke-l7-global-external-managed GatewayClass) to provision the Global LB; attach Cloud Armor via a GCPBackendPolicy. The Gateway routes to the ASM ingress gateway service, and from there HTTPRoute/VirtualService rules carry traffic into the mesh.
For identity, the chain is: KSA → IAM. Annotate (or bind) each service’s Kubernetes service account to the matching Google identity, then grant that identity only the roles it needs:
# config repo — payments service account
apiVersion: v1
kind: ServiceAccount
metadata:
name: payments
namespace: payments
annotations:
iam.gke.io/gcp-service-account: payments@PROJECT.iam.gserviceaccount.com
# Terraform — bind the KSA to the GSA, then grant least privilege
resource "google_service_account_iam_member" "payments_wi" {
service_account_id = google_service_account.payments.name
role = "roles/iam.workloadIdentityUser"
member = "serviceAccount:${var.project_id}.svc.id.goog[payments/payments]"
}
resource "google_project_iam_member" "payments_sql" {
project = var.project_id
role = "roles/cloudsql.client" # only payments gets this
member = "serviceAccount:${google_service_account.payments.email}"
}
Turn on the constraints/iam.disableServiceAccountKeyCreation org policy so the old escape hatch — downloading a JSON key — is closed entirely.
In-cluster config (the GitOps repo). The config repo holds namespaces, ResourceQuota, NetworkPolicy, the ASM PeerAuthentication/AuthorizationPolicy set, and each team’s Deployments/Services/HTTPRoutes. A RootSync reconciles the platform-wide objects; per-team RepoSync objects let teams own their namespace’s manifests without cluster-admin. The strict-mTLS and default-deny posture lives here, not in any pipeline:
# Mesh-wide: encrypt everything, then deny by default
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata: { name: default, namespace: istio-system }
spec: { mtls: { mode: STRICT } }
---
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata: { name: deny-all, namespace: payments }
spec: {} # empty spec on a namespace = deny all by default
---
# Explicit allow: only checkout may POST to payments
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata: { name: allow-checkout, namespace: payments }
spec:
selector: { matchLabels: { app: payments } }
action: ALLOW
rules:
- from: [{ source: { principals: ["cluster.local/ns/checkout/sa/checkout"] }}]
to: [{ operation: { methods: ["POST"], paths: ["/charge"] }}]
A canary is then just a weighted VirtualService committed to Git and picked up by Config Sync — 95/5, watch the SLO dashboards, then 50/50, then 100 — with no privileged access required to shift traffic.
Enterprise considerations
Security and Zero Trust. This architecture is Zero Trust by construction, not by add-on. Every east-west call is mutually authenticated (mTLS) and explicitly authorised (default-deny + allowlist) by the mesh; every workload has its own least-privilege Google identity with no static keys; only attested images run; and the network is private (private nodes, private DB endpoints, authorized control-plane networks). Add Policy Controller (managed OPA Gatekeeper, reconciled by Config Sync) to enforce guardrails like “no :latest tags,” “all images from our Artifact Registry,” “every pod sets resource requests,” and Pod Security Admission at the restricted level. Secrets come from Secret Manager via the CSI driver or Workload-Identity-scoped reads — never baked into images. The result is defence in depth where each layer (image → identity → network → mesh authz) is an independent gate.
Cost optimisation. Autopilot’s per-pod billing flips the cost model: there is no node slack to waste, but there is no slack to hide in either, so over-requesting becomes visible spend. Run VPA in recommendation mode and tune requests to real usage; use Horizontal Pod Autoscaler on golden-signal or custom metrics so replica count tracks demand. Use Spot Pods for fault-tolerant batch/async work (large discount, can be evicted). Co-locate Artifact Registry with the cluster region to avoid cross-region image-pull egress. For an enterprise fleet, a single shared platform cluster per region with hard ResourceQuota per team namespace is dramatically cheaper than a cluster-per-team sprawl — and Autopilot makes that multi-tenancy safe because Google enforces node-level isolation. Commit to Committed Use Discounts once your steady-state request baseline is known.
Scalability. Three independent axes scale here. Pods scale via HPA. Services scale organisationally because the mesh and GitOps make adding a service a manifest change, not a platform project. Clusters/regions scale via the fleet: add a regional Autopilot cluster, register it to the same fleet, point Config Sync at the same repos, and it converges to an identical configuration — then add it as a backend to the global LB (or use Multi-Cluster Services to make cross-cluster service discovery transparent). The control planes (Autopilot’s, managed ASM’s) are Google’s problem to scale, not yours.
Reliability and DR (RTO/RPO). A single regional cluster already survives a zone failure with no action — control plane and pods span three zones. For region loss, run a second regional cluster in the fleet, fed by the same Git repos, so it is already running the identical configuration (RTO is bounded by how fast the global LB drains the failed region’s NEGs and your DB failover, typically minutes, not a rebuild). RPO is set by the data tier, never by Kubernetes: Cloud SQL cross-region replicas give an RPO of seconds; Spanner multi-region gives RPO = 0. The stateless mesh and the GitOps reconciliation mean the compute side of DR is “the cluster is already up and configured” — your DR drills become data-failover drills, which is exactly where the real risk lives.
Observability. ASM emits the four golden signals (latency, traffic, errors, saturation) per service automatically, so you get a service-topology and a per-edge latency/error view without instrumenting anything. Add Managed Service for Prometheus for app-specific metrics and Cloud Trace (sidecars propagate context) for distributed traces across the service graph. Define SLOs with multi-window burn-rate alerts on customer-facing services; alert on symptoms (SLO burn), not causes (CPU). Config Sync exposes its own sync status, so “is prod what the repo says” is a dashboard, not a guess.
Governance. Git is the audit log. Every production change — a new image digest, a policy, a quota, a traffic weight — is a reviewed, attributable commit, and Config Sync guarantees the cluster matches it (drift is auto-reverted). Binary Authorization’s enforcement decisions are written to Cloud Audit Logs, giving an auditor a literal record of every admission. Policy Controller makes organisational standards machine-enforced rather than wiki-documented. Together this is what turns “we use Kubernetes” into “here is the cryptographic and Git evidence that only reviewed, signed, scanned code runs, with least-privilege identity, in a configuration that matches the repo.”
Reference enterprise example
Meridian Pay is a (fictional) B2B payments platform: 90 engineers across nine product teams, processing card and ACH transactions for mid-market merchants. They began on a single Heroku-style monolith, hit deployment contention (every team blocked on one release train) and a PCI audit that the monolith’s flat access model could not satisfy. They adopted this architecture over a quarter.
What they built. Two regional GKE Autopilot clusters — us-central1 (primary) and us-east4 (DR) — in a prod project, registered to one fleet, plus a smaller staging cluster. Managed ASM across all three. Twenty-three services at launch (gateway-api, merchant, ledger, card-auth, ach, risk, notifications, reporting, and so on), each in its own namespace owned by the responsible team via a RepoSync. Artifact Registry in us-central1 with on-push scanning and CMEK; Cloud Build pipelines that scan, test, push, and attest with a Cloud KMS key no human can use. Binary Authorization in ENFORCED_BLOCK_AND_AUDIT_LOG. Cloud Spanner (multi-region nam3) for the ledger — chosen specifically for RPO = 0 on money — and Cloud SQL for less critical service databases. Workload Identity for all twenty-three services; the iam.disableServiceAccountKeyCreation org policy on from day one.
Decisions and numbers. Their steady-state pod requests summed to roughly 180 vCPU and 360 GB across prod — about 40% less than the node capacity their old self-managed cluster ran, once VPA recommendations stripped years of accreted over-requesting (the monolith era had taught everyone to “ask for plenty”). Spot Pods ran the reporting and reconciliation batch tier at a steep discount. The mesh’s default-deny authz turned the PCI “segmentation” requirement into a reviewable set of AuthorizationPolicy files: the auditor could read, in Git, that nothing but card-auth could call the ledger write path, and that notifications had zero access to cardholder data services. The single most valuable artifact in the audit was Binary Authorization’s audit log — direct evidence that every running digest was signed by the post-scan attestor.
The incident that proved it. Four months in, a us-central1 zone had a networking degradation. The regional cluster shed the affected zone’s pods and rescheduled them in the other two zones automatically; the global LB drained the unhealthy NEGs. Customer-facing impact was a brief latency blip, no outage, no data loss — and no human ran a command. Separately, a contractor once tried to hot-patch a fix by pushing an unsigned image straight to the registry and applying it; the cluster refused admission, Binary Authorization logged the block, and the fix went through the pipeline an hour later as it should have. The platform team stayed at four people while the service count grew past forty in the following year, because adding a service was a pull request, not a project.
Outcome. Deployment contention vanished — teams shipped on their own cadence behind canary weights they controlled via Git. The PCI audit passed on the strength of machine-enforced segmentation and supply-chain evidence rather than narrative. And the org spent its SREs on product reliability and SLOs, not on patching nodes and chasing autoscaler bugs.
When to use it
Use this architecture when you have genuine microservices (multiple teams shipping independently), you want Kubernetes’ API and ecosystem without operating nodes, and you have real security/compliance pressure (payments, health, regulated data) that demands provable east-west isolation and a trusted supply chain. It shines precisely when the number of services and teams is the thing growing — the mesh and GitOps make that growth cheap, where ad-hoc Kubernetes makes it exponentially expensive.
Trade-offs to accept. There is real conceptual weight: a team must understand the mesh (mTLS, authz, VirtualService), GitOps reconciliation, Workload Identity, and Binary Authorization. Sidecars add a small per-request latency and per-pod memory cost (the managed data plane and ambient-style options narrow this, but it is non-zero). Autopilot trades flexibility for managed-ness — you cannot SSH to a node, run privileged DaemonSets freely, or pick arbitrary node shapes; workloads needing GPUs with exotic drivers, host-level access, or very large single pods may chafe. And the per-pod billing punishes sloppy requests, which is good discipline but a behaviour change.
Anti-patterns. Turning on mTLS but not authorization — encryption without a default-deny policy is not Zero Trust. Signing images before the scan/test gate — an attestation that proves nothing. Running Config Sync but still hand-applying kubectl to prod — you have reintroduced drift and destroyed the audit story. One cluster per team — multiplies cost and ops for no isolation benefit Autopilot doesn’t already give you via namespaces and quotas. A shared, over-privileged node service account instead of per-service Workload Identity — collapses your blast radius back to “any pod can do anything.”
Alternatives, honestly. If you have only a handful of stateless HTTP services and no mesh requirements, Cloud Run (see the companion GCP Global Web Application reference) is simpler and cheaper — no cluster to reason about at all; reach for GKE only when you need sidecars, gRPC streaming, stateful workloads, DaemonSets, or fine-grained mesh policy. If you want the same patterns across clouds and on-prem, the Anthos / GKE Enterprise fleet model extends this exact design to attached and bare-metal clusters. If you are firmly on GKE Standard for cost or node-control reasons, the mesh/registry/identity/GitOps layers of this article still apply unchanged — only the node-management story differs, and you take back the cluster-ops tax Autopilot was removing. The decision is rarely “Kubernetes vs not”; it is “how much of the platform do I want Google to run,” and this architecture is the answer for teams who want to run services, not machines.