Most “we run microservices on Kubernetes” stories are really “we run a distributed monolith on a cluster nobody is sure how to upgrade.” The pods are there, the YAML is there, but the cluster has one giant flat namespace, secrets are baked into images, traffic between services is plaintext, deploys happen by kubectl apply from a laptop, and the security team has quietly given up on auditing any of it. This reference architecture is about the other thing: a production-grade Azure Kubernetes Service (AKS) platform where dozens of microservices owned by many teams share a cluster safely — every pod has its own least-privilege Azure identity with no stored secrets, every service-to-service hop is mutually authenticated and encrypted, every image is signed and scanned before it can run, and every change to what’s deployed arrives through Git, not a human with cluster-admin. It scales down to a ten-service startup on one cluster and up to a regulated enterprise running hundreds of services across a fleet — the same component set serves both; what changes is the number of node pools and the strictness of the policy, not the diagram.
This article follows the format of the major architecture centers: the scenario, the end-to-end request and delivery flow, a component-by-component breakdown, concrete implementation and IaC wiring, the enterprise concerns (security, cost, reliability, observability, governance), a named worked example with real numbers, and an honest section on when not to build this.
The business scenario
Picture an engineering organization that has outgrown a single deployable. It might be a Series-B SaaS company with four squads, or a bank’s digital channel with forty. The shape of the pain is the same at both ends:
- Teams are blocked on each other’s release trains. A monolith means one deploy pipeline, one rollback blast radius, and a change-freeze that stops everyone. The business wants squads to ship independently, multiple times a day, without a cross-team change-approval meeting.
- The cluster has become a shared-fate liability. Everything runs in
default. One team’s runaway pod starves another team’s service; one compromised container can read every secret in the namespace; nobody can prove who can talk to what. The security and platform teams cannot sign off on putting customer data behind it. - Secrets and identity are a mess. Connection strings live in environment variables and
Secretobjects; service principals with client secrets are shared across services and rotated “annually, in theory.” An auditor asking “which workload accessed this database, and with what credential” gets a shrug. - Delivery is artisanal and unauditable. Deploys are
kubectlorhelm installrun by whoever is on call, against whatever image they built locally. There is no single source of truth for what should be running, so drift between environments is constant and the question “is prod actually what’s in Git?” has no answer. - Upgrades are terrifying. The cluster is a pet. Nobody wants to touch the Kubernetes version because the last upgrade broke ingress, so the fleet sits two versions behind and out of support.
The problem this architecture solves is precise: let many teams run many services on shared AKS infrastructure with hard security and tenancy boundaries, zero stored secrets, encrypted and authorized service-to-service traffic, a trusted software supply chain, and a fully auditable Git-driven delivery path — while keeping the cluster a disposable, continuously-upgradable cattle node rather than a pet. The non-goals matter too: this is not “Kubernetes for a single app” (that is over-engineering — use Container Apps or App Service), and it is not a multi-cluster service-mesh federation (a different, heavier article). It is the smallest coherent platform that makes a shared AKS cluster safe for real multi-team production.
Architecture overview
The organizing idea is a paved-road platform: the platform team owns one (or a small fleet of) hardened, private AKS clusters and a set of golden guardrails; application teams own namespaces and the Git repos that describe what runs in them. Three planes are kept deliberately separate — the traffic plane (how requests get in and move between services), the identity plane (how workloads prove who they are without secrets), and the delivery plane (how desired state becomes running state). Get those three right and the cluster becomes boring in the best way.
The request path, end to end, for external traffic hitting a service:
- A client resolves the application hostname to Azure Front Door (anycast edge, TLS termination, WAF with OWASP managed rules, bot and rate-limit policies, edge caching for static assets). For dynamic requests Front Door selects the cluster’s regional origin over a Private Link origin, so the cluster is never directly reachable from the internet.
- The request lands on the cluster’s ingress — the AKS-managed Application Gateway for Containers (AGC) or an Istio ingress gateway exposed via an internal Azure Load Balancer. Ingress is implemented through the Kubernetes Gateway API (
Gateway+HTTPRoute), not legacy Ingress, so routing is portable and expressive. - Inside the cluster the request enters the service mesh. Two production-supported choices on AKS: Istio (the AKS-managed Istio add-on, increasingly in ambient/sidecar-less mode) for rich L7 traffic management, or Cilium (Azure CNI powered by Cilium) with Cilium service mesh / Hubble for an eBPF-based data plane with identity-aware L3/L4 network policy at near-zero overhead. Either way, the first hop into a workload is mutual TLS — the calling identity is cryptographically verified, not trusted by IP.
- The target microservice pod runs in its team namespace, scheduled onto a user node pool appropriate to its workload class (general, memory-optimized, spot, or GPU). It has a Kubernetes ServiceAccount federated to an Azure Managed Identity via Workload Identity Federation — so when it needs to call Azure, it requests a token using the OIDC issuer projected into the pod, with no client secret anywhere.
- The service reads its configuration and secrets from Azure Key Vault through the Secrets Store CSI Driver (mounted as a tmpfs volume and/or synced to a
Secret), authenticating with that same workload identity over a private endpoint. It calls Azure data services — Azure SQL, Cosmos DB, Service Bus, Storage — using the workload identity’s Microsoft Entra token, so the database sees a named, least-privilege principal, not a shared connection string. - Service-to-service calls (service A → service B) stay inside the mesh: mTLS encrypts the hop, an AuthorizationPolicy (Istio) or CiliumNetworkPolicy decides whether A is allowed to call B at all, and the mesh records the call. East-west traffic that is not explicitly allowed is denied by default.
- Telemetry flows out continuously: Azure Monitor managed Prometheus scrapes metrics, Container Insights / Azure Monitor collects logs and the control-plane audit log, managed Grafana dashboards it, and the mesh emits distributed traces and a live service-dependency map.
The delivery path, end to end, for getting a change into the cluster:
- A developer merges to the app repo. Azure Pipelines / GitHub Actions builds the container, runs tests, pushes the image to Azure Container Registry (ACR), and the registry scans it (Microsoft Defender for Containers / Trivy) and signs it (Notation / Cosign).
- The pipeline does not deploy. Instead it opens a pull request against a GitOps config repo that bumps the image digest in the service’s manifests/Helm values.
- A GitOps controller running in the cluster — Argo CD or Flux (available as the AKS GitOps add-on,
microsoft.flux) — continuously reconciles cluster state to that repo. When the PR merges, the controller pulls the change and applies it. Git is the single source of truth; the cluster converges to it; drift is detected and (optionally) auto-corrected. - Before anything runs, an admission policy engine — Azure Policy for AKS (Gatekeeper) or Kyverno — validates the manifests: only images from the trusted ACR with a valid signature are admitted, every pod must set non-root and resource limits, host networking is forbidden, and so on. A non-conforming deploy is rejected at the API server, before scheduling.
The mental model: Front Door and the Gateway get traffic in; the mesh moves and authorizes it with mTLS; workload identity removes every secret from the picture; ACR + signing + admission control guarantees only trusted code runs; and GitOps makes the whole thing converge to Git, auditable commit by commit.
Component breakdown
| Component | Azure service / project | What it does | Key configuration choices |
|---|---|---|---|
| Cluster | Azure Kubernetes Service (AKS) | Managed Kubernetes control plane + nodes | Private cluster (no public API endpoint) or API-server VNet integration with authorized IP ranges; Azure CNI Overlay (or CNI powered by Cilium) for IP efficiency; auto-upgrade channel = stable with planned maintenance windows; Uptime SLA / Standard tier for the control plane; system node pool tainted CriticalAddonsOnly |
| Node pools | AKS user node pools + VMSS | Run workloads, segmented by class | Separate pools for general / memory / spot / GPU; cluster autoscaler per pool + KEDA for event-driven scale; ephemeral OS disks; Azure Linux (Mariner) nodes; taints/labels so workloads land on the right pool |
| Ingress / Gateway | App Gateway for Containers (AGC) or Istio ingress gateway | North-south entry, L7 routing, TLS | Gateway API (Gateway/HTTPRoute) over legacy Ingress; WAF at Front Door (and optionally AGC); internal LB so origin is private behind Front Door Private Link |
| Service mesh | AKS-managed Istio add-on or Cilium (Azure CNI Powered by Cilium) | mTLS, L7 traffic mgmt, authz, observability | Istio: prefer ambient mode (ztunnel + waypoints) to drop per-pod sidecar cost; PeerAuthentication: STRICT mTLS; AuthorizationPolicy default-deny. Cilium: eBPF dataplane, CiliumNetworkPolicy (identity-aware), Hubble for flow visibility; mutual auth via SPIFFE |
| Registry | Azure Container Registry (Premium) | Stores & secures images and Helm/OCI charts | Premium for private endpoints, geo-replication, content trust / Notation signing; quarantine pattern: image scanned before promotion; ACR Tasks for base-image patching; pull via workload/kubelet managed identity, not admin user (which is disabled) |
| Secrets | Azure Key Vault + Secrets Store CSI Driver | Source of truth for secrets/certs | Key Vault with RBAC authorization + private endpoint; CSI SecretProviderClass mounts secrets as files; rotation enabled; prefer passwordless (Entra tokens) over storing connection strings at all |
| Workload identity | Microsoft Entra Workload Identity Federation | Pods get Azure tokens with no secret | OIDC issuer enabled on AKS; federated credential binds ServiceAccount ↔ User-Assigned Managed Identity; annotate SA + label pod azure.workload.identity/use: "true"; one identity per service, scoped RBAC |
| Delivery (GitOps) | Argo CD or Flux (microsoft.flux AKS add-on) |
Reconciles cluster to a Git repo | App-of-apps / Flux Kustomizations; digest-pinned images (no :latest); progressive delivery via Argo Rollouts / Flagger (canary, blue-green); separate app repo vs config repo |
| Admission / policy | Azure Policy for AKS (Gatekeeper) or Kyverno | Enforces guardrails at the API server | Image-source + signature verification, non-root, read-only rootfs, required limits/requests, no host network, allowed registries; built-in Azure Policy initiative for AKS baseline |
| Edge | Azure Front Door + WAF | Global entry, TLS, WAF, caching | OWASP managed ruleset, bot manager, rate limiting; Private Link origin to the internal LB |
| Observability | Azure Monitor managed Prometheus, Container Insights, managed Grafana, Application Insights | Metrics, logs, traces, dashboards, audit | Managed Prometheus scrape configs; control-plane diagnostic logs (kube-audit) to Log Analytics; mesh traces + service map; Grafana dashboards as code |
A few choices deserve the “why,” because they are where teams most often go wrong.
Why a private cluster. A public AKS API endpoint is a standing internet-facing attack surface for the most powerful credential in your platform. A private cluster (or API-server VNet integration) means the control plane is reachable only from your network; CI reaches it through a self-hosted agent in the VNet or via the GitOps pull model (which needs no inbound access to the cluster at all — the controller reaches out to Git). Pair it with local accounts disabled and Entra + Azure RBAC for Kubernetes authorization, so cluster access is governed by Entra groups and Conditional Access, and cluster-admin is a break-glass PIM-elevated role, not a kubeconfig on a laptop.
Why ambient-mode Istio (or Cilium) instead of classic sidecars. The sidecar model puts an Envoy proxy in every pod — real CPU/memory tax per replica, and a coupling between app and proxy lifecycle that complicates upgrades. Istio ambient mode moves mTLS and L4 to a per-node ztunnel and only deploys an L7 waypoint proxy where you actually need L7 policy, cutting mesh overhead substantially. Cilium takes a different route entirely: mTLS-equivalent identity and policy enforced in the eBPF datapath in the kernel, no userspace proxy on the hot path at all. Choose Istio when you need rich L7 traffic shaping (header-based canaries, retries/timeouts, fault injection) mesh-wide; choose Cilium when you want high-throughput identity-aware L3/L4 security with minimal overhead and great flow visibility via Hubble. Both give you the non-negotiable: encrypted, authenticated, default-deny east-west traffic.
Why workload identity federation, emphatically. This is the single highest-leverage security decision in the architecture. The legacy alternatives — pod-identity, or a service principal with a client secret in a Secret — both end with a long-lived credential somewhere on the cluster that can be exfiltrated. Federation means the pod presents its projected Kubernetes service-account token (a short-lived OIDC JWT) to Entra and receives a short-lived Azure access token in exchange. There is no secret to steal, rotate, or leak. One managed identity per service, each with exactly the Azure RBAC it needs (this service reads this Key Vault and writes that Storage container — nothing else), gives you per-workload least privilege and a clean audit trail.
Implementation guidance
Provision in layers, each with its own IaC stack and state, so the platform team owns the cluster and app teams own their namespaces. Terraform is the common choice on Azure; Bicep is equally valid and avoids state management. The layering matters more than the tool.
- Layer 0 — Landing zone (platform team). Resource groups, the hub-spoke VNet, private DNS zones (
privatelink.vaultcore.azure.net,privatelink.azurecr.io,privatelink.<region>.azmk8s.io), and policy assignments. This is your existing Azure landing zone; the cluster is a spoke. - Layer 1 — Platform (platform team). ACR (Premium, private endpoint), Key Vault, Log Analytics + managed Prometheus + managed Grafana, the AKS cluster itself with the add-ons enabled, and the system identities.
- Layer 2 — Cluster bootstrap (platform team, but via GitOps). The mesh, the GitOps controller, the policy engine, ingress, and shared CRDs. Bootstrap GitOps once with IaC, then let GitOps manage everything else — including itself.
- Layer 3 — Workloads (app teams). Namespaces, manifests/Helm charts in the config repo, reconciled by the GitOps controller.
A representative Terraform skeleton for Layer 1 — note the add-ons that turn a bare cluster into this architecture (OIDC issuer + workload identity, the managed Istio mesh, monitoring, and key-vault secrets provider):
resource "azurerm_kubernetes_cluster" "prod" {
name = "aks-prod-eus2"
resource_group_name = azurerm_resource_group.platform.name
location = "eastus2"
dns_prefix = "aks-prod"
kubernetes_version = "1.31" # track n-1 of latest stable
sku_tier = "Standard" # control-plane Uptime SLA
oidc_issuer_enabled = true # required for workload identity
workload_identity_enabled = true
azure_policy_enabled = true # Gatekeeper guardrails
local_account_disabled = true # Entra-only access
# Private cluster: no public API server
private_cluster_enabled = true
default_node_pool {
name = "system"
vm_size = "Standard_D4ds_v5"
auto_scaling_enabled = true
min_count = 3
max_count = 5
only_critical_addons_enabled = true # taint: CriticalAddonsOnly
zones = [1, 2, 3]
os_sku = "AzureLinux"
}
network_profile {
network_plugin = "azure"
network_plugin_mode = "overlay" # CNI Overlay for IP efficiency
network_policy = "cilium" # or "azure"; mesh handles mTLS
load_balancer_sku = "standard"
}
# AKS-managed Istio service mesh add-on
service_mesh_profile {
mode = "Istio"
revisions = ["asm-1-23"]
}
key_vault_secrets_provider {
secret_rotation_enabled = true
}
oms_agent {
log_analytics_workspace_id = azurerm_log_analytics_workspace.platform.id
msi_auth_for_monitoring_enabled = true
}
azure_active_directory_role_based_access_control {
azure_rbac_enabled = true # Azure RBAC for Kubernetes
tenant_id = data.azurerm_client_config.current.tenant_id
}
identity { type = "UserAssigned"
identity_ids = [azurerm_user_assigned_identity.cluster.id] }
}
# Spot + GPU + general user pools added as azurerm_kubernetes_cluster_node_pool ...
Wire the workload identity for one service — this is the pattern every microservice repeats. Create a user-assigned identity, grant it only the Azure RBAC it needs, federate it to the service’s Kubernetes ServiceAccount, then annotate the ServiceAccount:
resource "azurerm_user_assigned_identity" "orders" {
name = "id-orders-svc"
resource_group_name = azurerm_resource_group.platform.name
location = "eastus2"
}
# Least-privilege Azure RBAC: this service reads ONLY its own KV secrets
resource "azurerm_role_assignment" "orders_kv" {
scope = azurerm_key_vault.orders.id
role_definition_name = "Key Vault Secrets User"
principal_id = azurerm_user_assigned_identity.orders.principal_id
}
# Federate the K8s ServiceAccount -> the managed identity (no secret)
resource "azurerm_federated_identity_credential" "orders" {
name = "orders-fed"
resource_group_name = azurerm_resource_group.platform.name
parent_id = azurerm_user_assigned_identity.orders.id
issuer = azurerm_kubernetes_cluster.prod.oidc_issuer_url
subject = "system:serviceaccount:orders:orders-sa"
audience = ["api://AzureADTokenExchange"]
}
apiVersion: v1
kind: ServiceAccount
metadata:
name: orders-sa
namespace: orders
annotations:
azure.workload.identity/client-id: "<id-orders-svc client id>"
---
# In the Deployment pod template:
# labels: { azure.workload.identity/use: "true" }
# serviceAccountName: orders-sa
# The Azure SDK in the pod now gets tokens via the projected SA token — no secret.
Networking and identity wiring, the load-bearing rules:
- Cluster networking: Azure CNI Overlay gives pods routable-within-cluster IPs without burning your VNet space; pod CIDRs are NAT’d, so a
/24of node IPs supports thousands of pods. Use Cilium as the dataplane if you want network policy enforced in eBPF. - Private endpoints everywhere: ACR, Key Vault, SQL, Storage, and the cluster API all have private endpoints in the spoke VNet, with the corresponding private DNS zones linked. No platform dependency is reachable over the public internet.
- Ingress is private: the cluster’s ingress LB is internal; the only public surface is Front Door, connected to that internal LB by a Private Link origin. The cluster has no public IP.
- Egress is controlled: route node egress through Azure Firewall (or a NAT gateway) and allow-list the FQDNs AKS needs (
*.azurecr.io,*.hcp.<region>.azmk8s.io, Ubuntu/Mariner package mirrors). This is also where you contain a compromised pod’s ability to call out. - Authorization is layered and default-deny: (1) Azure RBAC decides who (which Entra user/group/identity) can do what on the cluster API; (2) Kubernetes RBAC scopes app teams to their namespaces; (3) the mesh AuthorizationPolicy / CiliumNetworkPolicy decides which service may call which; (4) admission policy decides what may run at all. Four gates, each closed by default.
Progressive delivery: wire Argo Rollouts or Flagger so a new digest rolls out as a canary — 5% of traffic, watch the mesh’s success-rate and latency metrics from managed Prometheus, auto-promote if healthy, auto-rollback if not. The mesh provides the traffic-splitting primitive; the rollout controller provides the analysis and the abort.
Enterprise considerations
Security and Zero Trust. This architecture is a Zero Trust implementation for compute, applied at four layers. Identity: every workload has its own short-lived, secret-less Entra identity (workload identity federation) and every Azure data call is a named principal with least-privilege RBAC — there is no shared credential to compromise. Network: default-deny east-west via the mesh/CNI, mTLS on every hop, private endpoints for every dependency, and a private API server — a pod can only reach what policy explicitly allows. Supply chain: images are scanned (Defender for Containers / Trivy) and signed, and admission control refuses to run anything unsigned or from an untrusted registry; this closes the “someone pushed a malicious image” path that most clusters leave wide open. Runtime: Defender for Containers provides runtime threat detection (crypto-miner, reverse-shell, suspicious exec) on the nodes; pods run non-root, read-only-rootfs, with dropped capabilities, enforced by policy. Pull CIS AKS benchmark and Microsoft cloud security baseline assessments into Defender for Cloud and treat the findings as a backlog, not a one-time audit.
Cost optimization (FinOps). A shared cluster is itself the biggest cost lever — bin-packing many services onto common nodes beats a VM-per-service estate. Beyond that: (1) Spot node pools for stateless, interruptible, and batch workloads — often 60–90% cheaper, with KEDA/PDBs to handle eviction gracefully; (2) right-size with the VPA recommender and set requests from real usage, because over-requested pods waste reserved capacity even when idle; (3) cluster autoscaler + scale-to-zero node pools and KEDA so capacity tracks demand, plus the AKS Stop/Start feature for non-prod overnight; (4) Savings Plans / Reserved Instances for the steady-state baseline node count; (5) ambient-mode mesh to remove the per-pod sidecar tax across hundreds of replicas; (6) OpenCost / Microsoft Cost Management + Kubernetes cost views to show each team a per-namespace bill, which is the single most effective behavior change. Showback by namespace turns “the cluster is expensive” into “your service is expensive,” which is actionable.
Scalability. Four independent axes: pods scale via HPA (CPU/memory) and KEDA (queue depth, event rate, custom metrics) including scale-to-zero; nodes scale via the cluster autoscaler per pool and the faster Node Autoprovisioning (Karpenter for AKS) where available; the cluster itself has generous limits (thousands of nodes), and the platform scales by adding clusters to a fleet managed by Azure Kubernetes Fleet Manager when one cluster’s blast radius or limits become the constraint. Design services stateless so any of these can scale them freely; push state to Azure data services.
Reliability and DR (RTO/RPO). Inside a region: node pools span availability zones, Pod Disruption Budgets keep minimum replicas during upgrades and node churn, topology spread constraints avoid single-node concentration, and planned maintenance windows + surge upgrades make Kubernetes-version and node-image upgrades routine and non-disruptive — the cluster stays current and in support by construction. Region loss: the cluster is stateless and rebuildable — that is the whole point of GitOps. Your RTO is “how fast can IaC stand up a cluster in the paired region and the GitOps controller reconcile every service onto it” — realistically 15–45 minutes for a warm-standby cluster (pre-provisioned, GitOps paused) or longer for cold. Your RPO is governed entirely by the data tier, not the cluster: it is the replication lag of Cosmos DB (multi-region writes, seconds), Azure SQL failover groups, or geo-replicated Storage — the cluster holds no durable state to lose. Geo-replicate ACR so the standby region can pull images during a primary outage. The reliability win of this architecture is that DR is git apply against a fresh cluster, which you can — and must — rehearse on a schedule.
Observability. Three signals plus audit. Metrics: Azure Monitor managed Prometheus scrapes app and mesh metrics; managed Grafana dashboards them (golden signals per service, mesh success-rate/latency, node and cost views). Logs: Container Insights collects stdout and the control-plane kube-audit log to Log Analytics — the audit log is your “who changed what on the cluster” record. Traces: the mesh emits distributed traces to Application Insights, and Istio/Hubble draw a live service-dependency map so you can see, not guess, who calls whom. Alert on SLOs (error budget burn), not raw CPU. The mesh and GitOps controller both expose health you should alert on: mesh mTLS coverage and GitOps sync/drift status (a service that has drifted from Git, or a failed reconcile, is an incident).
Governance. Enforce, do not document. Azure Policy for AKS (Gatekeeper) applies the org’s guardrails cluster-wide — allowed registries, required labels/limits, no privileged pods, no host network — and reports compliance into Azure Policy. Kyverno covers mutation and finer policy (auto-inject securityContext, enforce image-digest pinning). Microsoft Entra + Azure RBAC for Kubernetes ties cluster access to Entra groups and PIM so cluster-admin is time-bound and approved. The GitOps repo’s PR history is your change-management record — every production change is a reviewed, attributed, revertable commit, which is exactly what auditors want and exactly what kubectl apply from a laptop never provides.
Reference enterprise example
Northwind Mobility is a (fictional) mid-market mobility-and-logistics SaaS: a driver app, a shipper portal, and a partner API, serving ~120,000 daily active users across the US, run by six squads. They started as a Django monolith on App Service. By 2025 the monolith’s single release train was the bottleneck — squads waited days for each other’s changes, a payments hotfix required a full-app deploy, and a SOC 2 audit flagged shared service-principal secrets and plaintext internal traffic. They decided to decompose into ~28 services on AKS, deliberately as a platform, not a pile of pods.
What they built. One production AKS cluster in East US 2 (Standard tier, private API server, Azure CNI Overlay with Cilium), with a warm-standby cluster in Central US kept current by the same IaC and a paused GitOps controller. Node pools: a 3-node zonal system pool, a general user pool (D-series, autoscaling 6→30), a spot pool for the trip-pricing batch and notification fan-out (saving ~70% on that bursty compute), and a small GPU pool for their ETA-prediction model. They chose the AKS-managed Istio add-on in ambient mode because the squads wanted header-based canaries and per-route retries, and ambient kept the mesh tax low across ~400 pods. Each of the 28 services got its own user-assigned managed identity federated to its ServiceAccount — zero client secrets remained anywhere; the SOC 2 finding closed itself. Secrets that genuinely had to exist (a third-party payment-gateway key) came from Key Vault via the CSI driver over a private endpoint; everything else (SQL, Service Bus, Blob) went passwordless via Entra tokens. Delivery moved to Flux (the microsoft.flux add-on) with an app repo / config repo split and Flagger canaries; Azure Policy for AKS enforced “only signed images from northwind.azurecr.io, non-root, limits required.” Front Door + WAF fronted the cluster’s internal ingress over a Private Link origin.
The numbers and decisions. Roughly $6,800/month all-in for the production cluster: ~$4,100 compute (heavily offset by spot and a 1-year Savings Plan on the baseline nodes), ~$700 ACR Premium + geo-replication, ~$900 Front Door + WAF, ~$1,100 Azure Monitor/Prometheus/Grafana/Log Analytics ingestion. The warm-standby cluster added ~$1,500 (mostly its idle baseline nodes). They debated classic Istio sidecars vs ambient and chose ambient, saving an estimated ~$900/month in sidecar CPU/memory at their replica count. They debated a cluster-per-squad model and rejected it as six times the platform toil for tenancy they could get with namespaces + mesh policy + Azure RBAC.
The outcome. Deploy frequency went from ~3/week (whole monolith) to 40+/day across squads, each squad shipping independently behind canaries. Mean time to recovery for a bad deploy dropped to under 4 minutes (Flagger auto-rollback on success-rate dip). They ran a region-loss game day: failed Front Door to Central US, un-paused the standby cluster’s Flux controller, and had all 28 services reconciled and serving in 31 minutes (RTO), with RPO of seconds because Cosmos DB multi-region writes and the SQL failover group held the state — the cluster held none. The SOC 2 auditor’s “credential management” and “encryption in transit (internal)” findings were both closed by workload identity and mesh mTLS respectively. The platform team’s recurring nightmare — Kubernetes upgrades — became a scheduled, automated, non-event via the stable auto-upgrade channel and surge upgrades within maintenance windows. Net: independent team velocity and a defensible security posture, on shared infrastructure, for under $8.5k/month.
When to use it
Use this architecture when you have multiple teams shipping multiple services that must release independently, you need a defensible security and tenancy boundary on shared infrastructure (encrypted/authorized east-west traffic, per-workload secret-less identity, a trusted supply chain), and you want auditable, Git-driven delivery with a cluster you can upgrade and rebuild without fear. It scales cleanly from one cluster running a dozen services to a Fleet-managed estate running hundreds; the diagram is the same, only the number of clusters and the strictness of policy change. The prerequisite is operational maturity: a platform team that owns the paved road, and app teams willing to live on it.
Trade-offs to accept going in. Kubernetes, a service mesh, GitOps, workload identity, and policy-as-code are a substantial amount of platform to learn and operate. You are buying enormous flexibility and a strong security posture, and paying for it in platform complexity and a real platform team. If you have one or two services and a single squad, this is over-engineering — the operational surface will cost you more than it returns.
Anti-patterns that quietly defeat the design:
- The shared-cluster-without-boundaries trap. AKS with everything in
default, no mesh policy, no admission control, andcluster-adminkubeconfigs floating around is not this architecture — it is a distributed monolith with worse security than the monolith had. The boundaries are the architecture. - Secrets in
Secretobjects “for now.” The moment a connection string or client secret lives in a KubernetesSecret, you have re-created the credential-leak path workload identity was meant to remove. Go passwordless; pull the unavoidable exceptions from Key Vault via CSI. kubectl applyalongside GitOps. A human applying changes out-of-band defeats the single-source-of-truth model, causes drift the controller will fight or revert, and erases the audit trail. Once GitOps owns a namespace, all changes go through Git.:latesttags and unsigned images. Mutable tags make rollbacks ambiguous and let an attacker swap an image under a tag you already approved. Pin digests; sign; enforce signatures at admission.- A mesh with permissive mTLS and no AuthorizationPolicy. Installing Istio/Cilium and leaving mTLS in
PERMISSIVEwith allow-all policy gives you the cost of a mesh and none of the security. GoSTRICTand default-deny, or you have bought a dashboard, not a boundary. - The pet cluster. Refusing to upgrade because the last upgrade hurt means sitting out-of-support and accumulating risk. With zones, PDBs, surge upgrades, and a
stablechannel, upgrades are routine — and GitOps means worst case you rebuild from Git.
Alternatives, in increasing capability and operational cost: (1) Azure Container Apps — managed, serverless Kubernetes-without-the-cluster, with built-in Dapr, KEDA, and ingress; the right choice for a small-to-medium set of microservices that do not need full cluster control, custom operators, or a specific mesh. Most teams should start here and graduate only when they hit its ceiling. (2) App Service / Functions — for a handful of web apps and event handlers, no orchestration needed. (3) A single AKS cluster, namespace-per-team (this article) — the default for real multi-team production at moderate scale. (4) AKS Fleet (multi-cluster) — when one cluster’s blast radius, scale limits, or hard regulatory isolation forces a fleet; same components, federated. Pick the lowest tier that meets your team count and isolation requirements; most organizations reach for AKS when Container Apps would have done, and pay for cluster operations they did not need. The platform you can actually operate beats the platform you merely deployed.