Where this fits
Platform Automation & DevOps is the eighth and final design area of the Azure Landing Zone (ALZ) conceptual architecture, and it is the one that makes the other seven durable. Identity, Network Topology, Resource Organization, Security, Management, Governance and the Billing/Entra-tenant decisions all describe a desired state; Platform Automation & DevOps decides how that state is expressed as code, reviewed, tested, deployed, and continuously reconciled so it never drifts. Concretely, this design area answers: in what language is the platform defined (Bicep, Terraform, or both)? Which accelerator bootstraps the management-group tree, policies and platform subscriptions? How does an application team get a governed subscription on demand (subscription vending)? What CI/CD deploys changes to the platform itself, with what gates and what identity? And what operating model does the platform team run so that “the landing zone” is a product with versioned releases rather than a one-off click-ops build that decays the day after go-live? Skip this design area and you have a beautiful architecture that is true exactly once; do it well and every other design area becomes a pull request.

Infrastructure as Code with Bicep and Terraform
What it is
Infrastructure as Code (IaC) is the practice of declaring your Azure resources — management groups, policy assignments, subscriptions, hub networks, Log Analytics workspaces, the lot — in version-controlled, machine-readable definitions that a deployment engine reconciles against the live tenant. For Azure landing zones there are two first-class, Microsoft-supported choices, and the ALZ ships both as maintained reference implementations:
- Bicep — a domain-specific language that transpiles to Azure Resource Manager (ARM) JSON. It is Azure-native, has no external state store (ARM is the state, recorded as deployment objects at the relevant scope), and is deployed by ARM directly. The ALZ ships ALZ-Bicep modules.
- Terraform — HashiCorp’s cloud-agnostic IaC tool driving the AzureRM and AzAPI providers, with an explicit, external state file as the source of truth for what it manages. The ALZ ships the Azure Verified Modules for Platform Landing Zone (ALZ) —
Azure/avm-ptn-alzand the olderAzure/caf-enterprise-scale(“CAF Enterprise-scale”) module.
Why it matters
The landing zone is governance infrastructure: a Deny policy at the intermediate-root management group affects every subscription in the tenant. You cannot afford for that to be a hand-edited portal setting that one admin remembers. IaC gives you review (PR + approvals), audit (git history of who changed which guardrail and why), repeatability (rebuild the platform in a DR tenant or a second geo), testability (validate and what-if before apply), and drift detection (the live tenant is continuously compared to the declared state). It is also the only sane way to operate at scale: hundreds of policy assignments, dozens of subscriptions and spoke networks are not maintainable by clicking.
Bicep vs Terraform for the platform — the real decision
This is the single most consequential decision in this design area. The honest answer: both are fully supported and both deploy the same ALZ reference architecture; choose on team skills and existing estate, not on a feature scoreboard.
| Dimension | Bicep | Terraform |
|---|---|---|
| Vendor / support | Microsoft-native, ships with Azure CLI; ARM is the engine | HashiCorp + Microsoft-published modules; AzureRM/AzAPI providers |
| State management | No external state — ARM stores deployments at the scope; nothing to secure/lock yourself | Explicit state file you must host (Azure Storage backend) and lock (blob lease) |
| Multi-cloud | Azure only | Cloud-agnostic — preferred if you also run AWS/GCP with one toolchain |
| Day-0 coverage of new Azure features | Often same-day (ARM-native) | Lags until the AzureRM provider adds it; AzAPI provider closes the gap |
| Preview before apply | az deployment ... what-if |
terraform plan (clearer diff, the gold standard) |
| Drift handling | What-if on redeploy; ARM deployments are idempotent | plan shows drift; can import existing resources |
| Modularity | Modules + template specs; registry via ACR or Bicep public registry | Modules + a rich public/private registry; Azure Verified Modules (AVM) |
| ALZ reference impl | ALZ-Bicep | Azure/avm-ptn-alz, caf-enterprise-scale |
| Best fit | Azure-only shop, .NET/Microsoft-aligned team, wants zero state ops | Multi-cloud, existing Terraform skills, wants explicit plan/state |
Pragmatic guidance I give clients: pick one as the platform’s primary language and standardize. Mixing Bicep and Terraform across the same layer multiplies the skills you must hire for and the failure modes you must debug. A common, legitimate split is Terraform for the platform landing zone (because the org already runs Terraform for everything) while application teams use whatever they like inside their own subscription — the guardrails are enforced by Policy, not by mandating the app team’s IaC tool.
How to do it well
- Adopt Azure Verified Modules (AVM). AVM is Microsoft’s program of pre-built, well-architected, tested Bicep and Terraform modules with consistent interfaces, telemetry, and a published contract. Building landing-zone components from AVM modules instead of raw resources gives you battle-tested defaults and a clear upgrade path. The ALZ platform pattern itself is an AVM pattern module (
avm-ptn-alz). - Separate
config(data) fromcode(logic). The ALZ accelerators are explicitly designed so your inputs — management-group names, policy parameters, regions, hub address spaces — live in declarative config files (.tfvars, Bicep.bicepparam, or library JSON/YAML), and the module logic is consumed unchanged from upstream. This lets yougit pullupstream improvements without re-merging your customizations. - Pin and version every module. Reference upstream modules by immutable version tag (a Terraform
version = "x.y.z", a Bicep registry tag, or a git SHA), neverlatest/main. Upgrades are then a deliberate, reviewed PR with awhat-if/plan, not a surprise on the next run. - Remote, locked, encrypted state (Terraform). Host state in an Azure Storage account with versioning, soft-delete, RBAC + Microsoft Entra auth (not account keys), blob-lease locking, and ideally a customer-managed key. Put the state account in the Management platform subscription, locked with
CanNotDelete. (Bicep sidesteps all of this — there is no state file to lose.) - Run
what-if/planon every PR and require it as a gate. A platform change that can’t show a clean, expected diff doesn’t merge. - Decompose by lifecycle, not by one giant root module. Keep the management-group + policy layer, the connectivity hub layer, and subscription vending in separate state/deployment units so a hub change can’t accidentally re-plan the whole tenant.
Concrete artifacts, decisions, and tools
- Artifacts: the platform IaC repository (modules + environment config); a Terraform backend definition (storage account, container, key) or the Bicep deployment-stack/scope strategy; a module version-pinning manifest; a
what-if/planartifact attached to every PR; a module-source decision record (AVM vs custom). - Decisions: Bicep vs Terraform as the primary platform language; AVM adoption; state hosting and locking (if Terraform); module versioning policy; how config is separated from code; what is deployed at MG scope vs subscription scope vs resource-group scope.
- Tools/services: Bicep + Azure Resource Manager (with
what-if, deployment stacks, template specs); Terraform with the AzureRM and AzAPI providers and an Azure Storage backend; Azure Verified Modules; ALZ-Bicep andAzure/avm-ptn-alz/Azure/caf-enterprise-scale; theAzure/namingmodule for deterministic names.
The landing-zone accelerator
What it is
The Azure landing zone accelerator is Microsoft’s opinionated, ready-to-deploy implementation of the entire ALZ conceptual architecture — the management-group hierarchy, the platform subscriptions (Management, Connectivity, Identity), the Azure Policy initiatives and assignments, RBAC, the Log Analytics workspace, hub networking, and Defender for Cloud — bootstrapped from a single configurable starting point. It comes in three flavours that all converge on the same reference design:
- Portal accelerator — a guided Azure portal experience (“Deploy Azure landing zones”) that asks for your inputs and provisions the platform. Excellent for a first deployment and for understanding the shape; less ideal as the long-term operating model because it is not, by itself, a CI/CD pipeline.
- Bicep accelerator (ALZ-Bicep) — modular Bicep that you run from a pipeline.
- Terraform accelerator (
Azure/alzaccelerator +avm-ptn-alz) — a bootstrap that scaffolds both the platform IaC and the CI/CD plumbing (repos, pipelines, identity, state backend) for you.
Why it matters
The accelerator is the difference between spending months hand-assembling a CAF-compliant platform and standing up a known-good, Microsoft-maintained baseline in days — one that already encodes the design decisions of the other seven design areas (the canonical MG tree, the ALZ custom policy set, the hub-spoke topology). Just as importantly, it gives you a supported upgrade path: as Microsoft improves the reference (new policies, new well-architected defaults), you consume those as versioned module updates rather than re-architecting. It is the fastest correct start and the thing that keeps your platform aligned with CAF over time.
Bootstrap vs platform — the key mental model
The Terraform accelerator draws a deliberate line you should preserve:
| Phase | What it creates | Run how often |
|---|---|---|
| Bootstrap | The DevOps scaffolding: the git repos, the CI/CD pipelines/workflows, the deployment identity (a Microsoft Entra app/managed identity with federated credentials / OIDC), the Terraform state storage account, and the agent/runner config | Once, to set up the factory |
| Platform (continuous) | The actual landing zone — MGs, policies, platform subs, hub network, Log Analytics — deployed by the pipelines the bootstrap created | Every change, forever, via PRs |
Internalising this prevents the classic mistake of treating the accelerator as a one-shot installer. The accelerator’s real value is that the platform phase is now a CI/CD loop you own.
How to do it well
- Start from the accelerator; don’t fork-and-forget. Take the upstream modules, drive them with your config, and stay on the upstream release train. Resist copy-pasting the module internals into your repo (you lose upgrades and inherit maintenance).
- Choose your starter config deliberately. The accelerators ship multiple starter configurations (e.g. minimal “management groups + policy only”, “complete” with connectivity, hub-and-spoke vNet vs Virtual WAN). Pick the one matching your Network Topology design-area decision so the hub the accelerator builds is the hub you actually want.
- Let the bootstrap create OIDC identity, not secrets. The Terraform accelerator can wire workload-identity federation between your pipeline and Entra so deployments use short-lived OIDC tokens with no stored client secret. Take it.
- Treat the accelerator output as your baseline, then layer org specifics. Org-specific policy exemptions, extra MGs (e.g. geo tiers for data residency), and custom initiatives go in your config layer on top of the accelerator’s defaults.
- Re-run idempotently in lower environments first. Stand the platform up in a canary/management-test scope (or a separate tenant) and prove the
plan/what-ifis clean before touching production management groups.
Concrete artifacts, decisions, and tools
- Artifacts: the chosen starter configuration; the bootstrapped repos + pipelines + state backend + deployment identity; your org-specific config overlay (MG names, policy params, hub address space, regions); an accelerator version/upgrade log.
- Decisions: portal vs Bicep vs Terraform accelerator; which starter config (hub-spoke vs Virtual WAN; minimal vs complete); GitHub vs Azure DevOps as the bootstrap target; how org customizations layer onto upstream defaults.
- Tools/services: the Azure landing zone portal accelerator; ALZ-Bicep; the
Azure/alzTerraform accelerator +avm-ptn-alz; Microsoft Entra workload identity federation (OIDC); GitHub or Azure DevOps for the bootstrapped repos and pipelines; Azure Policy built-in + ALZ custom initiatives baked into the accelerator.
Subscription vending
What it is
Subscription vending is the automated, self-service flow by which an application team receives a fully governed, network-connected, policy-compliant application landing zone subscription on demand — without a central team hand-building it. It is the productized answer to “how do app teams get into the platform,” and it is the practical embodiment of the ALZ principle of democratization (subscriptions are a unit of scale you mint freely, inside guardrails, rather than a precious thing gated by tickets). The reference implementation is the Azure/lz-vending Terraform module (with Bicep equivalents); it is the natural companion to the platform accelerator.
Why it matters
The subscription is Azure’s primary boundary for scale (quotas/limits), cost, RBAC/policy inheritance, and blast radius. If creating one is a manual, multi-week ticket, you get two failure modes: cloud velocity dies, and teams route around you (shadow IT, everything crammed into one mega-subscription that hits limits and muddies cost). Vending fixes both: app teams move at their own pace, and the platform team governs by policy-as-code applied at birth — the subscription is compliant the instant it exists because it lands under a governed management group with the right policies inherited. It is also where the platform’s self-service contract lives.
What a vending request actually does
A good vending pipeline turns a small metadata request into a complete, compliant subscription. The steps the lz-vending module orchestrates:
| Step | What happens | Why |
|---|---|---|
| Create / reference subscription | Mint a new subscription under the right billing scope (EA enrollment account or MCA invoice section), or alias an existing one | Correct cost lineage from day one |
| Management-group placement | Place it in the correct MG (e.g. Corp / Online) |
Inherits the right guardrails immediately |
| RBAC assignment | Grant the app team Owner/Contributor scoped to their subscription only | Self-service without platform access |
| Networking | Create a spoke vNet and peer it to the regional hub (or connect to Virtual WAN); register Private DNS | Connected and resolvable, hub-routed |
| Budget + tags + alerts | Apply a Cost Management budget with action-group alerts and mandatory tags (cost center, owner, env, classification) | Cost control and showback at birth |
| Resource providers + baseline | Register required resource providers; seed baseline RGs/Key Vault/identities | Ready to deploy into |
How to do it well
- Make the request a PR with a tiny schema. App teams submit a small YAML/
.tfvars(app name, cost center, environment, criticality, region/residency, budget, required connectivity) as a pull request to alanding-zonesrepo. PR review is the approval gate; merge triggers the vend. This gives you audit, review, and rollback for free. - No standing platform access for app teams. The vending pipeline runs under the platform deployment identity; it grants the app team rights scoped to their own subscription. Any elevated platform access stays behind Microsoft Entra PIM (just-in-time, approved, time-boxed).
- Govern at birth via the default management group + placement. Set the tenant’s default management group to a locked-down quarantine MG so any subscription created outside vending is non-compliant-by-default until reviewed; vending places subs explicitly so they inherit guardrails on creation.
- Standardize the baseline, parameterize the variance. Everything every subscription needs (spoke peering, DNS, budget, tags, diagnostic settings to the central Log Analytics workspace) is fixed in the module; only a handful of inputs vary per request.
- Decompose state per subscription. Each vended subscription ideally gets its own Terraform state (or a clearly partitioned slice) so one app’s changes never re-plan another’s.
Concrete artifacts, decisions, and tools
- Artifacts: the vending module + pipeline; the request intake schema (the per-subscription YAML/
.tfvarscontract); an RBAC delegation model; the per-sub budget/action-group definitions; the spoke-network and Private DNS wiring template; a subscription register (what was vended, for whom, under which MG and billing scope). - Decisions: per-workload vs per-environment subscription granularity; which billing scope subscriptions are minted under (EA enrollment account vs MCA invoice section vs CSP); default budget thresholds and alert recipients; whether app owners get Owner (with policy guardrails) or a constrained custom role; hub-peering vs Virtual WAN connection.
- Tools/services:
Azure/lz-vending(Terraform) / Bicep equivalent; Microsoft Cost Management + Billing (EA/MCA APIs to create subscriptions programmatically and set budgets); Azure Policy + default management group (governance at birth); Microsoft Entra PIM for platform elevation; Azure Private DNS and vNet peering / Virtual WAN for connectivity.
CI/CD for the platform
What it is
Platform CI/CD is the pipeline that takes a change to the landing zone itself — a new policy assignment, an MG tweak, a hub-network change, a new vended subscription — from a pull request through validation, preview, approval, and deployment, with the right identity, gates, and environment promotion. It is distinct from application CI/CD (which deploys app code into a landing zone). Here the “artifact” being deployed is governance and platform infrastructure, and the blast radius is potentially tenant-wide, so the controls are correspondingly stricter.
Why it matters
Because a single merged PR can change a Deny policy across every subscription, the platform pipeline is the control plane for your control plane. Done well it gives you peer-reviewed, auditable, gated, repeatable changes with a clear preview of impact and the ability to promote the same change through canary → production. Done badly (or not at all) you are back to admins clicking in the portal with no review, no preview, and no record — exactly the state IaC exists to eliminate.
The platform pipeline stages
A reference platform pipeline (whether GitHub Actions or Azure DevOps Pipelines) runs roughly:
| Stage | What runs | Gate |
|---|---|---|
| Validate / lint | terraform validate / bicep build, fmt/format checks, tflint, naming linter (Azure/naming) |
Block on failure |
| Static / policy test | Checkov/tfsec/PSRule for Azure (well-architected + ALZ rules), secret scanning | Block on findings |
| Plan / what-if | terraform plan / az deployment ... what-if, output attached to the PR |
Human review of the diff |
| PR approval | Required reviewers (CODEOWNERS), branch protection | Approvals + green checks |
| Apply to canary | Deploy to a non-prod MG/management-test scope or canary tenant | Automated post-deploy checks |
| Apply to production | Deploy to the real MG hierarchy / platform subs | Environment protection / manual approval |
| Post-deploy verify | Azure Resource Graph compliance queries, policy-compliance check | Alert on drift |
How to do it well
- OIDC / workload identity federation, never stored secrets. The pipeline authenticates to Azure with a Microsoft Entra app or managed identity using federated credentials — short-lived tokens, no client secret in the CI system. This is the single biggest security win and the accelerator wires it for you.
- Least-privilege, scoped deployment identities. Don’t give one god-identity Owner on the tenant root. Use separate identities per layer (one for MG/policy, one for connectivity, one for vending), each scoped to the smallest MG/subscription it needs, ideally with deployment-time elevation via PIM-for-groups.
- Branch protection + CODEOWNERS + environments. Protect
main; require PR review and passingplan/what-if; use GitHub Environments / Azure DevOps environment approvals so production apply needs an explicit human gate. - Promote the same artifact through stages. Run the identical config against a canary scope (a management-test MG branch or a second tenant) before production. Never let “tested in dev” mean “different code than prod.”
- Make
plan/what-ifa required, readable PR comment. Reviewers approve a diff, not a hope. Attach the plan output to the PR automatically. - Detect drift on a schedule. A nightly
plan/what-if(or Azure Policy compliance scan + Resource Graph query) flags anything that changed out-of-band, closing the loop between declared and live state.
Concrete artifacts, decisions, and tools
- Artifacts: the pipeline/workflow definitions (one per platform layer); the federated-credential configuration; branch-protection + CODEOWNERS rules; environment-approval definitions; the
plan/what-ifPR-comment automation; static-analysis (PSRule/Checkov) gate config; a drift-detection scheduled job. - Decisions: GitHub Actions vs Azure DevOps Pipelines; OIDC federation vs (rejected) stored secrets; how many deployment identities and at what scopes; canary/promotion topology (multi-MG vs multi-tenant); which static-analysis gates are blocking vs advisory; self-hosted vs Microsoft-hosted runners/agents (self-hosted if deploying into private-network platform resources).
- Tools/services: GitHub Actions or Azure DevOps Pipelines; Microsoft Entra workload identity federation; PSRule for Azure / Checkov / tfsec; Azure Policy compliance + Azure Resource Graph for post-deploy verification; Microsoft Entra PIM for deployment-identity elevation; GitHub Environments / Azure DevOps Environments & approvals for gating.
GitOps and the platform team operating model
What it is
GitOps is the operating discipline where git is the single source of truth for desired state, all changes flow through pull requests, and an automated system continuously reconciles the live environment to what git says — with drift detected and (optionally) auto-corrected. The platform team operating model is the human side: a dedicated platform engineering team (often the Cloud Center of Excellence / CCoE) that builds and operates “the landing zone” as an internal product — versioned, documented, supported, with application teams as its customers. Together they turn the platform from a project into a product with a release cadence.
Why it matters
Two failure modes kill landing zones over time: drift (someone clicks a change in the portal and the declared state silently diverges) and ownership rot (the original build team disbands and no one owns the platform’s evolution). GitOps attacks drift by making git authoritative and reconciliation continuous. The platform-team operating model attacks ownership rot by making the landing zone someone’s product with a backlog, releases, and SLOs. The combination is what lets a 50-team enterprise keep a single coherent, compliant platform years after go-live — and it shifts the platform team from gatekeepers (manually approving every request) to enablers (publishing self-service capabilities behind guardrails).
The operating model — platform team vs application teams
| Concern | Platform team (CCoE) owns | Application team owns |
|---|---|---|
| Management-group hierarchy & policy | ✔ (as code, via the platform repo) | — |
| Connectivity hub, firewall, DNS | ✔ | Spoke usage within guardrails |
| Subscription vending capability | ✔ (builds the vending product) | Requests a subscription via PR |
| Guardrails (Azure Policy) | ✔ (authors/assigns) | Operates inside them |
| Their application + its resources | — | ✔ (deploys into their own sub) |
| RBAC for their workload | — | ✔ (scoped to their sub) |
| Golden modules / paved road | ✔ (publishes AVM-based modules) | ✔ (consumes them) |
The principle: the platform team delegates with guardrails, not gatekeeps. App teams get Owner on their own subscription and a paved road of approved, reusable modules; the platform team ensures the guardrails they cannot remove are correct.
How to do it well
- Everything through PRs — no portal changes to the platform. Make manual changes to platform MGs/policies an exception requiring a follow-up PR to reconcile. Ideally lock down portal write access to platform scopes so the only path is the pipeline.
- Run the landing zone as a product. Versioned releases of the platform (semantic versioning), a public changelog, a backlog of guardrail/feature requests from app teams, documented SLOs (e.g. vending turnaround, policy-change lead time), and a support channel. The accelerator’s upstream releases feed your release train.
- Continuous reconciliation + drift detection. Schedule the platform pipeline (or a Policy compliance scan) to re-
plan/what-ifand alert on any divergence; remediate via PR. For workload clusters, real GitOps tooling — Flux (the native AKS GitOps extension) or Argo CD — reconciles Kubernetes state from git continuously; the same philosophy applies to the platform via scheduled IaC reconciliation. - Publish a paved road. Give app teams golden modules (AVM-based Bicep/Terraform for common patterns — a compliant web app, an AKS baseline, a data platform) so the easy path is also the compliant path. This is what makes guardrails feel like help, not handcuffs.
- Measure the platform like a product. Track DORA-style and platform-specific KPIs so you can prove the operating model is working (see below).
KPIs for the platform product
| KPI | What it tells you | Healthy direction |
|---|---|---|
| Subscription vending lead time | Self-service health | Minutes, not weeks |
| Policy-change lead time (PR → enforced) | Governance agility | Hours/days |
| Platform deployment frequency | How safely you can change the platform | Frequent, small PRs |
| Change failure rate / MTTR | Platform-change safety | Low / fast |
| % platform changes via git (vs portal) | GitOps adherence | ~100% |
| Policy-compliance % across the estate | Guardrails actually holding | High and rising |
| Configuration-drift incidents detected | Reconciliation working | Detected fast, trending down |
| Paved-road module adoption | Is the easy path the compliant path? | Rising |
Concrete artifacts, decisions, and tools
- Artifacts: the platform repo(s) as single source of truth; a platform product backlog, changelog, and versioned releases; documented operating-model RACI (who owns what); SLOs and a KPI dashboard; the paved-road module catalog; a drift-detection/reconciliation job; the contribution model (how app teams request guardrail changes via PR).
- Decisions: monorepo vs multi-repo for the platform; release cadence and versioning scheme; how (and whether) to auto-remediate drift vs alert-and-PR; whether to enforce Flux/Argo CD for workload clusters; the boundary of platform-team vs app-team responsibility; how the team is staffed (dedicated platform engineering vs rotation).
- Tools/services: GitHub/Azure DevOps (repos, PRs, branch protection, CODEOWNERS); Azure Policy + Azure Resource Graph (continuous compliance/drift visibility); Flux (AKS GitOps extension) / Argo CD for Kubernetes reconciliation; Azure Verified Modules for the paved road; a KPI/dashboard surface (Azure Workbooks / Power BI over Resource Graph and Cost Management).
Real-world enterprise scenario
Aranya Logistics is a fictional pan-India freight and last-mile-delivery company — ~2,300 employees, 38 application teams, a data-residency obligation (India regions only for customer data), and a board mandate to move off a chaotic single-subscription Azure estate onto a governed platform within two quarters. Their newly-formed platform engineering team (six engineers, branded internally as the CCoE) owns the Platform Automation & DevOps design area end to end.
IaC with Bicep and Terraform. Aranya already runs Terraform for their on-prem and a small AWS analytics footprint, so they standardize the platform on Terraform + Azure Verified Modules, building the landing zone from Azure/avm-ptn-alz and vending from Azure/lz-vending. Application teams are not forced onto Terraform — guardrails are enforced by Policy, so app teams use Bicep or Terraform as they prefer inside their own subscriptions. Platform state lives in an Azure Storage account in the Management subscription with versioning, soft-delete, Entra-auth (no account keys), blob-lease locking, a customer-managed key, and a CanNotDelete lock. Every upstream module is pinned to an exact version; upgrades are deliberate PRs.
The landing-zone accelerator. They bootstrap with the Azure/alz Terraform accelerator, choosing the hub-and-spoke “complete” starter (matching their Network Topology decision: Azure Firewall + ExpressRoute hub, not Virtual WAN). The bootstrap phase (run once) creates the GitHub repos, the GitHub Actions workflows, the Terraform state backend, and a Microsoft Entra app with OIDC federated credentials — no client secrets anywhere. The platform phase then runs continuously via PRs. On top of the accelerator’s defaults, Aranya layers org specifics in a config overlay: two extra geo MG tiers (Corp-IN-Central, Corp-IN-South) carrying an Allowed locations policy pinned to centralindia/southindia, plus three custom policy initiatives. They stand the whole platform up first in a management-test MG branch and prove a clean plan before touching production MGs.
Subscription vending. They productize onboarding with lz-vending. An app team opens a pull request to the landing-zones repo containing a ~15-line YAML: app name, cost center, environment, criticality, residency (IN-Central/IN-South), budget, and connectivity. PR review by the platform team is the approval gate; merge triggers a GitHub Actions workflow that mints the subscription under their MCA invoice section, places it in Corp-IN-Central or Online, peers its spoke vNet to the regional hub, registers Private DNS, grants the app team Owner scoped to that subscription only (platform elevation stays behind Entra PIM), seeds four baseline RGs (-app/-data/-net/-shared), wires diagnostic settings to the central Log Analytics workspace, and sets a ₹-denominated Cost Management budget with action-group alerts at 60/90/100%. A previously three-week, ticket-driven subscription request now completes in under 18 minutes, fully compliant.
Platform CI/CD. All platform changes run through GitHub Actions authenticating via OIDC workload identity federation (zero stored secrets). Each layer has its own scoped deployment identity (MG/policy, connectivity, vending) — no tenant-root god-identity. The pipeline lints (terraform validate, tflint, Azure/naming), runs PSRule for Azure and Checkov as blocking gates, posts the terraform plan as a PR comment for human review, requires CODEOWNERS approval and branch protection on main, applies first to the management-test canary scope with post-deploy Azure Resource Graph checks, then applies to production behind a GitHub Environment manual approval. A nightly plan flags any out-of-band drift.
GitOps and the platform team operating model. Aranya runs the landing zone as an internal product: semantically versioned releases, a changelog, a backlog fed by app-team guardrail requests (submitted as PRs/issues), and published SLOs (vending < 30 min; policy-change lead time < 2 days). No portal changes to platform scopes are permitted — portal write access to platform MGs is locked down, so the pipeline is the only path; the nightly reconciliation alerts on drift and the team remediates via PR. They publish a paved road of AVM-based golden modules (a compliant App Service web app, an AKS baseline using the Flux AKS GitOps extension for workload reconciliation, and a data-platform module) so the easy path is the compliant path. A Power-BI-over-Resource-Graph KPI dashboard tracks vending lead time, policy-compliance %, % of changes via git, and drift incidents.
Measurable outcome after two quarters: 41 application landing zones vended (target 35); ~99% reduction in subscription provisioning lead time (3 weeks → 18 min); 100% of platform changes via git (portal writes to platform scopes disabled); 0 stored deployment secrets (all OIDC); policy-compliance across the new estate at 98% (live via Resource Graph); zero residency violations (the Allowed locations policy pinned at the geo MG tier is inherited everywhere beneath); and paved-road module adoption at 73% of new workloads — the landing zone now evolves through reviewed pull requests rather than tribal knowledge.
Deliverables & checklist
Common pitfalls
- Treating the accelerator as a one-shot installer. Running the portal accelerator once and walking away leaves you with a platform that immediately starts drifting and has no upgrade path. Keep the bootstrap-vs-platform distinction: the accelerator builds the factory; the platform must then be a continuous CI/CD loop you own, consuming upstream releases as versioned updates.
- Mixing Bicep and Terraform in the same layer without a reason. Each tool multiplies the skills you must hire for and the failure modes you must debug. Standardize one primary language for the platform; if app teams want a different tool inside their own subscription, that’s fine — guardrails are enforced by Policy, not by mandating their IaC tool.
- Storing pipeline secrets instead of using OIDC. A long-lived client secret in your CI system is a tenant-wide credential waiting to leak. Use Microsoft Entra workload identity federation (OIDC) so deployments use short-lived tokens with no stored secret — the accelerator wires this for you.
- One god deployment identity with Owner on the tenant root. A single over-privileged identity is a catastrophic blast radius and a compliance red flag. Use least-privilege, per-layer deployment identities scoped to the smallest MG/subscription each needs, with elevation via PIM where required.
- Manual portal changes to the platform (drift). The moment someone clicks a policy or MG change in the portal, git is no longer the truth and your reconciliation lies. Enforce GitOps — disable portal write access to platform scopes, route every change through a PR, and run scheduled drift detection to catch exceptions.
- Centralized, ticket-driven subscription creation. Hand-building subscriptions kills velocity and breeds shadow IT. Productize vending (
Azure/lz-vending) so app teams self-serve via a small PR and receive a governed, connected, budgeted subscription in minutes — with no standing platform access. - Building the platform but never running it as a product. Without an owning platform team, a backlog, versioned releases and SLOs, the landing zone decays into tribal knowledge the day the build team disbands. Run it as an internal product with a clear operating model and a paved road, measured by platform KPIs.
What’s next
This installment closes the eight Azure Landing Zone Design Areas — from here, the next part of the series steps back to the operating cadence that keeps the whole platform healthy: landing-zone lifecycle management and continuous compliance, turning the design areas you’ve now built into an ongoing, measurable practice.