Architecture Azure

Azure Landing Zone: Governance — Azure Policy Initiatives, Cost Guardrails, Compliance Frameworks & Tag Enforcement

Where this fits

The Azure Landing Zone (ALZ) conceptual architecture splits into eight design areas across two themes — Environment design (Identity, Network Topology and Connectivity, Resource Organization) and Governance & operations (Security, Management, Governance, Platform Automation and DevOps, plus the cross-cutting Billing and Microsoft Entra Tenant decision). Governance is part 7, and it is the design area that turns intent into enforced reality: it takes the management group hierarchy you built in Resource Organization and the controls you scoped in Security and Management, and it expresses them as policy-as-code guardrails, cost controls, and compliance baselines that apply automatically and inescapably. Microsoft’s own framing is blunt — Governance exists to replace change-advisory-board gatekeeping with automated guardrails and continuous compliance auditing so application teams can move fast inside boundaries they cannot remove. This article goes deep on the five engines that make that real: Azure Policy and initiatives, cost controls and budgets, the audit/deny/deployIfNotExists guardrail decision, compliance-framework mapping, and tag enforcement.

Azure Landing Zone Design Areas — animated overview

Azure Policy and policy initiatives

What it is

Azure Policy is the rules engine of the landing zone. A policy definition is a single rule with an if condition (a logical test over resource properties — type, location, tags, kind, ARM field aliases) and a then effect that fires when the condition matches. A policy initiative (a.k.a. a policy set definition) is a named bundle of many definitions, exposing a consolidated set of parameters and rolling up into a single compliance percentage. You assign a definition or initiative to a scope — a management group, subscription, or resource group — and it flows down through inheritance to every descendant, where it evaluates existing resources and intercepts new deployments at the ARM control plane.

Azure Policy is complementary to RBAC, not a substitute: RBAC governs who can perform an action; Policy governs what the resulting resource is allowed to look like. A user with Owner can still be blocked from creating a public-IP NIC or an unencrypted disk by a Deny policy — that separation is the whole point.

Why it matters

Policy is the only mechanism in Azure that is simultaneously preventive (it can refuse non-compliant deployments before they exist), detective (it continuously audits the live estate and produces a compliance score), and corrective (it can deploy or modify resources to bring them into line). Without it, governance degrades into manual reviews, tribal knowledge, and drift. With it, a control written once at the intermediate-root management group is enforced on the 4,000th subscription exactly as on the first — and cannot be weakened by a child scope, only made stricter.

Why initiatives, not loose policies

You assign initiatives, not dozens of individual policies, for four reasons:

Reason Without initiatives With an initiative
Assignment sprawl N separate assignments per scope to manage and exclude One assignment, one set of exclusions
Compliance rollup N separate compliance figures, no single number One initiative compliance % you can report to auditors
Parameter consistency Each policy parameterized independently Shared initiative parameters set the same value once
Scope limits You burn through the per-scope assignment ceiling fast One assignment counts as one against the limit

Azure enforces limits here — there are caps on policy/initiative definitions and assignments per scope — which is exactly why the ALZ pattern is “few initiatives, assigned high, parameterized per scope” rather than “many policies, assigned everywhere.”

How to do it well

Concrete artifacts, decisions, and tools

Cost controls and budgets

What it is

Cost governance in the landing zone is delivered through Microsoft Cost Management + Billing: budgets (a cost or usage threshold evaluated against a scope — billing account, MCA invoice section / EA enrollment account, subscription, resource group, or a management-group cost view), alerts and action groups that fire at percentage thresholds of actual or forecast spend, and the commercial levers Azure exposes to bend the cost curve — Reservations, the Azure savings plan for compute, Azure Hybrid Benefit, Spot VMs, and dev/test subscriptions. Tags are the connective tissue: they make cost allocatable to a cost center, application, or business unit.

Why it matters

A landing zone that governs security but not spend produces a different kind of incident — the surprise invoice. Cost governance has to be architected in at the platform layer, not bolted on per app, because the people who can prevent overspend (the platform team, via budgets and policy) are not the people generating it (app teams). Done well, budgets give you forecast-based early warning (alert at 60% of projected month-end, not after the money is gone), commitment discounts cut the bill structurally, and tag-driven allocation turns one opaque invoice into accurate showback/chargeback per business unit.

The two halves: guardrails (preventive) and budgets (detective)

Cost control splits cleanly into prevent and detect:

Control Mechanism Effect
Restrict expensive SKUs/regions Azure Policy Deny on Microsoft.Compute/virtualMachines/sku.name, allowed-locations, allowed resource types Stops a wrong-SKU/wrong-region deploy before it costs anything
Auto-tier / expire data Azure Storage lifecycle management rules Moves blobs to cool/cold/archive or deletes at end of lifecycle
Right-size & shut down idle Azure Advisor cost recommendations; autoscale; start/stop automation Continuous structural savings
Budget alerting Cost Management budgets + action groups at 60/80/100% actual & forecast Detective early-warning, routed to owners
Commitment discounts Reservations (up to ~72% vs PAYG), savings plan for compute (up to ~65%), Hybrid Benefit, Spot Structural reduction of the run-rate

The crucial nuance: a Cost Management budget does not stop spend — it alerts (and can trigger an automation runbook via the action group, e.g. to deallocate dev VMs). To actually prevent cost you need Azure Policy allow-lists for SKUs/regions/types. Mature landing zones use both: Policy to cap what can be created, budgets to watch what it costs.

How to do it well

Concrete artifacts, decisions, and tools

Guardrails: audit vs deny vs deployIfNotExists (and the rest)

What it is

The effect is the verb of a policy — what actually happens when a resource matches. Azure Policy supports these effects, and choosing the right one per control is the single most consequential decision in the whole Governance design area:

Effect What it does Preventive / Detective / Corrective Remediatable?
Audit Logs a non-compliant entry; allows the deployment Detective No
AuditIfNotExists Audits when a related resource is missing/misconfigured (e.g. no diagnostic setting on a VM) Detective No
Deny Blocks the create/update at the control plane Preventive No
DenyAction Blocks an action — today only DELETE — to protect critical resources from deletion Preventive No
Modify Adds/updates/removes properties or tags on create/update; can fix existing resources via a remediation task Corrective Yes
Append Adds fields/properties (e.g. a default tag) at create time; cannot remediate existing resources Corrective (create-time) No
DeployIfNotExists (DINE) Deploys an ARM template when a related resource is missing (e.g. auto-deploy a diagnostic setting, enable Defender plan, install an agent) Corrective Yes
Manual Tracks attestation-based controls Azure can’t evaluate automatically (process/people controls); you attest compliance Detective (attested) N/A
Disabled Turns the policy off for testing without deleting the assignment

Why the choice matters

Pick the wrong effect and you either break the business (a premature Deny that blocks legitimate work) or achieve nothing (an Audit on a control that needed teeth). The art is matching effect to control intent and blast radius:

The operational reality of corrective effects

DeployIfNotExists and Modify are the only remediatable effects — they alone can fix the existing estate, the others only affect new deployments. Two facts shape how you operate them:

  1. New/updated resources trigger DINE/Modify evaluation automatically after a configurable evaluationDelay — by default DINE waits ~10 minutes (and you can extend it for resources that provision slowly, so the existence check runs after the dependency is ready).
  2. Pre-existing non-compliant resources are not auto-fixed — you must create a remediation task (which runs the deployment/modification using a managed identity that needs the right RBAC role, e.g. Contributor or a targeted role on the scope). Ongoing remediation is typically driven from the pipeline or a scheduled job so newly-discovered drift gets swept up.

Effects like Audit, AuditIfNotExists, Deny, DenyAction, Disabled, and Manual have no remediation capability — non-compliance found through them is resolved by changing the resource yourself (or attesting, for Manual).

Concrete artifacts, decisions, and tools

Compliance frameworks

What it is

A compliance framework in Azure terms is a regulatory-compliance initiative — a built-in policy set that maps an external standard’s controls to concrete Azure Policy definitions (mostly Audit/AuditIfNotExists, some DeployIfNotExists). Azure ships maintained initiatives for the big regimes, and the default ALZ baseline assigns the Microsoft cloud security benchmark (MCSB) — the successor to the Azure Security Benchmark — as the foundational guardrail set. On top of MCSB you layer the regimes your business is actually subject to.

Framework Typical applicability How it shows up in Azure
Microsoft cloud security benchmark (MCSB) Everyone — the ALZ default baseline Built-in initiative, the backbone of Defender for Cloud’s secure score
PCI-DSS Card/payment data Built-in regulatory-compliance initiative
HIPAA / HITRUST US healthcare PHI Built-in initiative
SOC 2 (Trust Services Criteria) SaaS / service orgs Built-in initiative
ISO/IEC 27001 Broad infosec certification Built-in initiative
NIST SP 800-53 / CSF US federal, regulated industries Built-in initiative
CIS Microsoft Azure Foundations Benchmark Hardening baseline Built-in initiative
Local/sovereign (RBI, GDPR-aligned, sovereign-cloud sets) Region-specific obligations Built-in + custom initiatives, often pinned at geo MG tiers

Why it matters

Microsoft’s guidance is to map regulatory and compliance requirements to Azure Policy definitions and Azure role assignments and to assign the initiatives at the management-group level from day zero — because retrofitting compliance onto a populated estate is dramatically more expensive than inheriting it from the start. Assigning, say, the PCI-DSS and ISO 27001 initiatives at the right MG means every subscription that lands beneath it is born measured against those controls, and you get a continuous, auditor-ready compliance percentage instead of a once-a-year scramble.

How to do it well

Concrete artifacts, decisions, and tools

Tag enforcement

What it is

Tags are key/value metadata on resources, resource groups, and subscriptions that carry what the name can’t or shouldn’t — owner, cost center, environment, data classification, criticality. Tag enforcement is the use of Azure Policy to guarantee tags exist, hold allowed values, and inherit correctly, because — critically — tags do not inherit by default: a resource does not automatically pick up its resource group’s or subscription’s tags.

Why it matters

Tags are the join key for almost every downstream governance function: cost allocation/showback in Cost Management, operational routing (who to page), automation targeting (start/stop, backup selection, sandbox expiry), and compliance reporting (which resources hold regulated data). A landing zone with weak tagging produces half-empty cost reports and unattributable resources. Because most tagged-resource values are governance-critical and tags don’t propagate natively, enforcement is non-optional — and Microsoft calls it out explicitly in the Governance design area, including using the append mode to enforce required tags and Modify to manage them.

The enforcement pattern (the part that actually works)

Policy intent Built-in policy Effect Why this effect
Block resources missing a required tag Require a tag and its value on resources Deny Stop non-compliant creation at source
Block RGs missing a required tag Require a tag on resource groups Deny RGs are the inheritance source for resources
Make resources inherit a tag from their RG Inherit a tag from the resource group Modify Tags don’t inherit natively; Modify can remediate existing
Make resources inherit a tag from the subscription Inherit a tag from the subscription Modify Same, sourced from the subscription
Backfill a default tag value Add or replace a tag on resources Modify Set org defaults; remediatable

The deliberate design: Deny on mandatory tags at the RG/subscription level (the authoritative source), Modify (inherit) so child resources automatically acquire CostCenter/Environment/etc. from their parent, and remediation tasks to backfill the existing estate — not just new resources. Use Modify over Append for tags because Modify supports more operations and remediates existing resources, whereas Append only acts at create time.

How to do it well

Concrete artifacts, decisions, and tools

Real-world enterprise scenario

Northwind Logistics Cloud is a fictional pan-Asia freight and supply-chain platform: ~2,300 employees, a regulated-data profile (payments via a card-processing arm, plus India/Singapore data-residency obligations), and 80+ application teams running on the ALZ Bicep accelerator. Their Cloud Centre of Excellence (CCoE) works the Governance design area onto the management group hierarchy already deployed in Resource Organization (northwind intermediate root → Platform, Landing Zones {Corp, Online}, Sandbox, Decommissioned).

Azure Policy and initiatives. The CCoE deploys the ALZ default policy assignments module as its baseline, then authors three custom initiatives (defined at the northwind MG so they’re assignable anywhere): a Security Guardrails initiative, a Cost Guardrails initiative, and a Tagging initiative. Every new restrictive policy lands first as a Deny with enforcement mode DoNotEnforce for a two-week soak, during which they read the compliance dashboard, fix or exempt the backlog (each exemption carries a justification and a 90-day expiry), then flip to enforce. They hold assignments at the root MG to a deliberate minimum and push workload-specific rules down to Corp/Online. App-platform leads get Resource Policy Contributor scoped to their own subscription for app-level governance. Artifact: a policy-assignment matrix — MCSB + ALZ defaults + the three custom initiatives at northwind, residency Deny (allowed-locations centralindia, southindia, southeastasia) pinned at geo sub-tiers under Corp.

Cost controls and budgets. Every vended subscription is born with a ₹/S$ budget and an action group alerting the owning team + FinOps at 60/80/100% of actual and forecast spend; dev/test subscriptions additionally trigger an automation runbook at 100% forecast that deallocates idle VMs. The Cost Guardrails initiative Denys VM SKUs outside an approved list and blocks expensive regions. Northwind commits to a savings plan for compute covering ~70% of steady-state compute, layers Reservations on the always-on SQL and the hub firewall, applies Hybrid Benefit to Windows/SQL, and routes nightly route-optimization batch to Spot VMs. Cost Management is sliced by CostCenter and Application for monthly showback. Outcome: a structural ~31% reduction in compute run-rate within two quarters and zero surprise invoices, because budgets give forecast-based early warning and Policy caps what can be created.

Guardrails (effect choices). Their effect-per-control register: Deny for the non-negotiables (no public IP on Corp, residency, encryption-at-rest, no classic resources); DenyAction/DELETE protecting the central Log Analytics workspace, the hub Azure Firewall, and the platform key vaults; DeployIfNotExists to auto-deploy diagnostic settings to the central workspace, auto-enable Defender for Cloud plans on every subscription, and configure backup — with evaluationDelay extended to 30 minutes on a slow-provisioning data service; Modify for tag inheritance. Remediation tasks (running under a managed identity with a scoped Contributor role) sweep the brownfield estate weekly. The audit-first discipline means not a single Deny rollout has blocked legitimate work.

Compliance frameworks. MCSB stays assigned tenant-wide (it drives the Defender for Cloud secure score, which the CISO tracks at 78% and climbing). The PCI-DSS regulatory initiative is scoped only to the card-processing management group/subscriptions, ISO/IEC 27001 at northwind for the certification audit, and a custom residency initiative at the India/Singapore geo tiers. The CCoE maps each framework’s controls to Policy and RBAC (access reviews + PIM via Entra ID Governance), tracks status in Defender for Cloud’s Regulatory Compliance dashboard, and exports quarterly evidence packs. A Manual-effect set with an attestation register covers the process controls Azure can’t evaluate.

Tag enforcement. Five mandatory tags — Environment, CostCenter, Owner, Application, DataClassification — with enumerated allowed values, enforced by the Tagging initiative: Deny missing tags on resource groups and subscriptions, Modify/inherit so child resources acquire CostCenter and Environment from their RG, add-default for ManagedBy=bicep. Remediation tasks tagged ~21,000 pre-existing resources; Azure Resource Graph KQL audits compliance live.

Measurable outcome after two quarters: Defender for Cloud secure score 62% → 78%; 100% mandatory-tag compliance on new resources, 97% across the legacy estate (audited via Resource Graph); first PCI-DSS and ISO 27001 audits passed on continuous evidence rather than a manual scramble; compute run-rate down ~31% via savings-plan/reservation/Spot layering; zero residency violations (allowed-locations pinned and inherited at the geo MG tiers); change-advisory-board reviews for cloud deployments eliminated, replaced by automated guardrails.

Deliverables & checklist

Common pitfalls

  1. Going straight to Deny and breaking the business. A restrictive policy flipped to enforce on a populated estate blocks legitimate work and triggers a fire drill. Always land it as Audit (or Deny with enforcement mode DoNotEnforce) first, measure the non-compliant count, fix or exempt the backlog, then enforce.
  2. Expecting Deny/Audit to fix existing resources. Only DeployIfNotExists and Modify are remediatable — and even they don’t auto-fix pre-existing resources without a remediation task (and a managed identity with the right RBAC). Plan remediation explicitly; don’t assume the live estate self-heals.
  3. Confusing budgets with spending caps. A Cost Management budget alerts; it does not stop spend. To actually prevent cost you need Azure Policy SKU/region/type allow-lists. Use both — budgets to watch, Policy to cap — or you’ll get the alert after the money is gone.
  4. Assigning regulatory initiatives everywhere. Most regulatory-compliance built-ins are audit-only and noisy if blanket-applied. Scope each regime (PCI-DSS, etc.) to where it actually applies, keep MCSB as the broad baseline, and pair audit initiatives with your own Deny/DINE enforcement for the controls you must guarantee.
  5. Assuming tags inherit. Resources do not inherit their RG’s or subscription’s tags by default, so cost reports come out half-empty. Enforce mandatory tags with Deny at the RG/subscription source, propagate with Modify/inherit, and run remediation tasks on the legacy estate. Prefer Modify over Append for tags.
  6. Hoarding assignments at the root management group. Piling policies onto the Tenant Root / intermediate root forces endless exclusions at inherited scopes and risks hitting assignment limits. Define high, assign at the right altitude, exclude low — push workload-specific guardrails down to the archetype MGs.

What’s next

With governance guardrails, cost controls, and compliance baselines enforced as code, part 8 of the Azure Landing Zone Design Areas turns to Platform Automation and DevOps — the GitOps pipelines, IaC modules, and subscription-vending machinery that deploy and continuously reconcile everything you’ve designed across the previous seven design areas.

AzureLanding ZoneGovernanceEnterprise
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

// part 7 of 8 · Azure Landing Zone Design Areas

Keep Reading