Almost every Azure estate I have been asked to rescue started the same way: a single subscription, Owner handed to a dozen people, a VNet someone sized at 10.0.0.0/16 “to be safe,” and governance that lived in a wiki page nobody read. It worked until it didn’t — until the second business unit wanted in, until an auditor asked who could create a public IP, until a VNet peering failed because two teams had both picked 10.0.0.0/16. The Enterprise-Scale Landing Zone exists to make those failures impossible by construction, before the first workload lands, by deciding the irreversible things once — centrally, as code — and then handing application teams a pre-governed subscription they can move fast inside without filing a ticket for every network or policy change.
This is not a methodology article about the Cloud Adoption Framework’s Ready phase; it is the deployable reference architecture itself — the management-group tree, the Virtual WAN secured-hub network, the Azure Firewall egress chokepoint, the ExpressRoute hybrid link, the Azure Policy guardrail set, and the Defender for Cloud security plane — described as one coherent system with an actual traffic path and an actual governance path through it. It follows the format of the major architecture centers: the scenario, the end-to-end flow, a component-by-component breakdown, concrete implementation and IaC wiring, the enterprise concerns, a named worked example with real numbers, and an honest section on when not to build the full thing.
The business scenario
Picture an organization standing at the threshold where ad-hoc Azure stops scaling. It might be a 600-person fintech that has outgrown its first three subscriptions, or a 40,000-person manufacturer migrating two data centers over eighteen months — the forcing functions are the same, and the architecture that answers them is the same shape at both sizes:
- A second team needs in, and “just give them Owner” no longer works. The moment more than one application team shares the estate, you need a boundary between what the platform controls (network egress, encryption, allowed regions, diagnostic logging) and what the team controls (everything inside their subscription). Without that boundary, every team can break every other team, and the security review never ends.
- An auditor or regulator is now in the picture. Someone — a PCI assessor, an ISO 27001 auditor, a board risk committee — asks a question the wiki cannot answer: who can expose a resource to the internet, prove every data store is encrypted with our keys, show me the control that enforces it, and show me it has never been violated. That demands policy-as-enforcement, not policy-as-documentation, and a security posture you can report on continuously.
- Hybrid connectivity has become load-bearing. On-prem ERP, a mainframe, a factory floor, or a regulator-mandated private link means traffic must reach Azure over a private circuit (ExpressRoute) — not the internet — and on-prem teams need a single, predictable, audited path into the cloud, not a mesh of one-off VPNs that no one can reason about.
- Subscriptions are multiplying faster than governance can keep up. Cost is sprawling with no per-team budget, IP ranges are colliding, and every new subscription is a fresh snowflake. The estate needs a vending model: minting a new, fully-governed, network-attached, budget-capped subscription must be a templated, same-day operation, not a two-week project.
The problem this architecture solves is precise: stand up a scalable, modular Azure foundation where central platform guardrails (identity, network, governance, security, operations) are enforced once and inherit automatically to hundreds of subscriptions, while application teams receive a pre-wired, policy-compliant subscription they can deploy into immediately — and where adding the next subscription, the next region, or the next on-prem site is a parameter change, not a project. The non-goals matter too. This is not “lock everything down so nothing ships”; the entire point is to make the safe path the fast path. And it is not a one-size build — the same conceptual architecture deploys “thin” for a mid-market company and “full” for a regulated enterprise; what changes is which optional planes you switch on, not the diagram.
Architecture overview
The organizing idea is a two-axis separation: a governance axis (a management-group hierarchy that carries policy and RBAC down by inheritance) crossed with a topology axis (a small set of platform subscriptions that host shared services, and a growing set of application subscriptions — the actual landing zones — that consume them). Decisions flow down the governance axis; traffic flows through the platform subscriptions. Get those two axes right and everything else is detail.
The governance path, top to bottom:
- At the apex sits the Microsoft Entra ID tenant — one identity boundary for the whole estate. Directly beneath the tenant root management group sits an intermediate root management group (e.g.
contoso) that you own and can assign policy to without touching the tenant-root scope, which is shared with the rest of the tenant. - Under the intermediate root, the hierarchy branches into the canonical Enterprise-Scale archetypes: Platform (with
Management,Connectivity, andIdentitychildren), Landing zones (withcorpandonlinechildren), Sandbox, and Decommissioned. Each branch is a policy scope. - Azure Policy initiatives are assigned at the management-group level and inherit downward: a “deny public IP on NICs” policy assigned at
corpapplies to every current and future subscription under it; a “require diagnostic settings to the central workspace” policy at the intermediate root applies to the entire estate. RBAC role assignments inherit the same way, so a platform-operator group granted Reader at the intermediate root sees everything without per-subscription grants. - The leaves of the tree are subscriptions — three or four platform subscriptions, and N application subscriptions that are vended into the
corporonlinearchetype already carrying the right policies, RBAC, budget, and network attachment.
The traffic path, through the platform:
- An on-prem user or system reaches Azure over ExpressRoute (private peering), landing on an ExpressRoute Gateway inside a Virtual WAN secured hub in the nearest region. (Branch offices and remote users reach the same hub via Site-to-Site / Point-to-Site VPN gateways co-located in the hub.)
- The secured hub’s routing intent sends the flow to the hub’s Azure Firewall Premium (deployed as the hub’s security provider). The firewall is the policy enforcement point for the network: it inspects the flow, applies network/application rules, optionally runs TLS inspection + IDPS, and either forwards or drops.
- From the firewall, traffic routes to the destination spoke VNet — an application landing zone’s network — which is connected to the hub by Virtual WAN. A
corpspoke is reachable only via the hub (no public ingress); anonlinespoke may additionally sit behind Azure Front Door / Application Gateway + WAF for internet ingress, but its egress still funnels through the hub firewall. - East-west traffic between two spokes also transits the hub firewall (routing intent forces spoke-to-spoke through the security provider), so lateral movement between landing zones is inspected, not implicit.
- Outbound (egress) traffic from any spoke to the internet is forced through the hub firewall by the default route Virtual WAN programs into the spokes — there is no split-tunnel, no per-spoke NAT gateway, no workload talking directly to the internet. The firewall’s public IP / NAT is the single, known, allow-listed egress identity for the whole estate.
- Name resolution for private endpoints flows through Azure DNS Private Resolver (or DNS forwarders) in the connectivity hub, with the Private DNS zones centralized so every spoke resolves
privatelink.*records consistently and on-prem can resolve Azure private endpoints over ExpressRoute.
Overlaying both axes is the management and security plane: every subscription’s diagnostic logs flow to a central Log Analytics workspace in the Management platform subscription, and Microsoft Defender for Cloud is enabled across the whole estate (via policy at the intermediate root) so secure-score, regulatory-compliance dashboards, and threat alerts span every landing zone, with detections wired into Microsoft Sentinel on that same workspace.
If you sketch this on a whiteboard it is two overlaid pictures. The governance picture is an org-chart-shaped tree: tenant root → intermediate root → {Platform → (Management, Connectivity, Identity); Landing zones → (corp, online); Sandbox; Decommissioned}, with little “policy” tags hanging off each node and arrows pointing down. The traffic picture is a hub-and-spoke star centered on the Virtual WAN secured hub: ExpressRoute and VPN gateways on one side feeding into the hub, the Azure Firewall sitting inside the hub as a chokepoint, and corp/online spokes radiating out — every arrow into or out of a spoke passing through the firewall in the middle.
Component breakdown
Each component does exactly one job: carry governance down, or carry traffic through, or watch the whole estate. The table summarizes; the prose covers the decisions that bite.
| Component | Role in the architecture | Key configuration choices |
|---|---|---|
| Management group hierarchy | The governance backbone — policy & RBAC inheritance scopes | Intermediate root under tenant root; archetypes Platform/{Mgmt,Conn,Identity}, Landing zones/{corp,online}, Sandbox, Decommissioned; assign at MG, never per-subscription |
| Platform subscriptions | Host shared, estate-wide services | Three: Management (Log Analytics, Defender, automation), Connectivity (hub, firewall, gateways, DNS), Identity (DC VMs / Entra DS if needed) — owned by the platform team |
| Application subscriptions (landing zones) | Where workloads actually run | Vended into corp (private, hub-routed) or online (internet-facing) archetype; pre-wired with policy, RBAC, budget, spoke VNet |
| Azure Virtual WAN | Microsoft-managed global network backbone & secured hubs | Standard SKU; one secured hub per region; routing intent for internet + private traffic; replaces hand-built hub-and-spoke + UDR sprawl |
| Azure Firewall (Premium) | The single network policy-enforcement & egress point | Deployed as the hub’s security provider; Premium for TLS inspection + IDPS + URL filtering; Firewall Policy managed centrally and inherited by child policies; forced-tunnel egress |
| ExpressRoute (+ Gateway) | Private, resilient hybrid connectivity to on-prem | Private peering; dual circuits across two peering locations for an SLA; ExpressRoute Gateway in the secured hub; VPN as encrypted backup / for branches |
| Azure DNS Private Resolver + Private DNS zones | Consistent private-endpoint name resolution across the estate and on-prem | Centralized privatelink.* zones linked to the hub; resolver inbound/outbound endpoints so on-prem resolves Azure private endpoints over ER |
| Azure Policy (initiatives) | Policy-as-code guardrails enforced by inheritance | Curated initiative set at intermediate root / archetype MGs; mix of Deny, Audit, DeployIfNotExists, Modify; remediation tasks for existing resources |
| Microsoft Defender for Cloud | Cloud-native posture management (CSPM) + workload protection (CWPP) | Enabled estate-wide via policy; CSPM + per-resource-type plans; regulatory compliance packs (PCI/ISO/CIS); secure score as a governance KPI |
| Microsoft Sentinel | SIEM/SOAR on the central workspace | Connected to the same Log Analytics workspace; Defender + Entra + Firewall connectors; analytics rules and automated playbooks |
| Log Analytics workspace | Central observability sink for the whole estate | One workspace in the Management sub; diagnostic settings forced by policy on every resource; data-collection rules; retention by data class |
The management-group hierarchy is the single most consequential and least reversible decision in the whole architecture. Subscriptions are cheap to mint but painful to re-parent once they carry live, policy-bound resources — moving a production subscription to a different management group can flip dozens of inherited policy effects underneath running workloads. So you design the tree first, deploy it before any workload subscription exists, and keep it deliberately shallow and archetype-based rather than mirroring your org chart (org charts re-org; archetypes don’t). The corp vs online split is the one that earns its keep: it lets you assign genuinely different policy at the two scopes — corp denies public IPs outright and forces all traffic through the hub, while online permits a vetted public-ingress pattern (WAF in front) but still forces egress through the firewall. Assign policy and RBAC at the management-group level, never per subscription — the entire value proposition is “assign once, inherit everywhere,” and per-subscription assignments are exactly the drift you are trying to eliminate.
Virtual WAN is the deliberate choice over hand-built hub-and-spoke, and the reason is operational, not technical novelty. A classic hub-and-spoke works, but at scale it accretes a sprawl of user-defined routes, manual VNet peerings, and gateway-transit toggles that one person eventually “understands,” which is a liability. Virtual WAN gives you a Microsoft-managed hub where the backbone, route propagation, and gateway integration are handled for you, and — critically — routing intent lets you declare “send all internet-bound and all private traffic through the hub’s security provider (the firewall)” as a single policy instead of authoring and maintaining UDRs on every spoke. That declaration is what forces east-west and egress through the firewall by construction, and it scales to a global mesh (hubs in multiple regions, automatically interconnected over Microsoft’s backbone) by adding a hub, not by re-architecting. Use Standard Virtual WAN (Basic does not support secured hubs or ExpressRoute), and deploy a hub per region where you have landing zones.
Azure Firewall Premium is the estate’s chokepoint, and “Premium” is a deliberate regulatory/security call, not a default. The firewall is deployed as the secured hub’s security provider, which is what lets routing intent point traffic at it. Standard Azure Firewall gives you FQDN-based application rules, network rules, threat-intel filtering, and DNS proxy — enough for many mid-market estates. You step up to Premium when you need TLS inspection (decrypt-inspect-re-encrypt outbound flows, e.g. to enforce data-egress controls or catch malware in encrypted traffic), IDPS (signature-based intrusion detection/prevention), and URL filtering / web categories — typically a regulatory or high-security requirement. The structural decision that pays off is Firewall Policy with inheritance: define a parent policy with the estate-wide rules (block known-bad, allow the sanctioned egress FQDNs, the IDPS configuration) and let per-hub or per-environment child policies inherit it and add their own rules — the same “define once, inherit” pattern as Azure Policy, applied to network rules. And the egress identity matters operationally: because all outbound traffic exits through the firewall’s public IP(s), that becomes the single, stable, allow-listable source IP that SaaS vendors and partners whitelist for the entire organization.
ExpressRoute is where the resilience math is real, and “one circuit” is the classic trap. A single ExpressRoute circuit is a single point of failure for all hybrid connectivity, and Microsoft’s ExpressRoute SLA only applies when you have two circuits terminating at two different peering locations (geo-redundant), each on diverse on-prem links. The ExpressRoute Gateway lives in the secured hub (Virtual WAN supports this directly), and you keep a Site-to-Site VPN as an encrypted backup path so a circuit failure degrades to VPN rather than dropping on-prem entirely. ExpressRoute is private peering (your routes, your address space) — it does not traverse the public internet — which is precisely why regulated workloads and latency-sensitive on-prem integrations require it. Branch offices that don’t justify a circuit, and remote/admin users, attach to the same hub via S2S/P2S VPN, so on-prem of every shape converges on one audited entry point.
Defender for Cloud and Policy are two halves of one governance loop, and the wiring between them is the point. Azure Policy is the enforcement engine — Deny stops a non-compliant resource at creation, DeployIfNotExists and Modify auto-remediate (e.g. deploy the Defender agent, add diagnostic settings, append the required tag). Defender for Cloud is the posture engine — it continuously assesses every resource against security recommendations, rolls them up into a secure score, and maps them to regulatory compliance packs (PCI DSS, ISO 27001, CIS Azure, NIST). The connection: you turn on Defender’s plans via Azure Policy assigned at the intermediate root (so every current and future subscription is covered automatically), and Defender’s findings feed back as the audit signal that tells you whether your policy set is actually sufficient. Detections (alerts) flow into Microsoft Sentinel on the central workspace, where analytics rules and automated playbooks (Logic Apps) turn a Defender alert into an investigation or an auto-containment action. The whole estate’s security becomes one dashboard and one score, not a per-subscription scavenger hunt.
Implementation guidance
The entire landing zone should be expressed as infrastructure-as-code in a Git repository, deployed through a reviewed pipeline — the platform itself becomes code that is version-controlled, peer-reviewed, and reversible. You do not click this together in the portal; a clicked-together landing zone has no source of truth, drifts immediately, and cannot answer “who changed the firewall policy and when.” The two mainstream paths are Azure Verified Modules (AVM) for Platform Landing Zones — the Microsoft-owned, tested module library, available for both Bicep and Terraform — and, for Terraform shops specifically, the long-standing Azure/caf-enterprise-scale (ALZ) module. Both deploy the same conceptual architecture; pick by your team’s IaC skill.
IaC structure (Terraform with the ALZ pattern shown; AVM/Bicep maps one-to-one with modules and parameter files):
# 1. The governance backbone: management-group hierarchy + policy assignments,
# deployed FIRST, before any workload subscription exists.
module "enterprise_scale" {
source = "Azure/caf-enterprise-scale/azurerm"
root_parent_id = data.azurerm_client_config.core.tenant_id
root_id = "contoso" # the intermediate root MG
root_name = "Contoso Platform"
# Turn on the canonical archetypes (Platform, Landing zones/corp+online, Sandbox, Decommissioned)
deploy_core_landing_zones = true
# Deploy the platform resources INTO the platform subscriptions:
deploy_management_resources = true # Log Analytics + Defender + Sentinel solutions
subscription_id_management = var.sub_management
deploy_connectivity_resources = true # the network hub lives here
subscription_id_connectivity = var.sub_connectivity
deploy_identity_resources = true
subscription_id_identity = var.sub_identity
# The opinionated thing: deploy a Virtual WAN secured hub, not classic hub-and-spoke.
configure_connectivity_resources = {
settings = {
vwan_hub_networks = [{
config = {
address_prefix = "10.100.0.0/23"
location = "westeurope"
# Routing intent: force internet + private traffic through Azure Firewall.
secure_hub = {
enabled = true
azure_firewall_sku = "Premium" # TLS inspection + IDPS
expressroute_gateway = true
vpn_gateway = true
}
}
}]
}
}
}
# 2. Subscription vending: minting a NEW, pre-governed application landing zone.
# Each call yields a subscription already in the corp/online archetype,
# spoke-attached to the hub, budget-capped, and policy-compliant on day one.
module "lz_payments" {
source = "Azure/lz-vending/azurerm" # the subscription-vending module
subscription_alias_enabled = true
subscription_billing_scope = var.mca_billing_scope
subscription_display_name = "lz-payments-prod"
# Place it under the corp archetype -> inherits "deny public IP", "force hub route", etc.
subscription_management_group_id = "contoso-landingzones-corp"
# Pre-wire the spoke into the Virtual WAN hub (no manual peering, ever).
virtual_network_enabled = true
virtual_networks = {
spoke = {
address_space = ["10.20.0.0/22"] # from the central IPAM allocation
vwan_connection_enabled = true
vwan_hub_resource_id = module.enterprise_scale.vwan_hub_id
vwan_security_configuration = { secure_internet_traffic = true } # egress via firewall
}
}
# Budget + owner RBAC handed to the app team at the subscription scope only.
budget_amount = 8000
role_assignments = {
team = { principal_id = var.payments_team_group, definition = "Contributor", scope = "subscription" }
}
}
Three implementation rules carry most of the weight:
- Deploy the hierarchy and platform before the first workload, and never re-parent live subs. The management-group tree and the three platform subscriptions go in first. Order is not cosmetic: a subscription created outside the tree and moved in later inherits a different set of policy effects at move-time, under load. Build the tree, then vend into it.
- Vend subscriptions; never hand-craft them. Every application landing zone comes from the same vending module call — that is what guarantees it lands in the right archetype, attached to the hub, budgeted, and compliant. A hand-made subscription is a snowflake that the platform team will spend the next year reconciling.
- Allocate IP space from a central plan, not by improvisation. A spoke’s
address_spacecomes from a documented IPAM allocation (Azure now offers a native IPAM in Network Manager, or a spreadsheet/PowerShell source of truth) so ranges never collide — overlapping CIDRs are the single most common reason a spoke cannot connect to the hub, and re-IP-ing a live VNet is brutal.
Networking and identity wiring. The Connectivity subscription hosts everything shared on the wire: the Virtual WAN, the secured hub(s), the Azure Firewall, the ExpressRoute and VPN gateways, the DNS Private Resolver, and the centralized privatelink.* Private DNS zones. Application spokes live in their own subscriptions and connect to the hub via Virtual WAN connections — there is no spoke-to-spoke peering and no per-spoke internet egress; routing intent forces both through the hub firewall. Identity is group-and-PIM-first: humans never get standing privileged access — they are made eligible for roles via Microsoft Entra Privileged Identity Management and elevate just-in-time with approval and an audit trail, and all access flows through Entra security groups mapped to roles at the management-group scope so it inherits. Workloads authenticate with managed identities (no secrets, no connection strings in code), and secrets that must exist live in Key Vault behind a private endpoint resolved through the central DNS. Defender’s plans are switched on through policy at the intermediate root so coverage follows every new subscription automatically — you never “remember” to protect a new landing zone.
Enterprise considerations
Security and Zero Trust. The architecture is Zero-Trust-shaped by construction. Verify explicitly: Entra ID + PIM means no standing admin rights and every elevation is approved and logged; Conditional Access gates the control plane. Least privilege: RBAC inherits from management groups so app teams get Contributor on their subscription and nothing more, and the corp archetype denies them the ability to even create a public IP. Assume breach: the Azure Firewall inspects east-west traffic between spokes (lateral movement is not implicit — it is inspected, and can be denied), TLS inspection + IDPS on egress catches exfiltration and malware in encrypted flows, every data service sits behind a private endpoint with public network access disabled (enforced by policy), and Defender for Cloud plus Sentinel give continuous detection across the whole estate. Network micro-segmentation is real here: a compromise in one landing zone’s spoke cannot reach another spoke without transiting the firewall, where it is logged and (if it violates a rule) dropped. Pair Defender’s regulatory compliance dashboard (PCI/ISO/CIS) with the policy set so “are we compliant” is a live query, not an annual audit scramble.
Cost optimization. The platform itself has a real, mostly-fixed cost that you must own and justify before the first workload — and naming it up front prevents the “why is the empty landing zone expensive?” conversation later. The big-ticket fixed items are the Azure Firewall (Premium carries a meaningful hourly + data-processing charge — it is frequently the largest single line in the connectivity subscription), the ExpressRoute circuits (port + the chosen bandwidth tier, doubled for the geo-redundant pair), the Virtual WAN hub and gateway units, and the Log Analytics ingestion (which grows with the estate). Levers that genuinely reduce the bill without breaking the pattern: share the firewall and gateways across all spokes (centralizing them in the hub is already a cost optimization versus a firewall per VNet); right-size to Standard Azure Firewall if you don’t need TLS inspection/IDPS; apply reservations / savings plans to the steady-state firewall, gateways, and any platform VMs (1- or 3-year commits on always-on infrastructure are close to free money); tier Log Analytics with the Basic/Auxiliary logs plans and a Data Collection Rule that drops noisy verbose logs at ingestion; and — the governance lever — put a budget with action-group alerts on every vended subscription so each app team sees and owns its spend, plus Defender/Policy to catch idle and over-provisioned resources. The honest framing: the landing zone is shared overhead amortized across every workload — the more landing zones ride on it, the cheaper it is per workload, which is itself the argument for vending many subscriptions onto one platform rather than each team building its own.
Scalability. Growth is a parameter, not a project — that is the whole promise. Another team? A single vending-module call yields a new governed, hub-attached, budgeted subscription the same day; policy and RBAC inherit automatically, so it is compliant on creation with zero manual setup. Another region? Add a vwan_hub_networks entry — Virtual WAN stands up the new secured hub and auto-connects it to the existing hubs over Microsoft’s backbone, and spokes in that region get the same firewall-forced routing by inheritance. Another on-prem site? Attach it as a VPN connection (or another ExpressRoute) to the nearest hub. The architecture is explicitly modular — platform decisions are made once and reused — so it scales from three subscriptions to several hundred without a re-platform. The constraints to watch are the ones you sized early: the management-group depth (keep it shallow), the IP plan (allocate generously from a central IPAM so you don’t run out of contiguous space), and the firewall’s throughput tier (scale the hub’s firewall up before egress saturates it).
Reliability and DR (RTO/RPO). Reliability here is mostly about the platform services, since the workloads bring their own DR — and the numbers come from specific tiers, never conflated. The connectivity hub is the critical shared dependency: ExpressRoute reliability comes from the two-circuits-at-two-peering-locations design (single circuit = single point of failure, and no SLA), with VPN as the automatic encrypted backup so a circuit loss degrades rather than disconnects; Virtual WAN hubs and Azure Firewall are zone-redundant within a region when deployed across availability zones, and multiple regional hubs mean a regional hub failure is survivable for a multi-region estate. The governance and management planes are resilient by nature — management groups and policy live in Azure Resource Manager (no DR to manage), and the central Log Analytics workspace can be regionally paired or its critical data exported. The DR numbers that matter for the platform: aim for the connectivity path to survive a circuit failure with near-zero RTO (VPN backup takes over) and a regional hub failure with minutes RTO (traffic re-homes to another regional hub) — and document that workload RTO/RPO is the application team’s responsibility inside their landing zone, with the platform providing the substrate (paired regions, backup vault patterns, Site Recovery options) but not owning each app’s recovery. The non-negotiable practice: rehearse the failovers — fail an ExpressRoute circuit in a drill and confirm VPN carries on-prem traffic; the DR you have tested beats the DR you have drawn.
Observability. The central Log Analytics workspace in the Management subscription is the single pane: Azure Policy forces diagnostic settings onto every resource in the estate (a DeployIfNotExists policy at the intermediate root), so logs and metrics from every subscription land in one place automatically — no team can “forget” to wire up logging. Data Collection Rules shape what is ingested (and at what cost tier), Microsoft Sentinel sits on that workspace as the SIEM with connectors for Defender, Entra ID sign-in/audit logs, and the Azure Firewall, and Azure Monitor workbooks/alerts plus the Defender secure-score trend give the platform team estate-wide health and posture at a glance. The signals unique to this architecture to alert on: firewall deny-rate and IDPS hits (a spike is either an attack or a misconfigured app), ExpressRoute circuit BGP / availability (the hybrid lifeline), policy-compliance drift (a sudden drop means someone created non-compliant resources or a remediation failed), Defender secure-score regressions, and budget burn per subscription.
Governance. This is the governance article, so the meta-point is that governance is enforced as code and inherited, not documented and hoped for. The curated Azure Policy initiative assigned at the intermediate root / archetype scopes is the heart of it — a deliberate mix of Deny (no public IPs in corp, no public network access on PaaS data services, only approved regions, only approved resource types), DeployIfNotExists/Modify (auto-deploy diagnostic settings and Defender agents, append mandatory tags: CostCentre, Environment, Owner, DataClassification), and Audit (everything you want visibility on before you enforce). Start from the AVM/ALZ baseline initiatives and customize deliberately — add the controls your compliance regime demands, and keep a policy-exemptions register documenting every intentional deviation so exceptions are auditable rather than invisible. Cost governance rides the same inheritance: budgets at the management-group and subscription scope, Cost Management views per archetype, and tag-enforcement so every dollar is attributable. The result is the boundary the scenario demanded: central control where it matters (security, network, compliance, cost visibility), team velocity everywhere else.
Reference enterprise example
Meridian Pay is a fictional mid-market payments company (≈900 employees, UK + EU, PCI-DSS Level 1) that had grown to seven Azure subscriptions the organic way: a flat estate, Owner shared widely, three teams’ VNets with overlapping 10.0.0.0/16 ranges, and “governance” in a Confluence page. Two forcing events converged. First, their QSA flagged that they could not demonstrate a control (only a policy document) preventing public exposure of cardholder-data stores, nor prove every store used customer-managed encryption — a finding that threatened their attestation. Second, a new bank partner mandated private connectivity (no internet path) for the settlement integration, which their VPN-mesh could not deliver in an auditable way. The board signed off on a landing-zone foundation with a hard target: pass the next PCI assessment with policy-enforced controls, deliver private bank connectivity over ExpressRoute, and onboard new application teams in days, not weeks.
What they built. A single meridian intermediate root management group with the canonical archetypes. Three platform subscriptions — Management (central Log Analytics + Defender for Cloud + Sentinel), Connectivity (a Virtual WAN Standard secured hub in uksouth with a second in westeurope), and Identity. The hub ran Azure Firewall Premium with TLS inspection and IDPS enabled — a direct PCI-driven decision so egress from the cardholder-data environment is decrypted, inspected, and logged. Dual ExpressRoute circuits terminated at two peering locations gave the bank settlement integration a private, SLA-backed path, with S2S VPN as the encrypted backup. Their cardholder-data workloads landed in the corp archetype (public IPs denied by policy, all traffic forced through the hub firewall, private endpoints mandatory); the customer-facing payment-status portal landed in online (Front Door + WAF for ingress, but egress still funneled through the firewall). The whole thing was deployed from a Git repo via Azure DevOps using the ALZ Terraform module, and new landing zones came from the lz-vending module.
A decision that bit, and the fix. Their first instinct was to keep the existing seven subscriptions and “just move them under the new management-group tree.” In a staging rehearsal they discovered that moving a live subscription into the corp archetype immediately triggered the Deny public IP and private-endpoint-required policies against running resources that had public IPs and public PaaS endpoints — the policies didn’t delete anything, but new deployments and scale operations started failing, and several resources showed as non-compliant overnight. The lesson — governance applies the instant you re-parent, under load — pushed them to a migrate-via-vending approach instead: they vended fresh compliant subscriptions, redeployed workloads into them from IaC, remediated the public-exposure findings during the move, and decommissioned the old snowflakes into the Decommissioned archetype. It was more work than a re-parent, but it turned the PCI finding into a fixed state rather than a flood of exemptions.
The numbers.
| Dimension | Before (organic estate) | After (Enterprise-Scale Landing Zone) |
|---|---|---|
| Governance model | Confluence page, Owner shared |
~70 Azure Policy assignments enforced by MG inheritance |
| PCI control evidence | Documentation only | Live Defender regulatory-compliance dashboard + Deny policies |
| Public-exposure risk (CDE) | Possible, unaudited | Deny public IP + private-endpoint-required in corp |
| Bank connectivity | VPN mesh, internet-adjacent | Dual ExpressRoute, private peering, VPN backup |
| Egress path | Per-team, uninspected | Single firewall, TLS-inspected + IDPS, one allow-listed egress IP |
| New team onboarding | ~2–3 weeks, manual | Vended subscription, same day, compliant on creation |
| Privileged access | Standing Owner |
PIM just-in-time, approved & audited |
| Defender secure score | Not measured | >82% within the first month |
The decision worth copying is the migrate-via-vending path: rather than dragging live, non-compliant subscriptions under enforcement and drowning in exemptions, Meridian treated the landing zone as a clean foundation and moved workloads into it from IaC, remediating as they went. The outcome after one quarter: the PCI assessment passed with the public-exposure and encryption controls demonstrated as enforced policy (not documents), the bank settlement integration ran over private ExpressRoute with a tested VPN failover, eleven application landing zones were vended onto the shared platform (each carrying its own budget and inheriting every guardrail), and the platform team’s per-subscription toil effectively vanished because compliance was now a property of creation, not a follow-up task.
When to use it
Use this architecture when more than one team shares your Azure estate, governance and compliance must be enforced and provable (not documented), hybrid connectivity is load-bearing (ExpressRoute/VPN to on-prem), and you expect the estate to grow to many subscriptions — and when the organization will fund a shared platform team and the platform’s fixed cost (firewall, ExpressRoute, hub, central logging) as overhead amortized across workloads. It scales from a mid-market company running the “thin” version (a single region, Standard firewall, the core archetypes) to a regulated global enterprise running the “full” version (multi-region hubs, Premium firewall with TLS/IDPS, dual ExpressRoute, a large policy estate). The diagram is the same; you choose which optional planes to switch on.
Trade-offs to accept going in. There is real fixed cost and real platform-engineering effort before any workload ships — the firewall, gateways, and central logging cost money on day one, and standing up the IaC + pipelines + policy set is weeks of senior work. You are buying a foundation, and a foundation has up-front cost that only pays back as workloads land on it. The discipline tax is also real: subscriptions must be vended (not hand-made), IP space must be allocated (not improvised), and changes go through pipelines (not the portal) — teams that resist that discipline will route around the platform and recreate the sprawl.
Anti-patterns that quietly defeat the design:
- A flat estate with no management groups, “we’ll add governance later.” Governance bolted on after workloads are live means re-parenting production subscriptions under enforcement — exactly the painful, exemption-flooding migration Meridian hit. Build the tree first.
- Per-subscription policy and RBAC instead of inheritance. Assigning policy/RBAC on each subscription individually is the drift you are trying to eliminate — and it does not scale past a handful of subscriptions. Assign at the management-group scope and let it inherit.
- A single ExpressRoute circuit. One circuit is a single point of failure for all hybrid connectivity and carries no SLA. Two circuits at two peering locations, plus VPN backup, or accept that on-prem connectivity has no resilience.
- Per-spoke internet egress / split tunnel. The moment a workload talks to the internet directly (its own NAT gateway, a public IP on the VM), you have lost the single inspected, allow-listable egress point — and your firewall logs no longer tell the whole story. Force all egress through the hub via routing intent.
- Clicking the landing zone together in the portal. No source of truth, instant drift, and no answer to “who changed this and when.” The platform must be code in Git, deployed through a reviewed pipeline.
- Hand-crafting every new subscription. Snowflake subscriptions that the platform team perpetually reconciles. Vend from one module so every landing zone is identical and compliant on creation.
- Building the full enterprise-scale rig when you needed the thin version (or vice versa). A three-team company stalling for months on Premium-firewall-plus-dual-ExpressRoute it doesn’t need, or a regulated enterprise standing up a flat “start small” estate it must re-platform in a year. Match the switched-on planes to your actual maturity and compliance obligations — but keep the management-group bones in both cases, because that is the part that is expensive to retrofit.
Alternatives, in increasing capability and cost: (1) Single subscription, well-tagged, with a few policies — fine for a single team and a single workload; survives until the second team arrives. (2) “Start small” landing zone — the same management-group bones with a trimmed policy set and a simple hub, grown incrementally as workloads demand; the pragmatic on-ramp that becomes enterprise-scale without a re-platform because it sat on the right tree. (3) Partner-led landing zone — a Microsoft partner delivers their validated implementation mapped back to CAF; right when you lack in-house platform-engineering capacity. (4) This architecture — full Enterprise-Scale Landing Zone — the complete foundation, for many workloads under central governance with enforced compliance and hybrid connectivity. Pick the lowest tier that meets the control and connectivity requirements the business will actually fund — but resolve the management-group hierarchy and IP plan up front in every tier, because those are the decisions you cannot cheaply walk back.