Most teams that adopt Azure Verified Modules (AVM) stop at “I called a module and got a resource.” That is the demo, not the value. The real win is using AVM as the substrate for an opinionated platform layer your application teams consume without ever touching a raw azurerm_ resource — a layer that bakes in private endpoints, diagnostic settings, mandatory tags and naming as types, validated at plan, so a publicly-exposed Key Vault is not something an app team can ship by accident because the lever does not exist. This guide builds that layer end to end: composing AVM resource modules into your own pattern modules, pinning them sanely (the pre-1.0 ~> trap that nukes 40 data planes in one Renovate merge), testing them at two altitudes, and shipping them through a private registry — with the state-migration discipline that keeps every upgrade a zero-destroy event.
The thing that makes this hard is not Terraform syntax; it is that AVM modules are pre-1.0, so the version-constraint intuition every engineer carries (~> 0.9 is “patches only”) is exactly wrong, and the place that intuition fails is the shared wrapper that 40 repos depend on. The thing that makes it worth doing is that once the wrapper exists, your platform team upgrades the entire estate by merging one pinned-version PR with a reviewed plan diff, and your app teams ship spokes, vaults and storage that are private, tagged and observable by construction. This article is the reference you keep open while you build that: every interface input, every pin rule, every test layer, every migration block, laid out as scannable tables so you read the prose once and then work from the tables.
By the end you will stop treating AVM as a fancier resource and start treating it as the brick library underneath your org’s non-negotiables. You will know precisely which version constraint pins a 0.x module without admitting a breaking minor, which inputs to expose and which to weld shut, how to assert a wrapper’s shape at plan without deploying, how to run a real apply/destroy against an ephemeral subscription with keyless OIDC, how to publish to a private registry on a semver tag, and — the part that separates a senior platform engineer from someone who read the README — how to absorb AVM’s internal resource-address churn inside the wrapper so a minor bump never shows up as destroy/create in a consumer’s plan.
What problem this solves
The pain this solves is module sprawl plus silent drift from your standards. Before a platform layer, every app team copies a different community module (or hand-rolls azurerm_ resources), each with different input names, none of which reliably support diagnostic settings, locks, role assignments or private endpoints. Security finds a publicly-accessible storage account in a quarterly review; the team that built it points at a wiki page nobody read. “Please tag things” and “please use private endpoints” live as documentation, which means they live as suggestions. Multiply by 40 repos and you have an estate you cannot reason about, where the answer to “is everything private?” is “let me go check, repo by repo.”
What breaks without this layer: governance becomes archaeology. You cannot upgrade a defaulting convention across the estate because there is no single place that owns it. A new compliance rule (force-Entra-auth on storage, deny public Key Vault) becomes 40 pull requests against 40 inconsistent codebases instead of one version bump. And when you do try to standardise by swapping hand-rolled modules for AVM, the naive attempt shows every storage account scheduled for destroy/create in the plan — because the resource address changed — so the migration gets reverted and “AVM doesn’t work for us” enters the team’s folklore.
Who hits this: every platform / cloud-engineering team operating Terraform at more than a handful of repos, especially under a landing-zone program where the Azure Cloud Adoption Framework landing zones defines the guardrails but leaves how app teams provision inside a spoke to you. It bites hardest where pre-1.0 AVM modules are pinned with ~> (the breaking-minor trap), where wrappers are accidental passthroughs of the full AVM surface (no guardrail value), and where nobody reads the migration plan before merging a Renovate AVM bump.
To frame the whole field before the deep dive, here is every layer of the module supply chain this article builds, who owns it, and the single failure that bites at each:
| Layer | What it is | Who owns it | The failure that bites here |
|---|---|---|---|
Upstream AVM resource module (avm-res-*) |
One logical resource + its children, WAF-aligned | Microsoft | Pre-1.0: ~> 0.x admits a breaking minor |
Upstream AVM pattern module (avm-ptn-*) |
A multi-resource architecture (hub-spoke, LZ) | Microsoft | Heavier blast radius on a bad bump |
| Your platform wrapper | Org pattern composed from AVM bricks + injected policy | Platform team | Accidental passthrough = no guardrail |
| Test + release gate | terraform test + Terratest + replace-gate in CI |
Platform team | Unacknowledged destroy/create ships |
| Private registry / git ref | Versioned, semver-tagged distribution | Platform team | Copy-paste instead of source/version |
| App-team consumption | Narrow inputs only; deploy into Azure | App teams | A leaked lever lets them go public |
Learning objectives
By the end of this article you can:
- Distinguish AVM resource modules from pattern modules, and place your wrapper as a deliberate third tier that composes resource bricks and injects org policy — without forking AVM.
- Read an AVM module’s real interface (
tags,lock,role_assignments,diagnostic_settings,private_endpoints,managed_identities,enable_telemetry) instead of guessing from prose. - Pin AVM dependencies correctly for pre-1.0 modules — why
~> 0.9is dangerous, why~> 0.9.1is right in a wrapper, and why exact pins belong in the platform layer while~> X.Y.Zbelongs in app repos. - Compose AVM resource modules into a pattern wrapper that forces private endpoints, diagnostics and tags as non-negotiable inputs, with
validationblocks that turn conventions into hardplan-time failures. - Test wrappers at two altitudes — fast
terraform testplan-level contract assertions, and nightly Terratest against an ephemeral subscription with OIDC keyless auth. - Publish wrappers to a private registry (Terraform Cloud/Enterprise) or a versioned git ref (Azure DevOps), and consume them by
source/versionwith a semver contract. - Migrate hand-rolled modules to AVM without state churn using
movedandimportblocks, and gate CI so an unacknowledgeddestroy/createcan never merge. - Automate AVM upgrades with Renovate so each bump is one reviewable PR carrying a
terraform plandiff.
Prerequisites & where this fits
You should be comfortable with core Terraform: HCL, providers, the init/plan/apply workflow, modules with inputs and outputs, and remote state. If any of that is shaky, the Terraform fundamentals: HCL, providers, state & workflow and Terraform state deep dive come first. Module authoring conventions — inputs, outputs, versioning — are assumed from Authoring Terraform modules: structure, inputs, outputs, versioning. You should know what a version constraint means in principle (we will sharpen it for 0.x), and have an Azure subscription plus the azurerm provider configured.
This sits at the infrastructure-as-code / platform-engineering layer of an Azure estate. It assumes the landing-zone scaffolding above it — management groups, policy, the hub — from Azure Cloud Adoption Framework landing zones, and it produces the spokes app teams deploy into. It pairs with Terraform module design: composition, versioning (the composition theory), Terraform testing: native & Terratest (the test mechanics), and Terraform refactoring: moved, import & removed blocks (the migration mechanics this article applies to AVM specifically). For teams that prefer Bicep, the equivalent distribution story is Bicep private module registry with ACR & CI/CD.
A quick map of who confirms what when something goes wrong, so you route a problem to the right layer fast:
| Concern | Where it lives | Confirm with | Owns the fix |
|---|---|---|---|
| “Which AVM version actually resolved?” | .terraform.lock.hcl |
terraform providers lock / read the lock |
Platform team |
| “Why is the plan showing a replace?” | Wrapper resource addresses | terraform show -json + jq |
Platform team |
| “Why did the plan error on a deployment?” | enable_telemetry in a locked sub |
Plan error text | Platform + governance |
| “Why can this team go public?” | Wrapper variables.tf surface |
grep for the exposed lever | Platform team |
| “Is the published version right?” | Registry / git tag | terraform init in a consumer |
Platform + app team |
| “Did the migration churn state?” | Plan actions on adopt | replace-gate in CI | Platform team |
Core concepts
Five mental models make every later decision obvious.
AVM is a specification, not just a module set. The reason AVM is worth building on is not “Microsoft published modules” — it is that every module conforms to the same interface contract. Consistent input names, mandatory support for diagnostic settings, locks, role assignments, and (where the service supports them) private endpoints, plus Well-Architected (WAF) defaults rather than the bare minimum that compiles. You learn one shape and it generalises across services. That shared shape is what lets you write generic org policy (force diagnostics everywhere) instead of bespoke wiring per resource.
There are two AVM module classes, and your wrapper is a third tier. AVM ships resource modules (Azure/avm-res-<service>-<resource>/azurerm) — one logical resource plus its directly-dependent children — and pattern modules (Azure/avm-ptn-<pattern>/azurerm) — a whole multi-resource architecture. The mental model: resource modules are LEGO bricks; pattern modules are pre-built assemblies. Your platform layer is neither — it is a third tier: your own pattern modules, composed from AVM resource bricks, that encode your org’s non-negotiables. You generally do not fork AVM; you wrap it.
Pre-1.0 changes the meaning of ~>. This is the single most consequential fact in the article. AVM resource modules are below 1.0, and AVM treats the minor segment as the breaking-change segment while below 1.0. So ~> 0.9 (which feels like “0.9.x only”) actually expands to >= 0.9.0, < 1.0.0 and will happily pull a breaking 0.10.0. The constraint that pins to a non-breaking range is the three-part ~> 0.9.1 (allows 0.9.1 .. 0.9.x, blocks 0.10.0). If you remember one thing, remember this.
The wrapper’s value is what it does not expose. A platform module is valuable in proportion to the levers it removes. If your variables.tf mirrors the AVM module’s inputs, you have built a passthrough, not a platform — an app team can still ship a public Key Vault. The discipline is to expose a narrow contract (workload name, tags, the central LAW id) and inject the rest (public_network_access_enabled = false, enable_telemetry = false, forced diagnostics and private endpoints) as constants the caller cannot override. Guardrails as types, validated at plan, not as a wiki page.
Every AVM upgrade is a potential state migration. Exact version pins control when you take an upgrade, not whether it is safe. A minor bump can move a resource under a for_each map, changing its resource address — and a changed address means Terraform plans destroy + create, which on a storage account is a data-plane deletion. The senior move is to read every AVM bump as a possible state migration, absorb the address change inside the wrapper with a moved block shipped in the same version, and gate CI so an unacknowledged replace can never merge.
The vocabulary in one table
Pin down every moving part before the deep sections; the glossary repeats these for lookup, this is the mental model side by side:
| Concept | One-line definition | Where it lives | Why it matters |
|---|---|---|---|
Resource module (avm-res-*) |
One logical resource + children | Public Terraform registry | The brick you compose |
Pattern module (avm-ptn-*) |
A multi-resource architecture | Public registry | A pre-built assembly |
| Platform wrapper | Your pattern over AVM bricks | Your private registry / repo | Encodes org non-negotiables |
| AVM interface | The shared optional input contract | Each module’s variables.tf |
Lets you write generic policy |
enable_telemetry |
Empty ARM deployment for usage metrics | An AVM input (default true) |
Fails plan in locked subs |
~> 0.9.1 vs ~> 0.9 |
Three-part vs two-part 0.x pin | Module version arg |
One blocks breaking minors, one doesn’t |
validation block |
Custom input precondition | Wrapper variables.tf |
Turns conventions into plan failures |
terraform test |
Native plan/apply assertion runner | tests/*.tftest.hcl |
Fast contract checks, no deploy |
| Terratest | Go E2E apply/assert/destroy | test/*.go |
Real Azure validation, nightly |
moved block |
Declares old→new resource address | Wrapper .tf |
Absorbs AVM address churn |
import block |
Brings existing Azure into state | Wrapper / consumer .tf |
Brownfield adoption, no recreate |
| Replace gate | CI check rejecting destroy+create |
Pipeline step | Stops accidental data-plane loss |
Why AVM exists: the resource vs. pattern split
AVM is Microsoft’s effort to replace the sprawl of inconsistent community modules with a single, owned, specification-driven set. Two things make it worth building on. A shared specification: every module conforms to the same interface contracts — consistent input names, mandatory support for diagnostic settings, locks, role assignments, and (where relevant) private endpoints; you learn one shape and it generalises. WAF alignment: modules encode Well-Architected defaults rather than the bare minimum that compiles. The two module classes you actually compose with are the resource and pattern modules — and your platform layer is a third tier over them.
| Class | Terraform registry prefix | Scope | When you reach for it |
|---|---|---|---|
| Resource module | Azure/avm-res-<service>-<resource>/azurerm |
One logical resource + its directly dependent child resources | The brick for your own pattern |
| Pattern module | Azure/avm-ptn-<pattern>/azurerm |
A multi-resource architecture (hub-spoke, AKS landing zone) | A whole assembly you accept as-is |
| Platform wrapper (yours) | your private registry / git ref | Org pattern composed from AVM resource bricks + injected policy | What app teams actually consume |
The mental model: resource modules are LEGO bricks; pattern modules are pre-built assemblies. Your platform layer is a third tier — your own pattern modules, composed from AVM resource bricks, that encode your org’s non-negotiables. You generally do not fork AVM; you wrap it.
The decision of which tier to consume, by situation:
| If you need… | Consume | Why |
|---|---|---|
| A single Key Vault with org defaults | Resource module, wrapped | You inject policy the bare brick doesn’t enforce |
| A whole hub-spoke exactly as Microsoft ships it | Pattern module directly | No org-specific deltas; accept the assembly |
| A spoke with your naming, tags, PE, diagnostics | Your wrapper over resource bricks | The pattern module won’t encode your non-negotiables |
| A one-off experiment / spike | Resource module directly | Not worth a wrapper yet |
| To change a default across 40 repos | Your wrapper (one bump) | The only place that owns the convention |
Why build on AVM at all rather than community modules or raw resources — the three approaches side by side on the axes that matter at estate scale:
| Axis | Raw azurerm_ resources |
Community modules | AVM (wrapped) |
|---|---|---|---|
| Interface consistency | None (you write it all) | Varies wildly per author | Mandated, identical shape across services |
| Diagnostics / locks / PE support | Hand-wired each time | Sometimes, inconsistently | First-class, standard inputs |
| Defaults | Whatever you type | Author’s opinion | WAF-aligned (good baseline) |
| Ownership / maintenance | You own everything | Author may abandon it | Microsoft-owned, supported |
| Upstream fixes | N/A | If the author ships them | Flow to you (you compose, not fork) |
| Org policy injection | Manual, per resource | Fork or pray | Inject once in your wrapper tier |
| Estate-wide change | N PRs, N codebases | N PRs | One wrapper bump |
A bare resource-module call looks like this — the starting point you will deliberately narrow and harden in your wrapper:
module "kv" {
source = "Azure/avm-res-keyvault-vault/azurerm"
version = "0.9.1"
name = "kv-platform-eus-01"
resource_group_name = azurerm_resource_group.platform.name
location = "eastus"
tenant_id = data.azurerm_client_config.current.tenant_id
}
That call gets you a vault, but with AVM’s defaults and the full AVM surface exposed — neither of which is what you ship to app teams. The whole rest of this article is turning that into a guarded, distributed, upgrade-safe platform brick.
Reading an AVM module’s interface
Before wrapping anything, read the interface — not the README prose, the actual variables. Because the AVM spec mandates a shared shape, resource modules share a recognisable set of optional inputs beyond the resource-specific ones. Know this set cold; it is the surface you decide to expose, inject or forbid in your wrapper.
| AVM input | Type (shape) | What it does | Your wrapper’s stance |
|---|---|---|---|
tags |
map(string) |
Tags applied to the resource | Expose (validated for mandatory keys) |
lock |
object | Apply CanNotDelete / ReadOnly management lock |
Inject (org default) or expose narrowly |
role_assignments |
map(object) |
RBAC assignments, keyed for add/remove without reindexing | Inject baseline; optionally extend |
diagnostic_settings |
map(object) |
Log/metric categories → workspace/storage/Event Hub | Inject (non-negotiable → central LAW) |
private_endpoints |
map(object) |
PE definitions (subnet, private DNS zone group) | Inject (non-negotiable on PE-capable services) |
managed_identities |
object | System- and/or user-assigned identity wiring | Inject or expose per pattern |
enable_telemetry |
bool (default true) |
Tiny empty ARM deployment for usage metrics | Inject false org-wide |
<resource>-specific |
varies | e.g. public_network_access_enabled, sku_name |
Mostly forbid; expose only the safe ones |
That enable_telemetry row deserves a callout because it fails in a way that wastes an afternoon:
enable_telemetry: AVM modules deploy a tiny, empty ARM deployment whose name encodes the module and version. It sends no resource data to Microsoft — it lets the team measure module usage. It is harmless, but in locked-down subscriptions whereMicrosoft.Resources/deploymentsis policy-denied, it will fail a plan with a confusing error. Decide your org default once (we set itfalseand bake that into our wrappers) rather than per-call.
Inspect the real inputs instead of guessing — pull the module and read its variables directly:
terraform init
terraform providers schema -json > /dev/null # sanity-check provider wiring
# Read the module's own variables directly:
find .terraform/modules/kv -name 'variables.tf' -exec grep -E '^variable' {} +
The AVM-standard inputs and their direct-resource equivalents, so you know what the brick is wiring under the hood:
| AVM input | Underlying azurerm mechanism it wraps |
Why the wrapper is nicer |
|---|---|---|
diagnostic_settings |
azurerm_monitor_diagnostic_setting |
One map vs N resource blocks + category enumeration |
private_endpoints |
azurerm_private_endpoint + DNS zone group |
Subnet + zone wiring abstracted, keyed |
role_assignments |
azurerm_role_assignment |
Keyed map survives reordering; no index churn |
lock |
azurerm_management_lock |
Single object, attached to the resource scope |
managed_identities |
identity {} block + azurerm_user_assigned_identity |
System/user identity wiring normalised |
Pinning and dependency strategy
AVM resource modules are pre-1.0, and this breaks the intuition most people have about ~>. The constraint that feels safe is the one that bites.
# DANGEROUS for a 0.x module:
version = "~> 0.9" # allows 0.9.x AND 0.10.0, 0.11.0, ...
For 0.x releases, ~> 0.9 is equivalent to >= 0.9.0, < 1.0.0. Because AVM treats the minor segment as the breaking-change segment while below 1.0, that constraint happily pulls in a breaking 0.10.0. The constraint that actually pins to a non-breaking range is the three-part form:
# Allows 0.9.1 .. 0.9.x, blocks 0.10.0:
version = "~> 0.9.1"
The full constraint-operator behaviour, made explicit so you never guess what a given string admits:
| Constraint written | Expands to | Admits a breaking 0.10.0? | Verdict for a 0.x AVM module |
|---|---|---|---|
0.9.1 |
exactly 0.9.1 |
No | Best in a wrapper — deliberate, reviewed bumps |
= 0.9.1 |
exactly 0.9.1 |
No | Same as above, explicit form |
~> 0.9.1 |
>= 0.9.1, < 0.10.0 |
No | Good — allows safe patch drift inside 0.9.x |
~> 0.9 |
>= 0.9.0, < 1.0.0 |
Yes | Dangerous — the classic AVM mistake |
>= 0.9.0 |
>= 0.9.0 (unbounded) |
Yes (and beyond) | Never — unbounded, will break |
>= 0.9.0, < 0.10.0 |
that range | No | Verbose but correct equivalent of ~> 0.9.1 |
| (omitted) | latest available | Yes | Never in shared code — irreproducible |
My rule across the platform repo, and why each tier pins differently:
| Repo tier | Pin AVM dependencies as | Pin your wrapper as | Rationale |
|---|---|---|---|
| Wrapper (platform) modules | exact (version = "0.9.1") |
n/a (this is the wrapper) | The platform layer is where you absorb upgrade risk deliberately, in a PR, with a reviewed plan diff |
| Consuming (app) repos | inherited from the wrapper | ~> X.Y.Z on your wrapper |
Your wrappers are semver-disciplined, so ~> is safe here; app teams inherit the AVM versions you chose |
Automate the bumps with Renovate so you review upgrades instead of chasing them. Renovate understands Terraform registry sources natively:
{
"$schema": "https://docs.renovatebot.com/renovate-schema.json",
"extends": ["config:recommended"],
"terraform": { "enabled": true },
"packageRules": [
{
"matchManagers": ["terraform"],
"matchPackageNames": ["/^Azure/avm-/"],
"groupName": "azure-verified-modules",
"schedule": ["before 9am on monday"]
}
]
}
Each Renovate PR becomes a single reviewable unit: the version bump plus the terraform plan your CI attaches as a comment. The lock file is what makes any of this reproducible — what each artifact pins and where:
| Artifact | Pins | Committed? | Bumped by |
|---|---|---|---|
version = in module block |
Module version constraint | Yes (in code) | Renovate PR / manual |
.terraform.lock.hcl |
Provider versions + checksums | Yes (always commit) | terraform init -upgrade |
required_version (versions.tf) |
Terraform CLI version range | Yes | Manual, deliberate |
required_providers (versions.tf) |
Provider source + version range | Yes | Manual / Renovate |
Wrapping resource modules into pattern modules
Here is the core of the platform layer. We want app teams to ask for “a spoke” and get a VNet, a Key Vault, and a storage account — all with private endpoints, diagnostics, and tags already correct. They should not be able to opt out of those. The directory layout that scales:
platform-modules/
└── spoke-landing-zone/
├── main.tf # composes AVM resource modules
├── variables.tf # the narrow contract app teams see
├── outputs.tf
├── versions.tf # required_providers + required_version
└── tests/
└── defaults.tftest.hcl
What each file owns, and the rule that keeps the wrapper a platform and not a passthrough:
| File | Owns | The discipline |
|---|---|---|
main.tf |
Composition of AVM bricks + injected policy | Inject enable_telemetry, diagnostic_settings, private_endpoints here — never pass them through |
variables.tf |
The narrow caller contract | Expose only safe inputs; validation on naming + mandatory tags |
outputs.tf |
Stable outputs (ids, URIs) | Treat as API: renaming an output is a major version bump |
versions.tf |
required_version + required_providers |
Pin the CLI and provider ranges deliberately |
tests/*.tftest.hcl |
Plan-level contract assertions | Assert the locked-down defaults resolve as expected |
The wrapper’s main.tf composes AVM bricks and injects org policy. Note enable_telemetry, diagnostic_settings, and private_endpoints are set by us, not passed through from the caller:
locals {
base_tags = merge(var.tags, {
managedBy = "platform-team"
module = "spoke-landing-zone"
})
}
module "vnet" {
source = "Azure/avm-res-network-virtualnetwork/azurerm"
version = "0.8.1"
name = "vnet-${var.workload}-${var.location_short}"
resource_group_name = var.resource_group_name
location = var.location
address_space = var.address_space
tags = local.base_tags
enable_telemetry = false
subnets = {
pe = {
name = "snet-private-endpoints"
address_prefixes = [var.pe_subnet_prefix]
}
}
}
module "kv" {
source = "Azure/avm-res-keyvault-vault/azurerm"
version = "0.9.1"
name = "kv-${var.workload}-${var.location_short}"
resource_group_name = var.resource_group_name
location = var.location
tenant_id = var.tenant_id
tags = local.base_tags
enable_telemetry = false
# Org default: no public access, ever.
public_network_access_enabled = false
diagnostic_settings = {
central = {
name = "to-law"
workspace_resource_id = var.log_analytics_workspace_id
}
}
private_endpoints = {
vault = {
subnet_resource_id = module.vnet.subnets["pe"].resource_id
private_dns_zone_resource_ids = [var.kv_private_dns_zone_id]
}
}
}
module "sa" {
source = "Azure/avm-res-storage-storageaccount/azurerm"
version = "0.6.4"
name = "st${var.workload}${var.location_short}"
resource_group_name = var.resource_group_name
location = var.location
tags = local.base_tags
enable_telemetry = false
public_network_access_enabled = false
shared_access_key_enabled = false # force Entra auth
diagnostic_settings = {
central = {
name = "to-law"
workspace_resource_id = var.log_analytics_workspace_id
}
}
}
The version numbers above are illustrative pins from the time of writing. Resolve the current ones for your repo from the registry and pin them exactly — never copy version strings from a blog post into production. (Yes, including this one.)
The naming convention the wrapper encodes (so app teams never hand-name a resource), with the Azure abbreviation and a worked example:
| Resource | Pattern in the wrapper | Azure abbrev. | Example (workload=checkout, eus) |
Constraint to respect |
|---|---|---|---|---|
| Resource group | rg-${workload}-${loc} |
rg |
rg-checkout-eus |
≤ 90 chars |
| Virtual network | vnet-${workload}-${loc} |
vnet |
vnet-checkout-eus |
≤ 64 chars |
| Subnet (PE) | snet-private-endpoints |
snet |
snet-private-endpoints |
≤ 80 chars |
| Key Vault | kv-${workload}-${loc} |
kv |
kv-checkout-eus |
3–24, globally unique |
| Storage account | st${workload}${loc} |
st |
stcheckouteus |
3–24, lowercase alnum only |
| Private endpoint | pe-${resource}-${workload} |
pe |
pe-kv-checkout |
≤ 80 chars |
| Log Analytics ws | law-${scope} |
law |
law-central |
≤ 63 chars |
Note the storage-account row is why the wrapper drops the hyphen and lowercases — storage names reject hyphens and uppercase, so encoding the rule in the module stops a whole class of plan-time naming failures.
The three bricks this wrapper composes, and the policy injected onto each — the table app teams never see but every reviewer should:
| Brick (AVM resource module) | Pinned | Injected non-negotiable | What it would default to bare |
|---|---|---|---|
avm-res-network-virtualnetwork |
0.8.1 |
PE subnet pre-created; telemetry off | No PE subnet; telemetry on |
avm-res-keyvault-vault |
0.9.1 |
public_network_access_enabled=false; PE + diag forced |
Public access allowed; no PE/diag wired |
avm-res-storage-storageaccount |
0.6.4 |
shared_access_key_enabled=false; public off; diag forced |
Key auth on; public allowed |
Why these specific defaults are the non-negotiables, in plain risk terms:
| Injected default | Risk it removes | Equivalent Azure Policy (defence in depth) |
|---|---|---|
public_network_access_enabled = false (KV) |
Vault reachable from the internet | Deny public network access on Key Vault |
private_endpoints = { vault = … } |
Secrets traffic leaving the backbone | Audit/deny resources without a PE |
shared_access_key_enabled = false (SA) |
Long-lived account keys to steal | Deny storage account key access |
diagnostic_settings → central LAW |
Blind spot — no audit trail | Deploy-if-not-exists diagnostic settings |
enable_telemetry = false |
Plan failure in locked subs | (operational, not security) |
Enforcing org defaults as non-negotiable inputs
The discipline that makes a platform layer valuable is what the wrapper does not expose. Compare the AVM surface (dozens of inputs) to your variables.tf:
variable "workload" {
type = string
description = "Short workload name, used in resource naming."
validation {
condition = can(regex("^[a-z0-9]{2,12}$", var.workload))
error_message = "workload must be 2-12 lowercase alphanumeric chars."
}
}
variable "tags" {
type = map(string)
description = "Caller tags; merged with mandatory platform tags."
validation {
condition = contains(keys(var.tags), "costCenter") && contains(keys(var.tags), "owner")
error_message = "tags must include costCenter and owner."
}
}
variable "log_analytics_workspace_id" {
type = string
description = "Central LAW resource ID for diagnostic settings."
}
# ... resource_group_name, location, location_short, tenant_id,
# address_space, pe_subnet_prefix, kv_private_dns_zone_id
There is no public_network_access_enabled, no enable_telemetry, no way to skip diagnostics. App teams cannot ship a publicly exposed Key Vault through this module because the lever does not exist. That is the entire point — guardrails as types, validated at plan, not as a wiki page nobody reads. The validation blocks turn “please remember to tag things” into a hard failure.
The full contract — every input the wrapper exposes, its type, whether it is validated, and why it is safe to expose:
| Input | Type | Validated? | Why it’s safe to expose |
|---|---|---|---|
workload |
string |
regex ^[a-z0-9]{2,12}$ |
Drives naming only; bounded charset |
tags |
map(string) |
must contain costCenter, owner |
Merged with platform tags; can’t drop mandatory keys |
location |
string |
(optional: allow-list of regions) | Placement choice, not a security lever |
location_short |
string |
(optional: regex) | Naming suffix |
resource_group_name |
string |
— | Where it lands; caller owns the RG |
tenant_id |
string |
— | Required by KV; not a guardrail |
address_space |
list(string) |
(optional: CIDR check) | IPAM choice, governed upstream |
pe_subnet_prefix |
string |
(optional: CIDR check) | Must fit inside address_space |
log_analytics_workspace_id |
string |
— | Forces diagnostics to your LAW |
kv_private_dns_zone_id |
string |
— | The PE zone; injecting PE needs it |
And the inputs the wrapper deliberately forbids (does not expose), with what each would let an app team do if leaked:
| Forbidden lever | What leaking it would allow | Kept as |
|---|---|---|
public_network_access_enabled |
Ship a public KV / storage account | Hard-coded false in main.tf |
shared_access_key_enabled (SA) |
Re-enable stealable account keys | Hard-coded false |
enable_telemetry |
Break plans in locked subs by accident | Hard-coded false |
diagnostic_settings |
Skip the audit trail / point elsewhere | Injected → central LAW |
private_endpoints |
Deploy without a PE | Injected from the wrapper’s PE subnet |
role_assignments (raw) |
Grant arbitrary RBAC inline | Baseline injected; extensions reviewed |
The validation patterns worth standardising, with the message your colleague sees at plan:
| Validate | Condition (sketch) | Error message |
|---|---|---|
| Workload name shape | can(regex("^[a-z0-9]{2,12}$", var.workload)) |
“workload must be 2-12 lowercase alphanumeric chars.” |
| Mandatory tags present | contains(keys(var.tags), "costCenter") && … |
“tags must include costCenter and owner.” |
| Region allow-list | contains(["eastus","centralindia"], var.location) |
“location must be an approved region.” |
| PE subnet inside VNet | cidrhost(var.address_space[0], 0) != "" (+ range check) |
“pe_subnet_prefix must fall inside address_space.” |
| Env name enum | contains(["dev","test","prod"], var.environment) |
“environment must be dev, test, or prod.” |
Testing modules: terraform test and Terratest
Two layers, two tools — and they answer different questions. Native terraform test answers “does the wrapper produce the right shape?” cheaply and without deploying; Terratest answers “does it actually work in Azure?” expensively and occasionally.
| Dimension | terraform test (native) |
Terratest (Go) |
|---|---|---|
| Altitude | plan-level (also apply if you ask) |
Real apply against live Azure |
| Speed | Seconds | Minutes (deploy + destroy) |
| Cost | Free (no resources) | Real Azure spend on an ephemeral sub |
| Deploys resources? | No (for command = plan) |
Yes — apply then destroy |
| Best for | Contract / shape assertions, guardrail proofs | End-to-end behaviour, real PE/DNS resolution |
| Runs in CI… | On every push / PR | Nightly / pre-release |
| Language | HCL | Go |
| Failure means | Wrapper composed the wrong shape | Azure rejected or behaviour drifted |
Native terraform test is fast, runs in-process, and is perfect for plan-level contract assertions — “does the wrapper produce the right shape?” No deployment needed:
# tests/defaults.tftest.hcl
run "defaults_are_locked_down" {
command = plan
variables {
workload = "checkout"
location = "eastus"
location_short = "eus"
resource_group_name = "rg-checkout"
tenant_id = "00000000-0000-0000-0000-000000000000"
address_space = ["10.20.0.0/24"]
pe_subnet_prefix = "10.20.0.0/27"
log_analytics_workspace_id = "/subscriptions/.../workspaces/law-central"
kv_private_dns_zone_id = "/subscriptions/.../privateDnsZones/privatelink.vaultcore.azure.net"
tags = { costCenter = "1234", owner = "team@contoso.com" }
}
assert {
condition = module.kv.... == false # assert the resolved public-access value
error_message = "Key Vault must never allow public network access."
}
}
Run it with:
terraform init
terraform test
The contract assertions worth writing — each proves a guardrail holds, at plan, for free:
Test (run block) |
command |
Asserts | Catches |
|---|---|---|---|
defaults_are_locked_down |
plan |
KV public_network_access_enabled == false |
A future edit re-exposing the vault |
storage_forces_entra |
plan |
SA shared_access_key_enabled == false |
Key auth creeping back in |
diagnostics_present |
plan |
Each resource has a diagnostic_settings entry |
Someone dropping the audit trail |
tags_merged |
plan |
Output tags include managedBy + caller’s |
Tag-merge logic regressions |
mandatory_tags_rejected |
plan (expect fail) |
Missing costCenter fails validation |
The validation block being removed |
bad_workload_rejected |
plan (expect fail) |
CHECKOUT! fails the regex |
Naming rule regressions |
pe_wired_to_subnet |
plan |
KV PE references the pe subnet id |
PE wiring breaking on refactor |
The terraform test building blocks you actually use, so you can read and write .tftest.hcl fluently:
| Construct | Goes in | Purpose | Note |
|---|---|---|---|
run "<name>" {} |
Test file | One test case (a plan or apply) | Runs in order; later runs see earlier state |
command = plan |
run block |
Assert without deploying | The default for contract tests |
command = apply |
run block |
Deploy then assert (real resources) | Needs creds + a real/ephemeral sub |
variables {} |
run block |
Inputs for this case | Override per-run |
assert {} |
run block |
condition + error_message |
The check itself |
expect_failures |
run block |
Assert a validation should fail | Proves guardrails reject bad input |
provider {} / providers |
File / run |
Wire/alias providers for the test | Mock or real |
module {} (override) |
run block |
Swap a child module for a stub | Isolate the unit under test |
Terratest (Go) is for real end-to-end validation against an ephemeral subscription — apply, assert against live Azure, destroy. Use it in CI nightly, not on every push:
func TestSpokeLandingZone(t *testing.T) {
opts := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
TerraformDir: "../examples/default",
})
defer terraform.Destroy(t, opts)
terraform.InitAndApply(t, opts)
kvURI := terraform.Output(t, opts, "key_vault_uri")
assert.Contains(t, kvURI, "vault.azure.net")
}
In CI, authenticate with OIDC workload identity federation (no stored secrets), and target a disposable subscription so a failed destroy never pollutes a real environment:
az login --service-principal -u "$ARM_CLIENT_ID" \
--tenant "$ARM_TENANT_ID" --federated-token "$IDTOKEN"
export ARM_USE_OIDC=true
export ARM_SUBSCRIPTION_ID="$EPHEMERAL_SUB_ID"
cd test && go test -timeout 45m ./...
The Terratest assertions worth the spend — behaviour you cannot prove at plan:
| Terratest assertion | Proves | Why plan-level can’t catch it |
|---|---|---|
key_vault_uri contains vault.azure.net |
The vault actually came up | Plan doesn’t materialise computed URIs reliably |
| Private DNS resolves the KV PE name | PE + DNS zone group wired correctly | DNS resolution is runtime behaviour |
| Storage data-plane rejects shared-key auth | Entra-only enforcement is real | Plan asserts intent, not Azure enforcement |
| Diagnostic setting visible in LAW | Logs flow to the workspace | Ingestion is runtime |
terraform destroy leaves zero resources |
Clean teardown (no orphans) | Only an apply/destroy cycle exposes orphans |
The CI auth model, made explicit — OIDC keyless is the right default; the same pattern powers GitHub Actions + Terraform OIDC plan/PR automation:
| Auth approach | Secret stored? | Blast radius | Verdict |
|---|---|---|---|
| OIDC workload identity federation | None | Short-lived, scoped token | Use this |
| Service principal + client secret | Yes (long-lived) | Leaked secret = standing access | Avoid; rotate if unavoidable |
| Managed identity (self-hosted runner) | None | Scoped to the runner identity | Good for self-hosted agents |
Personal az login on a runner |
Yes (interactive) | The human’s full access | Never in CI |
Publishing to a private registry
Wrappers are useless if teams copy-paste them. Publish them and consume by source/version. Two common backends, and the trade-off between them:
| Backend | Native registry? | Source string consumers use | Versioning mechanism | Best when |
|---|---|---|---|---|
| Terraform Cloud / Enterprise | Yes | app.terraform.io/<org>/<name>/azurerm |
Git tags (valid semver) | You’re on TFC/TFE already |
| Azure DevOps (git ref) | No | git::https://dev.azure.com/...?ref=v1.3.0 |
Tag ref in the URL | Azure DevOps shop, no TFC |
| Private VCS git ref (generic) | No | git::ssh://...//module?ref=v1.3.0 |
Tag ref in the URL | Any git host, lowest setup |
| Storage/HTTP archive | No | https://.../module-1.3.0.zip |
Versioned artifact name | Air-gapped / artifact-store shops |
Terraform Cloud / Enterprise private registry. Modules must live in repos named terraform-<provider>-<name> and are published from git tags that are valid semver. Tag, push, and the registry ingests the version:
git tag v1.3.0
git push origin v1.3.0
Consumers then reference it through the registry hostname:
module "spoke" {
source = "app.terraform.io/contoso/spoke-landing-zone/azurerm"
version = "~> 1.3.0"
# ... only the narrow contract inputs
}
Azure DevOps. There is no native Terraform registry product, so the pragmatic pattern is consuming wrappers as versioned git sources (a tag ref) pointed at Azure Repos, fronted by a CI pipeline that runs validate/test on tag:
module "spoke" {
source = "git::https://dev.azure.com/contoso/_git/platform-modules//spoke-landing-zone?ref=v1.3.0"
}
Consumption contract: semver is a promise. Bump patch for fixes, minor for additive inputs/outputs, major for anything that changes or removes an input or alters resource addresses. The moment you rename a wrapper variable, that’s a major — app teams pinned with
~>must opt in.
The semver decision table — what each kind of change costs in version terms:
| Change you made to the wrapper | Bump | Why | Consumer impact (~> X.Y.Z) |
|---|---|---|---|
| Fix a bug, no interface change | patch | Behaviour-preserving | Auto-picked up |
| Add a new optional input/output | minor | Additive, backward-compatible | Auto-picked up |
Tighten a validation (stricter) |
major | May reject previously-valid input | Must opt in |
| Rename / remove an input | major | Breaks callers | Must opt in |
| Change a resource address (for_each, etc.) | major (+ moved) |
State migration for consumers | Must opt in; needs moved |
| Bump an internal AVM dep (no surface change) | patch/minor | Depends on AVM’s own change | Usually transparent |
| Change a default value | major | Silent behaviour change | Must opt in |
The publish pipeline gates that should run on a tag, in order:
| Stage | Command | Gate |
|---|---|---|
| Format | terraform fmt -check -recursive |
Block on diff |
| Validate | terraform init -backend=false && terraform validate |
Block on error |
| Lint | tflint (+ ruleset) |
Block on error |
| Contract tests | terraform test |
Block on any failed run |
| Security scan | checkov / tfsec / trivy |
Block on high severity (see scanning article) |
| Tag → publish | git tag vX.Y.Z && git push --tags |
Registry ingests the version |
Migration path: replacing hand-rolled modules without state churn
The objection that kills AVM adoption: “we have hundreds of resources in state; switching modules means destroy/recreate.” It does not — if you use moved blocks. When you swap your old module "storage" for the AVM wrapper, the resource address changes (e.g. module.storage.azurerm_storage_account.this becomes module.sa.azurerm_storage_account.this[0]). Tell Terraform it’s the same object:
moved {
from = module.storage.azurerm_storage_account.this
to = module.sa.azurerm_storage_account.this[0]
}
moved blocks are declarative and version-controlled — they survive across the whole team, unlike a one-off terraform state mv. For resources that AVM creates as a child but you previously managed standalone (or that exist in Azure but not in state), use an import block instead:
import {
to = module.sa.azurerm_storage_account.this[0]
id = "/subscriptions/<sub>/resourceGroups/rg-checkout/providers/Microsoft.Storage/storageAccounts/stcheckouteus"
}
Migrate one module type at a time, behind a PR, and read the plan. A correct migration shows the resource moving with zero destroy/create lines — only in-place diffs for AVM’s added defaults (diagnostics, etc.). The mechanism-to-situation map — pick the right tool for what you’re migrating:
| Situation | Tool | What it does | Plan should show |
|---|---|---|---|
| Same resource, address changed (your module → AVM) | moved block |
Re-points state at the new address | Move, no destroy/create |
| Resource in Azure but not in Terraform state | import block |
Brings the existing object under management | Import + in-place diffs |
AVM moved a resource under for_each (internal) |
moved block (indexed key) |
Maps old address → keyed address | Move, no destroy/create |
| Resource truly being replaced (rename forces new) | (accept) | Genuine destroy/create | Acknowledge explicitly in the PR |
| One-off local fix, not for the team | terraform state mv (avoid) |
Imperative, non-versioned | (use moved instead) |
The migration playbook as a table — symptom in the plan, what it means, how to confirm, and the fix:
| # | Plan symptom on adopting AVM | Root cause | Confirm with | Fix |
|---|---|---|---|---|
| 1 | Every storage account shows destroy + create |
Resource address changed (your module → AVM) | terraform plan lists -/+ ... this[0] |
moved block from old → new address |
| 2 | A resource shows create though it exists in Azure |
It’s in Azure but not in state | Portal/CLI shows the resource live | import block with the resource id |
| 3 | Replace appears only after a minor AVM bump | AVM moved the resource under for_each |
Diff the module’s main.tf across versions |
moved to the keyed address, same wrapper bump |
| 4 | Diagnostic settings show as new (in-place add) | AVM injects diagnostics you didn’t have | Plan shows + azurerm_monitor_diagnostic_setting |
Expected — accept the additive diff |
| 5 | Plan errors: deployment denied | enable_telemetry = true in a locked sub |
Error names Microsoft.Resources/deployments |
Set enable_telemetry = false |
| 6 | RBAC assignment churns on reorder | Unkeyed role_assignments list reindexed |
Plan shows delete+add of identical roles | Use the keyed map form AVM expects |
| 7 | Private endpoint shows replace | PE subnet id changed under the hood | Compare subnet_resource_id old vs new |
moved the PE resource; align the subnet |
| 8 | Whole module shows replace after provider bump | Provider major changed a schema | .terraform.lock.hcl provider delta |
Pin provider; migrate per the provider guide |
The CI gate that makes an unacknowledged replace impossible to merge — read every plan for delete+create and fail the build:
terraform plan -no-color -out tfplan
terraform show -json tfplan \
| jq -e '[.resource_changes[]
| select(.change.actions == ["delete","create"]
or .change.actions == ["create","delete"])] | length == 0' \
|| { echo "::error::Unacknowledged replace in plan"; exit 1; }
Architecture at a glance
The diagram traces the module supply chain left to right — the path a resource definition travels from Microsoft’s public registry to a deployed, private, tagged spoke in your subscription. Read it as five zones. At the far left, upstream AVM ships the avm-res-* bricks (pre-1.0, the version trap lives here) and the avm-ptn-* assemblies. Those bricks flow by source + version into your platform wrapper tier — the heart of the system — where the spoke-landing-zone module composes a VNet, a Key Vault and a storage account and injects the org non-negotiables: public_network_access_enabled = false, private endpoints, forced diagnostics to the central LAW, Entra-only storage auth, and enable_telemetry = false. The wrapper’s narrow variables.tf is the membrane: app teams pass a workload name and tags, nothing dangerous.
From the wrapper, the path runs through the test + release gate — terraform test for plan-level contract assertions, Terratest for a real apply/destroy against an ephemeral subscription over keyless OIDC, and the replace gate that fails CI on any unacknowledged destroy/create. Only a green build tags a version into the private registry (Terraform Cloud or a git ref), from which the rightmost zone — 40+ app repos — consumes the wrapper by source/version with narrow inputs only, and terraform apply lands a spoke that is private and observable by construction. The five numbered badges mark the real hazards on this path: the ~> 0.x pin trap on the upstream brick, telemetry failing in a locked subscription, a passthrough wrapper that leaks a public lever, the state-address churn a minor bump can cause, and the brownfield-import gap when adopting AVM over existing Azure resources. Follow the numbers and you have both the architecture and the failure map in one view.
Real-world scenario
Northwind Cloud Platform is the four-engineer central team behind a retailer’s Azure estate: 40+ application repos, each owning one or more spokes inside a CAF-aligned landing zone, all on Terraform with state in Azure Storage and CI in Azure DevOps. Eighteen months ago every app team hand-rolled azurerm_ resources; a security review found nine publicly-accessible storage accounts and a Key Vault open to the internet, and “fix it” meant nine separate PRs against nine codebases. The platform team’s mandate after that review: make “private, tagged, observable” the only way to ship a spoke, and make estate-wide convention changes a one-PR operation. Their answer was the spoke-landing-zone wrapper over AVM resource bricks, distributed as a versioned git ref, pinned Azure/avm-res-storage-storageaccount/azurerm at an exact 0.6.x, consumed by every repo with ?ref=v1.x.
It worked beautifully for six months — until a routine Renovate PR bumped a single AVM minor in the wrapper, and the terraform plan that CI attached showed every storage account across 40 repos scheduled for destroy/create. The exact pin had not saved them, because the upgrade itself was the breaking event: that AVM release had moved the storage account resource under a for_each map, changing its address from ...this to ...this["default"]. A naive merge — and the team’s normal flow was “Renovate is green-ish, approve” — would have nuked 40 production data planes in a single apply. The engineer reviewing it noticed the plan was suspiciously long, scrolled, and saw the -/+ lines. That was the whole margin: one human reading a plan.
The breakthrough was reframing the problem. The issue was never “which version” — exact pins control when you take an upgrade, not whether it is safe. The issue was that an AVM bump in a shared wrapper is, structurally, a state migration, and they had been treating it as a dependency bump. The fix was to absorb the address change inside the wrapper with a moved block, shipped in the same version bump so all 40 consumers inherited it transparently:
moved {
from = module.sa.azurerm_storage_account.this
to = module.sa.azurerm_storage_account.this["default"]
}
Then they made this class of failure impossible to miss rather than relying on a tired reviewer: a CI step that parses terraform show -json and fails the build if the plan contains any destroy/create not explicitly acknowledged in the PR description. They also moved the wrapper’s exact-pin bumps behind a dedicated review checklist (“is this AVM minor a possible address change? diff the module’s main.tf”) and added a nightly Terratest run against an ephemeral subscription so behaviour regressions surfaced before a tag, not after.
Six months on, the estate is in a different posture. A new compliance rule — force customer-managed keys on storage — landed as one wrapper PR with a reviewed plan, propagated to all 40 repos by a ~> v1.x bump, with the replace-gate guaranteeing zero data-plane loss. The lesson on their wall: “Every AVM bump in a shared wrapper is a state migration until a moved-aware plan proves otherwise.” The incident, as a timeline, because the order of moves is the lesson:
| Time | Event | Action taken | Effect | What it should have been |
|---|---|---|---|---|
| Day 0 | Renovate bumps an AVM minor | (PR opened, CI green-ish) | — | Treat every AVM bump as a possible migration |
| Day 0 +5 min | Plan attached to PR | Reviewer scrolls the long plan | Spots 40× destroy/create |
The save — but luck, not process |
| Day 0 +20 min | Root cause found | Diff the module main.tf across versions |
Resource moved under for_each |
— |
| Day 0 +1 h | Fix drafted | Add moved block in the same wrapper bump |
Plan now shows move, zero replace | Correct fix |
| Day 1 | Shipped | Tag v1.4.0; consumers inherit transparently |
40 repos migrate with no churn | — |
| Day 2 | Hardened | Add jq replace-gate to CI |
Unacknowledged replace can’t merge | The durable fix |
| +1 week | Institutionalised | Nightly Terratest + AVM-bump checklist | Regressions caught pre-tag | The process change |
Advantages and disadvantages
The wrap-don’t-fork, AVM-as-substrate model both enables a real platform layer and carries sharp edges you must respect. Weigh it honestly:
| Advantages (why this model helps) | Disadvantages (why it bites) |
|---|---|
| One shared spec means generic org policy (force diagnostics/PE everywhere) instead of bespoke per-resource wiring | The shared spec is still pre-1.0 — ~> semantics are inverted and the trap is in the shared wrapper |
| WAF-aligned defaults: you inherit good defaults instead of the bare minimum that compiles | “Good defaults” still aren’t your defaults — you must inject org policy, or it’s just a fancier resource |
| Estate-wide convention changes become one pinned-version wrapper PR | A bad wrapper bump has a 40-repo blast radius; discipline is non-optional |
Migration is non-destructive with moved/import — adopt over existing state, zero recreate |
A wrong moved target silently destroys and recreates — you must read the plan |
Guardrails as types (validation, omitted levers) catch violations at plan, not in a quarterly review |
Narrowing the surface is work; the lazy path (passthrough) gives none of the value |
| Upstream fixes flow to you for free because you compose, don’t fork | You’re coupled to AVM’s release cadence and its internal address choices |
Keyed maps (role_assignments, diagnostic_settings) survive reordering — no index churn |
Telemetry’s empty deployment fails plans in locked subscriptions until you set it false |
Two-tier testing (terraform test + Terratest) proves both shape and behaviour |
Terratest costs real Azure spend and time; you must run it judiciously |
The model is right when you operate Terraform at scale (many repos, many spokes) and need conventions enforced by construction rather than by review. It is overkill for a single-team, handful-of-resources estate where a wrapper is more ceremony than value — there, consume AVM resource modules directly. It bites hardest on teams that pin pre-1.0 modules with ~>, that build passthrough “wrappers” with no injected policy, and that merge AVM bumps without reading the plan. Every one of those is a manageable failure — but only if you know it exists, which is the point of the deep sections above.
Hands-on lab
Build a minimal spoke-landing-zone wrapper over an AVM brick, prove its guardrail holds at plan with terraform test, and prove the ~> 0.x trap is real — all without deploying a thing (free). Run in any shell with Terraform ≥ 1.7 (for terraform test) and the azurerm provider available. No Azure spend: every step is plan-level.
Step 1 — Scaffold the wrapper.
mkdir -p spoke-landing-zone/tests && cd spoke-landing-zone
cat > versions.tf <<'EOF'
terraform {
required_version = ">= 1.7.0"
required_providers {
azurerm = { source = "hashicorp/azurerm", version = "~> 4.0" }
}
}
provider "azurerm" {
features {}
# plan-only; no real auth needed if you don't apply
skip_provider_registration = true
}
EOF
Step 2 — A narrow contract with a validated input. This is the membrane app teams see.
cat > variables.tf <<'EOF'
variable "workload" {
type = string
validation {
condition = can(regex("^[a-z0-9]{2,12}$", var.workload))
error_message = "workload must be 2-12 lowercase alphanumeric chars."
}
}
variable "location" { type = string }
variable "resource_group_name" { type = string }
variable "tags" {
type = map(string)
validation {
condition = contains(keys(var.tags), "costCenter") && contains(keys(var.tags), "owner")
error_message = "tags must include costCenter and owner."
}
}
EOF
Step 3 — Compose one AVM brick with injected, non-negotiable policy. A storage account: public off, Entra-only, telemetry off — none of it exposed to the caller.
cat > main.tf <<'EOF'
locals { base_tags = merge(var.tags, { managedBy = "platform-team" }) }
module "sa" {
source = "Azure/avm-res-storage-storageaccount/azurerm"
version = "0.6.4" # exact pin — the wrapper absorbs upgrade risk deliberately
name = "st${var.workload}eus"
resource_group_name = var.resource_group_name
location = var.location
tags = local.base_tags
enable_telemetry = false # injected, not exposed
public_network_access_enabled = false # injected — the lever app teams DON'T get
shared_access_key_enabled = false # force Entra auth
}
EOF
terraform init
Expected: terraform init downloads Azure/avm-res-storage-storageaccount/azurerm at 0.6.4 and the azurerm provider; “Terraform has been successfully initialized!”.
Step 4 — Prove the guardrail at plan with a contract test.
cat > tests/defaults.tftest.hcl <<'EOF'
run "storage_is_locked_down" {
command = plan
variables {
workload = "checkout"
location = "eastus"
resource_group_name = "rg-checkout"
tags = { costCenter = "1234", owner = "team@contoso.com" }
}
assert {
condition = module.sa.... == false # resolve the public-access output your AVM version exposes
error_message = "Storage must never allow public network access."
}
}
EOF
terraform test
Expected: terraform test runs the run block at plan level and reports the assertion result — no resources created. (Adjust the module.sa.... reference to the actual output your pinned AVM version surfaces; terraform output/the module’s outputs.tf tells you the name.)
Step 5 — Prove the validation rejects bad input. Feed an illegal workload name and watch it fail at plan, not in production.
terraform plan -var 'workload=CHECKOUT!' \
-var 'location=eastus' -var 'resource_group_name=rg-x' \
-var 'tags={costCenter="1",owner="a@b.com"}'
# Expected: Error — "workload must be 2-12 lowercase alphanumeric chars."
Step 6 — Prove the ~> 0.x trap is real. Loosen the pin and watch init -upgrade reach for a higher minor than you intended.
# Temporarily change the version line to the DANGEROUS form and re-init:
# version = "~> 0.6" # allows 0.7.0, 0.8.0, ... a BREAKING minor
sed -i.bak 's/version = "0.6.4"/version = "~> 0.6"/' main.tf
terraform init -upgrade
# Read which version actually resolved:
grep -A2 'avm-res-storage' .terraform/modules/modules.json 2>/dev/null || \
terraform version
# Restore the safe exact pin:
mv main.tf.bak main.tf && terraform init -upgrade
The point: ~> 0.6 silently admits a breaking 0.7.x/0.8.x; only 0.6.4 or ~> 0.6.4 holds the line.
Validation checklist. You built a wrapper that injects three security non-negotiables the caller cannot override, proved the guardrail at plan with terraform test (no deploy), proved the validation block rejects bad naming, and demonstrated the pre-1.0 ~> trap first-hand. The lab steps mapped to what each proves:
| Step | What you did | What it proves | Real-world analogue |
|---|---|---|---|
| 3 | Inject public_network_access_enabled=false |
The lever app teams don’t get | Every guarded brick in the wrapper |
| 4 | terraform test the locked-down default |
Guardrails verified at plan, for free |
The CI contract suite |
| 5 | Feed an illegal workload |
Conventions are hard failures, not wiki text | Naming policy enforced as a type |
| 6 | Loosen to ~> 0.6, re-init |
The pre-1.0 ~> trap is real |
The Renovate-bump near-miss |
Cleanup. No Azure resources were created (everything was plan-level), so just remove the directory.
cd .. && rm -rf spoke-landing-zone
Cost note. Zero — every command is init/plan/test, which create nothing in Azure. (A real Terratest run would cost a few rupees of ephemeral-subscription spend for the minutes resources exist; this lab deliberately avoids apply.)
Common mistakes & troubleshooting
This is the playbook — the part you bookmark. First as a scannable table, then the entries that bite hardest with the full reasoning underneath.
| # | Symptom | Root cause | Confirm (exact cmd / check) | Fix |
|---|---|---|---|---|
| 1 | A breaking module version got pulled despite ~> |
Pre-1.0: ~> 0.9 admits 0.10.0 |
Read resolved version in .terraform/modules/modules.json / lock |
Pin ~> 0.9.1 (three-part) or exact in wrappers |
| 2 | terraform plan errors on a deployment resource |
enable_telemetry=true in a policy-locked sub |
Plan error names Microsoft.Resources/deployments |
Set enable_telemetry=false in every wrapper |
| 3 | App team shipped a public Key Vault / storage | Wrapper exposes the lever (passthrough) | grep wrapper for public_network_access_enabled |
Remove the input; hard-code false in main.tf |
| 4 | Plan shows 40× destroy/create after a bump |
AVM moved the resource under for_each |
terraform show -json → actions ["delete","create"] |
moved block to the keyed address, same wrapper bump |
| 5 | A resource shows create but exists in Azure |
In Azure, not in Terraform state | Portal/CLI shows the live resource | import block with the resource id |
| 6 | RBAC assignments churn every plan | Unkeyed role_assignments list reindexed |
Plan shows delete+add of identical roles | Use the keyed map form AVM expects |
| 7 | version constraint won’t resolve |
Constraint impossible (e.g. >= 0.9, < 0.9) |
terraform init “no available releases” error |
Fix the range; check the registry for real versions |
| 8 | Consumer can’t find the published module | Repo not named terraform-<provider>-<name> or no semver tag |
Registry shows no versions | Rename repo; push a valid vX.Y.Z tag |
| 9 | terraform test passes but apply fails in Azure |
Plan-level test can’t catch runtime behaviour | Terratest apply surfaces the real error | Add a Terratest assertion for that behaviour |
| 10 | Wrapper variables.tf is huge |
It mirrors the AVM surface (no narrowing) | Count inputs vs the AVM module’s | Narrow deliberately; inject the rest |
| 11 | OIDC login fails in CI | Federated credential / subject mismatch | az login --federated-token error |
Fix the federated credential subject/audience |
| 12 | Provider major bump replaces everything | azurerm v3→v4 schema change |
.terraform.lock.hcl provider delta |
Pin provider; follow the provider upgrade guide |
| 13 | moved block did nothing (still replaces) |
from/to address wrong |
Plan still shows -/+ |
Correct the exact source/target address strings |
| 14 | Renovate raises no AVM PRs | packageRules pattern doesn’t match |
Renovate logs / dry-run | Fix matchPackageNames to /^Azure/avm-/ |
The expanded form, for the entries that cause the most damage:
1. A breaking module version got pulled despite a ~> constraint.
Root cause: The module is pre-1.0 and ~> 0.9 expands to >= 0.9.0, < 1.0.0, so it admits a breaking 0.10.0 because AVM treats the minor segment as breaking below 1.0.
Confirm: Read the resolved version in .terraform/modules/modules.json (or the registry-backed lock) and compare to what you intended.
Fix: In wrappers, pin exact (version = "0.9.1"); if you must allow drift, use the three-part ~> 0.9.1 (allows 0.9.x, blocks 0.10.0). Never ~> 0.9 on a 0.x module.
2. terraform plan fails on a deployment resource in a locked subscription.
Root cause: enable_telemetry = true (the AVM default) deploys a tiny empty Microsoft.Resources/deployments; in subscriptions where that operation is policy-denied, the plan fails with a confusing error that doesn’t obviously point at telemetry.
Confirm: The plan/apply error names a Microsoft.Resources/deployments operation being denied by policy.
Fix: Bake enable_telemetry = false into every wrapper, decided once org-wide — not per call.
3. An app team shipped a publicly-exposed Key Vault or storage account.
Root cause: The wrapper exposed public_network_access_enabled (a passthrough), so the team could set it true — the wrapper added no guardrail.
Confirm: grep -r public_network_access_enabled in the wrapper finds it in variables.tf (exposed) rather than only hard-coded in main.tf.
Fix: Remove the input from variables.tf; hard-code public_network_access_enabled = false in main.tf. Guardrails are the levers you omit. Back it with an Azure Policy deny for defence in depth.
4. After a minor AVM bump, the plan shows every instance of a resource scheduled for destroy/create.
Root cause: The AVM release changed the resource’s address (typically moving it under a for_each map), so Terraform sees the old address gone and a new one created — a destroy/create, which on data resources is destruction.
Confirm: terraform show -json tfplan | jq '.resource_changes[].change.actions' shows ["delete","create"].
Fix: Add a moved block from the old address to the new keyed address, shipped in the same wrapper version so consumers inherit it transparently; gate CI to reject unacknowledged replaces.
5. A resource shows create in the plan even though it already exists in Azure.
Root cause: The resource exists in Azure but is not in Terraform state (created out-of-band, or being adopted into AVM as a child it didn’t manage before).
Confirm: The portal/CLI shows the resource live; terraform state list doesn’t include it.
Fix: Use an import block (to = the AVM resource address, id = the Azure resource id) so Terraform adopts it instead of creating a duplicate; read the plan for in-place-only diffs.
6. RBAC role assignments churn (delete + re-add identical roles) on every plan.
Root cause: role_assignments passed as an unkeyed list gets reindexed when the order changes, so Terraform sees deletes and adds of the same assignments.
Confirm: The plan shows azurerm_role_assignment deletes and creates with identical role/scope.
Fix: Pass role_assignments as the keyed map AVM expects, so add/remove never reindexes the survivors.
10. The wrapper’s variables.tf is nearly a copy of the AVM module’s inputs.
Root cause: The “wrapper” is a passthrough — it forwards the full AVM surface, so it provides no narrowing and no guardrails (the entire reason it exists).
Confirm: The input count roughly matches the AVM module’s, and security levers (public_network_access_enabled, shared_access_key_enabled) appear in variables.tf.
Fix: Narrow deliberately to the small contract app teams need; inject the rest as constants. A platform layer is defined by what it refuses to expose.
Best practices
- Wrap, never fork. Compose AVM resource bricks and inject org policy; forking means you own maintenance forever and lose upstream fixes. The wrapper is your code; the bricks stay upstream.
- Pin AVM dependencies exactly in wrappers.
version = "0.9.1", not~> 0.9— pre-1.0 minors are breaking. Take upgrades deliberately, in a PR, with a reviewed plan. Let app repos pin your wrapper with~> X.Y.Z. - Treat every AVM bump as a possible state migration. Before merging a bump in a shared wrapper, diff the module’s
main.tfacross versions for address changes and assume amovedblock may be required. - Gate CI on unacknowledged replaces. A
jqcheck overterraform show -jsonthat fails on anydestroy/createnot acknowledged in the PR turns a 40-data-plane disaster into a build failure. - Define guardrails by omission. The levers you don’t expose (
public_network_access_enabled,shared_access_key_enabled,enable_telemetry) are the guardrails. Inject them as constants inmain.tf. - Validate conventions as types.
validationblocks on naming and mandatory tags turn “please remember” into a hardplan-time failure that no one can skip. - Set
enable_telemetry = falseorg-wide. Decide once and bake into wrappers; it prevents the empty-deployment plan failure in locked subscriptions. - Test at two altitudes.
terraform testfor fast plan-level contract assertions on every push; Terratest against an ephemeral subscription nightly for real behaviour (PE/DNS resolution, Entra-only enforcement). - Authenticate CI with OIDC. Workload identity federation — no stored secrets, short-lived scoped tokens — for both plan automation and Terratest.
- Publish; never copy-paste. Distribute wrappers by
source/versionfrom a private registry or semver git tag, and honour semver: renaming an input or changing an address is a major. - Use
moved/import, notstate mv. Declarative, version-controlled migration survives across the team; imperative state surgery doesn’t. - Commit
.terraform.lock.hcl. It’s the only thing that makes provider versions reproducible across the team and CI. - Layer Azure Policy under the wrapper. The wrapper prevents violations at author time; Policy catches anything provisioned outside it. Belt and braces — see Azure Policy as code.
The practices as a pre-flight checklist for any new wrapper:
| Check | Pass criterion | Why it matters |
|---|---|---|
| AVM deps pinned exactly | No ~> 0.x anywhere in the wrapper |
Avoids the breaking-minor trap |
enable_telemetry injected false |
Not exposed; constant in main.tf |
No plan failure in locked subs |
| Security levers omitted | public_network_access_enabled etc. not in variables.tf |
Guardrail by construction |
| Diagnostics + PE injected | Forced to central LAW / PE subnet | Observability + isolation non-optional |
| Naming + tags validated | validation blocks present |
Conventions are hard failures |
| Contract tests exist | terraform test covers the guardrails |
Regressions caught at plan |
| Replace gate in CI | jq check fails on destroy/create |
No accidental data-plane loss |
| Published by semver | Tag + registry/git ref, not copy-paste | One bump propagates everywhere |
| Lock file committed | .terraform.lock.hcl in VCS |
Reproducible provider versions |
Security notes
- Guardrail by omission is a security control. The single highest-leverage security decision here is not exposing
public_network_access_enabledandshared_access_key_enabled. An app team cannot ship a public vault or key-auth storage account if the lever doesn’t exist in the contract — this beats any after-the-fact scan. - Force Entra-only auth and private endpoints. Inject
shared_access_key_enabled = false(kills stealable account keys) andprivate_endpoints = {…}(keeps secrets/data traffic on the backbone) as non-negotiables. Pair with the private DNS zones the PE needs. - Keyless CI with OIDC. Use workload identity federation for plan automation and Terratest; never store a long-lived service-principal secret in a pipeline. Scope the federated identity to the specific repo/branch/environment.
- Least-privilege test subscription. Terratest runs
applywith real credentials — scope that identity to a disposable subscription so a faileddestroyor a compromised runner can’t touch production. - Inject a baseline
role_assignments, don’t expose raw RBAC. Let the wrapper grant the minimal assignments the pattern needs (keyed map); review any extension. Don’t let app teams hand-write arbitrary role grants inline. - Diagnostics to a central, access-controlled LAW. Forcing
diagnostic_settingsto your central Log Analytics workspace means every spoke is auditable; lock down who can read that workspace. - Pin and scan the supply chain. Exact AVM pins plus a committed lock file mean you know exactly what code runs; run
checkov/tfsec/trivyin the publish pipeline so a wrapper can’t ship a misconfiguration. See Checkov, Trivy & tfsec IaC scanning. - Defence in depth with Azure Policy. The wrapper enforces at author time; Azure Policy
deny/deployIfNotExistsenforces at the platform — anything provisioned outside the wrapper still gets caught.
The security controls mapped to the threat each removes and the policy backstop:
| Control (in the wrapper) | Threat removed | Azure Policy backstop |
|---|---|---|
Omit public_network_access_enabled |
Internet-exposed KV/SA | Deny public network access |
shared_access_key_enabled = false |
Long-lived account-key theft | Deny storage key access |
Injected private_endpoints |
Data/secrets off the backbone | Audit/deny resources without PE |
Injected diagnostic_settings → LAW |
Unaudited resource | DeployIfNotExists diagnostics |
| OIDC keyless CI | Stolen long-lived pipeline secret | (conditional access on the identity) |
| Disposable test subscription | Blast radius of a CI compromise | Management-group scoping |
Baseline keyed role_assignments |
Over-broad inline RBAC | Deny role assignments at wrong scope |
| Supply-chain scan in publish CI | Shipping a misconfigured wrapper | (gate is the policy here) |
Cost & sizing
There is no per-hour charge for “an AVM module” — the cost story here is operational spend plus the resources your wrappers deploy, and the way the platform layer saves money is by making convention changes one PR instead of forty. The drivers, what each costs, and how the platform layer moves the number:
| Cost driver | What you pay for | Rough INR / month | How the platform layer affects it |
|---|---|---|---|
| Terratest on an ephemeral subscription | Minutes of real resources during apply→destroy | ~₹500–2,000 (nightly, small spokes) | Keep spokes minimal; destroy reliably; run nightly not per-push |
| CI compute (plan/test/publish) | Pipeline minutes | Often free tier / ~₹0–1,000 | Plan-level terraform test is cheap; gate Terratest to nightly |
| Terraform Cloud / Enterprise | Per-user or per-run, if used | Varies (free tier exists) | Optional — git-ref distribution is ₹0 |
| Private registry storage | Negligible (git tags) | ~₹0 | Tags cost nothing; storage/HTTP archive is tiny |
| The deployed spoke itself | VNet (free), KV (per-op), SA, PE | Per the resources (PE ~₹600–900/PE/mo) | Wrapper standardises sizing; PEs add a per-endpoint hourly charge |
| Renovate (self-hosted or app) | Compute / free GitHub app | ~₹0 | Saves engineer-hours chasing bumps |
| Engineer time (the real cost) | Hours per estate-wide change | (the big one) | One wrapper PR vs 40 hand-edits — the whole ROI |
Right-sizing guidance: the only recurring infra cost the platform layer adds is the ephemeral-subscription Terratest spend — keep example spokes minimal (one VNet, one KV, one SA, the PEs under test) and ensure destroy is reliable (the replace-gate and a defer terraform.Destroy prevent orphans that quietly bill). Private endpoints are the one line item to watch in the deployed spoke: each PE carries a small hourly charge plus per-GB processing, so don’t inject PEs on services that don’t need them. Everything else — the registry (git tags), CI plan/test, Renovate — is effectively free. The justification is engineer-time: a single compliance change (force CMK, deny public) that used to be 40 PRs becomes one reviewed wrapper bump, and the replace-gate makes that bump safe — which is worth far more than the few hundred rupees of nightly test spend.
A rough monthly picture for a 40-repo estate: nightly Terratest (~₹1,000–2,000), CI (free tier to ~₹1,000), registry/Renovate (~₹0), plus whatever the spokes themselves cost (dominated by PEs and the storage/KV operations, not the platform layer). The platform layer’s line on the bill is small; its line on the risk and engineer-hours ledger is where it pays for itself.
Interview & exam questions
1. What is the difference between an AVM resource module and a pattern module, and where does your platform wrapper fit? A resource module (avm-res-*) provisions one logical resource plus its directly-dependent children; a pattern module (avm-ptn-*) provisions a whole multi-resource architecture. Resource modules are LEGO bricks, pattern modules are pre-built assemblies. Your wrapper is a third tier — your own pattern module composed from AVM resource bricks that injects your org’s non-negotiables. You wrap, not fork.
2. Why is version = "~> 0.9" dangerous for an AVM module, and what should you use instead? AVM modules are pre-1.0, and AVM treats the minor segment as breaking below 1.0. ~> 0.9 expands to >= 0.9.0, < 1.0.0, so it admits a breaking 0.10.0. In wrappers, pin exact (0.9.1); if you need patch drift, use the three-part ~> 0.9.1 (allows 0.9.x, blocks 0.10.0).
3. What is enable_telemetry and why does it sometimes break a plan? AVM modules deploy a tiny, empty Microsoft.Resources/deployments whose name encodes module + version, used to measure usage — it sends no resource data. In subscriptions where that deployment operation is policy-denied, the plan fails with a confusing error. Set enable_telemetry = false once, org-wide, in your wrappers.
4. What makes a wrapper a real platform layer rather than a passthrough? What it does not expose. A passthrough forwards the full AVM surface, so an app team can still ship a public Key Vault. A platform layer exposes a narrow, validated contract (workload, tags, central LAW id) and injects the rest (public_network_access_enabled = false, forced PE/diagnostics, telemetry off) as constants the caller cannot override — guardrails as types, validated at plan.
5. How do you migrate a hand-rolled module to AVM without destroying resources? Use a moved block to re-point state from the old resource address to the new AVM address (the address changes, the object doesn’t), and an import block for resources that exist in Azure but not in state. Migrate one module type per PR and read the plan — a correct migration shows moves and in-place diffs with zero destroy/create.
6. A minor AVM bump in a shared wrapper shows every storage account scheduled for destroy/create. What happened and how do you fix it? The AVM release changed the resource’s address (moved it under a for_each map), so Terraform sees the old address removed and a new one created. Absorb the change with a moved block to the new keyed address, shipped in the same wrapper version so consumers inherit it transparently, and gate CI to reject unacknowledged replaces.
7. Why pin AVM exactly in wrappers but ~> X.Y.Z in app repos? The wrapper is where you absorb upgrade risk deliberately, in a reviewed PR with a plan diff — so exact pins. Your wrappers are semver-disciplined, so app repos can safely use ~> X.Y.Z on your wrapper and inherit the AVM versions you chose, getting your patches/minors automatically without ever pinning AVM directly.
8. When do you use terraform test versus Terratest? terraform test runs plan-level (or apply) assertions in-process — fast, free, no deploy — perfect for contract/shape checks (“does the wrapper produce the locked-down shape?”) on every push. Terratest runs a real apply/assert/destroy against an ephemeral subscription — slow and costs spend — for behaviour you can’t see at plan (PE/DNS resolution, Entra-only enforcement); run it nightly.
9. How do you authenticate Terraform CI to Azure without storing secrets? OIDC workload identity federation: the CI system presents a short-lived federated token (az login --federated-token, ARM_USE_OIDC=true) scoped to a specific repo/branch/environment, so there’s no long-lived service-principal secret to leak. Target a disposable subscription for any apply.
10. What semver bump does renaming a wrapper input require, and why? A major — renaming or removing an input is a breaking change to the contract; app teams pinned with ~> X.Y.Z won’t pick it up until they opt in. The same applies to changing a resource address (also needs a moved), tightening a validation, or changing a default value.
11. How does Renovate fit the AVM upgrade workflow? Renovate understands Terraform registry sources natively. A packageRule matching /^Azure/avm-/ groups AVM bumps into one PR on a schedule; CI attaches the terraform plan so each upgrade is a single reviewable unit — you review upgrades instead of chasing them, and the replace-gate guards the merge.
12. Why layer Azure Policy under the wrapper if the wrapper already enforces guardrails? Defence in depth. The wrapper enforces at author time for anything provisioned through it — but resources created out-of-band (portal, another tool, a non-wrapper module) bypass it. Azure Policy deny/deployIfNotExists enforces at the platform regardless of how a resource was created, catching what the wrapper can’t see.
These map to the HashiCorp Terraform Associate (modules, version constraints, state, moved/import) and Azure platform/DevOps exams: AZ-400 (IaC, release pipelines, secure CI) and AZ-104/AZ-305 (governance, landing zones, Policy). A compact cert mapping for revision:
| Question theme | Primary cert | Objective area |
|---|---|---|
| Module classes, composition, wrapping | Terraform Associate | Use and create modules |
Version constraints (~>, pre-1.0) |
Terraform Associate | Module versioning & sources |
moved / import migration |
Terraform Associate | State & refactoring |
terraform test / Terratest |
Terraform Associate / AZ-400 | Testing IaC; CI |
| OIDC keyless CI, replace-gate | AZ-400 | Secure pipelines; release gates |
| Guardrails, Policy backstop, landing zones | AZ-305 / AZ-104 | Governance & design |
Quick check
- You pin an AVM module with
version = "~> 0.9". A teammate’sterraform init -upgradepulls0.10.0and the plan goes haywire. Why, and what should the constraint have been? - A
terraform planfails in a locked-down subscription with an error about aMicrosoft.Resources/deploymentsbeing denied. What AVM setting is the likely cause and what’s the fix? - True or false: the more inputs your wrapper’s
variables.tfexposes, the more useful it is to app teams. - After a Renovate AVM bump in a shared wrapper, the plan shows 40 storage accounts as
destroy/create. Name the root cause and the two-part fix. - You’re adopting AVM over a storage account that already exists in Azure but isn’t in Terraform state. Which block do you use, and what should a correct plan show?
Answers
- The module is pre-1.0, and
~> 0.9expands to>= 0.9.0, < 1.0.0; because AVM treats the minor segment as breaking below 1.0, that admits a breaking0.10.0. The constraint should have been exact (0.9.1) in a wrapper, or the three-part~> 0.9.1(allows0.9.x, blocks0.10.0). enable_telemetry = true(the AVM default) — it deploys a tiny emptyMicrosoft.Resources/deployments, which a policy-locked subscription denies, failing the plan. Fix: setenable_telemetry = falsein the wrapper, once, org-wide.- False. A wrapper’s value is what it doesn’t expose. Exposing the full AVM surface makes it a passthrough with no guardrails — an app team could ship a public Key Vault. Expose a narrow, validated contract and inject the rest.
- Root cause: the AVM bump changed the resource’s address (moved it under a
for_eachmap), so Terraform plans destroy+create. Fix: (a) add amovedblock to the new keyed address in the same wrapper version, and (b) gate CI to reject any unacknowledgeddestroy/createin the plan. - Use an
importblock (to= the AVM resource address,id= the Azure resource id). A correct plan shows the resource imported with only in-place diffs (e.g. AVM’s added diagnostics) and zero destroy/create.
Glossary
- Azure Verified Modules (AVM) — Microsoft’s owned, specification-driven set of Terraform/Bicep modules with consistent interfaces and Well-Architected defaults, replacing inconsistent community modules.
- Resource module (
avm-res-*) — an AVM module provisioning one logical resource plus its directly-dependent child resources; the “brick” you compose. - Pattern module (
avm-ptn-*) — an AVM module provisioning a whole multi-resource architecture (e.g. hub-spoke, a landing zone); a “pre-built assembly”. - Platform wrapper — your own pattern module composed from AVM resource bricks that injects org non-negotiables and exposes a narrow contract; a third tier over AVM. You wrap, not fork.
- AVM interface — the shared set of optional inputs AVM mandates (
tags,lock,role_assignments,diagnostic_settings,private_endpoints,managed_identities,enable_telemetry) that lets you write generic policy. enable_telemetry— an AVM input (defaulttrue) that deploys a tiny empty ARM deployment for usage metrics; sends no resource data, but fails plans whereMicrosoft.Resources/deploymentsis policy-denied.- Pre-1.0 (
0.x) versioning — AVM resource modules are below 1.0 and treat the minor segment as breaking, which inverts the usual~>intuition. ~> X.Y.Z(pessimistic constraint) — allows the rightmost segment to increase;~> 0.9.1means>= 0.9.1, < 0.10.0, whereas~> 0.9means>= 0.9.0, < 1.0.0(dangerous for 0.x).validationblock — a custom input precondition invariables.tf(with aconditionanderror_message) that turns conventions into hardplan-time failures.terraform test— native HCL test runner (*.tftest.hcl) that asserts atplan(orapply) level; fast, free, no deploy — used for contract/shape assertions.- Terratest — a Go testing library that runs a real
apply→ assert →destroyagainst live Azure; slow and costs spend — used for end-to-end behaviour, typically nightly. - OIDC workload identity federation — keyless CI auth where the pipeline presents a short-lived federated token scoped to a repo/branch/environment, with no stored long-lived secret.
movedblock — a declarative, version-controlled statement that a resource’s address changed fromfromtoto, so Terraform re-points state instead of destroying and recreating.importblock — a declarative statement (to+id) that brings an existing Azure resource under Terraform management without recreating it.- Replace gate — a CI check (e.g.
jqoverterraform show -json) that fails the build on any unacknowledgeddestroy/createin the plan. .terraform.lock.hcl— the dependency lock file pinning provider versions and checksums; commit it for reproducibility across the team and CI.- Private module registry — a registry (Terraform Cloud/Enterprise) or versioned git ref from which consumers pull wrappers by
source/versioninstead of copy-pasting.
Next steps
You can now build, guard, test, distribute and safely upgrade an AVM-based Terraform platform layer. Build outward:
- Next: Terraform module design: composition, versioning — the composition theory that underpins clean wrappers and a healthy module graph.
- Related: Terraform testing: native & Terratest — go deeper on the two-altitude test strategy this article applies to wrappers.
- Related: Terraform refactoring: moved, import & removed blocks — master the migration mechanics that keep every AVM bump a zero-destroy event.
- Related: Azure Cloud Adoption Framework landing zones — the guardrail and management-group scaffolding your spokes deploy into.
- Related: Checkov, Trivy & tfsec IaC scanning — wire a security scan into the publish pipeline so a wrapper can’t ship a misconfiguration.
- Related: GitHub Actions + Terraform OIDC plan/PR automation — the keyless CI pattern that runs plan-on-PR and Terratest without stored secrets.
- Related: Bicep private module registry with ACR & CI/CD — the equivalent platform-layer distribution story for teams that prefer Bicep AVM.