Most Terraform codebases don’t fail because of a missing feature. They fail because a module’s interface is sloppy, its versions float, and every consumer copies-and-pastes the last team’s mistakes. This is a practitioner’s guide to building modules that behave like real software: stable contracts, validated inputs, semantic versions, and a private registry that platform teams can trust.
1. Root vs. child modules: where to draw the boundary
A root module is the directory you run terraform apply against. It owns the backend, the providers, and the environment-specific wiring. A child module is a reusable unit you call from elsewhere with module blocks. It owns resources and a contract, never a backend and never provider configuration.
The single most useful rule: child modules must not declare provider blocks or backend configuration. They inherit providers from the root. Declaring providers inside a shared module makes it impossible to use that module more than once with different provider aliases, and it’s a one-way door that’s painful to undo.
# child module: modules/storage-account/versions.tf
# Declare REQUIRED providers (the contract), never CONFIGURE them.
terraform {
required_version = ">= 1.6.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = ">= 3.80, < 5.0"
}
}
}
Draw the boundary along ownership and lifecycle, not along resource types. A good child module encapsulates one cohesive thing a team would reason about as a unit: a storage account with its network rules and diagnostic settings; a virtual network with its subnets and NSG associations. If two resources are always created, destroyed, and scaled together, they belong in the same module. If they have independent lifecycles, split them.
Rule of thumb: if you can’t write a one-sentence description of what a module produces, it’s doing too much.
2. Designing the module interface: inputs and outputs as a contract
Treat variables.tf and outputs.tf as your public API. Everything inside is an implementation detail you’re free to refactor; everything in the interface is a promise you have to keep.
Required vs. optional inputs. A variable is required when it has no default. Make the irreducible facts required (name, resource group, location) and give everything else a sensible default. Resist the urge to default things that have no safe universal value — a default that’s wrong for half your consumers is worse than a required input.
# modules/storage-account/variables.tf
variable "name" {
description = "Globally unique storage account name (3-24 lowercase alphanumeric chars)."
type = string
}
variable "resource_group_name" {
description = "Resource group to create the account in."
type = string
}
variable "location" {
description = "Azure region, e.g. eastus2."
type = string
}
variable "account_tier" {
description = "Performance tier."
type = string
default = "Standard"
}
variable "replication_type" {
description = "Storage replication strategy."
type = string
default = "ZRS"
}
variable "tags" {
description = "Tags merged onto all resources."
type = map(string)
default = {}
}
Outputs are a contract too. Export the values consumers actually need to wire things together — IDs, names, endpoints — and nothing more. Every output you publish is something you can’t remove without a breaking change, so be deliberate. Mark sensitive outputs so they don’t leak into logs.
# modules/storage-account/outputs.tf
output "id" {
description = "Resource ID of the storage account."
value = azurerm_storage_account.this.id
}
output "name" {
description = "The storage account name."
value = azurerm_storage_account.this.name
}
output "primary_blob_endpoint" {
description = "Primary blob service endpoint."
value = azurerm_storage_account.this.primary_blob_endpoint
}
output "primary_access_key" {
description = "Primary access key. Prefer managed identity over this."
value = azurerm_storage_account.this.primary_access_key
sensitive = true
}
A useful convention: name the primary resource this (e.g. azurerm_storage_account.this) when a module manages exactly one of a kind. It reads cleanly in outputs and signals intent.
3. Input hardening: validation, type constraints, and optional() attributes
Type constraints are your first line of defense; validation blocks are your second. Catch bad input at plan time with a clear message instead of at apply time with a 400 from the provider.
variable "name" {
description = "Globally unique storage account name."
type = string
validation {
condition = can(regex("^[a-z0-9]{3,24}$", var.name))
error_message = "name must be 3-24 lowercase alphanumeric characters."
}
}
variable "replication_type" {
description = "Storage replication strategy."
type = string
default = "ZRS"
validation {
condition = contains(["LRS", "ZRS", "GRS", "GZRS", "RAGRS", "RAGZRS"], var.replication_type)
error_message = "replication_type must be one of LRS, ZRS, GRS, GZRS, RAGRS, RAGZRS."
}
}
For structured input, use object types with optional() attributes (stable since Terraform 1.3). This lets callers pass a partial object and have the rest defaulted, which is far cleaner than a flat sprawl of network_* variables. optional() takes a second argument for the default value.
variable "network_rules" {
description = "Optional network ACLs for the account."
type = object({
default_action = optional(string, "Deny")
bypass = optional(set(string), ["AzureServices"])
ip_rules = optional(list(string), [])
subnet_ids = optional(list(string), [])
})
default = {}
}
A consumer can now write network_rules = { ip_rules = ["203.0.113.0/24"] } and inherit default_action = "Deny" automatically. Note that defaults supplied by optional() are applied per-attribute, so the top-level default = {} plus per-attribute defaults gives you a fully populated object even when the caller passes nothing.
You can also cross-validate multiple variables. Since Terraform 1.9, a validation block can reference other variables directly:
variable "enable_https_only" {
type = bool
default = true
}
variable "min_tls_version" {
type = string
default = "TLS1_2"
validation {
# Refers to another variable (allowed in Terraform >= 1.9).
condition = var.enable_https_only || var.min_tls_version != "TLS1_0"
error_message = "TLS1_0 is only permissible when HTTPS-only is disabled."
}
}
4. Composition patterns: thin wrappers, facades, and the god-module trap
Thin wrapper. Wrap an upstream module (or raw resource set) to bake in your org’s defaults — naming conventions, mandatory tags, diagnostic settings — while passing everything else straight through. The wrapper is small and exists only to encode policy.
Facade module. Compose several child modules behind one coarse-grained interface that represents an application’s footprint. The facade calls the network, storage, and identity modules and exposes a curated set of outputs. Consumers get one front door; you keep small, independently testable building blocks behind it.
# modules/app-platform/main.tf (a facade)
module "network" {
source = "app.terraform.io/kloudvin/network/azurerm"
version = "~> 2.4"
name = "${var.app_name}-vnet"
address_space = var.address_space
resource_group = var.resource_group_name
location = var.location
tags = local.tags
}
module "storage" {
source = "app.terraform.io/kloudvin/storage-account/azurerm"
version = "~> 1.7"
name = var.storage_name
resource_group_name = var.resource_group_name
location = var.location
network_rules = {
subnet_ids = [module.network.subnet_ids["app"]]
}
tags = local.tags
}
The god-module anti-pattern is the failure mode to avoid: one module with 60 input variables, dozens of count/for_each toggles, and a feature_flags object that turns whole resource trees on and off. It’s impossible to test, terrifying to upgrade, and every consumer uses a different 20% of it. When you see a module growing a boolean for every resource it might create, that’s the signal to decompose it into smaller modules and compose them with a facade.
| Pattern | Use when | Interface size |
|---|---|---|
| Thin wrapper | You need org defaults over an existing module | Small, mostly pass-through |
| Facade | An app needs several modules wired together | Medium, curated |
| God-module | Never | Enormous, flag-driven |
5. Semantic versioning and source pinning with ~>
Version modules with SemVer (MAJOR.MINOR.PATCH) and mean it:
- MAJOR — a breaking change to the interface: removing or renaming a variable/output, changing a default that alters real infrastructure, or tightening validation that previously-valid input now fails.
- MINOR — backward-compatible additions: a new optional variable, a new output.
- PATCH — bug fixes that don’t touch the interface.
Pin module sources with the pessimistic constraint operator ~>. It allows the rightmost version component to increase but no further:
~> 1.7permits>= 1.7.0, < 2.0.0— any new minor or patch, but never a major.~> 1.7.2permits>= 1.7.2, < 1.8.0— patches only.
module "storage" {
source = "app.terraform.io/kloudvin/storage-account/azurerm"
version = "~> 1.7" # auto-adopt 1.x improvements, block 2.0 breaking changes
# ...
}
Use ~> MAJOR.MINOR for modules you want to keep current within a major line, and the stricter ~> MAJOR.MINOR.PATCH for anything you need frozen against even minor drift. Note the version argument only applies to registry sources and provider requirements; Git sources are pinned with a ?ref= tag instead:
module "legacy" {
source = "git::https://github.com/kloudvin/tf-modules.git//storage?ref=v1.7.3"
}
Always pin to an immutable tag, never a branch. ?ref=main is a production incident waiting for a quiet afternoon.
6. Publishing to a private registry with tagged releases
Terraform’s private registries discover modules by Git tags that follow x.y.z (a leading v is accepted). Repositories must be named terraform-<PROVIDER>-<NAME>, e.g. terraform-azurerm-storage-account. The release flow is the same everywhere: tag, push, the registry indexes the new version.
# Cut a release from a clean main branch
git switch main
git pull
git tag -a v1.7.3 -m "fix: correct default network ACL bypass set"
git push origin v1.7.3
Terraform Cloud / HCP Terraform. Connect the VCS repo (or publish via the API), and tagged commits appear automatically. Consumers reference the four-part address:
module "storage" {
source = "app.terraform.io/<ORG>/storage-account/azurerm"
version = "~> 1.7"
}
Authenticate CI and developer machines with terraform login app.terraform.io, or set a credentials block / TF_TOKEN_app_terraform_io environment variable for non-interactive runs.
Azure DevOps / GitLab. Both expose a native private module registry. GitLab uses a namespaced address and a CI_JOB_TOKEN (or a personal/group token) for auth:
module "storage" {
source = "gitlab.example.com/<group>/storage-account/azurerm"
version = "~> 1.7"
}
Whatever the host, the discipline is identical: one module per repo, releases via signed/annotated tags, and a changelog entry per tag. A registry without a changelog just hides the diff from the people who need it most.
7. Documenting and validating modules
Generate documentation from the code so it can never drift. terraform-docs reads your variables and outputs and injects a table into the README between marker comments.
# Inject/refresh the docs table in README.md
terraform-docs markdown table --output-file README.md --output-mode inject .
<!-- BEGIN_TF_DOCS -->
<!-- terraform-docs writes the inputs/outputs tables here -->
<!-- END_TF_DOCS -->
Lint for correctness and provider-specific footguns with tflint (add the Azure ruleset plugin for deprecated arguments and invalid instance types):
tflint --init
tflint --recursive
Keep at least one runnable example fixture per module under examples/. It doubles as documentation and as the thing your CI actually plans, so the module is proven to instantiate.
modules/storage-account/
main.tf
variables.tf
outputs.tf
versions.tf
README.md
examples/
basic/
main.tf
Wire it all into CI. This pipeline fails on unformatted code, invalid config, lint findings, and a broken example:
# .github/workflows/module-ci.yml
name: module-ci
on: [pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- name: Format check
run: terraform fmt -recursive -check
- name: Validate
run: terraform -chdir=examples/basic init -backend=false && terraform -chdir=examples/basic validate
- name: Lint
uses: terraform-linters/setup-tflint@v4
- name: Run tflint
run: tflint --recursive
terraform validatechecks syntax and internal consistency, not whether the plan is correct against a real cloud. Useinit -backend=falsein the example so CI doesn’t need cloud credentials just to validate.
8. Migration strategy: from monolith to versioned modules with moved blocks
Refactoring a flat root module into child modules normally means Terraform wants to destroy and recreate every resource, because its address in state changed (azurerm_storage_account.main becomes module.storage.azurerm_storage_account.this). The moved block rewrites state addresses at plan time with zero destruction.
# Place in the root module that's being refactored
moved {
from = azurerm_storage_account.main
to = module.storage.azurerm_storage_account.this
}
moved {
from = azurerm_virtual_network.main
to = module.network.azurerm_virtual_network.this
}
Run terraform plan and confirm it reports moves, not replacements. A correct refactor shows lines like # module.storage.azurerm_storage_account.this has moved and a final plan of 0 to add, 0 to change, 0 to destroy. If you see destroys, an address is wrong — fix the moved block before applying.
moved blocks are declarative and idempotent; leave them in place for at least one release cycle so every workspace and every collaborator’s state gets migrated, then remove them. For module-to-module renames across packages, moved works as long as both addresses are reachable from the configuration being applied. For relocating resources to a different state file entirely, that’s terraform state mv (or import/removed blocks), not moved.
Verify
Confirm the whole chain works end to end:
# 1. Formatting and static validation pass
terraform fmt -recursive -check
terraform -chdir=examples/basic init -backend=false
terraform -chdir=examples/basic validate
# 2. The published version resolves from the private registry
terraform init # in a consumer root that pins ~> 1.7
terraform version
cat .terraform.lock.hcl | grep -A2 storage-account # provider/module pins recorded
# 3. A refactor plan moves state instead of destroying it
terraform plan # expect: "has moved" lines, 0 to destroy
A green run means: the interface validates, the tagged version is discoverable, the constraint resolves to the version you expect, and migrations are non-destructive.
Checklist
Pitfalls and next steps
The recurring mistakes are predictable: floating versions (source without version, or ?ref=main); breaking changes shipped as a minor bump because nobody audited the interface; provider blocks smuggled into shared modules; and god-modules that grow a feature flag instead of a sibling. Each one is cheap to prevent and expensive to unwind.
From here, add contract tests with the native terraform test framework (.tftest.hcl files) so assertions about outputs run in CI, and consider policy as code (Sentinel or OPA/Conftest) at the registry boundary to reject plans that violate org standards before they ever reach a workspace. Modules that ship with tests, generated docs, semantic versions, and a private registry stop being a liability and start behaving like the platform primitives your teams actually deserve.