Quick take — Reusable hashicorp/azurerm ~> 4.0 module for Azure Capacity Reservation Groups: reserve VM SKU capacity per zone, wire reservations, and expose IDs for downstream VM scheduling. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "azurerm" {
features {}
}
module "capacity_reservation" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-capacity-reservation?ref=v1.0.0"
name = "..." # Name of the Capacity Reservation Group (3-80 chars, sta…
resource_group_name = "..." # Resource group the group is created in.
location = "..." # Azure region; must support capacity reservations.
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
An Azure Capacity Reservation Group is a logical container that holds one or more capacity reservations, each of which pre-allocates compute capacity for a specific VM SKU (for example Standard_D4s_v5) in a region — optionally pinned to a single availability zone. Once a reservation is provisioned, Azure guarantees that the reserved number of instances of that SKU is available to you the moment you deploy VMs or scale a VM Scale Set into the group, even when the underlying datacenter is otherwise capacity-constrained. To draw against it you simply reference the group’s ID from the VM’s or VMSS’s capacity_reservation_group_id.
The raw Terraform footprint is two resources — azurerm_capacity_reservation_group and azurerm_capacity_reservation — but production usage is fiddly: each reservation is a child of the group, the SKU capacity must be a non-zero integer, a reservation can only target a zone the group itself spans, and you almost always want a for_each over a map of SKU+zone combinations rather than one hard-coded reservation. Wrapping this in a module gives you a single var-driven map of reservations, input validation (capacity > 0, valid zone strings, consistent naming), uniform tagging across environments, and clean outputs so a downstream VM, VMSS, or AKS module can consume the group ID without knowing the internals.
When to use it
- You run business-critical or bursty batch workloads that must scale out on a schedule (month-end finance jobs, ML training runs, trading-day capacity) and cannot tolerate
AllocationFailed/ZonalAllocationFailederrors at deploy time. - You need zone-resilient capacity guarantees — reserving the same SKU in zones 1, 2 and 3 so a zonal VMSS or AKS node pool always has headroom to recover after a node loss.
- You deploy into a constrained region or popular SKU family (GPU
NC/NDseries, large-memoryM/Eseries) where on-demand allocation frequently fails. - You want capacity decoupled from the VM lifecycle — reserve now, attach VMs later, and keep the reservation warm between deployment windows.
- You do not need this for cost savings alone — that is what Reserved Instances and Savings Plans are for. Capacity Reservation Groups are about availability; you pay for reserved-but-unused capacity at the on-demand rate.
Module structure
terraform-module-azure-capacity-reservation/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 4.0"
}
}
}
main.tf
resource "azurerm_capacity_reservation_group" "this" {
name = var.name
resource_group_name = var.resource_group_name
location = var.location
# Zones the group spans. Each child reservation may target one of these.
zones = var.zones
tags = var.tags
}
resource "azurerm_capacity_reservation" "this" {
for_each = var.reservations
name = each.key
capacity_reservation_group_id = azurerm_capacity_reservation_group.this.id
sku {
name = each.value.sku_name
capacity = each.value.capacity
}
# A reservation may be pinned to a single zone (must be one the group spans),
# or left null to be non-zonal / regional within the group.
zone = each.value.zone
}
variables.tf
variable "name" {
type = string
description = "Name of the Capacity Reservation Group."
validation {
condition = can(regex("^[a-zA-Z0-9][a-zA-Z0-9._-]{1,78}[a-zA-Z0-9_]$", var.name))
error_message = "name must be 3-80 chars and start with an alphanumeric character."
}
}
variable "resource_group_name" {
type = string
description = "Name of the resource group the group is created in."
}
variable "location" {
type = string
description = "Azure region (e.g. westeurope). Must support capacity reservations."
}
variable "zones" {
type = list(string)
description = "Availability zones the group spans. Empty list => regional (non-zonal) group."
default = []
validation {
condition = alltrue([for z in var.zones : contains(["1", "2", "3"], z)])
error_message = "zones must only contain the strings \"1\", \"2\" or \"3\"."
}
}
variable "reservations" {
type = map(object({
sku_name = string
capacity = number
zone = optional(string, null)
}))
description = <<-EOT
Map of capacity reservations keyed by reservation name. For each entry:
sku_name : VM SKU to reserve, e.g. "Standard_D4s_v5".
capacity : Number of instances to reserve (must be > 0).
zone : Optional single zone ("1"/"2"/"3"); must be one the group spans.
EOT
default = {}
validation {
condition = alltrue([for r in values(var.reservations) : r.capacity > 0])
error_message = "Every reservation capacity must be a positive integer."
}
validation {
condition = alltrue([
for r in values(var.reservations) :
r.zone == null || contains(var.zones, coalesce(r.zone, ""))
])
error_message = "Each reservation zone must be null or one of the group's zones."
}
}
variable "tags" {
type = map(string)
description = "Tags applied to the Capacity Reservation Group."
default = {}
}
outputs.tf
output "id" {
description = "Resource ID of the Capacity Reservation Group. Attach VMs/VMSS via this."
value = azurerm_capacity_reservation_group.this.id
}
output "name" {
description = "Name of the Capacity Reservation Group."
value = azurerm_capacity_reservation_group.this.name
}
output "zones" {
description = "Availability zones the group spans."
value = azurerm_capacity_reservation_group.this.zones
}
output "reservation_ids" {
description = "Map of reservation name => capacity reservation resource ID."
value = { for k, r in azurerm_capacity_reservation.this : k => r.id }
}
output "reserved_capacity" {
description = "Map of reservation name => reserved instance count for that SKU."
value = { for k, r in var.reservations : k => r.capacity }
}
How to use it
module "capacity_reservation_group" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-capacity-reservation?ref=v1.0.0"
name = "crg-batch-weu-prod"
resource_group_name = azurerm_resource_group.compute.name
location = "westeurope"
zones = ["1", "2", "3"]
reservations = {
"res-d4sv5-z1" = { sku_name = "Standard_D4s_v5", capacity = 20, zone = "1" }
"res-d4sv5-z2" = { sku_name = "Standard_D4s_v5", capacity = 20, zone = "2" }
"res-d4sv5-z3" = { sku_name = "Standard_D4s_v5", capacity = 20, zone = "3" }
}
tags = {
environment = "prod"
workload = "month-end-batch"
owner = "platform-team"
}
}
# Downstream: a zonal VMSS draws its capacity from the reservation group.
resource "azurerm_linux_virtual_machine_scale_set" "batch" {
name = "vmss-batch-weu-prod"
resource_group_name = azurerm_resource_group.compute.name
location = "westeurope"
sku = "Standard_D4s_v5"
instances = 60
zones = ["1", "2", "3"]
admin_username = "azureuser"
# Guarantees scale-out lands on reserved capacity instead of failing.
capacity_reservation_group_id = module.capacity_reservation_group.id
# ... admin_ssh_key, network_interface, os_disk, source_image_reference ...
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "azurerm"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...azurerm state bucket/container + key per path...
}
}
2. Module config — live/prod/capacity_reservation/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-capacity-reservation?ref=v1.0.0"
}
inputs = {
name = "..."
resource_group_name = "..."
location = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/capacity_reservation && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
| name | string |
— | Yes | Name of the Capacity Reservation Group (3-80 chars, starts alphanumeric). |
| resource_group_name | string |
— | Yes | Resource group the group is created in. |
| location | string |
— | Yes | Azure region; must support capacity reservations. |
| zones | list(string) |
[] |
No | Availability zones the group spans ("1"/"2"/"3"); empty means regional. |
| reservations | map(object({ sku_name, capacity, zone })) |
{} |
No | Reservations keyed by name; capacity must be > 0, zone must be one the group spans or null. |
| tags | map(string) |
{} |
No | Tags applied to the Capacity Reservation Group. |
Outputs
| Name | Description |
|---|---|
| id | Resource ID of the Capacity Reservation Group; attach VMs/VMSS via capacity_reservation_group_id. |
| name | Name of the Capacity Reservation Group. |
| zones | Availability zones the group spans. |
| reservation_ids | Map of reservation name to capacity reservation resource ID. |
| reserved_capacity | Map of reservation name to reserved instance count for that SKU. |
Enterprise scenario
A European insurance firm runs its actuarial reserving engine as a 60-node Standard_D4s_v5 VMSS that only spins up for three days each month-end. In previous cycles the West Europe region was capacity-constrained and the scale-out occasionally failed with ZonalAllocationFailed, delaying regulatory reporting. The platform team now provisions this module with 20 reserved instances in each of zones 1, 2 and 3 (60 total), so the month-end VMSS is guaranteed to land instantly across all three zones; between cycles they scale the VMSS to zero while keeping the reservation warm, accepting the on-demand cost as cheap insurance against a missed reporting SLA.
Best practices
- Match SKU, region and zones exactly between the reservation and the consuming VM/VMSS — a
Standard_D4s_v5reservation in zone 1 does nothing for aStandard_D8s_v5request or a zone-2 instance. Mismatches silently fall back to on-demand allocation. - Right-size capacity and review it monthly — you pay the on-demand rate for reserved-but-unused instances, so track utilization and trim
capacityonce the peak window passes rather than over-provisioning “just in case”. - Reserve symmetrically across zones for resilient workloads (equal
capacityin zones 1/2/3) so a zonal outage or node loss always has headroom to recover into the surviving zones. - Use a clear, queryable naming and tagging convention (e.g.
crg-<workload>-<region>-<env>and reservation keys likeres-<sku>-z<zone>), and tag every group withowner,workloadandenvironmentso idle reservations can be found and cost-attributed in Cost Management. - Combine with a Reserved Instance or Savings Plan covering the same SKU/region to recover the spend — Capacity Reservation Groups guarantee availability, while the RI/Savings Plan reduces the price of the same committed capacity.
- Scope RBAC tightly — capacity reservations can incur significant ongoing cost, so grant create/delete on these resources only to the platform team and use a separate group per workload to keep blast radius and billing boundaries clean.