Terraform Module: Azure Capacity Reservation Group — Guaranteed VM Capacity on Demand

Quick take — Reusable hashicorp/azurerm ~> 4.0 module for Azure Capacity Reservation Groups: reserve VM SKU capacity per zone, wire reservations, and expose IDs for downstream VM scheduling. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "capacity_reservation" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-capacity-reservation?ref=v1.0.0"

  name                = "..."  # Name of the Capacity Reservation Group (3-80 chars, sta…
  resource_group_name = "..."  # Resource group the group is created in.
  location            = "..."  # Azure region; must support capacity reservations.
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

An Azure Capacity Reservation Group is a logical container that holds one or more capacity reservations, each of which pre-allocates compute capacity for a specific VM SKU (for example Standard_D4s_v5) in a region — optionally pinned to a single availability zone. Once a reservation is provisioned, Azure guarantees that the reserved number of instances of that SKU is available to you the moment you deploy VMs or scale a VM Scale Set into the group, even when the underlying datacenter is otherwise capacity-constrained. To draw against it you simply reference the group’s ID from the VM’s or VMSS’s capacity_reservation_group_id.

The raw Terraform footprint is two resources — azurerm_capacity_reservation_group and azurerm_capacity_reservation — but production usage is fiddly: each reservation is a child of the group, the SKU capacity must be a non-zero integer, a reservation can only target a zone the group itself spans, and you almost always want a for_each over a map of SKU+zone combinations rather than one hard-coded reservation. Wrapping this in a module gives you a single var-driven map of reservations, input validation (capacity > 0, valid zone strings, consistent naming), uniform tagging across environments, and clean outputs so a downstream VM, VMSS, or AKS module can consume the group ID without knowing the internals.

When to use it

You run business-critical or bursty batch workloads that must scale out on a schedule (month-end finance jobs, ML training runs, trading-day capacity) and cannot tolerate AllocationFailed / ZonalAllocationFailed errors at deploy time.
You need zone-resilient capacity guarantees — reserving the same SKU in zones 1, 2 and 3 so a zonal VMSS or AKS node pool always has headroom to recover after a node loss.
You deploy into a constrained region or popular SKU family (GPU NC/ND series, large-memory M/E series) where on-demand allocation frequently fails.
You want capacity decoupled from the VM lifecycle — reserve now, attach VMs later, and keep the reservation warm between deployment windows.
You do not need this for cost savings alone — that is what Reserved Instances and Savings Plans are for. Capacity Reservation Groups are about availability; you pay for reserved-but-unused capacity at the on-demand rate.

Module structure

terraform-module-azure-capacity-reservation/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

resource "azurerm_capacity_reservation_group" "this" {
  name                = var.name
  resource_group_name = var.resource_group_name
  location            = var.location

  # Zones the group spans. Each child reservation may target one of these.
  zones = var.zones

  tags = var.tags
}

resource "azurerm_capacity_reservation" "this" {
  for_each = var.reservations

  name                          = each.key
  capacity_reservation_group_id = azurerm_capacity_reservation_group.this.id

  sku {
    name     = each.value.sku_name
    capacity = each.value.capacity
  }

  # A reservation may be pinned to a single zone (must be one the group spans),
  # or left null to be non-zonal / regional within the group.
  zone = each.value.zone
}

variables.tf

variable "name" {
  type        = string
  description = "Name of the Capacity Reservation Group."

  validation {
    condition     = can(regex("^[a-zA-Z0-9][a-zA-Z0-9._-]{1,78}[a-zA-Z0-9_]$", var.name))
    error_message = "name must be 3-80 chars and start with an alphanumeric character."
  }
}

variable "resource_group_name" {
  type        = string
  description = "Name of the resource group the group is created in."
}

variable "location" {
  type        = string
  description = "Azure region (e.g. westeurope). Must support capacity reservations."
}

variable "zones" {
  type        = list(string)
  description = "Availability zones the group spans. Empty list => regional (non-zonal) group."
  default     = []

  validation {
    condition     = alltrue([for z in var.zones : contains(["1", "2", "3"], z)])
    error_message = "zones must only contain the strings \"1\", \"2\" or \"3\"."
  }
}

variable "reservations" {
  type = map(object({
    sku_name = string
    capacity = number
    zone     = optional(string, null)
  }))
  description = <<-EOT
    Map of capacity reservations keyed by reservation name. For each entry:
      sku_name : VM SKU to reserve, e.g. "Standard_D4s_v5".
      capacity : Number of instances to reserve (must be > 0).
      zone     : Optional single zone ("1"/"2"/"3"); must be one the group spans.
  EOT
  default     = {}

  validation {
    condition     = alltrue([for r in values(var.reservations) : r.capacity > 0])
    error_message = "Every reservation capacity must be a positive integer."
  }

  validation {
    condition = alltrue([
      for r in values(var.reservations) :
      r.zone == null || contains(var.zones, coalesce(r.zone, ""))
    ])
    error_message = "Each reservation zone must be null or one of the group's zones."
  }
}

variable "tags" {
  type        = map(string)
  description = "Tags applied to the Capacity Reservation Group."
  default     = {}
}

outputs.tf

output "id" {
  description = "Resource ID of the Capacity Reservation Group. Attach VMs/VMSS via this."
  value       = azurerm_capacity_reservation_group.this.id
}

output "name" {
  description = "Name of the Capacity Reservation Group."
  value       = azurerm_capacity_reservation_group.this.name
}

output "zones" {
  description = "Availability zones the group spans."
  value       = azurerm_capacity_reservation_group.this.zones
}

output "reservation_ids" {
  description = "Map of reservation name => capacity reservation resource ID."
  value       = { for k, r in azurerm_capacity_reservation.this : k => r.id }
}

output "reserved_capacity" {
  description = "Map of reservation name => reserved instance count for that SKU."
  value       = { for k, r in var.reservations : k => r.capacity }
}

How to use it

module "capacity_reservation_group" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-capacity-reservation?ref=v1.0.0"

  name                = "crg-batch-weu-prod"
  resource_group_name = azurerm_resource_group.compute.name
  location            = "westeurope"
  zones               = ["1", "2", "3"]

  reservations = {
    "res-d4sv5-z1" = { sku_name = "Standard_D4s_v5", capacity = 20, zone = "1" }
    "res-d4sv5-z2" = { sku_name = "Standard_D4s_v5", capacity = 20, zone = "2" }
    "res-d4sv5-z3" = { sku_name = "Standard_D4s_v5", capacity = 20, zone = "3" }
  }

  tags = {
    environment = "prod"
    workload    = "month-end-batch"
    owner       = "platform-team"
  }
}

# Downstream: a zonal VMSS draws its capacity from the reservation group.
resource "azurerm_linux_virtual_machine_scale_set" "batch" {
  name                = "vmss-batch-weu-prod"
  resource_group_name = azurerm_resource_group.compute.name
  location            = "westeurope"
  sku                 = "Standard_D4s_v5"
  instances           = 60
  zones               = ["1", "2", "3"]
  admin_username      = "azureuser"

  # Guarantees scale-out lands on reserved capacity instead of failing.
  capacity_reservation_group_id = module.capacity_reservation_group.id

  # ... admin_ssh_key, network_interface, os_disk, source_image_reference ...
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module config — live/prod/capacity_reservation/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-capacity-reservation?ref=v1.0.0"
}

inputs = {
  name = "..."
  resource_group_name = "..."
  location = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/capacity_reservation && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
name	`string`	—	Yes	Name of the Capacity Reservation Group (3-80 chars, starts alphanumeric).
resource_group_name	`string`	—	Yes	Resource group the group is created in.
location	`string`	—	Yes	Azure region; must support capacity reservations.
zones	`list(string)`	`[]`	No	Availability zones the group spans (`"1"`/`"2"`/`"3"`); empty means regional.
reservations	`map(object({ sku_name, capacity, zone }))`	`{}`	No	Reservations keyed by name; `capacity` must be > 0, `zone` must be one the group spans or null.
tags	`map(string)`	`{}`	No	Tags applied to the Capacity Reservation Group.

Outputs

Name	Description
id	Resource ID of the Capacity Reservation Group; attach VMs/VMSS via `capacity_reservation_group_id`.
name	Name of the Capacity Reservation Group.
zones	Availability zones the group spans.
reservation_ids	Map of reservation name to capacity reservation resource ID.
reserved_capacity	Map of reservation name to reserved instance count for that SKU.

Enterprise scenario

A European insurance firm runs its actuarial reserving engine as a 60-node Standard_D4s_v5 VMSS that only spins up for three days each month-end. In previous cycles the West Europe region was capacity-constrained and the scale-out occasionally failed with ZonalAllocationFailed, delaying regulatory reporting. The platform team now provisions this module with 20 reserved instances in each of zones 1, 2 and 3 (60 total), so the month-end VMSS is guaranteed to land instantly across all three zones; between cycles they scale the VMSS to zero while keeping the reservation warm, accepting the on-demand cost as cheap insurance against a missed reporting SLA.

Best practices

Match SKU, region and zones exactly between the reservation and the consuming VM/VMSS — a Standard_D4s_v5 reservation in zone 1 does nothing for a Standard_D8s_v5 request or a zone-2 instance. Mismatches silently fall back to on-demand allocation.
Right-size capacity and review it monthly — you pay the on-demand rate for reserved-but-unused instances, so track utilization and trim capacity once the peak window passes rather than over-provisioning “just in case”.
Reserve symmetrically across zones for resilient workloads (equal capacity in zones 1/2/3) so a zonal outage or node loss always has headroom to recover into the surviving zones.
Use a clear, queryable naming and tagging convention (e.g. crg-<workload>-<region>-<env> and reservation keys like res-<sku>-z<zone>), and tag every group with owner, workload and environment so idle reservations can be found and cost-attributed in Cost Management.
Combine with a Reserved Instance or Savings Plan covering the same SKU/region to recover the spend — Capacity Reservation Groups guarantee availability, while the RI/Savings Plan reduces the price of the same committed capacity.
Scope RBAC tightly — capacity reservations can incur significant ongoing cost, so grant create/delete on these resources only to the platform team and use a separate group per workload to keep blast radius and billing boundaries clean.