IaC Azure

Terraform Module: Azure Recovery Services Vault — immutable, soft-delete-protected backup at scale

Quick take — A reusable hashicorp/azurerm ~> 4.0 Terraform module for Azure Recovery Services Vault: CMK encryption, immutability, soft delete, cross-region restore, and a VM backup policy wired in. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "recovery_services_vault" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-recovery-services-vault?ref=v1.0.0"

  name                = "..."  # Vault name; 2-50 chars, starts with a letter, alphanume…
  resource_group_name = "..."  # Resource group to deploy into.
  location            = "..."  # Azure region for the vault.
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

An Azure Recovery Services Vault (RSV) is the management container for Azure Backup and Azure Site Recovery. It holds backup data and recovery points for Azure VMs, Azure Files, SQL/SAP HANA in Azure VMs, and on-prem workloads (via MARS/MABS), and it stores replication state for ASR. Because the vault is where your last line of defence lives, the security and durability knobs on it matter more than on almost any other resource: redundancy (GeoRedundant + cross-region restore), soft delete, immutability, public-network access, and customer-managed key (CMK) encryption are the difference between “we restored in an hour” and “ransomware deleted our backups too.”

Wrapping the vault in a module is worth it because the defaults are dangerous and the safe configuration is verbose. The raw azurerm_recovery_services_vault resource defaults to soft_delete_enabled = true but leaves immutability Disabled, leaves the network open, and won’t enable cross-region restore unless you explicitly opt in — and CRR can only be turned on while the vault is still GeoRedundant and cannot be turned off again. A module lets a platform team encode “vaults are immutable, geo-redundant, CMK-encrypted, private-only, and ship with a baseline VM backup policy” once, then hand consumers three variables. It also packages the common companions — a azurerm_recovery_services_vault_backup_policy_vm and the identity/CMK plumbing — so application teams don’t reinvent retention math or forget to set system_assigned identity before attaching a key.

When to use it

Skip the module (or strip it down) for a throwaway lab vault where LocallyRedundant and no immutability is fine and the extra identity/CMK surface is just noise.

Module structure

terraform-module-azure-recovery-services-vault/
├── versions.tf      # provider pin
├── main.tf          # vault + immutability + CRR + baseline VM backup policy
├── variables.tf     # var-driven inputs with validation
└── outputs.tf       # id/name + identity + policy ids

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

locals {
  # Cross-region restore is only valid with GeoRedundant storage.
  cross_region_restore = var.storage_mode_type == "GeoRedundant" ? var.cross_region_restore_enabled : false

  # Immutability cannot be Locked at creation in one step safely; expose the chosen
  # state but guard Locked so consumers opt in deliberately.
  immutability = var.immutability
}

resource "azurerm_recovery_services_vault" "this" {
  name                = var.name
  resource_group_name = var.resource_group_name
  location            = var.location
  sku                 = var.sku

  storage_mode_type            = var.storage_mode_type
  cross_region_restore_enabled = local.cross_region_restore

  soft_delete_enabled = var.soft_delete_enabled
  immutability        = local.immutability

  public_network_access_enabled = var.public_network_access_enabled

  dynamic "identity" {
    for_each = var.identity_type == null ? [] : [1]
    content {
      type         = var.identity_type
      identity_ids = var.identity_type == "UserAssigned" || var.identity_type == "SystemAssigned, UserAssigned" ? var.identity_ids : null
    }
  }

  dynamic "encryption" {
    for_each = var.cmk_key_id == null ? [] : [1]
    content {
      key_id                            = var.cmk_key_id
      infrastructure_encryption_enabled = var.infrastructure_encryption_enabled
      user_assigned_identity_id         = var.cmk_user_assigned_identity_id
      use_system_assigned_identity      = var.cmk_user_assigned_identity_id == null
    }
  }

  tags = var.tags
}

# Baseline daily VM backup policy shipped with the vault so consumers can
# protect VMs immediately without hand-writing retention math.
resource "azurerm_backup_policy_vm" "daily" {
  count = var.create_vm_backup_policy ? 1 : 0

  name                = var.vm_backup_policy_name
  resource_group_name = var.resource_group_name
  recovery_vault_name = azurerm_recovery_services_vault.this.name

  policy_type                    = "V2"
  timezone                       = var.vm_backup_timezone
  instant_restore_retention_days = var.instant_restore_retention_days

  backup {
    frequency = "Daily"
    time      = var.vm_backup_time
  }

  retention_daily {
    count = var.retention_daily_count
  }

  retention_weekly {
    count    = var.retention_weekly_count
    weekdays = ["Sunday"]
  }

  retention_monthly {
    count    = var.retention_monthly_count
    weekdays = ["Sunday"]
    weeks    = ["First"]
  }

  retention_yearly {
    count    = var.retention_yearly_count
    weekdays = ["Sunday"]
    weeks    = ["First"]
    months   = ["January"]
  }
}

variables.tf

variable "name" {
  description = "Name of the Recovery Services Vault. 2-50 chars, must start with a letter and contain only alphanumerics and hyphens."
  type        = string

  validation {
    condition     = can(regex("^[A-Za-z][A-Za-z0-9-]{1,49}$", var.name))
    error_message = "Vault name must be 2-50 chars, start with a letter, and use only letters, numbers, and hyphens."
  }
}

variable "resource_group_name" {
  description = "Name of the resource group to deploy the vault into."
  type        = string
}

variable "location" {
  description = "Azure region for the vault (e.g. centralindia)."
  type        = string
}

variable "sku" {
  description = "Vault SKU. Standard is the normal choice; RS0 is a legacy value."
  type        = string
  default     = "Standard"

  validation {
    condition     = contains(["Standard", "RS0"], var.sku)
    error_message = "sku must be either Standard or RS0."
  }
}

variable "storage_mode_type" {
  description = "Backup storage redundancy: LocallyRedundant, ZoneRedundant, or GeoRedundant. GeoRedundant is required for cross-region restore."
  type        = string
  default     = "GeoRedundant"

  validation {
    condition     = contains(["LocallyRedundant", "ZoneRedundant", "GeoRedundant"], var.storage_mode_type)
    error_message = "storage_mode_type must be LocallyRedundant, ZoneRedundant, or GeoRedundant."
  }
}

variable "cross_region_restore_enabled" {
  description = "Enable cross-region restore. Only effective with GeoRedundant storage and CANNOT be disabled once enabled."
  type        = bool
  default     = true
}

variable "soft_delete_enabled" {
  description = "Keep deleted backup items in a soft-deleted state for 14 days. Strongly recommended to leave true."
  type        = bool
  default     = true
}

variable "immutability" {
  description = "Vault immutability: Disabled, Unlocked, or Locked. Locked is irreversible — recovery points cannot be deleted before expiry."
  type        = string
  default     = "Unlocked"

  validation {
    condition     = contains(["Disabled", "Unlocked", "Locked"], var.immutability)
    error_message = "immutability must be Disabled, Unlocked, or Locked."
  }
}

variable "public_network_access_enabled" {
  description = "Allow access over the public internet. Set false and use a private endpoint for production."
  type        = bool
  default     = false
}

variable "identity_type" {
  description = "Managed identity type for the vault: SystemAssigned, UserAssigned, or 'SystemAssigned, UserAssigned'. Null disables identity. Required when using CMK."
  type        = string
  default     = "SystemAssigned"

  validation {
    condition = var.identity_type == null || contains(
      ["SystemAssigned", "UserAssigned", "SystemAssigned, UserAssigned"],
      var.identity_type
    )
    error_message = "identity_type must be SystemAssigned, UserAssigned, 'SystemAssigned, UserAssigned', or null."
  }
}

variable "identity_ids" {
  description = "List of user-assigned managed identity resource IDs. Required when identity_type includes UserAssigned."
  type        = list(string)
  default     = []
}

variable "cmk_key_id" {
  description = "Key Vault key ID for customer-managed key encryption. Null uses Microsoft-managed keys."
  type        = string
  default     = null
}

variable "cmk_user_assigned_identity_id" {
  description = "User-assigned identity ID used to access the CMK. Null falls back to the vault's system-assigned identity."
  type        = string
  default     = null
}

variable "infrastructure_encryption_enabled" {
  description = "Enable double (infrastructure) encryption on top of CMK. Only honoured when cmk_key_id is set; cannot be changed after creation."
  type        = bool
  default     = false
}

variable "create_vm_backup_policy" {
  description = "Create the baseline daily Azure VM backup policy in this vault."
  type        = bool
  default     = true
}

variable "vm_backup_policy_name" {
  description = "Name of the baseline VM backup policy."
  type        = string
  default     = "policy-vm-daily"
}

variable "vm_backup_timezone" {
  description = "Timezone for the VM backup schedule (e.g. India Standard Time, UTC)."
  type        = string
  default     = "UTC"
}

variable "vm_backup_time" {
  description = "Daily backup start time in 24h HH:MM, must be on the hour or half-hour for Azure Backup."
  type        = string
  default     = "23:00"

  validation {
    condition     = can(regex("^([01][0-9]|2[0-3]):(00|30)$", var.vm_backup_time))
    error_message = "vm_backup_time must be HH:MM on the hour or half-hour (e.g. 23:00 or 23:30)."
  }
}

variable "instant_restore_retention_days" {
  description = "Days to retain instant-restore snapshots (1-30 for V2 policies)."
  type        = number
  default     = 5

  validation {
    condition     = var.instant_restore_retention_days >= 1 && var.instant_restore_retention_days <= 30
    error_message = "instant_restore_retention_days must be between 1 and 30."
  }
}

variable "retention_daily_count" {
  description = "Number of daily recovery points to retain (7-9999)."
  type        = number
  default     = 30

  validation {
    condition     = var.retention_daily_count >= 7 && var.retention_daily_count <= 9999
    error_message = "retention_daily_count must be between 7 and 9999."
  }
}

variable "retention_weekly_count" {
  description = "Number of weekly recovery points to retain."
  type        = number
  default     = 12
}

variable "retention_monthly_count" {
  description = "Number of monthly recovery points to retain."
  type        = number
  default     = 12
}

variable "retention_yearly_count" {
  description = "Number of yearly recovery points to retain (set 0 to skip long-term compliance retention)."
  type        = number
  default     = 7
}

variable "tags" {
  description = "Tags applied to the vault."
  type        = map(string)
  default     = {}
}

outputs.tf

output "id" {
  description = "Resource ID of the Recovery Services Vault."
  value       = azurerm_recovery_services_vault.this.id
}

output "name" {
  description = "Name of the Recovery Services Vault."
  value       = azurerm_recovery_services_vault.this.name
}

output "location" {
  description = "Region of the vault."
  value       = azurerm_recovery_services_vault.this.location
}

output "principal_id" {
  description = "Principal ID of the vault's system-assigned managed identity (null if not enabled). Use to grant Key Vault and storage access."
  value       = try(azurerm_recovery_services_vault.this.identity[0].principal_id, null)
}

output "tenant_id" {
  description = "Tenant ID of the vault's managed identity (null if not enabled)."
  value       = try(azurerm_recovery_services_vault.this.identity[0].tenant_id, null)
}

output "vm_backup_policy_id" {
  description = "Resource ID of the baseline VM backup policy (null when not created). Attach to azurerm_backup_protected_vm."
  value       = try(azurerm_backup_policy_vm.daily[0].id, null)
}

output "vm_backup_policy_name" {
  description = "Name of the baseline VM backup policy (null when not created)."
  value       = try(azurerm_backup_policy_vm.daily[0].name, null)
}

How to use it

resource "azurerm_resource_group" "backup" {
  name     = "rg-backup-prod-cin"
  location = "centralindia"
}

module "recovery_services_vault" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-recovery-services-vault?ref=v1.0.0"

  name                = "rsv-prod-cin-01"
  resource_group_name = azurerm_resource_group.backup.name
  location            = azurerm_resource_group.backup.location

  # Ransomware-resilient posture
  sku                           = "Standard"
  storage_mode_type             = "GeoRedundant"
  cross_region_restore_enabled  = true
  soft_delete_enabled           = true
  immutability                  = "Unlocked" # promote to "Locked" after validating retention
  public_network_access_enabled = false

  # Baseline VM backup policy: 30 daily / 12 weekly / 12 monthly / 7 yearly
  create_vm_backup_policy        = true
  vm_backup_policy_name          = "policy-vm-prod-daily"
  vm_backup_timezone             = "India Standard Time"
  vm_backup_time                 = "23:00"
  instant_restore_retention_days = 5
  retention_daily_count          = 30
  retention_yearly_count         = 7

  tags = {
    environment = "prod"
    owner       = "platform-team"
    workload    = "backup"
  }
}

# Downstream: protect an existing VM using the policy the module created.
resource "azurerm_backup_protected_vm" "app01" {
  resource_group_name = azurerm_resource_group.backup.name
  recovery_vault_name = module.recovery_services_vault.name
  source_vm_id        = azurerm_linux_virtual_machine.app01.id
  backup_policy_id    = module.recovery_services_vault.vm_backup_policy_id
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module configlive/prod/recovery_services_vault/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-recovery-services-vault?ref=v1.0.0"
}

inputs = {
  name = "..."
  resource_group_name = "..."
  location = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/recovery_services_vault && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
name string Yes Vault name; 2-50 chars, starts with a letter, alphanumerics + hyphens.
resource_group_name string Yes Resource group to deploy into.
location string Yes Azure region for the vault.
sku string "Standard" No Vault SKU (Standard or RS0).
storage_mode_type string "GeoRedundant" No Redundancy: LocallyRedundant, ZoneRedundant, or GeoRedundant.
cross_region_restore_enabled bool true No Enable CRR (GeoRedundant only; irreversible once on).
soft_delete_enabled bool true No Retain deleted backup items for 14 days.
immutability string "Unlocked" No Disabled, Unlocked, or Locked (Locked is irreversible).
public_network_access_enabled bool false No Allow public network access; keep false in prod.
identity_type string "SystemAssigned" No SystemAssigned, UserAssigned, "SystemAssigned, UserAssigned", or null.
identity_ids list(string) [] No User-assigned identity IDs (required when identity includes UserAssigned).
cmk_key_id string null No Key Vault key ID for customer-managed key encryption.
cmk_user_assigned_identity_id string null No User-assigned identity for CMK access; null uses system-assigned.
infrastructure_encryption_enabled bool false No Double encryption on top of CMK (create-time only).
create_vm_backup_policy bool true No Create the baseline daily VM backup policy.
vm_backup_policy_name string "policy-vm-daily" No Name of the baseline VM backup policy.
vm_backup_timezone string "UTC" No Timezone for the backup schedule.
vm_backup_time string "23:00" No Daily start time, HH:MM on the hour/half-hour.
instant_restore_retention_days number 5 No Instant-restore snapshot retention (1-30).
retention_daily_count number 30 No Daily recovery points retained (7-9999).
retention_weekly_count number 12 No Weekly recovery points retained.
retention_monthly_count number 12 No Monthly recovery points retained.
retention_yearly_count number 7 No Yearly recovery points retained.
tags map(string) {} No Tags applied to the vault.

Outputs

Name Description
id Resource ID of the Recovery Services Vault.
name Name of the vault.
location Region of the vault.
principal_id System-assigned identity principal ID (null if disabled); use for Key Vault/storage role grants.
tenant_id Tenant ID of the vault’s managed identity (null if disabled).
vm_backup_policy_id Resource ID of the baseline VM backup policy (null when not created).
vm_backup_policy_name Name of the baseline VM backup policy (null when not created).

Enterprise scenario

A regulated fintech runs ~400 production VMs across centralindia and southindia and must prove to auditors that backups are tamper-proof for a seven-year retention window. The platform team consumes this module from every workload subscription with immutability = "Locked", storage_mode_type = "GeoRedundant", cross_region_restore_enabled = true, and CMK encryption backed by a Key Vault key, so even a compromised subscription owner cannot shorten retention or delete recovery points before expiry. Application teams simply reference module.recovery_services_vault.vm_backup_policy_id from their azurerm_backup_protected_vm resources, and an Azure Policy DeployIfNotExists audit confirms every vault in the tenant matches the locked, geo-redundant baseline.

Best practices

TerraformAzureRecovery Services VaultModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading