Quick take — A reusable hashicorp/azurerm ~> 4.0 Terraform module for Azure Recovery Services Vault: CMK encryption, immutability, soft delete, cross-region restore, and a VM backup policy wired in. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "azurerm" {
features {}
}
module "recovery_services_vault" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-recovery-services-vault?ref=v1.0.0"
name = "..." # Vault name; 2-50 chars, starts with a letter, alphanume…
resource_group_name = "..." # Resource group to deploy into.
location = "..." # Azure region for the vault.
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
An Azure Recovery Services Vault (RSV) is the management container for Azure Backup and Azure Site Recovery. It holds backup data and recovery points for Azure VMs, Azure Files, SQL/SAP HANA in Azure VMs, and on-prem workloads (via MARS/MABS), and it stores replication state for ASR. Because the vault is where your last line of defence lives, the security and durability knobs on it matter more than on almost any other resource: redundancy (GeoRedundant + cross-region restore), soft delete, immutability, public-network access, and customer-managed key (CMK) encryption are the difference between “we restored in an hour” and “ransomware deleted our backups too.”
Wrapping the vault in a module is worth it because the defaults are dangerous and the safe configuration is verbose. The raw azurerm_recovery_services_vault resource defaults to soft_delete_enabled = true but leaves immutability Disabled, leaves the network open, and won’t enable cross-region restore unless you explicitly opt in — and CRR can only be turned on while the vault is still GeoRedundant and cannot be turned off again. A module lets a platform team encode “vaults are immutable, geo-redundant, CMK-encrypted, private-only, and ship with a baseline VM backup policy” once, then hand consumers three variables. It also packages the common companions — a azurerm_recovery_services_vault_backup_policy_vm and the identity/CMK plumbing — so application teams don’t reinvent retention math or forget to set system_assigned identity before attaching a key.
When to use it
- You run Azure VM, Azure Files, or SQL-in-VM backups and want every vault to be immutable + soft-delete-protected by org policy, not by hope.
- You need ransomware-resilient backups: immutability locked, geo-redundant storage, cross-region restore on, and a long-retention policy baked in.
- You’re standing up Azure Site Recovery and need the vault + identity + private endpoint scaffolding consistently across regions.
- You operate a landing-zone / platform model where dozens of subscriptions each need a compliant vault and you want one reviewed module instead of copy-pasted HCL.
- You must satisfy CMK encryption or data-residency requirements and want the user-assigned identity, key, and
infrastructure_encryption_enabledwired up deterministically.
Skip the module (or strip it down) for a throwaway lab vault where LocallyRedundant and no immutability is fine and the extra identity/CMK surface is just noise.
Module structure
terraform-module-azure-recovery-services-vault/
├── versions.tf # provider pin
├── main.tf # vault + immutability + CRR + baseline VM backup policy
├── variables.tf # var-driven inputs with validation
└── outputs.tf # id/name + identity + policy ids
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 4.0"
}
}
}
main.tf
locals {
# Cross-region restore is only valid with GeoRedundant storage.
cross_region_restore = var.storage_mode_type == "GeoRedundant" ? var.cross_region_restore_enabled : false
# Immutability cannot be Locked at creation in one step safely; expose the chosen
# state but guard Locked so consumers opt in deliberately.
immutability = var.immutability
}
resource "azurerm_recovery_services_vault" "this" {
name = var.name
resource_group_name = var.resource_group_name
location = var.location
sku = var.sku
storage_mode_type = var.storage_mode_type
cross_region_restore_enabled = local.cross_region_restore
soft_delete_enabled = var.soft_delete_enabled
immutability = local.immutability
public_network_access_enabled = var.public_network_access_enabled
dynamic "identity" {
for_each = var.identity_type == null ? [] : [1]
content {
type = var.identity_type
identity_ids = var.identity_type == "UserAssigned" || var.identity_type == "SystemAssigned, UserAssigned" ? var.identity_ids : null
}
}
dynamic "encryption" {
for_each = var.cmk_key_id == null ? [] : [1]
content {
key_id = var.cmk_key_id
infrastructure_encryption_enabled = var.infrastructure_encryption_enabled
user_assigned_identity_id = var.cmk_user_assigned_identity_id
use_system_assigned_identity = var.cmk_user_assigned_identity_id == null
}
}
tags = var.tags
}
# Baseline daily VM backup policy shipped with the vault so consumers can
# protect VMs immediately without hand-writing retention math.
resource "azurerm_backup_policy_vm" "daily" {
count = var.create_vm_backup_policy ? 1 : 0
name = var.vm_backup_policy_name
resource_group_name = var.resource_group_name
recovery_vault_name = azurerm_recovery_services_vault.this.name
policy_type = "V2"
timezone = var.vm_backup_timezone
instant_restore_retention_days = var.instant_restore_retention_days
backup {
frequency = "Daily"
time = var.vm_backup_time
}
retention_daily {
count = var.retention_daily_count
}
retention_weekly {
count = var.retention_weekly_count
weekdays = ["Sunday"]
}
retention_monthly {
count = var.retention_monthly_count
weekdays = ["Sunday"]
weeks = ["First"]
}
retention_yearly {
count = var.retention_yearly_count
weekdays = ["Sunday"]
weeks = ["First"]
months = ["January"]
}
}
variables.tf
variable "name" {
description = "Name of the Recovery Services Vault. 2-50 chars, must start with a letter and contain only alphanumerics and hyphens."
type = string
validation {
condition = can(regex("^[A-Za-z][A-Za-z0-9-]{1,49}$", var.name))
error_message = "Vault name must be 2-50 chars, start with a letter, and use only letters, numbers, and hyphens."
}
}
variable "resource_group_name" {
description = "Name of the resource group to deploy the vault into."
type = string
}
variable "location" {
description = "Azure region for the vault (e.g. centralindia)."
type = string
}
variable "sku" {
description = "Vault SKU. Standard is the normal choice; RS0 is a legacy value."
type = string
default = "Standard"
validation {
condition = contains(["Standard", "RS0"], var.sku)
error_message = "sku must be either Standard or RS0."
}
}
variable "storage_mode_type" {
description = "Backup storage redundancy: LocallyRedundant, ZoneRedundant, or GeoRedundant. GeoRedundant is required for cross-region restore."
type = string
default = "GeoRedundant"
validation {
condition = contains(["LocallyRedundant", "ZoneRedundant", "GeoRedundant"], var.storage_mode_type)
error_message = "storage_mode_type must be LocallyRedundant, ZoneRedundant, or GeoRedundant."
}
}
variable "cross_region_restore_enabled" {
description = "Enable cross-region restore. Only effective with GeoRedundant storage and CANNOT be disabled once enabled."
type = bool
default = true
}
variable "soft_delete_enabled" {
description = "Keep deleted backup items in a soft-deleted state for 14 days. Strongly recommended to leave true."
type = bool
default = true
}
variable "immutability" {
description = "Vault immutability: Disabled, Unlocked, or Locked. Locked is irreversible — recovery points cannot be deleted before expiry."
type = string
default = "Unlocked"
validation {
condition = contains(["Disabled", "Unlocked", "Locked"], var.immutability)
error_message = "immutability must be Disabled, Unlocked, or Locked."
}
}
variable "public_network_access_enabled" {
description = "Allow access over the public internet. Set false and use a private endpoint for production."
type = bool
default = false
}
variable "identity_type" {
description = "Managed identity type for the vault: SystemAssigned, UserAssigned, or 'SystemAssigned, UserAssigned'. Null disables identity. Required when using CMK."
type = string
default = "SystemAssigned"
validation {
condition = var.identity_type == null || contains(
["SystemAssigned", "UserAssigned", "SystemAssigned, UserAssigned"],
var.identity_type
)
error_message = "identity_type must be SystemAssigned, UserAssigned, 'SystemAssigned, UserAssigned', or null."
}
}
variable "identity_ids" {
description = "List of user-assigned managed identity resource IDs. Required when identity_type includes UserAssigned."
type = list(string)
default = []
}
variable "cmk_key_id" {
description = "Key Vault key ID for customer-managed key encryption. Null uses Microsoft-managed keys."
type = string
default = null
}
variable "cmk_user_assigned_identity_id" {
description = "User-assigned identity ID used to access the CMK. Null falls back to the vault's system-assigned identity."
type = string
default = null
}
variable "infrastructure_encryption_enabled" {
description = "Enable double (infrastructure) encryption on top of CMK. Only honoured when cmk_key_id is set; cannot be changed after creation."
type = bool
default = false
}
variable "create_vm_backup_policy" {
description = "Create the baseline daily Azure VM backup policy in this vault."
type = bool
default = true
}
variable "vm_backup_policy_name" {
description = "Name of the baseline VM backup policy."
type = string
default = "policy-vm-daily"
}
variable "vm_backup_timezone" {
description = "Timezone for the VM backup schedule (e.g. India Standard Time, UTC)."
type = string
default = "UTC"
}
variable "vm_backup_time" {
description = "Daily backup start time in 24h HH:MM, must be on the hour or half-hour for Azure Backup."
type = string
default = "23:00"
validation {
condition = can(regex("^([01][0-9]|2[0-3]):(00|30)$", var.vm_backup_time))
error_message = "vm_backup_time must be HH:MM on the hour or half-hour (e.g. 23:00 or 23:30)."
}
}
variable "instant_restore_retention_days" {
description = "Days to retain instant-restore snapshots (1-30 for V2 policies)."
type = number
default = 5
validation {
condition = var.instant_restore_retention_days >= 1 && var.instant_restore_retention_days <= 30
error_message = "instant_restore_retention_days must be between 1 and 30."
}
}
variable "retention_daily_count" {
description = "Number of daily recovery points to retain (7-9999)."
type = number
default = 30
validation {
condition = var.retention_daily_count >= 7 && var.retention_daily_count <= 9999
error_message = "retention_daily_count must be between 7 and 9999."
}
}
variable "retention_weekly_count" {
description = "Number of weekly recovery points to retain."
type = number
default = 12
}
variable "retention_monthly_count" {
description = "Number of monthly recovery points to retain."
type = number
default = 12
}
variable "retention_yearly_count" {
description = "Number of yearly recovery points to retain (set 0 to skip long-term compliance retention)."
type = number
default = 7
}
variable "tags" {
description = "Tags applied to the vault."
type = map(string)
default = {}
}
outputs.tf
output "id" {
description = "Resource ID of the Recovery Services Vault."
value = azurerm_recovery_services_vault.this.id
}
output "name" {
description = "Name of the Recovery Services Vault."
value = azurerm_recovery_services_vault.this.name
}
output "location" {
description = "Region of the vault."
value = azurerm_recovery_services_vault.this.location
}
output "principal_id" {
description = "Principal ID of the vault's system-assigned managed identity (null if not enabled). Use to grant Key Vault and storage access."
value = try(azurerm_recovery_services_vault.this.identity[0].principal_id, null)
}
output "tenant_id" {
description = "Tenant ID of the vault's managed identity (null if not enabled)."
value = try(azurerm_recovery_services_vault.this.identity[0].tenant_id, null)
}
output "vm_backup_policy_id" {
description = "Resource ID of the baseline VM backup policy (null when not created). Attach to azurerm_backup_protected_vm."
value = try(azurerm_backup_policy_vm.daily[0].id, null)
}
output "vm_backup_policy_name" {
description = "Name of the baseline VM backup policy (null when not created)."
value = try(azurerm_backup_policy_vm.daily[0].name, null)
}
How to use it
resource "azurerm_resource_group" "backup" {
name = "rg-backup-prod-cin"
location = "centralindia"
}
module "recovery_services_vault" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-recovery-services-vault?ref=v1.0.0"
name = "rsv-prod-cin-01"
resource_group_name = azurerm_resource_group.backup.name
location = azurerm_resource_group.backup.location
# Ransomware-resilient posture
sku = "Standard"
storage_mode_type = "GeoRedundant"
cross_region_restore_enabled = true
soft_delete_enabled = true
immutability = "Unlocked" # promote to "Locked" after validating retention
public_network_access_enabled = false
# Baseline VM backup policy: 30 daily / 12 weekly / 12 monthly / 7 yearly
create_vm_backup_policy = true
vm_backup_policy_name = "policy-vm-prod-daily"
vm_backup_timezone = "India Standard Time"
vm_backup_time = "23:00"
instant_restore_retention_days = 5
retention_daily_count = 30
retention_yearly_count = 7
tags = {
environment = "prod"
owner = "platform-team"
workload = "backup"
}
}
# Downstream: protect an existing VM using the policy the module created.
resource "azurerm_backup_protected_vm" "app01" {
resource_group_name = azurerm_resource_group.backup.name
recovery_vault_name = module.recovery_services_vault.name
source_vm_id = azurerm_linux_virtual_machine.app01.id
backup_policy_id = module.recovery_services_vault.vm_backup_policy_id
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "azurerm"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...azurerm state bucket/container + key per path...
}
}
2. Module config — live/prod/recovery_services_vault/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-recovery-services-vault?ref=v1.0.0"
}
inputs = {
name = "..."
resource_group_name = "..."
location = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/recovery_services_vault && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
| name | string | — | Yes | Vault name; 2-50 chars, starts with a letter, alphanumerics + hyphens. |
| resource_group_name | string | — | Yes | Resource group to deploy into. |
| location | string | — | Yes | Azure region for the vault. |
| sku | string | "Standard" |
No | Vault SKU (Standard or RS0). |
| storage_mode_type | string | "GeoRedundant" |
No | Redundancy: LocallyRedundant, ZoneRedundant, or GeoRedundant. |
| cross_region_restore_enabled | bool | true |
No | Enable CRR (GeoRedundant only; irreversible once on). |
| soft_delete_enabled | bool | true |
No | Retain deleted backup items for 14 days. |
| immutability | string | "Unlocked" |
No | Disabled, Unlocked, or Locked (Locked is irreversible). |
| public_network_access_enabled | bool | false |
No | Allow public network access; keep false in prod. |
| identity_type | string | "SystemAssigned" |
No | SystemAssigned, UserAssigned, "SystemAssigned, UserAssigned", or null. |
| identity_ids | list(string) | [] |
No | User-assigned identity IDs (required when identity includes UserAssigned). |
| cmk_key_id | string | null |
No | Key Vault key ID for customer-managed key encryption. |
| cmk_user_assigned_identity_id | string | null |
No | User-assigned identity for CMK access; null uses system-assigned. |
| infrastructure_encryption_enabled | bool | false |
No | Double encryption on top of CMK (create-time only). |
| create_vm_backup_policy | bool | true |
No | Create the baseline daily VM backup policy. |
| vm_backup_policy_name | string | "policy-vm-daily" |
No | Name of the baseline VM backup policy. |
| vm_backup_timezone | string | "UTC" |
No | Timezone for the backup schedule. |
| vm_backup_time | string | "23:00" |
No | Daily start time, HH:MM on the hour/half-hour. |
| instant_restore_retention_days | number | 5 |
No | Instant-restore snapshot retention (1-30). |
| retention_daily_count | number | 30 |
No | Daily recovery points retained (7-9999). |
| retention_weekly_count | number | 12 |
No | Weekly recovery points retained. |
| retention_monthly_count | number | 12 |
No | Monthly recovery points retained. |
| retention_yearly_count | number | 7 |
No | Yearly recovery points retained. |
| tags | map(string) | {} |
No | Tags applied to the vault. |
Outputs
| Name | Description |
|---|---|
| id | Resource ID of the Recovery Services Vault. |
| name | Name of the vault. |
| location | Region of the vault. |
| principal_id | System-assigned identity principal ID (null if disabled); use for Key Vault/storage role grants. |
| tenant_id | Tenant ID of the vault’s managed identity (null if disabled). |
| vm_backup_policy_id | Resource ID of the baseline VM backup policy (null when not created). |
| vm_backup_policy_name | Name of the baseline VM backup policy (null when not created). |
Enterprise scenario
A regulated fintech runs ~400 production VMs across centralindia and southindia and must prove to auditors that backups are tamper-proof for a seven-year retention window. The platform team consumes this module from every workload subscription with immutability = "Locked", storage_mode_type = "GeoRedundant", cross_region_restore_enabled = true, and CMK encryption backed by a Key Vault key, so even a compromised subscription owner cannot shorten retention or delete recovery points before expiry. Application teams simply reference module.recovery_services_vault.vm_backup_policy_id from their azurerm_backup_protected_vm resources, and an Azure Policy DeployIfNotExists audit confirms every vault in the tenant matches the locked, geo-redundant baseline.
Best practices
- Lock immutability last, never first. Deploy as
Unlocked, validate that your retention durations are correct and that no team needs to delete recovery points, then promote toLocked— once Locked it is irreversible and you cannot reduce retention or remove items before expiry. - Turn on cross-region restore deliberately. CRR requires
GeoRedundantstorage and is a one-way switch; enable it up front for tier-1 workloads, because you cannot retrofit it after downgrading redundancy, and you can never turn it back off. - Keep the vault private and CMK-encrypted for sensitive data. Set
public_network_access_enabled = falsewith a private endpoint, and front it with a user-assigned identity + Key Vault key (andinfrastructure_encryption_enabledwhere double encryption is mandated) — grant theprincipal_idoutput Key Vault access so backups encrypt under keys you control. - Never lower redundancy after backups exist.
storage_mode_typecan only be changed from Geo→Local/Zone before protection starts; once items are backed up the setting is fixed, so chooseGeoRedundant(orZoneRedundantfor in-region zonal resilience) at creation. - Right-size retention for cost. GRS backup storage is billed per GB and long yearly retention multiplies it fast; use the
retention_*inputs to match each tier (e.g. dropretention_yearly_count = 0for dev) instead of one global policy, and lean oninstant_restore_retention_daysonly as high as your RTO needs. - Name and tag for fleet operations. Use a predictable convention like
rsv-<env>-<region>-<nn>and tagenvironment/owner/workloadso cross-region pairs are obvious, soft-delete and immutability state can be audited per environment, and cost can be split by workload.