Quick take — A reusable hashicorp/aws ~> 5.0 Terraform module that provisions an AWS Backup vault, plan, rules, and resource selection — with KMS encryption, cross-region copy, and a least-privilege service role for compliant, automated backups. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "aws" {
region = "us-east-1"
}
module "backup" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-backup?ref=v1.0.0"
vault_name = "..." # Name of the backup vault and prefix for the IAM role (2…
plan_name = "..." # Name of the backup plan.
rules = ["...", "..."] # Backup rules (schedule, retention, lifecycle, optional …
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
AWS Backup is a fully managed, policy-based service that centralizes and automates data protection across services such as EBS, RDS, DynamoDB, EFS, Aurora, FSx, and Storage Gateway. Instead of writing per-service backup logic, you define a backup plan — a set of scheduled rules that say when to back up, how long to retain each recovery point, where to store it, and whether to copy it to another region or account for disaster recovery.
The core resource here is aws_backup_plan, but a plan on its own does nothing useful. In production it always travels with three companions: an aws_backup_vault (the encrypted, access-controlled store for recovery points), an aws_backup_selection (which tags or ARNs the plan actually protects), and an IAM role that AWS Backup assumes to read source data and write recovery points. Wrapping all four in a module gives every team the same encrypted-by-default, tag-driven, cross-region-capable backup posture from a single module block — instead of each squad hand-rolling cron schedules and forgetting to set retention or a copy destination.
This module creates a customer-managed KMS-encrypted vault, a plan with one or more rules, an optional vault lock for ransomware/compliance protection, optional cross-region copy, and a least-privilege service role wired to the AWS-managed backup policies. Selection is tag-based by default, so onboarding a new resource is as simple as adding a tag.
When to use it
- You need consistent, auditable backups across many AWS accounts and want one well-tested module rather than ad-hoc lifecycle rules.
- You must meet retention/compliance mandates (e.g. keep daily backups 35 days, monthlies 7 years) and want WORM-style protection via Backup Vault Lock.
- You want cross-region disaster recovery copies of EBS/RDS/DynamoDB recovery points without scripting it per resource.
- You prefer tag-based selection so application teams opt resources in (
backup = "daily") without touching backup infrastructure. - You are standardizing on customer-managed KMS keys for backup data and need the vault, role, and copy targets to all honor that key.
Reach for native per-service snapshots only when you need sub-hour RPOs or app-consistent quiescing that AWS Backup does not yet cover for your engine; otherwise this module is the lower-toil default.
Module structure
terraform-module-aws-backup/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
main.tf
locals {
# Tag-based selection conditions: { backup = "daily" } => one StringEquals condition.
selection_tags = [
for k, v in var.selection_tags : {
type = "STRINGEQUALS"
key = k
value = v
}
]
}
# ---------------------------------------------------------------------------
# Backup vault (customer-managed KMS, optional Vault Lock)
# ---------------------------------------------------------------------------
resource "aws_backup_vault" "this" {
name = var.vault_name
kms_key_arn = var.kms_key_arn
force_destroy = var.force_destroy
tags = merge(var.tags, { Name = var.vault_name })
}
resource "aws_backup_vault_lock_configuration" "this" {
count = var.enable_vault_lock ? 1 : 0
backup_vault_name = aws_backup_vault.this.name
changeable_for_days = var.vault_lock_changeable_for_days
min_retention_days = var.vault_lock_min_retention_days
max_retention_days = var.vault_lock_max_retention_days
}
# ---------------------------------------------------------------------------
# IAM service role assumed by AWS Backup
# ---------------------------------------------------------------------------
data "aws_iam_policy_document" "assume" {
statement {
effect = "Allow"
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["backup.amazonaws.com"]
}
}
}
resource "aws_iam_role" "backup" {
name = "${var.vault_name}-backup-role"
assume_role_policy = data.aws_iam_policy_document.assume.json
permissions_boundary = var.permissions_boundary_arn
tags = var.tags
}
resource "aws_iam_role_policy_attachment" "backup" {
role = aws_iam_role.backup.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSBackupServiceRolePolicyForBackup"
}
resource "aws_iam_role_policy_attachment" "restore" {
count = var.attach_restore_policy ? 1 : 0
role = aws_iam_role.backup.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSBackupServiceRolePolicyForRestores"
}
# ---------------------------------------------------------------------------
# Backup plan + rules
# ---------------------------------------------------------------------------
resource "aws_backup_plan" "this" {
name = var.plan_name
dynamic "rule" {
for_each = var.rules
content {
rule_name = rule.value.rule_name
target_vault_name = aws_backup_vault.this.name
schedule = rule.value.schedule
start_window = rule.value.start_window
completion_window = rule.value.completion_window
enable_continuous_backup = try(rule.value.enable_continuous_backup, false)
lifecycle {
cold_storage_after = rule.value.cold_storage_after
delete_after = rule.value.delete_after
}
# Optional cross-region (and/or cross-account) copy of recovery points.
dynamic "copy_action" {
for_each = try(rule.value.copy_action, null) == null ? [] : [rule.value.copy_action]
content {
destination_vault_arn = copy_action.value.destination_vault_arn
lifecycle {
cold_storage_after = try(copy_action.value.cold_storage_after, null)
delete_after = try(copy_action.value.delete_after, null)
}
}
}
recovery_point_tags = merge(var.tags, try(rule.value.recovery_point_tags, {}))
}
}
dynamic "advanced_backup_setting" {
for_each = var.enable_windows_vss ? [1] : []
content {
backup_options = { WindowsVSS = "enabled" }
resource_type = "EC2"
}
}
tags = var.tags
}
# ---------------------------------------------------------------------------
# Resource selection (tag-based and/or explicit ARNs)
# ---------------------------------------------------------------------------
resource "aws_backup_selection" "this" {
name = "${var.plan_name}-selection"
plan_id = aws_backup_plan.this.id
iam_role_arn = aws_iam_role.backup.arn
resources = var.selection_resources
not_resources = var.selection_not_resources
dynamic "selection_tag" {
for_each = local.selection_tags
content {
type = selection_tag.value.type
key = selection_tag.value.key
value = selection_tag.value.value
}
}
}
variables.tf
variable "vault_name" {
description = "Name of the backup vault (and prefix for the IAM role)."
type = string
validation {
condition = can(regex("^[A-Za-z0-9._-]{2,50}$", var.vault_name))
error_message = "vault_name must be 2-50 chars: letters, numbers, dot, hyphen, underscore."
}
}
variable "plan_name" {
description = "Name of the backup plan."
type = string
}
variable "kms_key_arn" {
description = "ARN of the customer-managed KMS key used to encrypt the vault. Strongly recommended over the AWS-managed key."
type = string
default = null
}
variable "force_destroy" {
description = "Allow Terraform to delete the vault even if it still contains recovery points. Keep false in production."
type = bool
default = false
}
variable "rules" {
description = <<-EOT
List of backup rules. Each rule:
rule_name (string, required)
schedule (string, required) cron in UTC, e.g. "cron(0 5 * * ? *)"
start_window (number) minutes before the job is considered failed to start
completion_window (number) minutes the job has to finish
cold_storage_after (number) days before transition to cold storage (>= 90 if delete_after set)
delete_after (number) retention in days before the recovery point is deleted
enable_continuous_backup (bool) point-in-time recovery (RDS/Aurora/S3)
copy_action = { destination_vault_arn, cold_storage_after, delete_after } (optional)
recovery_point_tags (map(string), optional)
EOT
type = list(object({
rule_name = string
schedule = string
start_window = optional(number, 60)
completion_window = optional(number, 360)
cold_storage_after = optional(number)
delete_after = optional(number)
enable_continuous_backup = optional(bool, false)
copy_action = optional(object({
destination_vault_arn = string
cold_storage_after = optional(number)
delete_after = optional(number)
}))
recovery_point_tags = optional(map(string), {})
}))
validation {
condition = length(var.rules) > 0
error_message = "At least one backup rule is required."
}
validation {
# If cold_storage_after and delete_after are both set, delete must be >= cold + 90 days (AWS constraint).
condition = alltrue([
for r in var.rules :
r.cold_storage_after == null || r.delete_after == null ? true : r.delete_after >= r.cold_storage_after + 90
])
error_message = "delete_after must be at least cold_storage_after + 90 days when both are set."
}
}
variable "selection_tags" {
description = "Map of tag key/value pairs; resources carrying ALL of these tags are included in the plan."
type = map(string)
default = {}
}
variable "selection_resources" {
description = "Explicit resource ARNs to include. Use [\"*\"] to select all supported resources, or [] when using tags only."
type = list(string)
default = []
}
variable "selection_not_resources" {
description = "Resource ARNs to explicitly exclude from selection."
type = list(string)
default = []
}
variable "attach_restore_policy" {
description = "Attach AWSBackupServiceRolePolicyForRestores so the role can also perform restores."
type = bool
default = true
}
variable "permissions_boundary_arn" {
description = "Optional IAM permissions boundary ARN applied to the backup service role."
type = string
default = null
}
variable "enable_vault_lock" {
description = "Enable Backup Vault Lock (WORM) to protect recovery points from early deletion."
type = bool
default = false
}
variable "vault_lock_changeable_for_days" {
description = "Grace period (days) during which the vault lock can still be changed/removed. >= 3 to enter compliance mode."
type = number
default = 3
}
variable "vault_lock_min_retention_days" {
description = "Minimum retention enforced by the vault lock, in days."
type = number
default = 7
}
variable "vault_lock_max_retention_days" {
description = "Maximum retention enforced by the vault lock, in days."
type = number
default = 36500
}
variable "enable_windows_vss" {
description = "Enable Windows VSS application-consistent backups for EC2 instances."
type = bool
default = false
}
variable "tags" {
description = "Tags applied to all created resources."
type = map(string)
default = {}
}
outputs.tf
output "vault_id" {
description = "Name/ID of the backup vault."
value = aws_backup_vault.this.id
}
output "vault_arn" {
description = "ARN of the backup vault (use as a copy_action destination from other regions)."
value = aws_backup_vault.this.arn
}
output "vault_recovery_points" {
description = "Number of recovery points currently stored in the vault."
value = aws_backup_vault.this.recovery_points
}
output "plan_id" {
description = "ID of the backup plan."
value = aws_backup_plan.this.id
}
output "plan_arn" {
description = "ARN of the backup plan."
value = aws_backup_plan.this.arn
}
output "plan_version" {
description = "Unique version ID of the backup plan (changes on every update)."
value = aws_backup_plan.this.version
}
output "selection_id" {
description = "ID of the backup selection."
value = aws_backup_selection.this.id
}
output "backup_role_arn" {
description = "ARN of the IAM role AWS Backup assumes for backup and restore jobs."
value = aws_iam_role.backup.arn
}
How to use it
# A customer-managed key for backup data (or reference an existing one).
resource "aws_kms_key" "backup" {
description = "CMK for AWS Backup vault"
enable_key_rotation = true
deletion_window_in_days = 30
}
# DR vault in a second region to receive cross-region copies.
resource "aws_backup_vault" "dr" {
provider = aws.dr # aliased provider in eu-west-1
name = "prod-app-dr"
kms_key_arn = aws_kms_key.backup_dr.arn
}
module "backup" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-backup?ref=v1.0.0"
vault_name = "prod-app"
plan_name = "prod-app-plan"
kms_key_arn = aws_kms_key.backup.arn
# Tag-driven selection: any resource tagged backup = "daily" is protected.
selection_tags = {
backup = "daily"
}
rules = [
{
rule_name = "daily-35d"
schedule = "cron(0 5 * * ? *)" # 05:00 UTC every day
start_window = 60
completion_window = 360
delete_after = 35
# Copy each daily recovery point to the DR region, kept 14 days.
copy_action = {
destination_vault_arn = aws_backup_vault.dr.arn
delete_after = 14
}
},
{
rule_name = "monthly-7y"
schedule = "cron(0 6 1 * ? *)" # 06:00 UTC on the 1st
cold_storage_after = 90
delete_after = 2555 # ~7 years
}
]
# Compliance: prevent recovery points being deleted before 7 days.
enable_vault_lock = true
vault_lock_min_retention_days = 7
vault_lock_changeable_for_days = 3
tags = {
Environment = "prod"
Team = "platform"
CostCenter = "1042"
}
}
# Downstream reference: grant a least-privilege role permission to start
# on-demand restore jobs against THIS vault using the module's output ARN.
data "aws_iam_policy_document" "restore_ops" {
statement {
effect = "Allow"
actions = ["backup:StartRestoreJob"]
resources = ["*"]
}
statement {
effect = "Allow"
actions = ["iam:PassRole"]
resources = [module.backup.backup_role_arn]
}
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "s3"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...s3 state bucket/container + key per path...
}
}
2. Module config — live/prod/backup/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-backup?ref=v1.0.0"
}
inputs = {
vault_name = "..."
plan_name = "..."
rules = ["...", "..."]
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/backup && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
| vault_name | string | — | yes | Name of the backup vault and prefix for the IAM role (2-50 chars). |
| plan_name | string | — | yes | Name of the backup plan. |
| kms_key_arn | string | null | no | Customer-managed KMS key ARN encrypting the vault. Recommended. |
| force_destroy | bool | false | no | Allow vault deletion while it still holds recovery points. |
| rules | list(object) | — | yes | Backup rules (schedule, retention, lifecycle, optional copy_action). At least one required. |
| selection_tags | map(string) | {} | no | Tag key/values; resources with ALL tags are backed up. |
| selection_resources | list(string) | [] | no | Explicit ARNs to include (use [“*”] for all supported resources). |
| selection_not_resources | list(string) | [] | no | ARNs to explicitly exclude. |
| attach_restore_policy | bool | true | no | Also attach the AWS-managed restore policy to the role. |
| permissions_boundary_arn | string | null | no | IAM permissions boundary for the backup service role. |
| enable_vault_lock | bool | false | no | Enable Backup Vault Lock (WORM) on the vault. |
| vault_lock_changeable_for_days | number | 3 | no | Grace period before the vault lock becomes immutable. |
| vault_lock_min_retention_days | number | 7 | no | Minimum retention enforced by the vault lock. |
| vault_lock_max_retention_days | number | 36500 | no | Maximum retention enforced by the vault lock. |
| enable_windows_vss | bool | false | no | Enable Windows VSS application-consistent EC2 backups. |
| tags | map(string) | {} | no | Tags applied to all created resources. |
Outputs
| Name | Description |
|---|---|
| vault_id | Name/ID of the backup vault. |
| vault_arn | ARN of the backup vault (use as a cross-region copy_action destination). |
| vault_recovery_points | Number of recovery points currently stored in the vault. |
| plan_id | ID of the backup plan. |
| plan_arn | ARN of the backup plan. |
| plan_version | Unique version ID of the plan (changes on each update). |
| selection_id | ID of the backup selection. |
| backup_role_arn | ARN of the IAM role AWS Backup assumes for backup/restore jobs. |
Enterprise scenario
A financial-services platform runs ~600 RDS, EBS, and DynamoDB resources across 25 application accounts governed by AWS Organizations. The platform team publishes this module once and onboards each account through a stack that sets selection_tags = { backup = "daily" }, so application squads protect a new database simply by tagging it — no backup PRs required. The daily-35d rule copies every recovery point into a hardened DR vault in eu-west-1, and enable_vault_lock with a 7-day minimum retention satisfies the regulator’s ransomware-resilience control by making recent backups immutable. When an auditor asks “prove these backups cannot be deleted early,” the team points at the single vault-lock configuration and the plan_version output captured in state.
Best practices
- Always supply a customer-managed
kms_key_arn. The AWS-managedaws/backupkey cannot be shared cross-account and gives you no key policy control; a CMK lets you scope who can decrypt restores and enables key rotation. - Lock production vaults, but rehearse first. Set
enable_vault_lock = truewithchangeable_for_days >= 3so you have a grace window — once compliance mode hardens, even the root account cannot shorten retention or delete recovery points. - Tier retention with
cold_storage_afterto control cost. Long-retention (monthly/yearly) rules should transition to cold storage; remember AWS requiresdelete_afterto be at leastcold_storage_after + 90days — the module validates this for you. - Size
start_window/completion_windowto your backup volume. Large EBS/RDS fleets can miss a too-short completion window and silently fail; monitorBackupJobfailures via AWS Backup Audit Manager or EventBridge. - Prefer tag-based selection over
["*"]. Wildcard selection sweeps in every supported resource (and its cost); explicit tags keep ownership and spend attributable, and avoid backing up ephemeral or already-replicated data. - Use a consistent
vault_name/plan_nameconvention (e.g.<app>-<env>) and applytagswithEnvironment,Team, andCostCenterso backup storage shows up cleanly in Cost Explorer and the IAM role name stays predictable.