IaC AWS

Terraform Module: AWS Backup — centralized, policy-driven backups with cross-region copies

Quick take — A reusable hashicorp/aws ~> 5.0 Terraform module that provisions an AWS Backup vault, plan, rules, and resource selection — with KMS encryption, cross-region copy, and a least-privilege service role for compliant, automated backups. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "aws" {
  region = "us-east-1"
}

module "backup" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-backup?ref=v1.0.0"

  vault_name = "..."           # Name of the backup vault and prefix for the IAM role (2…
  plan_name  = "..."           # Name of the backup plan.
  rules      = ["...", "..."]  # Backup rules (schedule, retention, lifecycle, optional …
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

AWS Backup is a fully managed, policy-based service that centralizes and automates data protection across services such as EBS, RDS, DynamoDB, EFS, Aurora, FSx, and Storage Gateway. Instead of writing per-service backup logic, you define a backup plan — a set of scheduled rules that say when to back up, how long to retain each recovery point, where to store it, and whether to copy it to another region or account for disaster recovery.

The core resource here is aws_backup_plan, but a plan on its own does nothing useful. In production it always travels with three companions: an aws_backup_vault (the encrypted, access-controlled store for recovery points), an aws_backup_selection (which tags or ARNs the plan actually protects), and an IAM role that AWS Backup assumes to read source data and write recovery points. Wrapping all four in a module gives every team the same encrypted-by-default, tag-driven, cross-region-capable backup posture from a single module block — instead of each squad hand-rolling cron schedules and forgetting to set retention or a copy destination.

This module creates a customer-managed KMS-encrypted vault, a plan with one or more rules, an optional vault lock for ransomware/compliance protection, optional cross-region copy, and a least-privilege service role wired to the AWS-managed backup policies. Selection is tag-based by default, so onboarding a new resource is as simple as adding a tag.

When to use it

Reach for native per-service snapshots only when you need sub-hour RPOs or app-consistent quiescing that AWS Backup does not yet cover for your engine; otherwise this module is the lower-toil default.

Module structure

terraform-module-aws-backup/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  # Tag-based selection conditions: { backup = "daily" } => one StringEquals condition.
  selection_tags = [
    for k, v in var.selection_tags : {
      type  = "STRINGEQUALS"
      key   = k
      value = v
    }
  ]
}

# ---------------------------------------------------------------------------
# Backup vault (customer-managed KMS, optional Vault Lock)
# ---------------------------------------------------------------------------
resource "aws_backup_vault" "this" {
  name          = var.vault_name
  kms_key_arn   = var.kms_key_arn
  force_destroy = var.force_destroy

  tags = merge(var.tags, { Name = var.vault_name })
}

resource "aws_backup_vault_lock_configuration" "this" {
  count = var.enable_vault_lock ? 1 : 0

  backup_vault_name   = aws_backup_vault.this.name
  changeable_for_days = var.vault_lock_changeable_for_days
  min_retention_days  = var.vault_lock_min_retention_days
  max_retention_days  = var.vault_lock_max_retention_days
}

# ---------------------------------------------------------------------------
# IAM service role assumed by AWS Backup
# ---------------------------------------------------------------------------
data "aws_iam_policy_document" "assume" {
  statement {
    effect  = "Allow"
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["backup.amazonaws.com"]
    }
  }
}

resource "aws_iam_role" "backup" {
  name                 = "${var.vault_name}-backup-role"
  assume_role_policy   = data.aws_iam_policy_document.assume.json
  permissions_boundary = var.permissions_boundary_arn

  tags = var.tags
}

resource "aws_iam_role_policy_attachment" "backup" {
  role       = aws_iam_role.backup.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSBackupServiceRolePolicyForBackup"
}

resource "aws_iam_role_policy_attachment" "restore" {
  count = var.attach_restore_policy ? 1 : 0

  role       = aws_iam_role.backup.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSBackupServiceRolePolicyForRestores"
}

# ---------------------------------------------------------------------------
# Backup plan + rules
# ---------------------------------------------------------------------------
resource "aws_backup_plan" "this" {
  name = var.plan_name

  dynamic "rule" {
    for_each = var.rules
    content {
      rule_name           = rule.value.rule_name
      target_vault_name   = aws_backup_vault.this.name
      schedule            = rule.value.schedule
      start_window        = rule.value.start_window
      completion_window   = rule.value.completion_window
      enable_continuous_backup = try(rule.value.enable_continuous_backup, false)

      lifecycle {
        cold_storage_after = rule.value.cold_storage_after
        delete_after       = rule.value.delete_after
      }

      # Optional cross-region (and/or cross-account) copy of recovery points.
      dynamic "copy_action" {
        for_each = try(rule.value.copy_action, null) == null ? [] : [rule.value.copy_action]
        content {
          destination_vault_arn = copy_action.value.destination_vault_arn

          lifecycle {
            cold_storage_after = try(copy_action.value.cold_storage_after, null)
            delete_after       = try(copy_action.value.delete_after, null)
          }
        }
      }

      recovery_point_tags = merge(var.tags, try(rule.value.recovery_point_tags, {}))
    }
  }

  dynamic "advanced_backup_setting" {
    for_each = var.enable_windows_vss ? [1] : []
    content {
      backup_options = { WindowsVSS = "enabled" }
      resource_type  = "EC2"
    }
  }

  tags = var.tags
}

# ---------------------------------------------------------------------------
# Resource selection (tag-based and/or explicit ARNs)
# ---------------------------------------------------------------------------
resource "aws_backup_selection" "this" {
  name         = "${var.plan_name}-selection"
  plan_id      = aws_backup_plan.this.id
  iam_role_arn = aws_iam_role.backup.arn

  resources     = var.selection_resources
  not_resources = var.selection_not_resources

  dynamic "selection_tag" {
    for_each = local.selection_tags
    content {
      type  = selection_tag.value.type
      key   = selection_tag.value.key
      value = selection_tag.value.value
    }
  }
}

variables.tf

variable "vault_name" {
  description = "Name of the backup vault (and prefix for the IAM role)."
  type        = string

  validation {
    condition     = can(regex("^[A-Za-z0-9._-]{2,50}$", var.vault_name))
    error_message = "vault_name must be 2-50 chars: letters, numbers, dot, hyphen, underscore."
  }
}

variable "plan_name" {
  description = "Name of the backup plan."
  type        = string
}

variable "kms_key_arn" {
  description = "ARN of the customer-managed KMS key used to encrypt the vault. Strongly recommended over the AWS-managed key."
  type        = string
  default     = null
}

variable "force_destroy" {
  description = "Allow Terraform to delete the vault even if it still contains recovery points. Keep false in production."
  type        = bool
  default     = false
}

variable "rules" {
  description = <<-EOT
    List of backup rules. Each rule:
      rule_name                (string, required)
      schedule                 (string, required) cron in UTC, e.g. "cron(0 5 * * ? *)"
      start_window             (number) minutes before the job is considered failed to start
      completion_window        (number) minutes the job has to finish
      cold_storage_after       (number) days before transition to cold storage (>= 90 if delete_after set)
      delete_after             (number) retention in days before the recovery point is deleted
      enable_continuous_backup (bool)   point-in-time recovery (RDS/Aurora/S3)
      copy_action = { destination_vault_arn, cold_storage_after, delete_after } (optional)
      recovery_point_tags      (map(string), optional)
  EOT
  type = list(object({
    rule_name                = string
    schedule                 = string
    start_window             = optional(number, 60)
    completion_window        = optional(number, 360)
    cold_storage_after       = optional(number)
    delete_after             = optional(number)
    enable_continuous_backup = optional(bool, false)
    copy_action = optional(object({
      destination_vault_arn = string
      cold_storage_after    = optional(number)
      delete_after          = optional(number)
    }))
    recovery_point_tags = optional(map(string), {})
  }))

  validation {
    condition     = length(var.rules) > 0
    error_message = "At least one backup rule is required."
  }

  validation {
    # If cold_storage_after and delete_after are both set, delete must be >= cold + 90 days (AWS constraint).
    condition = alltrue([
      for r in var.rules :
      r.cold_storage_after == null || r.delete_after == null ? true : r.delete_after >= r.cold_storage_after + 90
    ])
    error_message = "delete_after must be at least cold_storage_after + 90 days when both are set."
  }
}

variable "selection_tags" {
  description = "Map of tag key/value pairs; resources carrying ALL of these tags are included in the plan."
  type        = map(string)
  default     = {}
}

variable "selection_resources" {
  description = "Explicit resource ARNs to include. Use [\"*\"] to select all supported resources, or [] when using tags only."
  type        = list(string)
  default     = []
}

variable "selection_not_resources" {
  description = "Resource ARNs to explicitly exclude from selection."
  type        = list(string)
  default     = []
}

variable "attach_restore_policy" {
  description = "Attach AWSBackupServiceRolePolicyForRestores so the role can also perform restores."
  type        = bool
  default     = true
}

variable "permissions_boundary_arn" {
  description = "Optional IAM permissions boundary ARN applied to the backup service role."
  type        = string
  default     = null
}

variable "enable_vault_lock" {
  description = "Enable Backup Vault Lock (WORM) to protect recovery points from early deletion."
  type        = bool
  default     = false
}

variable "vault_lock_changeable_for_days" {
  description = "Grace period (days) during which the vault lock can still be changed/removed. >= 3 to enter compliance mode."
  type        = number
  default     = 3
}

variable "vault_lock_min_retention_days" {
  description = "Minimum retention enforced by the vault lock, in days."
  type        = number
  default     = 7
}

variable "vault_lock_max_retention_days" {
  description = "Maximum retention enforced by the vault lock, in days."
  type        = number
  default     = 36500
}

variable "enable_windows_vss" {
  description = "Enable Windows VSS application-consistent backups for EC2 instances."
  type        = bool
  default     = false
}

variable "tags" {
  description = "Tags applied to all created resources."
  type        = map(string)
  default     = {}
}

outputs.tf

output "vault_id" {
  description = "Name/ID of the backup vault."
  value       = aws_backup_vault.this.id
}

output "vault_arn" {
  description = "ARN of the backup vault (use as a copy_action destination from other regions)."
  value       = aws_backup_vault.this.arn
}

output "vault_recovery_points" {
  description = "Number of recovery points currently stored in the vault."
  value       = aws_backup_vault.this.recovery_points
}

output "plan_id" {
  description = "ID of the backup plan."
  value       = aws_backup_plan.this.id
}

output "plan_arn" {
  description = "ARN of the backup plan."
  value       = aws_backup_plan.this.arn
}

output "plan_version" {
  description = "Unique version ID of the backup plan (changes on every update)."
  value       = aws_backup_plan.this.version
}

output "selection_id" {
  description = "ID of the backup selection."
  value       = aws_backup_selection.this.id
}

output "backup_role_arn" {
  description = "ARN of the IAM role AWS Backup assumes for backup and restore jobs."
  value       = aws_iam_role.backup.arn
}

How to use it

# A customer-managed key for backup data (or reference an existing one).
resource "aws_kms_key" "backup" {
  description             = "CMK for AWS Backup vault"
  enable_key_rotation     = true
  deletion_window_in_days = 30
}

# DR vault in a second region to receive cross-region copies.
resource "aws_backup_vault" "dr" {
  provider    = aws.dr        # aliased provider in eu-west-1
  name        = "prod-app-dr"
  kms_key_arn = aws_kms_key.backup_dr.arn
}

module "backup" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-backup?ref=v1.0.0"

  vault_name  = "prod-app"
  plan_name   = "prod-app-plan"
  kms_key_arn = aws_kms_key.backup.arn

  # Tag-driven selection: any resource tagged backup = "daily" is protected.
  selection_tags = {
    backup = "daily"
  }

  rules = [
    {
      rule_name          = "daily-35d"
      schedule           = "cron(0 5 * * ? *)" # 05:00 UTC every day
      start_window       = 60
      completion_window  = 360
      delete_after       = 35

      # Copy each daily recovery point to the DR region, kept 14 days.
      copy_action = {
        destination_vault_arn = aws_backup_vault.dr.arn
        delete_after          = 14
      }
    },
    {
      rule_name          = "monthly-7y"
      schedule           = "cron(0 6 1 * ? *)" # 06:00 UTC on the 1st
      cold_storage_after = 90
      delete_after       = 2555 # ~7 years
    }
  ]

  # Compliance: prevent recovery points being deleted before 7 days.
  enable_vault_lock             = true
  vault_lock_min_retention_days = 7
  vault_lock_changeable_for_days = 3

  tags = {
    Environment = "prod"
    Team        = "platform"
    CostCenter  = "1042"
  }
}

# Downstream reference: grant a least-privilege role permission to start
# on-demand restore jobs against THIS vault using the module's output ARN.
data "aws_iam_policy_document" "restore_ops" {
  statement {
    effect    = "Allow"
    actions   = ["backup:StartRestoreJob"]
    resources = ["*"]
  }
  statement {
    effect    = "Allow"
    actions   = ["iam:PassRole"]
    resources = [module.backup.backup_role_arn]
  }
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "s3"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...s3 state bucket/container + key per path...
  }
}

2. Module configlive/prod/backup/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-backup?ref=v1.0.0"
}

inputs = {
  vault_name = "..."
  plan_name = "..."
  rules = ["...", "..."]
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/backup && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
vault_name string yes Name of the backup vault and prefix for the IAM role (2-50 chars).
plan_name string yes Name of the backup plan.
kms_key_arn string null no Customer-managed KMS key ARN encrypting the vault. Recommended.
force_destroy bool false no Allow vault deletion while it still holds recovery points.
rules list(object) yes Backup rules (schedule, retention, lifecycle, optional copy_action). At least one required.
selection_tags map(string) {} no Tag key/values; resources with ALL tags are backed up.
selection_resources list(string) [] no Explicit ARNs to include (use [“*”] for all supported resources).
selection_not_resources list(string) [] no ARNs to explicitly exclude.
attach_restore_policy bool true no Also attach the AWS-managed restore policy to the role.
permissions_boundary_arn string null no IAM permissions boundary for the backup service role.
enable_vault_lock bool false no Enable Backup Vault Lock (WORM) on the vault.
vault_lock_changeable_for_days number 3 no Grace period before the vault lock becomes immutable.
vault_lock_min_retention_days number 7 no Minimum retention enforced by the vault lock.
vault_lock_max_retention_days number 36500 no Maximum retention enforced by the vault lock.
enable_windows_vss bool false no Enable Windows VSS application-consistent EC2 backups.
tags map(string) {} no Tags applied to all created resources.

Outputs

Name Description
vault_id Name/ID of the backup vault.
vault_arn ARN of the backup vault (use as a cross-region copy_action destination).
vault_recovery_points Number of recovery points currently stored in the vault.
plan_id ID of the backup plan.
plan_arn ARN of the backup plan.
plan_version Unique version ID of the plan (changes on each update).
selection_id ID of the backup selection.
backup_role_arn ARN of the IAM role AWS Backup assumes for backup/restore jobs.

Enterprise scenario

A financial-services platform runs ~600 RDS, EBS, and DynamoDB resources across 25 application accounts governed by AWS Organizations. The platform team publishes this module once and onboards each account through a stack that sets selection_tags = { backup = "daily" }, so application squads protect a new database simply by tagging it — no backup PRs required. The daily-35d rule copies every recovery point into a hardened DR vault in eu-west-1, and enable_vault_lock with a 7-day minimum retention satisfies the regulator’s ransomware-resilience control by making recent backups immutable. When an auditor asks “prove these backups cannot be deleted early,” the team points at the single vault-lock configuration and the plan_version output captured in state.

Best practices

TerraformAWSBackupModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading