IaC AWS

Terraform Module: AWS CloudHSM — FIPS 140-2 Level 3 key custody as repeatable code

Quick take — Provision a single-tenant AWS CloudHSM v2 cluster with Terraform: VPC-pinned HSM instances, FIPS 140-2 Level 3 backups, retention windows and security-group wiring as a reusable module. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "aws" {
  region = "us-east-1"
}

module "cloudhsm" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-cloudhsm?ref=v1.0.0"

  cluster_name = "..."           # Logical name; drives tags and SG name prefix (3–40 char…
  vpc_id       = "..."           # VPC for the cluster security group.
  subnet_ids   = ["...", "..."]  # Subnets the cluster spans; ≥2 in different AZs for HA (…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

AWS CloudHSM gives you single-tenant, dedicated Hardware Security Modules validated to FIPS 140-2 Level 3, running inside your own VPC. Unlike AWS KMS (a multi-tenant managed service), CloudHSM hands you exclusive control of the key material: AWS never holds your crypto user credentials, and AWS cannot recover keys if you lose them. You get it for offload of SSL/TLS, transparent database encryption, custom CA signing, code signing, and PKCS#11 / JCE / KSP workloads that have a regulatory mandate for customer-controlled, tamper-resistant key storage.

The raw moving parts are fiddly: a aws_cloudhsm_v2_cluster is created in UNINITIALIZED state, then one or more aws_cloudhsm_v2_hsm instances are added across subnets in different Availability Zones, then you must initialize the cluster with your own self-signed PKI and activate it out-of-band with the CloudHSM CLI. The cluster also drops an ENI per HSM into your subnets, so the security group and subnet placement have to be right or the client simply cannot reach the device on TCP 2223–2225.

This module wraps that into a single, var-driven unit. It creates the cluster, fans HSM instances out across the AZs you give it, manages the dedicated cluster security group with the required self-referencing ingress, and exposes the cluster ID, state, security-group ID and the cluster CSR so a downstream automation step can sign and initialize. Backups (FIPS-encrypted, cross-region copyable) and their retention are first-class inputs.

When to use it

Reach for plain AWS KMS instead if you don’t have a single-tenancy mandate — CloudHSM bills per HSM-hour per instance and needs at least two HSMs for production HA, so it is materially more expensive and more operationally involved.

Module structure

terraform-module-aws-cloudhsm/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf
# versions.tf
terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}
# main.tf
locals {
  # An HSM lives in a single AZ; spread instances across the supplied
  # subnets (which must be in different AZs) round-robin for HA.
  hsm_subnet_ids = [
    for i in range(var.hsm_instance_count) :
    var.subnet_ids[i % length(var.subnet_ids)]
  ]

  base_tags = merge(
    {
      Name      = var.cluster_name
      ManagedBy = "terraform"
      Module    = "terraform-module-aws-cloudhsm"
    },
    var.tags
  )
}

# Dedicated security group for the cluster ENIs.
# CloudHSM clients talk to the HSM ENIs on TCP 2223-2225; the HSMs in a
# cluster must also reach each other, hence the self-referencing rule.
resource "aws_security_group" "this" {
  name_prefix = "${var.cluster_name}-cloudhsm-"
  description = "CloudHSM ${var.cluster_name} cluster + client access"
  vpc_id      = var.vpc_id

  tags = local.base_tags

  lifecycle {
    create_before_destroy = true
  }
}

# HSM-to-HSM intra-cluster traffic.
resource "aws_vpc_security_group_ingress_rule" "cluster_self" {
  security_group_id            = aws_security_group.this.id
  description                  = "Intra-cluster HSM traffic"
  ip_protocol                  = "tcp"
  from_port                    = 2223
  to_port                      = 2225
  referenced_security_group_id = aws_security_group.this.id
}

# Client (EC2 / EKS) access to the HSM ENIs.
resource "aws_vpc_security_group_ingress_rule" "client" {
  for_each = toset(var.client_cidr_blocks)

  security_group_id = aws_security_group.this.id
  description       = "CloudHSM client access from ${each.value}"
  ip_protocol       = "tcp"
  from_port         = 2223
  to_port           = 2225
  cidr_ipv4         = each.value
}

resource "aws_vpc_security_group_egress_rule" "all" {
  security_group_id = aws_security_group.this.id
  description       = "Allow all egress"
  ip_protocol       = "-1"
  cidr_ipv4         = "0.0.0.0/0"
}

# The cluster itself. Created UNINITIALIZED; you initialise + activate
# it out-of-band with the CloudHSM CLI using the exported CSR.
resource "aws_cloudhsm_v2_cluster" "this" {
  hsm_type   = var.hsm_type
  subnet_ids = var.subnet_ids

  # Restore from an existing backup instead of a fresh cluster (DR / clone).
  source_backup_identifier = var.source_backup_identifier

  # FIPS-encrypted automatic backups retention (7-379 days).
  cluster_certificates {}

  tags = merge(local.base_tags, {
    BackupRetentionDays = tostring(var.backup_retention_days)
  })
}

# Spread N HSM instances across the AZs of the supplied subnets.
resource "aws_cloudhsm_v2_hsm" "this" {
  count = var.hsm_instance_count

  cluster_id = aws_cloudhsm_v2_cluster.this.cluster_id
  subnet_id  = local.hsm_subnet_ids[count.index]
}
# variables.tf
variable "cluster_name" {
  description = "Logical name for the cluster; used for tags and the security group name prefix."
  type        = string

  validation {
    condition     = can(regex("^[a-zA-Z0-9-]{3,40}$", var.cluster_name))
    error_message = "cluster_name must be 3-40 chars: letters, numbers and hyphens only."
  }
}

variable "vpc_id" {
  description = "VPC in which to create the cluster security group."
  type        = string
}

variable "subnet_ids" {
  description = "Subnet IDs the cluster spans. Provide at least two, each in a DIFFERENT Availability Zone, for production HA."
  type        = list(string)

  validation {
    condition     = length(var.subnet_ids) >= 1 && length(var.subnet_ids) <= 10
    error_message = "Provide between 1 and 10 subnet IDs (>= 2 recommended for HA)."
  }
}

variable "hsm_type" {
  description = "HSM hardware type. hsm1.medium (FIPS 140-2 L3) or hsm2m.medium (FIPS 140-3 L3, larger key store)."
  type        = string
  default     = "hsm1.medium"

  validation {
    condition     = contains(["hsm1.medium", "hsm2m.medium"], var.hsm_type)
    error_message = "hsm_type must be one of: hsm1.medium, hsm2m.medium."
  }
}

variable "hsm_instance_count" {
  description = "Number of HSM instances to provision. 1 is single-AZ (non-HA); 2+ recommended for production."
  type        = number
  default     = 2

  validation {
    condition     = var.hsm_instance_count >= 1 && var.hsm_instance_count <= 28
    error_message = "hsm_instance_count must be between 1 and 28."
  }
}

variable "backup_retention_days" {
  description = "How long FIPS-encrypted automatic backups are retained (7-379 days)."
  type        = number
  default     = 90

  validation {
    condition     = var.backup_retention_days >= 7 && var.backup_retention_days <= 379
    error_message = "backup_retention_days must be between 7 and 379."
  }
}

variable "source_backup_identifier" {
  description = "Optional backup ID to restore the cluster from (DR or clone). Null creates a fresh cluster."
  type        = string
  default     = null
}

variable "client_cidr_blocks" {
  description = "CIDR ranges allowed to reach the HSM ENIs on TCP 2223-2225 (e.g. your EC2/EKS subnets)."
  type        = list(string)
  default     = []

  validation {
    condition     = alltrue([for c in var.client_cidr_blocks : can(cidrhost(c, 0))])
    error_message = "Every entry in client_cidr_blocks must be a valid IPv4 CIDR."
  }
}

variable "tags" {
  description = "Additional tags merged onto every resource."
  type        = map(string)
  default     = {}
}
# outputs.tf
output "cluster_id" {
  description = "The CloudHSM cluster ID (e.g. cluster-abcd1234)."
  value       = aws_cloudhsm_v2_cluster.this.cluster_id
}

output "cluster_name" {
  description = "Logical cluster name."
  value       = var.cluster_name
}

output "cluster_state" {
  description = "Cluster lifecycle state (UNINITIALIZED, INITIALIZED, ACTIVE, ...)."
  value       = aws_cloudhsm_v2_cluster.this.cluster_state
}

output "cluster_csr" {
  description = "The cluster CSR. Sign this with your CA, then initialise + activate the cluster out-of-band."
  value       = aws_cloudhsm_v2_cluster.this.cluster_certificates[0].cluster_csr
}

output "security_group_id" {
  description = "Security group attached to the cluster ENIs; attach this to client EC2/EKS nodes too."
  value       = aws_security_group.this.id
}

output "vpc_id" {
  description = "VPC the cluster ENIs live in."
  value       = aws_cloudhsm_v2_cluster.this.vpc_id
}

output "hsm_ip_addresses" {
  description = "Private ENI IP of each HSM instance — feed these to the CloudHSM client config."
  value       = [for h in aws_cloudhsm_v2_hsm.this : h.ip_address]
}

output "hsm_ids" {
  description = "IDs of the provisioned HSM instances."
  value       = [for h in aws_cloudhsm_v2_hsm.this : h.hsm_id]
}

How to use it

module "cloudhsm" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-cloudhsm?ref=v1.0.0"

  cluster_name = "payments-signing-prod"
  vpc_id       = module.network.vpc_id

  # Two private subnets in two different AZs => HA across AZs.
  subnet_ids = [
    module.network.private_subnet_ids["az-a"],
    module.network.private_subnet_ids["az-b"],
  ]

  hsm_type           = "hsm2m.medium"
  hsm_instance_count = 2
  backup_retention_days = 180

  # Only the payments app subnets may reach the HSMs.
  client_cidr_blocks = [
    module.network.app_subnet_cidr["az-a"],
    module.network.app_subnet_cidr["az-b"],
  ]

  tags = {
    Environment = "prod"
    CostCentre  = "payments"
    Compliance  = "PCI-DSS"
  }
}

# Downstream: attach the cluster SG to the EC2 client that runs the
# CloudHSM client daemon / PKCS#11 library, so it can reach the HSM ENIs.
resource "aws_instance" "hsm_client" {
  ami                    = data.aws_ami.al2023.id
  instance_type          = "c6i.large"
  subnet_id              = module.network.app_subnet_ids["az-a"]
  vpc_security_group_ids = [module.cloudhsm.security_group_id]

  tags = { Name = "payments-hsm-client" }
}

# Downstream: surface the cluster ID + CSR for the activation pipeline.
output "cloudhsm_cluster_id" {
  value = module.cloudhsm.cluster_id
}

output "cloudhsm_csr" {
  description = "Hand to the activation job that signs the cert and runs the CloudHSM CLI."
  value       = module.cloudhsm.cluster_csr
}

After apply, the cluster is UNINITIALIZED. Sign the exported cluster_csr with your own CA, run aws cloudhsmv2 initialize-cluster, then use the CloudHSM CLI on the client instance to activate it and create the first crypto user (CO/CU). Terraform deliberately stops at the hardware boundary — key custody never passes through state.

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "s3"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...s3 state bucket/container + key per path...
  }
}

2. Module configlive/prod/cloudhsm/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-cloudhsm?ref=v1.0.0"
}

inputs = {
  cluster_name = "..."
  vpc_id = "..."
  subnet_ids = ["...", "..."]
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/cloudhsm && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
cluster_name string Yes Logical name; drives tags and SG name prefix (3–40 chars, [a-zA-Z0-9-]).
vpc_id string Yes VPC for the cluster security group.
subnet_ids list(string) Yes Subnets the cluster spans; ≥2 in different AZs for HA (1–10 allowed).
hsm_type string "hsm1.medium" No hsm1.medium (FIPS 140-2 L3) or hsm2m.medium (FIPS 140-3 L3).
hsm_instance_count number 2 No Number of HSM instances (1–28); 2+ for production HA.
backup_retention_days number 90 No FIPS-encrypted backup retention, 7–379 days.
source_backup_identifier string null No Restore the cluster from this backup ID (DR/clone) instead of fresh.
client_cidr_blocks list(string) [] No CIDRs allowed to the HSM ENIs on TCP 2223–2225.
tags map(string) {} No Extra tags merged onto every resource.

Outputs

Name Description
cluster_id The CloudHSM cluster ID (e.g. cluster-abcd1234).
cluster_name Logical cluster name.
cluster_state Lifecycle state (UNINITIALIZEDACTIVE).
cluster_csr Cluster CSR to sign and use for initialise/activate.
security_group_id SG on the cluster ENIs; attach to client nodes too.
vpc_id VPC hosting the cluster ENIs.
hsm_ip_addresses Private ENI IP of each HSM (for client config).
hsm_ids IDs of the provisioned HSM instances.

Enterprise scenario

A regional payments processor runs a card-issuance and PIN-translation platform under PCI-PIN and PCI-DSS. Their auditors will not accept multi-tenant KMS for the issuer master keys, so each environment gets one payments-signing-prod cluster from this module — two hsm2m.medium instances split across eu-west-1a and eu-west-1b, 180-day FIPS backup retention, and ingress locked to just the two PCI app subnets. Because the topology, retention and SG rules are codified, the platform team stamps an identical, evidence-ready cluster into the DR region by re-using the module with a source_backup_identifier, and the whole config is the artefact they hand to the QSA each year.

Best practices

TerraformAWSCloudHSMModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading