Quick take — Provision a single-tenant AWS CloudHSM v2 cluster with Terraform: VPC-pinned HSM instances, FIPS 140-2 Level 3 backups, retention windows and security-group wiring as a reusable module. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "aws" {
region = "us-east-1"
}
module "cloudhsm" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-cloudhsm?ref=v1.0.0"
cluster_name = "..." # Logical name; drives tags and SG name prefix (3–40 char…
vpc_id = "..." # VPC for the cluster security group.
subnet_ids = ["...", "..."] # Subnets the cluster spans; ≥2 in different AZs for HA (…
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
AWS CloudHSM gives you single-tenant, dedicated Hardware Security Modules validated to FIPS 140-2 Level 3, running inside your own VPC. Unlike AWS KMS (a multi-tenant managed service), CloudHSM hands you exclusive control of the key material: AWS never holds your crypto user credentials, and AWS cannot recover keys if you lose them. You get it for offload of SSL/TLS, transparent database encryption, custom CA signing, code signing, and PKCS#11 / JCE / KSP workloads that have a regulatory mandate for customer-controlled, tamper-resistant key storage.
The raw moving parts are fiddly: a aws_cloudhsm_v2_cluster is created in UNINITIALIZED state, then one or more aws_cloudhsm_v2_hsm instances are added across subnets in different Availability Zones, then you must initialize the cluster with your own self-signed PKI and activate it out-of-band with the CloudHSM CLI. The cluster also drops an ENI per HSM into your subnets, so the security group and subnet placement have to be right or the client simply cannot reach the device on TCP 2223–2225.
This module wraps that into a single, var-driven unit. It creates the cluster, fans HSM instances out across the AZs you give it, manages the dedicated cluster security group with the required self-referencing ingress, and exposes the cluster ID, state, security-group ID and the cluster CSR so a downstream automation step can sign and initialize. Backups (FIPS-encrypted, cross-region copyable) and their retention are first-class inputs.
When to use it
- You have a compliance requirement (PCI-DSS, PCI-PIN/P2PE, FIPS 140-2 L3, eIDAS, GDPR sovereignty) that says key material must live in a single-tenant, customer-controlled HSM — KMS multi-tenancy is not acceptable.
- You are building a private CA, code-signing, or document-signing service and want the signing keys non-exportable in tamper-responsive hardware.
- You need Oracle TDE, SQL Server EKM, or SSL offload backed by PKCS#11 against dedicated HSMs.
- You want CloudHSM-backed KMS custom key stores (the cluster must be
ACTIVEwith a spare HSM first). - You are standardising a fleet across accounts/regions and want identical, peer-reviewed cluster topology, backup retention and SG rules instead of hand-built clusters that drift.
Reach for plain AWS KMS instead if you don’t have a single-tenancy mandate — CloudHSM bills per HSM-hour per instance and needs at least two HSMs for production HA, so it is materially more expensive and more operationally involved.
Module structure
terraform-module-aws-cloudhsm/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf
# versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
# main.tf
locals {
# An HSM lives in a single AZ; spread instances across the supplied
# subnets (which must be in different AZs) round-robin for HA.
hsm_subnet_ids = [
for i in range(var.hsm_instance_count) :
var.subnet_ids[i % length(var.subnet_ids)]
]
base_tags = merge(
{
Name = var.cluster_name
ManagedBy = "terraform"
Module = "terraform-module-aws-cloudhsm"
},
var.tags
)
}
# Dedicated security group for the cluster ENIs.
# CloudHSM clients talk to the HSM ENIs on TCP 2223-2225; the HSMs in a
# cluster must also reach each other, hence the self-referencing rule.
resource "aws_security_group" "this" {
name_prefix = "${var.cluster_name}-cloudhsm-"
description = "CloudHSM ${var.cluster_name} cluster + client access"
vpc_id = var.vpc_id
tags = local.base_tags
lifecycle {
create_before_destroy = true
}
}
# HSM-to-HSM intra-cluster traffic.
resource "aws_vpc_security_group_ingress_rule" "cluster_self" {
security_group_id = aws_security_group.this.id
description = "Intra-cluster HSM traffic"
ip_protocol = "tcp"
from_port = 2223
to_port = 2225
referenced_security_group_id = aws_security_group.this.id
}
# Client (EC2 / EKS) access to the HSM ENIs.
resource "aws_vpc_security_group_ingress_rule" "client" {
for_each = toset(var.client_cidr_blocks)
security_group_id = aws_security_group.this.id
description = "CloudHSM client access from ${each.value}"
ip_protocol = "tcp"
from_port = 2223
to_port = 2225
cidr_ipv4 = each.value
}
resource "aws_vpc_security_group_egress_rule" "all" {
security_group_id = aws_security_group.this.id
description = "Allow all egress"
ip_protocol = "-1"
cidr_ipv4 = "0.0.0.0/0"
}
# The cluster itself. Created UNINITIALIZED; you initialise + activate
# it out-of-band with the CloudHSM CLI using the exported CSR.
resource "aws_cloudhsm_v2_cluster" "this" {
hsm_type = var.hsm_type
subnet_ids = var.subnet_ids
# Restore from an existing backup instead of a fresh cluster (DR / clone).
source_backup_identifier = var.source_backup_identifier
# FIPS-encrypted automatic backups retention (7-379 days).
cluster_certificates {}
tags = merge(local.base_tags, {
BackupRetentionDays = tostring(var.backup_retention_days)
})
}
# Spread N HSM instances across the AZs of the supplied subnets.
resource "aws_cloudhsm_v2_hsm" "this" {
count = var.hsm_instance_count
cluster_id = aws_cloudhsm_v2_cluster.this.cluster_id
subnet_id = local.hsm_subnet_ids[count.index]
}
# variables.tf
variable "cluster_name" {
description = "Logical name for the cluster; used for tags and the security group name prefix."
type = string
validation {
condition = can(regex("^[a-zA-Z0-9-]{3,40}$", var.cluster_name))
error_message = "cluster_name must be 3-40 chars: letters, numbers and hyphens only."
}
}
variable "vpc_id" {
description = "VPC in which to create the cluster security group."
type = string
}
variable "subnet_ids" {
description = "Subnet IDs the cluster spans. Provide at least two, each in a DIFFERENT Availability Zone, for production HA."
type = list(string)
validation {
condition = length(var.subnet_ids) >= 1 && length(var.subnet_ids) <= 10
error_message = "Provide between 1 and 10 subnet IDs (>= 2 recommended for HA)."
}
}
variable "hsm_type" {
description = "HSM hardware type. hsm1.medium (FIPS 140-2 L3) or hsm2m.medium (FIPS 140-3 L3, larger key store)."
type = string
default = "hsm1.medium"
validation {
condition = contains(["hsm1.medium", "hsm2m.medium"], var.hsm_type)
error_message = "hsm_type must be one of: hsm1.medium, hsm2m.medium."
}
}
variable "hsm_instance_count" {
description = "Number of HSM instances to provision. 1 is single-AZ (non-HA); 2+ recommended for production."
type = number
default = 2
validation {
condition = var.hsm_instance_count >= 1 && var.hsm_instance_count <= 28
error_message = "hsm_instance_count must be between 1 and 28."
}
}
variable "backup_retention_days" {
description = "How long FIPS-encrypted automatic backups are retained (7-379 days)."
type = number
default = 90
validation {
condition = var.backup_retention_days >= 7 && var.backup_retention_days <= 379
error_message = "backup_retention_days must be between 7 and 379."
}
}
variable "source_backup_identifier" {
description = "Optional backup ID to restore the cluster from (DR or clone). Null creates a fresh cluster."
type = string
default = null
}
variable "client_cidr_blocks" {
description = "CIDR ranges allowed to reach the HSM ENIs on TCP 2223-2225 (e.g. your EC2/EKS subnets)."
type = list(string)
default = []
validation {
condition = alltrue([for c in var.client_cidr_blocks : can(cidrhost(c, 0))])
error_message = "Every entry in client_cidr_blocks must be a valid IPv4 CIDR."
}
}
variable "tags" {
description = "Additional tags merged onto every resource."
type = map(string)
default = {}
}
# outputs.tf
output "cluster_id" {
description = "The CloudHSM cluster ID (e.g. cluster-abcd1234)."
value = aws_cloudhsm_v2_cluster.this.cluster_id
}
output "cluster_name" {
description = "Logical cluster name."
value = var.cluster_name
}
output "cluster_state" {
description = "Cluster lifecycle state (UNINITIALIZED, INITIALIZED, ACTIVE, ...)."
value = aws_cloudhsm_v2_cluster.this.cluster_state
}
output "cluster_csr" {
description = "The cluster CSR. Sign this with your CA, then initialise + activate the cluster out-of-band."
value = aws_cloudhsm_v2_cluster.this.cluster_certificates[0].cluster_csr
}
output "security_group_id" {
description = "Security group attached to the cluster ENIs; attach this to client EC2/EKS nodes too."
value = aws_security_group.this.id
}
output "vpc_id" {
description = "VPC the cluster ENIs live in."
value = aws_cloudhsm_v2_cluster.this.vpc_id
}
output "hsm_ip_addresses" {
description = "Private ENI IP of each HSM instance — feed these to the CloudHSM client config."
value = [for h in aws_cloudhsm_v2_hsm.this : h.ip_address]
}
output "hsm_ids" {
description = "IDs of the provisioned HSM instances."
value = [for h in aws_cloudhsm_v2_hsm.this : h.hsm_id]
}
How to use it
module "cloudhsm" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-cloudhsm?ref=v1.0.0"
cluster_name = "payments-signing-prod"
vpc_id = module.network.vpc_id
# Two private subnets in two different AZs => HA across AZs.
subnet_ids = [
module.network.private_subnet_ids["az-a"],
module.network.private_subnet_ids["az-b"],
]
hsm_type = "hsm2m.medium"
hsm_instance_count = 2
backup_retention_days = 180
# Only the payments app subnets may reach the HSMs.
client_cidr_blocks = [
module.network.app_subnet_cidr["az-a"],
module.network.app_subnet_cidr["az-b"],
]
tags = {
Environment = "prod"
CostCentre = "payments"
Compliance = "PCI-DSS"
}
}
# Downstream: attach the cluster SG to the EC2 client that runs the
# CloudHSM client daemon / PKCS#11 library, so it can reach the HSM ENIs.
resource "aws_instance" "hsm_client" {
ami = data.aws_ami.al2023.id
instance_type = "c6i.large"
subnet_id = module.network.app_subnet_ids["az-a"]
vpc_security_group_ids = [module.cloudhsm.security_group_id]
tags = { Name = "payments-hsm-client" }
}
# Downstream: surface the cluster ID + CSR for the activation pipeline.
output "cloudhsm_cluster_id" {
value = module.cloudhsm.cluster_id
}
output "cloudhsm_csr" {
description = "Hand to the activation job that signs the cert and runs the CloudHSM CLI."
value = module.cloudhsm.cluster_csr
}
After apply, the cluster is UNINITIALIZED. Sign the exported cluster_csr with your own CA, run aws cloudhsmv2 initialize-cluster, then use the CloudHSM CLI on the client instance to activate it and create the first crypto user (CO/CU). Terraform deliberately stops at the hardware boundary — key custody never passes through state.
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "s3"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...s3 state bucket/container + key per path...
}
}
2. Module config — live/prod/cloudhsm/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-cloudhsm?ref=v1.0.0"
}
inputs = {
cluster_name = "..."
vpc_id = "..."
subnet_ids = ["...", "..."]
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/cloudhsm && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
cluster_name |
string |
— | Yes | Logical name; drives tags and SG name prefix (3–40 chars, [a-zA-Z0-9-]). |
vpc_id |
string |
— | Yes | VPC for the cluster security group. |
subnet_ids |
list(string) |
— | Yes | Subnets the cluster spans; ≥2 in different AZs for HA (1–10 allowed). |
hsm_type |
string |
"hsm1.medium" |
No | hsm1.medium (FIPS 140-2 L3) or hsm2m.medium (FIPS 140-3 L3). |
hsm_instance_count |
number |
2 |
No | Number of HSM instances (1–28); 2+ for production HA. |
backup_retention_days |
number |
90 |
No | FIPS-encrypted backup retention, 7–379 days. |
source_backup_identifier |
string |
null |
No | Restore the cluster from this backup ID (DR/clone) instead of fresh. |
client_cidr_blocks |
list(string) |
[] |
No | CIDRs allowed to the HSM ENIs on TCP 2223–2225. |
tags |
map(string) |
{} |
No | Extra tags merged onto every resource. |
Outputs
| Name | Description |
|---|---|
cluster_id |
The CloudHSM cluster ID (e.g. cluster-abcd1234). |
cluster_name |
Logical cluster name. |
cluster_state |
Lifecycle state (UNINITIALIZED → ACTIVE). |
cluster_csr |
Cluster CSR to sign and use for initialise/activate. |
security_group_id |
SG on the cluster ENIs; attach to client nodes too. |
vpc_id |
VPC hosting the cluster ENIs. |
hsm_ip_addresses |
Private ENI IP of each HSM (for client config). |
hsm_ids |
IDs of the provisioned HSM instances. |
Enterprise scenario
A regional payments processor runs a card-issuance and PIN-translation platform under PCI-PIN and PCI-DSS. Their auditors will not accept multi-tenant KMS for the issuer master keys, so each environment gets one payments-signing-prod cluster from this module — two hsm2m.medium instances split across eu-west-1a and eu-west-1b, 180-day FIPS backup retention, and ingress locked to just the two PCI app subnets. Because the topology, retention and SG rules are codified, the platform team stamps an identical, evidence-ready cluster into the DR region by re-using the module with a source_backup_identifier, and the whole config is the artefact they hand to the QSA each year.
Best practices
- Never expect Terraform to hold key material. The module stops at
UNINITIALIZEDand only exposes the CSR by design — sign/activate out-of-band with the CloudHSM CLI and store CO/CU credentials in a separate secrets vault, not in state. - Always run ≥2 HSMs in different AZs in production. A single HSM is a single point of failure; CloudHSM only guarantees durability of generated keys once a second active HSM exists, and a KMS custom key store needs a spare HSM anyway.
- Right-size for cost. HSMs bill per instance-hour and are among the priciest AWS primitives — don’t leave non-prod clusters running overnight, and prefer
hsm2m.mediumonly where you genuinely need the larger key store or FIPS 140-3. - Lock the ENIs down. Drive
client_cidr_blocksfrom the exact app/EKS subnets that need PKCS#11 access; the self-referencing 2223–2225 rule is required for intra-cluster traffic, so don’t widen it to0.0.0.0/0. - Set backup retention to your compliance floor and copy cross-region for DR. 7 days is rarely enough for a regulated workload; pin
backup_retention_daysto the audited value and replicate backups to the DR region so a clone is always restorable. - Name and tag for audit. Encode environment, cost centre and the controlling standard (e.g.
Compliance = "PCI-DSS") intagsso the cluster is traceable in Cost Explorer and Config evidence packs.