Every byte you write to Azure is already encrypted at rest with Microsoft-managed keys. That fact lulls a lot of teams into stopping there. This article is about the next two rungs of the ladder: replacing the platform key with a customer-managed key (CMK) you control, and stacking a second independent encryption layer underneath it so that a single compromised key never exposes plaintext.
We will wire CMK across Storage, managed disks, and databases, anchor the keys in a FIPS 140-2 Level 3 Managed HSM, automate rotation, and rehearse the failure modes that actually hurt: a deleted key, a regional outage, a broken identity grant.
1. The three encryption layers, and what each actually buys you
Azure encryption at rest is layered. Understanding which layer a control belongs to is the difference between a defensible design and cargo-culted config.
| Layer | What it protects | Who holds the key | Threat it addresses |
|---|---|---|---|
| Platform (default) | All data at rest | Microsoft | Lost/stolen physical media |
| Customer-managed key (CMK) | The DEK that encrypts your data | You, in Key Vault / Managed HSM | Insider access, compliance separation of duties, instant revocation |
| Infrastructure encryption (double) | A second, independent AES-256 pass | Microsoft (separate key) | Cryptographic failure or implementation flaw in a single layer |
The mental model is envelope encryption. Your data is encrypted with a data encryption key (DEK). The DEK is wrapped by a key encryption key (KEK) — that KEK is your CMK. Revoke or delete the CMK and the DEK can no longer be unwrapped, so the data is cryptographically inaccessible even though the ciphertext still physically exists. That is the entire point of CMK: you can render data unreadable on your terms, without Microsoft in the loop.
Infrastructure encryption is orthogonal. It adds a second AES-256 encryption at the storage-infrastructure level using a Microsoft-managed key, applied in addition to the service-level encryption. Two independent keys, two independent algorithms-in-practice. It must be enabled at resource creation — you cannot retrofit it.
Callout: CMK and infrastructure encryption are not substitutes. CMK gives you control; infrastructure encryption gives you defense in depth. Regulated workloads usually want both.
2. Choosing a key store: Key Vault Premium vs Managed HSM
Both Key Vault Premium and Managed HSM give you HSM-backed keys. The difference is the boundary and the assurance level.
| Key Vault Premium | Managed HSM | |
|---|---|---|
| HSM model | Shared, multi-tenant HSM | Single-tenant, dedicated HSM pool |
| FIPS validation | FIPS 140-2 Level 2 | FIPS 140-2 Level 3 |
| Admin model | Azure RBAC / access policies | Local RBAC (data-plane), separate from control plane |
| Key ceremony / BYOK | Supported | Supported, with security-domain export |
| Cost | Per-operation | Provisioned (always-on pool) |
Pick Managed HSM when you need Level 3, full single-tenancy, or strict separation between Azure subscription admins and key administrators (the security domain means even a global admin cannot exfiltrate your keys). Pick Key Vault Premium when Level 2 satisfies your auditors and you want pay-per-use economics.
Provisioning a Managed HSM requires an activation step where you supply RSA public keys for the quorum of administrators who hold the security domain.
# Create the Managed HSM (control plane). It starts in a provisioned-but-not-activated state.
az keyvault create \
--hsm-name "kv-hsm-prod" \
--resource-group "rg-security" \
--location "eastus2" \
--retention-days 90 \
--administrators "$(az ad signed-in-user show --query id -o tsv)"
# Generate three RSA key pairs for the security-domain quorum, then activate.
# 'quorum 2' means any 2 of the 3 holders can recover the HSM.
az keyvault security-domain download \
--hsm-name "kv-hsm-prod" \
--sd-wrapping-keys cert1.cer cert2.cer cert3.cer \
--sd-quorum 2 \
--security-domain-file "kv-hsm-prod-SD.json"
The downloaded security-domain file is the crown jewel. Store it offline, split across the quorum holders. Losing it past the quorum threshold means the HSM is unrecoverable by anyone, including Microsoft.
Grant a key-management role on the data plane (Managed HSM uses its own local RBAC, not subscription RBAC):
az keyvault role assignment create \
--hsm-name "kv-hsm-prod" \
--role "Managed HSM Crypto Officer" \
--assignee "$(az ad signed-in-user show --query id -o tsv)" \
--scope "/keys"
3. CMK for Storage and managed disks
3.1 Create the key
Create an RSA key in the HSM. Use RSA-HSM (or RSA 3072+) for wrapping; storage CMK supports RSA.
az keyvault key create \
--hsm-name "kv-hsm-prod" \
--name "cmk-storage" \
--kty RSA-HSM \
--size 3072 \
--ops wrapKey unwrapKey
3.2 Storage account with CMK and a user-assigned identity
The clean pattern is a user-assigned managed identity that you grant access before the storage account references the key. This avoids the chicken-and-egg problem you hit with system-assigned identities.
resource "azurerm_user_assigned_identity" "storage_cmk" {
name = "id-storage-cmk"
resource_group_name = azurerm_resource_group.sec.name
location = azurerm_resource_group.sec.location
}
# Grant the identity crypto rights on the HSM (local RBAC role).
resource "azurerm_key_vault_managed_hardware_security_module_role_assignment" "storage" {
managed_hsm_id = azurerm_key_vault_managed_hardware_security_module.prod.id
name = "00000000-0000-0000-0000-000000000abc"
scope = "/keys"
role_definition_id = "/providers/Microsoft.KeyVault/providers/Microsoft.Authorization/roleDefinitions/21dbd100-6940-42c2-9190-5d6cb909625b" # Managed HSM Crypto User
principal_id = azurerm_user_assigned_identity.storage_cmk.principal_id
}
resource "azurerm_storage_account" "data" {
name = "stkvdataprod"
resource_group_name = azurerm_resource_group.sec.name
location = azurerm_resource_group.sec.location
account_tier = "Standard"
account_replication_type = "GRS"
min_tls_version = "TLS1_2"
infrastructure_encryption_enabled = true # double encryption, must be set at creation
identity {
type = "UserAssigned"
identity_ids = [azurerm_user_assigned_identity.storage_cmk.id]
}
customer_managed_key {
managed_hsm_key_id = azurerm_key_vault_managed_hardware_security_module_key.cmk_storage.versionless_id
user_assigned_identity_id = azurerm_user_assigned_identity.storage_cmk.id
}
}
Two things worth calling out. First, infrastructure_encryption_enabled = true is the double-encryption switch and is immutable after creation. Second, referencing the versionless key ID is what enables automatic key-version rotation: when the key rolls to a new version, Storage picks it up without you touching the account.
3.3 Managed disks via Disk Encryption Sets
Managed disks do not reference Key Vault directly. They go through a Disk Encryption Set (DES), which holds the identity and key binding. The DES supports three encryption types — pick EncryptionAtRestWithCustomerKey for CMK, or EncryptionAtRestWithPlatformAndCustomerKeys for double encryption (platform key + CMK).
az disk-encryption-set create \
--name "des-prod" \
--resource-group "rg-security" \
--key-url "https://kv-hsm-prod.managedhsm.azure.net/keys/cmk-disks/<version>" \
--encryption-type "EncryptionAtRestWithPlatformAndCustomerKeys" \
--mi-system-assigned
# Grant the DES identity crypto rights, then create a disk bound to it.
DES_PRINCIPAL=$(az disk-encryption-set show -n des-prod -g rg-security --query identity.principalId -o tsv)
az keyvault role assignment create \
--hsm-name "kv-hsm-prod" \
--role "Managed HSM Crypto Service Encryption User" \
--assignee "$DES_PRINCIPAL" \
--scope "/keys"
az disk create \
--name "osdisk-app01" \
--resource-group "rg-security" \
--size-gb 128 \
--disk-encryption-set "des-prod"
Note the DES key-url here is versioned. For auto-rotation on disks, set the DES to rotate to the latest version with az disk-encryption-set update --enable-auto-key-rotation true.
4. CMK for Azure SQL and PostgreSQL (TDE with BYOK)
Azure SQL Database uses Transparent Data Encryption (TDE). By default the TDE protector is service-managed; BYOK swaps it for your CMK. The server’s managed identity needs get, wrapKey, and unwrapKey on the key.
# Assign the SQL server's identity rights on the HSM
SQL_PRINCIPAL=$(az sql server show -n sql-prod -g rg-security --query identity.principalId -o tsv)
az keyvault role assignment create \
--hsm-name "kv-hsm-prod" \
--role "Managed HSM Crypto Service Encryption User" \
--assignee "$SQL_PRINCIPAL" \
--scope "/keys"
# Register the key with the server, then promote it to the active TDE protector
az sql server key create \
--server "sql-prod" \
--resource-group "rg-security" \
--kid "https://kv-hsm-prod.managedhsm.azure.net/keys/cmk-sql/<version>"
az sql server tde-key set \
--server "sql-prod" \
--resource-group "rg-security" \
--server-key-type "AzureKeyVault" \
--kid "https://kv-hsm-prod.managedhsm.azure.net/keys/cmk-sql/<version>"
For automatic key-version rotation, enable it on the server’s TDE protector so a new key version is adopted without re-running the set command:
az sql server update -n sql-prod -g rg-security --assign-identity
# Then enable auto-rotation of the TDE protector key version:
az sql server tde-key set \
--server "sql-prod" --resource-group "rg-security" \
--server-key-type "AzureKeyVault" \
--auto-rotation-enabled true \
--kid "https://kv-hsm-prod.managedhsm.azure.net/keys/cmk-sql/<version>"
Azure Database for PostgreSQL Flexible Server follows the same envelope pattern: a user-assigned identity, get/wrapKey/unwrapKey on the key, and the CMK configured at server level. It is set with az postgres flexible-server create --key <key-id> --identity <uami> .... The operational caveat is the same across all of these: if the key becomes inaccessible, the database transitions to an Inaccessible state and goes offline until access is restored.
5. Encryption scopes for per-container key isolation
A single CMK on a storage account is coarse. Encryption scopes let you bind different keys (or the platform key) to individual blob containers or even individual blobs — useful for multi-tenant blob stores where each tenant demands key isolation.
# Create a scope backed by a dedicated CMK
az storage account encryption-scope create \
--account-name "stkvdataprod" \
--name "tenant-acme-scope" \
--key-source "Microsoft.KeyVault" \
--key-uri "https://kv-hsm-prod.managedhsm.azure.net/keys/cmk-tenant-acme/<version>"
# Create a container that defaults to that scope and forbids overrides
az storage container create \
--account-name "stkvdataprod" \
--name "acme-data" \
--default-encryption-scope "tenant-acme-scope" \
--prevent-encryption-scope-override true \
--auth-mode login
--prevent-encryption-scope-override true is the control that matters: it stops a caller from writing a blob under a different scope, guaranteeing every object in the container uses the tenant’s key. You can also enable infrastructure (double) encryption per scope with --require-infrastructure-encryption true at creation.
6. BYOK import and key escrow
CMK does not require Microsoft to generate your key. With BYOK you can import a key generated in your own on-prem HSM, wrapped so the plaintext key never transits in the clear. The workflow: pull a wrapping key (KEK) from the target HSM, wrap your target key with it inside your on-prem HSM, then upload the wrapped blob.
# 1. Create a non-exportable RSA-HSM KEK in the target HSM to wrap with
az keyvault key create --hsm-name "kv-hsm-prod" --name "byok-kek" \
--kty RSA-HSM --size 4096 --ops import
# 2. (On your on-prem HSM) wrap your target key with the KEK's public key using
# the CKM_RSA_AES_KEY_WRAP mechanism, producing key-to-import.byok
# 3. Import the wrapped blob — plaintext key material never leaves your HSM
az keyvault key import --hsm-name "kv-hsm-prod" --name "cmk-imported" \
--byok-file "key-to-import.byok"
Escrow consideration: once you own key generation, you own key loss. If your only copy of an imported key lives in one HSM and you delete it past purge protection, the data is gone. Maintain an offline escrow copy of the source key material under the same controls as your security domain, and document who can reconstitute it.
7. Confidential computing and Secure Key Release
For confidential VMs and confidential containers, you can gate a key so it is only released to a workload that proves, via hardware attestation, that it is running in a genuine trusted execution environment with the expected measurements. This is Secure Key Release (SKR).
The mechanism: mark the key as exportable and attach a release policy that references Microsoft Azure Attestation (MAA). The workload obtains an attestation token from MAA, presents it, and the HSM releases the wrapped key only if the token’s claims satisfy the policy.
{
"version": "1.0.0",
"anyOf": [
{
"authority": "https://sharedeus2.eus2.attest.azure.net",
"allOf": [
{ "claim": "x-ms-isolation-tee.x-ms-attestation-type", "equals": "sevsnpvm" },
{ "claim": "x-ms-isolation-tee.x-ms-compliance-status", "equals": "azure-compliant-cvm" }
]
}
]
}
az keyvault key create --hsm-name "kv-hsm-prod" --name "skr-key" \
--kty RSA-HSM --size 3072 --exportable true \
--policy "skr-release-policy.json"
The key is exportable: true but that does not mean anyone can read it — export is only ever the wrapped form, and only to a caller whose attestation satisfies the policy. This is how you bind a decryption capability to a verified, measured workload rather than to a static identity.
Enterprise scenario
A payments platform we ran enabled CMK with auto key-version rotation on a tier-0 Azure SQL server, sourcing the TDE protector from Managed HSM. Standard pattern, passed audit. Three months in, the HSM Crypto Officer who had originally been granted access offboarded, and an IAM cleanup job removed their now-orphaned local RBAC assignments. Nothing broke — until the next automatic key rotation. The new key version was created, but the SQL server’s user-assigned identity had only ever been granted Crypto Service Encryption User on the specific key, not at /keys scope. The freshly minted version inherited no assignment the server could resolve, the unwrap failed, and the database flipped to Inaccessible. Failover groups did not help — the replica wrapped against the same HSM key.
The fix had two parts. First, grant the encryption identity at the collection scope so every future version is covered, not just the one present at setup:
az keyvault role assignment create \
--hsm-name "kv-hsm-prod" \
--role "Managed HSM Crypto Service Encryption User" \
--assignee-object-id "$SQL_PRINCIPAL" \
--assignee-principal-type "ServicePrincipal" \
--scope "/keys"
Second, we added an Azure Monitor alert on the HSM’s KeyNearExpiry and on any unwrapKey failure, plus a synthetic probe that runs az sql db show --query status against a canary database every five minutes. The real lesson: with CMK you have moved a hard dependency into your own IAM blast radius. Any identity-hygiene automation that touches the HSM is now a potential outage trigger, so key grants must be scoped to survive rotation and excluded from generic cleanup jobs.
Verify
Confirm each layer is actually doing what you configured.
# Storage: CMK source + infrastructure (double) encryption both on
az storage account show -n stkvdataprod -g rg-security \
--query "{cmk:encryption.keySource, infra:encryption.requireInfrastructureEncryption, keyId:encryption.keyVaultProperties.keyVaultUri}" -o table
# Managed disk: encryption type via its DES
az disk show -n osdisk-app01 -g rg-security \
--query "encryption.type" -o tsv
# expect: EncryptionAtRestWithPlatformAndCustomerKeys
# SQL: active TDE protector should be your AzureKeyVault key, not ServiceManaged
az sql server tde-key show -n sql-prod -g rg-security \
--query "{type:serverKeyType, uri:uri}" -o table
# HSM: confirm the key version history (proves rotation happened)
az keyvault key list-versions --hsm-name kv-hsm-prod --name cmk-storage \
--query "[].{created:attributes.created, enabled:attributes.enabled}" -o table
For a true negative test, disable the key version in a non-prod HSM and confirm the dependent storage account / database becomes inaccessible — then re-enable and confirm recovery. That is the only way to know revocation works before you need it.
Hardening checklist
Pitfalls
- The Inaccessible cascade. Deleting or disabling a CMK takes down every resource that wraps with it. Purge protection is your seatbelt — turn it on everywhere, and never grant
Purgeto automation. - Infrastructure encryption is creation-time only. If you forgot it, your only path is recreate-and-migrate. Bake it into your modules as a default-true variable.
- Versioned vs versionless key IDs. Versionless enables auto-rotation; versioned pins you to one key version forever. Know which your resource needs (Storage wants versionless; some services historically required versioned).
- HSM local RBAC is not subscription RBAC. Owner on the subscription grants nothing on the HSM data plane. Forgetting this is the most common reason a CMK grant silently fails.
- Losing the security domain. Past the quorum threshold, no one — not even Microsoft — can recover a Managed HSM. Treat that file like root key material.
Next steps
Wrap all of the above into a reusable Terraform/Bicep module with enable_double_encryption and key_rotation_policy as first-class inputs, then enforce it with Azure Policy: deny storage accounts where keySource != Microsoft.KeyVault and deny disks not bound to an approved DES. That turns encryption-at-rest from a per-resource decision into a platform guarantee.