Architecture Azure

Azure Enterprise Architecture: Confidential & Regulated Workloads

Most “encryption everywhere” stories on Azure quietly stop at two of the three states of data. Encryption at rest (TDE, storage-service encryption) and in transit (TLS) are table stakes — but the moment a record is decrypted into RAM so the CPU can actually compute on it, it sits in cleartext inside host memory, visible in principle to the hypervisor, the host OS, a privileged platform operator, or anyone who can dump a memory page. For most workloads that residual exposure is an accepted risk. For confidential and regulated workloads — payment data under PCI DSS, patient records under HIPAA, EU personal data under GDPR with sovereignty clauses, or anything where the threat model explicitly includes the cloud operator itself — it is not. This article is a complete, reusable Azure reference architecture for running such workloads on confidential computing: hardware-based Trusted Execution Environments (AMD SEV-SNP and Intel TDX) that encrypt data in use, with remote attestation proving the environment is genuine before any secret is released, Key Vault Managed HSM holding keys under FIPS 140-3 Level 3 with secure key release, and Azure Policy plus sovereign controls turning all of it into an auditable, enforced platform rather than a one-off VM.

The business scenario

The pattern below is deliberately scale-agnostic. The same architecture protects a 60-person digital-health startup processing its first 100,000 patient records and a multinational bank running a cross-border payments engine — only the SKU counts, the number of landing zones, and the depth of the audit trail change.

The recurring problem looks like this. An organization must process genuinely sensitive data in the cloud, and the people who sign off on that decision — a CISO, a Data Protection Officer, a regulator, a customer’s third-party risk team — are not satisfied by “it’s encrypted at rest and in transit.” Their hard requirements are:

The architecture that follows satisfies all of these with first-party Azure confidential-computing services and no third-party security appliances.

Architecture overview

The design has two intertwined planes: a data/compute plane where sensitive processing happens inside TEEs, and a trust plane — attestation plus a hardware key store — that decides whether the data plane is allowed to touch any secret at all. Nothing sensitive flows until the trust plane says yes.

Azure confidential & regulated workloads reference architecture: a confidential compute data plane (AKS/CVM/ACI on SEV-SNP/TDX) inside a private VNet, an attestation-gated trust plane (Microsoft Azure Attestation + Key Vault Managed HSM secure key release), private SQL/Blob data services, and a governance plane (Azure Policy, Defender, Confidential Ledger), with the numbered 1–7 request and key-release flow.

The request and data path runs like this. A request — a payment authorization, a clinical query, an inference call — arrives at Azure Front Door / Application Gateway with WAF, the only public ingress, which terminates TLS and inspects the request before forwarding it privately into a regional VNet. The workload itself runs on confidential compute: either a pool of Confidential VMs (DCasv5/ECasv5 on AMD SEV-SNP, or DCesv5/ECesv5 on Intel TDX) hosting the application, an AKS confidential node pool (CVM worker nodes, optionally with Kata-based confidential containers for per-pod isolation), or Confidential Container Instances for a serverless burst. In every case the guest’s memory is hardware-encrypted with a per-VM key held in the AMD Secure Processor / Intel TDX module — the hypervisor and host OS see only ciphertext.

Before that workload is trusted with anything, it goes through attestation. On boot (and again before any secret release) the TEE produces a signed hardware attestation report — a measurement of the firmware, the guest’s launch state, the TEE’s security version, and a freshness nonce. That report is sent to Microsoft Azure Attestation (MAA), an independent service running in its own TEE, which verifies the AMD/Intel signature chain, evaluates the measurement against a customer-authored attestation policy (“must be SEV-SNP, debug disabled, minimum TCB version N”), and — only on success — issues a short-lived, signed attestation JWT.

That JWT is the key that unlocks the trust plane. The workload presents it to Azure Key Vault Managed HSM, which holds the organization’s master keys under FIPS 140-3 Level 3 in single-tenant hardware. The key is marked exportable under a Secure Key Release (SKR) policy that says, in effect, “release this key only to a caller holding a valid MAA token that proves SEV-SNP, debug off, TCB ≥ N.” Managed HSM validates the token against the policy and releases the key wrapped to the enclave’s public key — so the key materializes only inside encrypted enclave memory, never on the wire in cleartext, never in the host. With that key in hand, the enclave decrypts data pulled from Azure SQL / Blob Storage (all reached over Private Endpoints, public access disabled), computes on it in encrypted memory, re-encrypts results, and returns. The data is cleartext only for the microseconds it lives inside a hardware-encrypted page.

Wrapping both planes is governance. Azure Policy, assigned at the regulated management group, enforces that only confidential SKUs may be deployed, that secure boot and vTPM are on, that storage and SQL are private and customer-managed-key encrypted, and that resources are confined to approved regions. Microsoft Defender for Cloud scores the subscriptions against the relevant regulatory compliance frameworks (PCI DSS 4.0, HIPAA/HITRUST, ISO 27001, the regulator’s own benchmark), and Azure Monitor / Log Analytics plus immutable audit logs give the auditor a single, tamper-evident evidence trail. If the workload’s threat model demands the cloud operator be fully excluded, the entire stack can be deployed into Microsoft Cloud for Sovereignty with a sovereign landing zone and confidential-computing-by-default policies.

In one sentence: the data plane runs inside hardware-encrypted enclaves; the trust plane (attestation + Managed HSM secure key release) refuses to hand over any secret until the enclave proves itself; and the governance plane makes confidential-by-construction the only legal way to deploy.

Component breakdown

Each component below earns its place by closing a specific gap in the “data in use is exposed” threat model.

Confidential compute (the data plane)

Option Hardware / TEE Isolation boundary Best for
Confidential VMs (DCasv5/ECasv5) AMD SEV-SNP Whole VM memory encrypted; guest attested Lift-and-shift of existing VM workloads; full-OS confidentiality
Confidential VMs (DCesv5/ECesv5) Intel TDX Whole VM (trust domain) encrypted Same, on Intel; TDX-specific attestation
AKS confidential node pool SEV-SNP CVM nodes Per-node VM encryption; optional per-pod via Kata Kubernetes platforms; mixed confidential/standard pools
Confidential containers (ACI / AKS Kata) SEV-SNP utility VM per pod Per-container-group enclave + container attestation Serverless or per-tenant isolation without app changes
Application enclaves (Intel SGX, DCsv3) Intel SGX Process-level enclave, tiny TCB Highest assurance for a small, refactored code path

Why it’s here: this is the only layer that encrypts data while the CPU computes on it. The key configuration choices are: enable secure boot + vTPM (mandatory for CVMs and for guest attestation); choose VMGS/OS-disk confidential encryption with a customer-managed key so even the OS disk is sealed to the enterprise’s HSM; and pin a minimum TCB / security version so VMs on outdated, potentially-vulnerable microcode are rejected at attestation time. For SEV-SNP CVMs prefer the full-disk confidential OS encryption option over the VMGS-only mode when the OS disk itself carries sensitive state.

A blunt note on SGX vs. CVMs: SGX gives the smallest trusted computing base (just your enclave code, not the OS) and therefore the strongest assurance, but it forces you to refactor your app to the enclave SDK and live with limited enclave page cache. Confidential VMs trust the whole guest OS (a larger TCB) but run unmodified applications. Most enterprises start with CVMs for breadth and reserve SGX for the one crown-jewel routine (a signing operation, a key-derivation step) that justifies the rewrite.

Microsoft Azure Attestation (the trust verifier)

What it does: independently verifies the TEE’s hardware attestation evidence and issues a signed JWT asserting “this is a genuine, policy-compliant enclave.” Why it’s here: without independent attestation, “I’m running in a confidential VM” is an unverifiable claim — an attacker could spoof it. MAA runs in its own enclave so that even Microsoft’s operators cannot forge a passing attestation. Key choices: author a strict attestation policy (reject debuggable=true, require the specific TEE type, set x-ms-sevsnpvm-tcbm / security-version floors), and run a dedicated MAA provider per environment so you control its policy and signing certificate rather than sharing the regional default. The MAA-issued token’s claims (e.g. x-ms-isolation-tee, x-ms-attestation-type) are exactly what the Managed HSM release policy matches against.

Azure Key Vault Managed HSM + Secure Key Release (the key boundary)

What it does: stores the organization’s master/wrapping keys in single-tenant, FIPS 140-3 Level 3 hardware and releases an exportable key only to a caller presenting a valid attestation token that satisfies the key’s SKR policy. Why it’s here: this is the technical control that makes “the operator cannot read our data” true rather than aspirational — keys never exist in cleartext outside HSM hardware or an attested enclave. Key choices: use Managed HSM, not standard Key Vault, when you need single-tenant hardware, full key sovereignty, and high-throughput crypto; enable BYOK/HYOK import (or even an external HSM via the double-key-encryption pattern) if regulation demands the enterprise generate keys outside Azure; define the release policy in the key’s release_policy with exportable = true; and split duties using Managed HSM’s local RBAC roles (Crypto Officer vs. Crypto User) so no single identity can both author a release policy and use the key. Pair every key with a purge-protection + soft-delete configuration so a deleted key (and the data it protects) cannot be silently destroyed.

Private networking & data services

What it does: keeps SQL, Blob, the HSM, and the attestation calls off the public internet. Why it’s here: confidential compute protects data in use, but the surrounding data services still need the Zero-Trust treatment — Private Endpoints for SQL and Storage with public network access = Disabled, Private DNS zones so *.database.windows.net/*.blob.core.windows.net/*.managedhsm.azure.net resolve to private addresses, and customer-managed-key encryption (rooted in the same Managed HSM) on SQL TDE and storage. Key choice: turn on Storage infrastructure encryption (double encryption) and SQL TDE with a Managed HSM key so at-rest, in-transit, and in-use protection share one sovereign key hierarchy.

Governance: Azure Policy, Defender for Cloud, and sovereign controls

Control surface Concrete mechanism What it enforces
Azure Policy (deny/audit) Built-in + custom initiative at the regulated management group Only confidential SKUs; secure boot + vTPM on; private endpoints; CMK; allowed regions only
Defender for Cloud Regulatory compliance dashboard Continuous PCI DSS 4.0 / HIPAA / ISO scoring with evidence
Microsoft Cloud for Sovereignty Sovereign landing zone + policy baseline Operator-exclusion, residency, and confidential-by-default at the platform level
Confidential Ledger (optional) Blockchain-backed, tamper-evident store Immutable audit of attestation decisions and key releases

Why it’s here: a single confidential VM is a science project; an enforced estate where non-confidential is impossible is an architecture. Azure Policy’s deny effects make the platform self-enforcing, and Azure Confidential Ledger can give the auditor a cryptographically tamper-evident record of every key release and attestation decision — exactly the kind of evidence a regulator wants to see.

Implementation guidance

The whole estate is reproducible IaC — confidential computing should never be click-ops, because the entire value proposition rests on provable, repeatable configuration.

Confidential VMs / AKS in Terraform/Bicep. Provision a CVM by setting the confidential security profile on the VM resource. In Terraform’s azurerm_linux_virtual_machine, that means a vtpm_enabled = true, secure_boot_enabled = true block plus an os_disk with security_encryption_type = "DiskWithVMGuestState" (or "VMGuestStateOnly") and a disk_encryption_set_id rooted in your Managed HSM key; the VM size must be a DC/EC-asv5 (SEV-SNP) or -esv5 (TDX) SKU. For Kubernetes, add an azurerm_kubernetes_cluster_node_pool whose vm_size is a confidential SKU; the node pool inherits SEV-SNP, and you layer the confidential-containers / Kata add-on for per-pod isolation. Bicep mirrors this with the securityProfile.securityType: 'ConfidentialVM' and uefiSettings properties — useful when you want the deployment template itself attached to an Azure landing zone.

Managed HSM + Secure Key Release. Managed HSM has a deliberately heavyweight bootstrap: after az keyvault create --hsm-name, it must be activated by downloading the security domain and supplying a quorum of RSA key-pairs (e.g. 3-of-5) — a ceremony you script but perform with multiple human custodians, because losing the security domain quorum is unrecoverable. Then create an exportable key with a release policy. The policy is a small JSON document asserting the required attestation claims; conceptually:

{
  "version": "1.0.0",
  "anyOf": [{
    "authority": "https://myattestprovider.eus.attest.azure.net",
    "allOf": [
      { "claim": "x-ms-attestation-type", "equals": "sevsnpvm" },
      { "claim": "x-ms-compliance-status", "equals": "azure-compliant-cvm" }
    ]
  }]
}

The workload’s runtime flow is then: (1) call the on-host attestation client to get a hardware report, (2) POST it to MAA and receive a JWT, (3) call Managed HSM release with that JWT — Azure returns the key wrapped to the enclave, which unwraps it inside encrypted memory. The Azure confidential-computing SKR samples package this loop; treat it as a sidecar/init step so application code only ever sees an already-unwrapped key handle.

Identity wiring. Every confidential workload authenticates with a user-assigned managed identity — no client secrets. That identity gets the Managed HSM Crypto User local-RBAC role (to use keys), while a separate operations identity holds Crypto Officer (to manage keys); the SKR policy then adds a second, attestation-based gate on top of RBAC, so even a stolen managed-identity token cannot release the key outside a genuine enclave. SQL and Storage access likewise uses the managed identity and Entra tokens — Zero-Trust identity is a prerequisite for, not a replacement for, confidential compute.

Networking. Lay the workload into an Azure landing-zone spoke: a dedicated subnet for the CVM/AKS pool, Private Endpoints for SQL, Storage, Key Vault/Managed HSM, and a private endpoint for Azure Attestation so even the attestation handshake stays off the public internet. Force egress through Azure Firewall with FQDN rules, and lock the data services to public_network_access_enabled = false. Link the relevant Private DNS zones to the spoke (and to the hub for resolution from on-prem over ExpressRoute).

Pipeline. Run the IaC through a CI/CD pipeline whose own runners are, ideally, confidential agents, and gate merges with Azure Policy compliance checks so a PR that introduces a non-confidential SKU fails before deployment, not after.

Enterprise considerations

Security & Zero Trust. Confidential computing is the missing third pillar that lets you claim end-to-end Zero Trust: identity is the perimeter (Entra + managed identities), the network is assumed hostile (private endpoints, WAF), data is encrypted in all three states, and the host itself is removed from the trust boundary. The crown-jewel control is secure key release — it converts “we trust Azure operators not to read memory” into “no secret is mathematically obtainable outside an attested enclave.” Keep the TCB floor in your attestation policy current: when AMD/Intel ship a microcode fix for a TEE vulnerability, raise the required security version so unpatched hosts fail attestation automatically.

Cost optimization. Confidential SKUs carry a premium — DC/ECasv5 VMs typically run roughly 5–12% above their non-confidential Dadsv5 equivalents, and Managed HSM bills a flat hourly rate (on the order of ~USD 3/hour, i.e. ~USD 2,000+/month, for the standard pool) regardless of key count. So: scope confidentiality to the regulated workload only — run the public marketing site and the non-sensitive batch jobs on standard SKUs in a separate, non-confidential landing zone — and share one Managed HSM across many keys/applications since its cost is per-pool, not per-key. Use autoscale on the AKS confidential node pool and reservations/savings plans on the steady-state CVM baseline. The single biggest waste pattern is making the entire estate confidential “to be safe”; confidentiality is a targeted control, not a blanket.

Scalability. The data plane scales like any other Azure compute — VMSS for CVMs, cluster + pod autoscaler for AKS confidential pools, per-request elasticity for confidential ACI. Attestation and key release add a small per-cold-start latency (a report → JWT → unwrap round-trip, tens of milliseconds), so cache the released key inside enclave memory for the VM’s lifetime and re-attest on a schedule rather than per request. Managed HSM sustains thousands of crypto ops/second; if wrap/unwrap throughput becomes the bottleneck, do envelope encryption (HSM wraps a per-session data key; the data key does the bulk crypto in-enclave).

Reliability & DR (RTO/RPO). Confidential VMs and AKS confidential pools support availability zones — spread the pool across three zones for in-region resilience. For DR, the critical, non-obvious dependency is the Managed HSM security domain: it does not automatically replicate cross-region, so a multi-region design must either provision a second Managed HSM and re-import the same key material under a matching SKR policy, or use Managed HSM’s backup/restore into the paired region. Treat RPO as governed by your data-tier replication (SQL failover group / geo-redundant storage with CMK), and RTO by how fast a confidential pool plus its HSM can be stood up in region B — practice it, because the HSM activation ceremony is the long pole. A realistic target for this pattern is RTO ≤ 1 hour, RPO ≤ 5 minutes, with RPO 0 achievable via synchronous SQL replication at a latency cost.

Observability. Pipe CVM/AKS logs and the attestation client’s decisions to Azure Monitor / Log Analytics; alert on attestation failures (a spike means either an attack attempt or a TCB regression after a host update) and on key-release denials. Critically, you cannot run a memory-dump-based EDR inside a confidential VM the way you would a normal host — that’s the whole point — so lean on guest-level telemetry, network observability, and the attestation/key-release audit trail as your detection surface, and write every key release to Confidential Ledger for a tamper-evident record.

Governance. The estate is held together by an Azure Policy initiative assigned at the regulated management group with deny effects (non-confidential SKU → blocked; public storage in scope → blocked; out-of-region deployment → blocked) and Defender for Cloud continuously scoring against PCI DSS 4.0 / HIPAA / ISO 27001. For the strongest sovereignty posture, deploy on Microsoft Cloud for Sovereignty, whose sovereign landing zone ships these confidential-and-residency policies as the default baseline rather than something you assemble by hand.

Reference enterprise example

MeridianHealth Analytics is a fictional EU digital-health company (Frankfurt-headquartered, ~280 staff) that builds a clinical-decision-support service. Hospitals across Germany and France send de-identified-but-still-sensitive patient datasets; MeridianHealth runs risk-scoring models over them and returns insights. Three forces collide: GDPR (personal health data, special-category), each hospital’s third-party risk team demanding that “even the cloud provider cannot read our patients’ data,” and a hard contractual clause that data must never leave the EU. A standard “encrypted at rest + TLS” architecture loses every deal in security review, because the hospitals’ DPOs ask the one question it cannot answer: who can read the data while you’re computing on it?

MeridianHealth builds the confidential architecture. The scoring service runs on an AKS confidential node pool (DCasv5 SEV-SNP nodes) in Germany West Central, with a small SGX-based DCsv3 enclave for the one routine that derives the per-hospital de-identification key. Master keys live in a single Managed HSM in the same region, activated in a 3-of-5 ceremony split across the CTO, the DPO, and three board-appointed custodians. Each hospital’s data-encryption key is an exportable HSM key whose SKR policy requires a sevsnpvm, debuggable=false, TCB-floor attestation token from MeridianHealth’s dedicated MAA provider. When a scoring job starts, the AKS pod attests, MAA issues a JWT, Managed HSM releases the key wrapped to the enclave, the pod decrypts the dataset from private-endpoint Blob Storage entirely inside SEV-SNP-encrypted memory, scores it, re-encrypts the result, and the cleartext never exists outside a hardware-encrypted page. An Azure Policy initiative at the mg-regulated management group denies any non-confidential VM SKU, any storage account with public access, and — using an allowed-locations policy — any deployment outside Germany West Central / West Europe, making the EU-residency clause a platform guarantee rather than a promise. Every key release writes to Azure Confidential Ledger.

The numbers: roughly 18 DCasv5 cores of steady-state AKS capacity (about €2,900/month on a 1-year reservation, versus ~€2,600 for the non-confidential equivalent — a ~12% confidentiality premium), one Managed HSM pool (~€1,900/month, shared across all 40+ hospital keys so the marginal cost per hospital is near zero), plus storage, networking, and Front Door. Total regulated-platform run-rate lands near €6,500/month — trivial against the contract values it unlocks. The outcome: MeridianHealth now passes hospital security reviews by demonstrating, with an attestation report and a key-release ledger entry, that no MeridianHealth engineer and no Azure operator can read patient data in use. The confidential architecture stopped being a cost line and became the sales enabler — three previously-blocked hospital contracts closed within a quarter of go-live, and the DPO signs off because residency is enforced by deny-policy, not by trust.

When to use it

Use this architecture when the threat model genuinely includes the infrastructure operator or a host-level compromise — regulated data (PCI/HIPAA/GDPR-special-category), multi-party computation where parties don’t trust each other or the host, sovereign/government workloads, IP-sensitive model weights or signing keys, or any deal where a customer’s risk team demands technical (not policy) operator-exclusion. It is also the right call when data residency must be guaranteed, since the same governance plane that enforces confidentiality enforces region-confinement.

The trade-offs are real. Confidential SKUs cost a single-digit-to-low-double-digit premium and Managed HSM adds a fixed monthly floor; attestation and key release add cold-start latency and an HSM activation/DR ceremony that is operationally heavy. You also accept a larger blast radius if you misconfigure the trust plane — a sloppy SKR policy (forgetting to require debuggable=false) silently defeats the entire control, so the attestation policy is now a crown-jewel artifact that deserves code review and change control.

Anti-patterns to avoid. Do not make the whole estate confidential “to be safe” — it wastes money and dilutes focus; scope it to the regulated workload. Do not run a confidential VM without a strict attestation policy and secure key release — an attested-but-ungated enclave gives you encrypted memory but still hands keys to anyone, which is theatre. Do not store the master key in standard Key Vault when the requirement is single-tenant FIPS 140-3 L3 hardware sovereignty — that’s the gap Managed HSM exists to fill. And do not treat confidential computing as a substitute for Zero-Trust identity and private networking; it is the third pillar on top of them, not a replacement.

Alternatives. If the operator is inside your trust boundary (a typical internal LOB app), plain Zero-Trust web architecture — WAF, Entra Conditional Access, private endpoints, CMK — is sufficient and far cheaper; reach for confidential computing only when “data in use” exposure is an explicit, signed-off concern. If you need the smallest possible TCB for one tiny routine, use Intel SGX application enclaves rather than whole confidential VMs. If regulation demands the keys never touch Azure at all, layer double key encryption (DKE) or an external HSM in front of this design. And if sovereignty is the dominant driver, start from Microsoft Cloud for Sovereignty’s sovereign landing zone, which gives you this entire pattern as an opinionated, policy-enforced baseline rather than a build.

AzureArchitectureEnterpriseReference Architecture
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading