Azure Compute

Azure Specialized Compute: Dedicated Hosts, Spot, Confidential VMs, HPC & Batch

Most Azure compute lessons stop at the standard virtual machine — pick a size, attach a disk, put it in an availability zone, done. But a surprising amount of real architecture lives in the specialized compute territory beyond that default: the workloads that need a whole physical server to themselves for compliance, the trading systems that measure success in microseconds of network latency, the batch pipelines that want ten thousand cores for twenty minutes and nothing the rest of the day, the regulated data that must stay encrypted even while the CPU is processing it, and the engineering simulations that only run fast if the nodes are wired together with InfiniBand. Azure has a distinct feature or service for each of these, and a senior architect is expected to know not just that they exist but when each is the right answer, what it costs, and where the sharp edges are.

This lesson is the map of that territory. We cover Azure Dedicated Hosts and host groups (single-tenant physical servers you control, for isolation, bring-your-own-licence economics and maintenance scheduling); proximity placement groups (forcing VMs physically close together to shave network latency); Spot Virtual Machines (Azure’s deeply discounted spare capacity, and the eviction model you must design around); Confidential VMs and confidential containers (hardware-based memory encryption with AMD SEV-SNP and Intel TDX, plus remote attestation — confidential computing, the third state of data protection); the HPC VM families (HB, HC, HX, ND) and the InfiniBand/RDMA fabric that makes tightly-coupled parallel jobs scale; Azure Batch and Azure CycleCloud (two different ways to run large-scale job scheduling on pools of VMs that grow and shrink automatically); and Scheduled Events, the in-VM signal that lets any application become maintenance-aware. Each gets the architect’s treatment — what it is, the choices, the defaults, when to pick it, the trade-off, the limits and the cost lever.

By the end you will be able to look at an unusual compute requirement — “this has to be PCI-isolated”, “this trading engine needs sub-millisecond hops”, “we want to run this CFD model on 500 cores overnight”, “the regulator says the data can’t be readable even to Microsoft” — and reach confidently for the right Azure primitive.

Learning objectives

By the end of this lesson you can:

Prerequisites

You should be comfortable with the standard Azure virtual machine — sizes and families, disks, availability zones, availability sets and scale sets — covered in the Azure Virtual Machines deep dive and the VM resilience deep dive. You should know the subscription → resource group → resource hierarchy, how to run az in Cloud Shell, and the basics of reservations and Azure Hybrid Benefit, because the licensing economics return here. This lesson sits in the Compute module of the Azure Zero-to-Hero course as the “everything beyond the standard VM” capstone, immediately after the compliance and sovereignty lesson (confidential computing is a sovereignty control) and before we close the compute track. No prior HPC or confidential-computing experience is assumed — every term is defined.

Core concepts

Before the individual features, fix five mental models. They explain why this whole family exists.

Standard VMs are multi-tenant by design; sometimes you must opt out of that. On a normal VM, Microsoft’s fabric places your virtual machine on whatever physical host has room, alongside other customers’ VMs (strongly isolated by the hypervisor, but sharing the silicon). For the vast majority of workloads that is correct and economical. Dedicated Hosts exist for the minority that must not share a physical server — for a compliance mandate, a software licence that is bound to physical cores, or a need to control exactly when the host is patched. You are trading away the efficiency of shared infrastructure for isolation and control, and you pay for that with the host whether you fill it or not.

Latency is physics; placement is the lever. Two VMs in the same region might be in different datacentres several kilometres apart — fine for almost everything, but a problem for a chatty, latency-sensitive cluster (think a trading matching engine and its feed handlers, or HPC nodes exchanging data every millisecond). A proximity placement group tells Azure “put these VMs as physically close as possible”, typically in the same datacentre and ideally the same network spine, trading some allocation flexibility for the lowest possible network latency between them.

Spare capacity is cheap if you can give it back. At any moment Azure has unused compute that would otherwise sit idle. Spot VMs let you rent that spare capacity at a steep discount — often 60-90% off pay-as-you-go — on one condition: Azure can evict (reclaim) the VM with as little as 30 seconds’ notice when it needs the capacity back (or when your max price is exceeded). This is the single most important cost lever in compute if and only if your workload can survive an instance vanishing — batch jobs, dev/test, stateless web tiers behind a scale set, CI agents.

There is a third state of data, and confidential computing protects it. We routinely protect data at rest (disk encryption) and data in transit (TLS). But while a CPU is actually processing data, that data sits in plain memory, readable in principle by anyone with sufficient access to the host — including, in the threat model that matters for the most sensitive regulated workloads, the cloud operator. Confidential computing closes that gap: the CPU encrypts the VM’s (or container’s) memory with a key the hardware holds and the host OS/hypervisor never sees, and remote attestation lets you cryptographically prove the workload is genuinely running inside such a protected, unmodified environment before you release secrets to it. This is data-in-use protection.

Tightly-coupled HPC is a network problem, not a CPU problem. A parallel simulation split across many nodes is only as fast as the slowest communication step between them. Ordinary Ethernet (even accelerated) adds microseconds of latency and CPU overhead per message; at scale, that overhead dominates and the job stops scaling — adding nodes makes it slower. The HPC VM families solve this with InfiniBand and RDMA (Remote Direct Memory Access), a fabric that lets one node write directly into another node’s memory, bypassing the OS and CPU, at single-digit-microsecond latency. That is what lets an MPI job scale to hundreds of nodes.

Key terms used throughout: dedicated host (a single-tenant physical server), host group (the container/collection of hosts, the zonal/fault-domain boundary), fault domain (a rack-level isolation boundary), PPG (proximity placement group), eviction (reclamation of a Spot VM), TEE / enclave (trusted execution environment — the hardware-protected memory region), attestation (proving a TEE’s identity/integrity), RDMA / InfiniBand (the low-latency HPC fabric), MPI (Message Passing Interface — the programming model for tightly-coupled HPC), pool (a managed, autoscaling group of compute nodes in Batch), and Scheduled Event (an in-VM notification of imminent maintenance).

Azure Dedicated Hosts & host groups

A standard VM lives on a physical host that Microsoft chooses and shares among customers. An Azure Dedicated Host flips that: you provision an entire physical server that is yours alone, and you place your VMs onto it. Nobody else’s workload runs on that silicon.

Why dedicated hosts exist

Three drivers justify the premium:

  1. Physical isolation / compliance. Some regulatory regimes and security policies require that workloads run on hardware not shared with other tenants. A dedicated host gives you a single-tenant boundary you can attest to.
  2. Bring-your-own-licence (BYOL) economics. Many per-physical-core software licences (Windows Server and SQL Server via Azure Hybrid Benefit, and some third-party products) are far cheaper when you control the underlying physical cores. On a dedicated host you can see and licence the actual cores, sockets and host type, often making the host cheaper overall than the equivalent fully-licensed standard VMs.
  3. Maintenance control. On standard VMs, Microsoft decides when host maintenance happens (within an SLA). On a dedicated host you can use maintenance control to defer and self-schedule platform updates within a rolling window — critical for systems with strict change windows.

Host groups, fault domains and zones

You don’t create a host directly into thin air — you create a host group first, then add hosts to it. The host group is the resilience and placement boundary.

Concept What it is Choices / defaults Notes
Host group The container/collection that holds one or more dedicated hosts Created per region; can be pinned to a single availability zone or left zoneless The host group’s zone is fixed at creation — you cannot move it later
Fault domains Rack-level isolation within the host group; hosts in different FDs sit on separate racks (power/network) 1–5 (you choose at host-group creation; default 1) Spread hosts across FDs so a single rack failure doesn’t take out all of them
Availability zone A physically separate datacentre group within the region Optional; one zone per host group For zone resilience, deploy multiple host groups, one per zone
Host (the resource) The physical server itself, of a specific host SKU e.g. Dsv5-Type1, Esv5-Type1, Fsv2-Type1 The SKU family must match the VM series you intend to place

A common production pattern for a resilient dedicated-host estate: one host group per availability zone, each with 2-3 fault domains, hosts spread across the FDs, and VMs balanced across the hosts. That gives you both rack-level (FD) and datacentre-level (zone) resilience on single-tenant hardware.

Host SKUs and capacity

Each host SKU corresponds to a VM family and a fixed amount of physical capacity — a set number of physical cores, a memory size, and therefore a number of VMs of a given size it can hold. For example a host of the Dsv5-Type1 family exposes a fixed pool of physical cores; you can pack it with any mix of Dsv5 VM sizes until the cores are exhausted (one large VM, or many small ones, the cores are the limit). You pay for the whole host per hour regardless of how many VMs you place on it — so the economics only work when you fill the host or when BYOL savings outweigh the unused capacity.

Automatic vs manual placement

When you create the host group you choose how VMs land on hosts:

Placement mode Behaviour When to use
Manual placement (default) You explicitly assign each VM to a named host via --host. If you forget, the VM fails to deploy Maximum control; small estates; when you must guarantee exactly which host a VM is on
Automatic placement Azure chooses a host within the group with room and places the VM for you (you target the host group, not a host) Larger estates; scale sets on dedicated hosts; less operational toil

Automatic placement is the modern default recommendation for anything beyond a handful of VMs, and it is required for placing a Virtual Machine Scale Set on a host group.

Maintenance control

Maintenance control is the headline operational benefit. You create a maintenance configuration (scope = Host), assign it to the host group (or individual hosts), and Azure then holds back all non-zero-impact platform updates for those hosts. You apply pending updates on your schedule (a recurring window, or on-demand), one fault domain at a time, so you control exactly when reboots/live-migrations touch your isolated hardware. Without maintenance control, Microsoft applies host updates on its own cadence (still within SLA, but not on your clock).

Limits and gotchas

# Create a zonal host group with 2 fault domains and automatic placement
az vm host group create \
  -g rg-dedicated -n hg-prod-z1 \
  --location centralindia --zone 1 \
  --platform-fault-domain-count 2 \
  --automatic-placement true

# Add a dedicated host of a specific SKU into fault domain 0
az vm host create \
  -g rg-dedicated --host-group hg-prod-z1 -n host-dsv5-0 \
  --sku Dsv5-Type1 --platform-fault-domain 0

# Create a VM that auto-places onto the host group (Hybrid Benefit for Windows)
az vm create \
  -g rg-dedicated -n vm-app-1 --image Win2022Datacenter \
  --host-group hg-prod-z1 --zone 1 \
  --size Standard_D4s_v5 --license-type Windows_Server

Proximity placement groups (low latency)

Two VMs in the same region can be far enough apart that the network round-trip between them is a meaningful fraction of a millisecond. For most applications that is irrelevant. For a latency-sensitive, chatty cluster — a stock-exchange matching engine and its order gateways, a SAP application tier and its database, HPC nodes exchanging boundary data — it can be the difference between meeting an SLA and missing it.

A proximity placement group (PPG) is a logical grouping that tells Azure: place every VM in this group as physically close together as possible — same datacentre, and where possible the same network spine. The result is the lowest and most consistent network latency Azure can offer between those VMs.

How it works and the trade-off

You create an empty PPG, then deploy VMs (and scale sets) into it. The first VM you start “anchors” the PPG to a specific datacentre; every subsequent VM is placed near that anchor. That is also the catch:

Aspect Detail
Benefit Lowest, most consistent inter-VM network latency (single-digit microseconds within the group)
Anchor behaviour The first allocated VM pins the location; later VMs must fit there
Allocation risk The smaller the target region/datacentre, the higher the chance a needed VM size isn’t available at the anchor location → allocation failure
Mitigation Deploy the largest / rarest VM sizes first so the anchor lands somewhere that can host them; deploy all PPG VMs together
Zones interaction A PPG is, by nature, a single physical location — so it is effectively within one availability zone. You cannot have one PPG span zones (that would defeat the purpose). For zone resilience you run one PPG per zone
Availability sets An availability set can be aligned to a PPG, combining low latency with fault/update-domain spread within that location

The core architectural tension: PPG pulls VMs together (latency); availability zones push them apart (resilience). You cannot have both maximally — so you decide per tier. A latency-critical compute cluster might accept single-zone placement in a PPG and rely on a second PPG in another zone for DR; a web tier that doesn’t need microsecond latency stays zone-redundant.

az ppg create -g rg-lowlat -n ppg-trading --location centralindia
# Deploy the rarest/largest size first to anchor well, then the rest
az vm create -g rg-lowlat -n vm-engine --ppg ppg-trading \
  --size Standard_F32s_v2 --image Ubuntu2204
az vm create -g rg-lowlat -n vm-gateway --ppg ppg-trading \
  --size Standard_F8s_v2 --image Ubuntu2204

Spot Virtual Machines & eviction

Spot VMs rent Azure’s unused capacity at a deep discount — frequently 60-90% below pay-as-you-go, varying by region, size and demand. The deal is simple and asymmetric: you get cheap compute, and Azure can take it back (“evict” it) at any time with as little as 30 seconds’ notice (delivered via a Scheduled Event — see below) when it needs the capacity for full-price customers or when your price cap is exceeded.

Spot is the highest-leverage cost optimization in compute — for the right workloads. It is wrong for anything that must stay up; it is excellent for anything interruptible.

Eviction type: capacity vs price

When you create a Spot VM you choose why it can be evicted:

Eviction type What triggers eviction Behaviour
Capacity only (set max price = -1) Azure needs the capacity back You pay the current Spot price (capped at the pay-as-you-go rate) and are only ever evicted for capacity, never for price. The common choice
Price or capacity (set a max price, e.g. 0.05) The Spot price rises above your max price, or Azure needs capacity You also get evicted if market price exceeds your cap — useful to enforce a hard budget ceiling

Setting max price = -1 means “I’ll pay up to the standard pay-as-you-go price, just don’t evict me on price” — this is what most batch/HPC users want.

Eviction policy: Deallocate vs Delete

Separately, you choose what happens to the VM when it is evicted:

Eviction policy On eviction Cost while evicted When to use
Deallocate (default) VM is stopped (deallocated); OS/data disks kept; you can restart it later when capacity returns You still pay for the disks (and any static IP) while deallocated Stateful-ish workloads you want to resume; you keep the VM identity and disks
Delete VM and (optionally) its disks are deleted Nothing (resources gone) Truly ephemeral nodes, especially in Spot scale sets, where you want capacity to come and go cleanly with no lingering disk bills

Designing for eviction

Spot only works if the workload tolerates a node disappearing. Patterns that make it safe:

# Spot VM: capacity-only eviction (max price -1), Delete on eviction
az vm create -g rg-batch -n vm-spot-worker --image Ubuntu2204 \
  --size Standard_D4s_v5 \
  --priority Spot --eviction-policy Delete --max-price -1

# Check current eviction rate / pricing signal for a size before committing
az vm list-skus --location centralindia --size Standard_D4s_v5 --output table

Confidential VMs & confidential containers

Disk encryption protects data at rest; TLS protects data in transit. Confidential computing protects the third state — data in use, while it is being processed in memory — by running the workload inside a hardware-based Trusted Execution Environment (TEE). The CPU encrypts the VM’s or container’s memory with a key generated and held inside the processor, never exposed to the host OS, the hypervisor, or the cloud operator. Even an administrator with full access to the physical host cannot read the workload’s live memory.

The hardware: AMD SEV-SNP and Intel TDX

Azure Confidential VMs are built on two CPU technologies; the VM size family tells you which:

Technology Vendor What it does Azure VM families
AMD SEV-SNP (Secure Encrypted Virtualization – Secure Nested Paging) AMD EPYC Encrypts VM memory per-VM with a hardware key; SNP adds integrity protection against the hypervisor DCasv5/DCadsv5, ECasv5/ECadsv5 (and newer)
Intel TDX (Trust Domain Extensions) Intel Xeon Creates hardware-isolated “trust domains” with encrypted, integrity-protected memory DCesv5/DCedsv5, ECesv5/ECedsv5 (and newer)

Both deliver the same architectural promise — a confidential VM whose entire memory is hardware-encrypted — using different silicon. You choose the family; the size letter scheme is the standard one (D = general purpose, E = memory-optimized; the C denotes confidential).

Attestation: proving the TEE is real

Encryption alone isn’t enough — you must be able to prove a workload is genuinely running inside a legitimate, unmodified TEE before you trust it with secrets. That is remote attestation: the hardware produces a signed attestation report describing the TEE’s identity and measurements (firmware, boot state), and a verifier checks it. On Azure this is the job of Microsoft Azure Attestation (MAA), a managed service that validates the evidence and issues a token. A typical confidential pattern: the workload boots, attests via MAA, and only on a valid token does Key Vault / Managed HSM release the keys the workload needs (secure key release). For confidential VMs, the OS disk can also be confidential-encrypted with keys bound to the VM’s TEE.

Confidential containers

You don’t need a whole VM to get a TEE. Azure offers confidential containers in two forms:

Form Where Model
Confidential containers on AKS AKS with confidential VM node pools or Confidential Containers (Kata) Pod-level isolation in a hardware-backed enclave; for lift-and-shift of standard containers into a TEE
Confidential containers on ACI Azure Container Instances Serverless confidential containers backed by SEV-SNP, with an enforced security policy and attestation

These let you protect data-in-use for containerized workloads — useful for multi-party data analytics (several organizations compute over combined data that none can read), confidential AI inference, and processing regulated PII where even the operator must be excluded from the trust boundary.

When to use (and the trade-offs)

Aspect Detail
Use when Regulatory/contractual requirement to exclude the cloud operator from the trust boundary; multi-party computation; highly sensitive PII/financial/health data; sovereignty controls
Cost Confidential families carry a premium over equivalent standard sizes
Performance Small overhead from memory encryption; generally modest for typical workloads
Constraints Limited region/family availability vs standard VMs; specific supported OS images; some features differ
Not a silver bullet Protects memory confidentiality/integrity — it is not a substitute for patching, network security, or identity controls
# Create a Confidential VM (AMD SEV-SNP family) with a confidential OS disk
az vm create -g rg-conf -n cvm-1 \
  --size Standard_DC4as_v5 \
  --image "Canonical:0001-com-ubuntu-confidential-vm-jammy:22_04-lts-cvm:latest" \
  --security-type ConfidentialVM \
  --os-disk-security-encryption-type DiskWithVMGuestState \
  --enable-vtpm true --enable-secure-boot true \
  --admin-username azureuser --generate-ssh-keys

HPC VM families & InfiniBand

High-Performance Computing (HPC) workloads — computational fluid dynamics, weather and climate models, molecular dynamics, finite-element crash simulation, seismic processing, large-scale AI training — split one big problem across many nodes that must constantly exchange data. As noted in the core concepts, the bottleneck for these tightly-coupled jobs is the inter-node network, not the CPUs. Azure’s HPC VM families pair fast, HPC-grade CPUs/GPUs with a back-end InfiniBand fabric and RDMA, which is what lets a job scale efficiently to hundreds of nodes.

The families

Family Optimized for Interconnect Typical workloads
HB-series (HBv3, HBv4) Memory-bandwidth-bound HPC InfiniBand (NDR/HDR) CFD, weather, explicit FEA, fluid dynamics
HC-series Compute / dense-FP HPC (high clock, all cores) InfiniBand Implicit FEA, molecular dynamics, computational chemistry
HX-series Very large memory HPC InfiniBand EDA (chip design), large structural/mechanical models
ND-series (NDv2/v4/v5 and newer) GPU HPC & large-scale AI training InfiniBand between GPU nodes (e.g. NVIDIA, GPUDirect RDMA) Distributed deep-learning training, GPU simulation
NC / NV-series GPU compute / visualization (often not InfiniBand-coupled) Ethernet (typically) Single-node GPU compute, inference, remote viz

The H-prefix families (HB/HC/HX) and the InfiniBand-equipped ND families are the tightly-coupled HPC SKUs. The N-series without InfiniBand are for single-node or loosely-coupled GPU work (one big inference box, visualization), where node-to-node RDMA isn’t needed.

InfiniBand, RDMA and MPI

Three ingredients make tightly-coupled HPC scale, and you need all three:

To use the fabric you also need the right OS image and drivers: the Azure HPC marketplace images (AlmaLinux-HPC / Ubuntu-HPC) ship with the InfiniBand drivers, RDMA stack and tuned MPI pre-installed — strongly preferred over hand-installing on a base image. For predictable performance, HPC nodes are typically placed in a proximity placement group (often within a single scale set) so they sit on the same InfiniBand spine.

A classic interview point: why does a parallel job sometimes get slower when you add nodes? Because past a point, communication overhead grows faster than the compute you’ve added (Amdahl’s law plus network cost). InfiniBand/RDMA pushes that point much further out — which is exactly why the HPC families exist.

Azure Batch & CycleCloud

Running one HPC job by hand on a few VMs is easy. Running thousands of jobs, or a job across hundreds of nodes that you only want to exist while the job runs, needs a scheduler that provisions compute, queues and dispatches work, scales the fleet up and down, and tears it all down afterwards. Azure gives you two distinct services for this, aimed at two different audiences.

Azure Batch

Azure Batch is a fully managed, cloud-native job-scheduling service. You don’t manage a scheduler or head node — Azure does. The model is three nested concepts:

Concept What it is
Pool A managed, autoscaling collection of compute nodes (VMs of a chosen size/image). The pool can grow and shrink on a formula, and can be Spot (low-priority) nodes for cheap throughput
Job A logical container for work, attached to a pool
Task A unit of work (a command line, with input files and output handling) that runs on a node. Tasks can be many thousands; Batch dispatches them across the pool

Batch shines for embarrassingly parallel and HPC workloads: rendering (every frame a task), Monte-Carlo and financial risk runs, genomics/parametric sweeps, media transcoding, and MPI multi-node jobs (Batch supports multi-instance tasks over InfiniBand). Its autoscale formula is the key cost lever: you write an expression (based on pending tasks, time of day, etc.) and Batch resizes the pool — including scaling to zero when idle, so you pay only while work runs. Combine that with Spot nodes and Batch becomes extremely cheap per unit of work.

# Create a Batch account, then an autoscaling Spot pool
az batch account create -g rg-batch -n kvbatch$RANDOM -l centralindia
az batch account login -g rg-batch -n <account>
# (pool/job/task creation continues via `az batch pool create`, `job create`, `task create`)

Azure CycleCloud

Azure CycleCloud is an orchestration tool for deploying and managing traditional HPC cluster schedulers on Azure. Where Batch replaces the scheduler with a managed service, CycleCloud lets HPC teams keep the scheduler they already knowSlurm, PBS Pro/OpenPBS, LSF, Grid Engine — and run it on Azure with autoscaling, cost controls and a familiar cluster experience. You install CycleCloud (often from the Marketplace), point it at your subscription, and it provisions head nodes and dynamically autoscaling execute nodes, mounts shared filesystems, and presents the cluster exactly as on-prem HPC users expect.

Azure Batch Azure CycleCloud
Model Managed job-scheduling service (no scheduler to run) Orchestrator that deploys your HPC scheduler (Slurm/PBS/LSF/GE)
Audience Developers building scalable batch into apps; cloud-native pipelines HPC teams lifting traditional clusters to the cloud; researchers used to Slurm
You manage Pools/jobs/tasks via API/SDK A familiar cluster (queues, the scheduler), with autoscale handled
Best for Embarrassingly parallel, render/transcode, parametric sweeps, app-integrated batch Tightly-coupled MPI HPC where users want their existing scheduler/workflow
Autoscale Built-in formula on the pool (to zero) Scheduler-driven autoscale of execute nodes (to zero)

The decision is largely cultural and architectural: build batch into an application → Azure Batch; bring an existing HPC cluster and its users → CycleCloud. Both autoscale to zero and both can use Spot for the worker fleet.

Azure specialized compute options

The diagram lays out the specialized-compute landscape — Dedicated Hosts and host groups for isolation, PPGs for latency, Spot for cheap interruptible capacity, Confidential VMs/containers for data-in-use, the HPC families on InfiniBand, and Batch/CycleCloud orchestrating pools — so you can see at a glance which primitive answers which requirement.

Scheduled events & maintenance awareness

Every VM, no matter how special, eventually faces maintenance — and the difference between a graceful workload and an outage is whether the application knew it was coming. Azure exposes that knowledge through Scheduled Events.

Planned vs unplanned maintenance

Type What it is Impact
Unplanned (unexpected) A hardware failure on the host Azure auto-recovers the VM (typically a few minutes of downtime as it restarts on healthy hardware). Availability zones/sets limit the blast radius
Planned Routine platform updates (host OS, firmware) Often zero-impact via live migration (the VM is moved with a brief pause of a few seconds); some updates need a reboot. Maintenance control (above) lets you self-schedule these on eligible resources

Scheduled Events: the in-VM signal

Scheduled Events is part of the Azure Instance Metadata Service (IMDS) — a non-routable endpoint (169.254.169.254) reachable only from inside the VM. The VM polls it to learn about imminent maintenance affecting that VM, with advance warning (typically up to 15 minutes for planned operations, but as little as ~30 seconds for a Spot eviction), giving the application time to react — drain connections, fail over, checkpoint, deregister from a load balancer.

Event types you’ll see in the payload include:

EventType Meaning Typical reaction
Reboot The VM will be rebooted for maintenance Flush state, checkpoint, quiesce
Redeploy The VM will be moved to another host Same — expect a brief outage
Freeze The VM is paused briefly (e.g. live migration) Usually nothing, but pause time-sensitive ops
Preempt A Spot VM is being evicted (~30s notice) Checkpoint, deregister, save work now
Terminate The VM (often scale-set instance) is being deleted Graceful shutdown of the app

A maintenance-aware app polls the Scheduled Events endpoint, and on seeing a relevant event for its own node, performs its drain/checkpoint logic and then acknowledges (approves) the event to let Azure proceed immediately rather than waiting out the timer. This is exactly how a Spot scale set drains an evicted node, and how a clustered database fails over before a reboot instead of after.

# From inside a Linux VM: read pending Scheduled Events via IMDS
curl -s -H "Metadata:true" \
  "http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01" | jq .
# To acknowledge an event, POST its EventId back to the same endpoint.

Hands-on lab

In this lab you exercise three of the most practical specialized-compute features from Cloud Shell — a proximity placement group, a Spot VM with an eviction policy, and reading Scheduled Events from inside the VM. (Dedicated hosts, confidential VMs and full HPC clusters incur real cost and quota; they are described above rather than provisioned here.) Everything below runs on standard, low-cost sizes and is fully torn down at the end.

Step 1 — Resource group, region and a proximity placement group.

RG=rg-spec-compute-lab
LOC=centralindia
az group create -n $RG -l $LOC
az ppg create -g $RG -n ppg-lab --location $LOC

Expected: the PPG is created ("proximityPlacementGroupType": "Standard").

Step 2 — Deploy a small Spot VM into the PPG with a Delete eviction policy.

az vm create -g $RG -n vm-spot-lab \
  --image Ubuntu2204 --size Standard_B2s \
  --ppg ppg-lab \
  --priority Spot --eviction-policy Delete --max-price -1 \
  --admin-username azureuser --generate-ssh-keys

Expected: the VM deploys with "priority": "Spot" and "evictionPolicy": "Delete". (If you get a Spot allocation error, that region/size has no spare capacity right now — try --size Standard_B1s or a different region.)

Step 3 — Confirm the Spot and PPG settings.

az vm show -g $RG -n vm-spot-lab \
  --query "{priority:priority, eviction:evictionPolicy, ppg:proximityPlacementGroup.id}" -o jsonc

Validation: priority is Spot, eviction is Delete, and ppg references ppg-lab — proving the VM is both a Spot instance and pinned to the proximity placement group.

Step 4 — Read Scheduled Events from inside the VM.

# SSH in (the IMDS endpoint is only reachable from inside the VM)
ssh azureuser@$(az vm show -d -g $RG -n vm-spot-lab --query publicIps -o tsv)

# Inside the VM:
curl -s -H "Metadata:true" \
  "http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01"
exit

Expected: a JSON document {"DocumentIncarnation": N, "Events": []} — an empty Events array means no maintenance is currently scheduled for this VM. (When Azure later schedules maintenance, or evicts this Spot VM, an event with EventType such as Preempt or Reboot would appear here, which a real app would poll for and act on.)

Step 5 (read-only) — Inspect what a host group would look like. Without creating one, you can list the host SKUs available in the region to see what dedicated-host families you could deploy:

az vm list-skus --location $LOC --resource-type hostGroups/hosts -o table 2>/dev/null \
  || echo "Host SKUs vary by region/subscription; check the portal Dedicated Hosts blade."

Cleanup.

az group delete -n $RG --yes --no-wait

Cost note (INR): the lab’s only running cost is the Spot B2s VM, billed per second at the Spot price — typically a fraction of the already-low B-series pay-as-you-go rate, so a 20-30 minute lab is on the order of a rupee or two, plus a few paise for the small OS disk while it exists. The PPG, the Scheduled Events endpoint and listing SKUs are free. az group delete removes everything (with --eviction-policy Delete the disk goes too), returning the cost to zero. Note that dedicated hosts and confidential VMs are materially more expensive and were deliberately not provisioned here — never leave a dedicated host running idle, as it bills for the whole physical server.

Common mistakes & troubleshooting

Symptom Likely cause Fix
Dedicated-host bill far higher than expected You pay for the whole host whether or not VMs fill it; an idle/under-packed host still bills Right-size and pack hosts; apply a reserved instance to the host; delete hosts you aren’t using
VM fails to deploy onto a dedicated host Host group is in manual placement and you didn’t specify --host (or the host is full / wrong SKU family) Specify the target host, or switch the group to automatic placement; check the host has free cores and matches the VM family
PPG deployment fails with allocation error The PPG’s anchor location can’t host the requested size Deploy the largest/rarest size first; deploy all PPG VMs together; try another zone/region
Spot VM keeps getting evicted Volatile capacity for that size/region, or your max price is below market Use --max-price -1 (capacity-only); pick a less contested size/region; spread across sizes; mix Spot + regular in a scale set
Spot eviction caused data loss Workload wasn’t checkpointing and ignored the Preempt event Consume Scheduled Events; checkpoint frequently; use Deallocate policy if you must keep disks, or design the job to resume
HPC job doesn’t scale past a few nodes Not using InfiniBand/RDMA (wrong VM family, base image without drivers, or non-RDMA MPI) Use an HB/HC/HX/ND family + the HPC marketplace image + an RDMA-capable MPI; place nodes in a PPG/scale set
Confidential VM won’t boot a given image The OS image isn’t a supported confidential-VM image, or the family isn’t available in the region Use a marketplace confidential-VM image; check region/family availability; verify --security-type ConfidentialVM and vTPM/secure-boot settings
App got rebooted with no warning The application never polled Scheduled Events, so it didn’t drain before planned maintenance Poll the IMDS Scheduled Events endpoint; react to and acknowledge events; for control over timing on eligible resources, apply maintenance control

Best practices

Security notes

Interview & exam questions

1. When would you choose an Azure Dedicated Host over standard VMs? When you need single-tenant physical isolation (compliance/regulatory mandate), bring-your-own-licence economics (per-physical-core licences like SQL/Windows via Hybrid Benefit are cheaper on hardware you control), or maintenance control (self-scheduling host updates within a change window). You trade shared-infrastructure efficiency for isolation and control, and you pay for the whole host regardless of utilization.

2. What is the difference between a host group and a dedicated host? A dedicated host is the physical server; a host group is the container that holds one or more hosts and defines the placement boundary — its availability zone and fault-domain count (both fixed at creation). You create the host group first, then add hosts into its fault domains.

3. Automatic vs manual placement on dedicated hosts? Manual (default) requires you to assign each VM to a named host (--host); automatic lets Azure choose a host within the group. Automatic is recommended at scale and is required to put a scale set on a host group.

4. What does a proximity placement group do, and what’s the trade-off? It forces VMs to be physically co-located for the lowest, most consistent inter-VM network latency. The trade-off is allocation flexibility: the first VM anchors the location and later VMs must fit there, so rare/large sizes can fail to allocate — mitigate by deploying the largest size first. A PPG is effectively within a single zone, so it pulls against zone resilience.

5. Explain Spot VM eviction — the two “types” and the two “policies”. Eviction type = why it’s evicted: capacity-only (max price -1, evicted only when Azure needs capacity) or price-or-capacity (a max price; also evicted if market price exceeds it). Eviction policy = what happens: Deallocate (stop, keep disks, resume later — you still pay for disks) or Delete (remove the VM/disks, no further cost). Notice comes via a Preempt Scheduled Event with ~30 seconds’ warning.

6. What is confidential computing and which CPU technologies back it on Azure? Hardware-based protection of data in use: the CPU encrypts VM/container memory with a key the host/hypervisor never sees, inside a Trusted Execution Environment. On Azure it’s AMD SEV-SNP (DCas/ECas families) and Intel TDX (DCes/ECes families). It complements at-rest and in-transit encryption — the “third state”.

7. What is remote attestation and why does it matter for confidential workloads? It’s the process of cryptographically proving a workload is genuinely running in a legitimate, unmodified TEE before trusting it. The hardware emits a signed attestation report; Microsoft Azure Attestation verifies it and issues a token, which gates secure key release from Key Vault/Managed HSM — so secrets only ever reach a verified enclave.

8. Why do HPC VM families have InfiniBand, and what is RDMA? Tightly-coupled HPC is limited by inter-node communication, not CPU. InfiniBand is a dedicated low-latency, high-bandwidth back-end fabric; RDMA lets one node access another’s memory directly, bypassing the OS/CPU at single-digit-microsecond latency. Together with an RDMA-capable MPI, they let parallel jobs scale to hundreds of nodes instead of stalling on communication overhead.

9. Azure Batch vs Azure CycleCloud — when each? Azure Batch is a managed job-scheduling service (pools → jobs → tasks, autoscale to zero, Spot nodes) for cloud-native, app-integrated, embarrassingly-parallel and HPC work — you don’t run a scheduler. CycleCloud orchestrates a traditional HPC scheduler (Slurm/PBS/LSF/Grid Engine) on autoscaling Azure capacity for teams who want to bring their existing cluster and workflow. Build batch into an app → Batch; lift an existing HPC cluster → CycleCloud.

10. What are Scheduled Events and how does an application use them? A part of the Instance Metadata Service (169.254.169.254, reachable only inside the VM) that warns of imminent maintenance (Reboot/Redeploy/Freeze/Preempt/Terminate) with advance notice. A maintenance-aware app polls the endpoint, drains/checkpoints/fails over when it sees an event for its node, then acknowledges the event to let Azure proceed immediately.

11. Planned vs unplanned maintenance — what’s the difference in impact? Unplanned is a hardware failure: Azure auto-recovers the VM with a few minutes’ downtime (zones/sets limit blast radius). Planned is routine platform updates, often zero-impact via live migration (brief pause) or sometimes a reboot; on eligible resources you can self-schedule these with maintenance control.

12. How do you architect a cost-efficient render farm on Azure? Use Azure Batch with an autoscaling pool of Spot nodes (each frame a task), set --max-price -1 and Delete eviction policy so capacity comes and goes cleanly, scale the pool to zero when idle, and consume Preempt events to requeue interrupted frames. You pay only while frames render, at the deep Spot discount.

Quick check

  1. You must run a workload on a server not shared with any other tenant, and licence SQL Server by physical core as cheaply as possible. Which compute option?
  2. A trading cluster needs the lowest possible network latency between its VMs. What do you create, and what’s the main risk?
  3. You want the cheapest compute for an interruptible batch job and want no lingering disk cost after eviction. Which --priority, --eviction-policy and --max-price?
  4. Your regulator requires that even Microsoft cannot read the data while it’s being processed. Which class of compute, and what proves the environment is genuine?
  5. An MPI simulation stops scaling past 8 nodes. Name the three things that must all be in place for it to scale.

Answers

  1. Azure Dedicated Host — single-tenant physical isolation, and you can licence the actual physical cores (BYOL / Azure Hybrid Benefit), usually making it cheaper than fully-licensed standard VMs for steady-state SQL.
  2. A proximity placement group — the main risk is allocation failure, because the first VM anchors the physical location and later (especially large/rare) sizes may not fit there; deploy the largest size first and all VMs together.
  3. --priority Spot --eviction-policy Delete --max-price -1 — Spot for the deep discount, Delete so the VM and disks vanish on eviction (no further cost), max price -1 for capacity-only eviction at up to the pay-as-you-go rate.
  4. Confidential VMs / confidential containers (AMD SEV-SNP or Intel TDX) protect data in use; remote attestation (verified by Microsoft Azure Attestation) cryptographically proves the workload runs in a genuine, unmodified TEE before secrets are released.
  5. An InfiniBand-equipped HPC VM family (HB/HC/HX/ND), the HPC marketplace image with the RDMA drivers, and an RDMA-capable MPI — plus, ideally, a proximity placement group so the nodes share the InfiniBand spine.

Exercise

Design (on paper or in Bicep) a resilient, cost-aware specialized-compute estate for a quantitative-research firm with three needs. (1) A regulated risk-calc tier that, per the compliance team, must run on single-tenant hardware with self-scheduled patching — specify the host group layout (zones, fault domains, placement mode), the maintenance configuration, and how you’d apply a reserved instance to control cost. (2) An overnight Monte-Carlo batch that should cost as little as possible — choose between Azure Batch and CycleCloud, justify it, and specify the pool sizing, Spot settings (eviction type/policy/max price) and the autoscale-to-zero behaviour, plus how tasks survive eviction. (3) A tightly-coupled CFD model the engineers run on ~100 cores — pick the HPC VM family, the image, the interconnect/MPI, and the placement strategy, and explain in two sentences why this combination scales where standard VMs wouldn’t. Finish with a short note on which tiers (if any) should use Confidential VMs and why.

Certification mapping

This lesson supports the compute-specialization corners of both major Azure certifications:

Glossary

Next steps

AzureComputeConfidential ComputingHPCAzure BatchAZ-305
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading