Almost every interesting failure, cost surprise, and performance complaint on Azure IaaS eventually traces back to a disk. A SQL Server that “randomly” stalls under load is usually a Standard SSD pretending to be Premium. A surprise line on the bill is usually an Ultra Disk left attached to a deallocated VM, or a forgotten 4 TB Premium snapshot. A VM that won’t boot after a “harmless” caching change is a data disk that had write-back caching turned on while the application assumed durability. Disks are where the abstraction of “just a virtual machine” meets the very physical reality of IOPS, throughput, latency, replication, and money.
This is the deep dive that makes disks stop being a mystery. An Azure managed disk is a block-storage device that Azure provisions and manages for you as a first-class Azure Resource Manager (ARM) resource — you pick a type and a size, and Azure handles the underlying storage account, replication, and placement. You will leave this lesson knowing every disk type and when to choose it, how disk size maps to performance, what host caching actually does (and when it will corrupt your data if you get it wrong), every encryption option Azure offers and how they stack, and the operational toolkit — snapshots, images, shared disks, performance tiers, online resize, OS-disk swap, and ephemeral OS disks. We will cover the settings you choose when you create a disk and the ones you can (and cannot) change afterwards, with working az CLI and Bicep for each core operation.
Learning objectives
By the end of this lesson you will be able to:
- Choose the correct disk type (Standard HDD, Standard SSD, Premium SSD v1, Premium SSD v2, Ultra) for a given workload by reasoning about IOPS, throughput, latency, and cost.
- Explain how disk size tiers (P/E/S) map to provisioned performance, and how Premium SSD v2 and Ultra decouple size from performance.
- Configure host caching (None / ReadOnly / ReadWrite) correctly per disk role and explain why the wrong choice can corrupt data.
- Distinguish the OS disk, data disks, and the temporary disk, and know which one survives a deallocate.
- Apply Azure’s encryption options — server-side encryption with platform or customer-managed keys, encryption at host, Azure Disk Encryption, double encryption, and confidential disk encryption — and know when each is required.
- Operate disks day-to-day: incremental snapshots, images, shared disks for clustering, performance tiers, online resize, OS-disk swap, and ephemeral OS disks — with
azand Bicep.
Prerequisites & where this fits
You should be comfortable creating a virtual machine and reasoning about regions and resource groups; if VMs are new, read the Azure Virtual Machines deep dive first, since the Disks tab of VM creation is where most people first meet these options. You will get more out of the encryption section if you have seen Azure Key Vault before. This lesson sits in the Compute module of the Azure Zero-to-Hero course, immediately after the VM and VM-resilience lessons and immediately before networking — disks are the storage layer that every VM stands on, so we cover them while VMs are fresh and before we move the discussion to the network.
Core concepts
Managed vs unmanaged (the history that explains the model). In the original Azure model you created storage accounts yourself and dropped VM disks into them as page blobs (“unmanaged disks”). You had to spread VMs across many storage accounts to avoid hitting a per-account IOPS cap (20,000 IOPS), you managed your own naming and containers, and an availability-set deployment could silently put two VMs’ disks in the same storage scale unit and defeat the whole point of the availability set. Managed disks, now the default and the only type you should use, make the disk itself the ARM resource: Azure picks and manages the backing storage, enforces the per-disk performance you provisioned, automatically distributes disks of availability-set VMs across fault domains, and gives you role-based access control, resource locks, tags, and Azure Policy on the disk like any other resource. Unmanaged disks are deprecated and being retired — treat “disk” and “managed disk” as synonyms from here on.
The three roles a disk can play. Every VM has exactly one OS disk (a registered, bootable disk, max 4 TiB, mounted as C: on Windows or / on Linux, with ReadWrite caching on by default). A VM can have one or more data disks — empty block devices you attach for application data, databases, and logs; the number you can attach is capped by the VM size (a small VM might allow 4, a large one 64). And almost every VM size ships a temporary disk (the “temp disk”, D: on Windows, /dev/sdb//mnt on Linux): a local SSD physically attached to the host, not a managed disk, not persisted, and wiped on deallocate, host maintenance, or resize. The temp disk is free and fast and perfect for OS paging files, tempdb, and scratch — and catastrophic if you ever store anything you care about on it.
Provisioned performance is what you pay for. With the classic tiers (Standard HDD/SSD, Premium SSD v1), performance is a fixed function of the size you pick — choosing a bigger disk is how you buy more IOPS. With the newer types (Premium SSD v2 and Ultra), capacity and performance are decoupled: you provision GiB, IOPS, and MB/s independently and are billed for each. The mental model to carry: you pay for provisioned capacity and (on v2/Ultra) provisioned performance, not for what you actually use.
IOPS vs throughput vs latency. IOPS is operations per second (matters for small random I/O — databases, OLTP). Throughput is MB/s (matters for large sequential I/O — backups, analytics, media). Latency is the time per operation (matters for chatty, latency-sensitive apps). A disk can be IOPS-bound, throughput-bound, or latency-bound, and the VM size has its own disk IOPS/throughput ceiling — your effective performance is the minimum of the disk limits and the VM’s limits. A Premium disk on an undersized VM, or a VM with no Premium support, will never hit the disk’s rated numbers.
Disk types: the master comparison
This is the single most important table in the lesson. The five managed-disk types, side by side:
| Disk type | Media | Max size | Max IOPS (per disk) | Max throughput | Typical latency | Performance model | Best for |
|---|---|---|---|---|---|---|---|
| Standard HDD | Magnetic | 32 TiB | ~2,000 (+ bursting) | ~500 MB/s | ms (10ms+), variable | Fixed by size tier (S) | Dev/test, backup, infrequent/cold, cost-first |
| Standard SSD | SSD | 32 TiB | ~6,000 (with bursting) | ~750 MB/s | single-digit ms | Fixed by size tier (E) | Web servers, light prod, dev/test that needs consistency |
| Premium SSD v1 | SSD | 32 TiB | 20,000 | 900 MB/s | low single-digit ms | Fixed by size tier (P) | Production, databases, latency-sensitive; required for SLA-backed single-VM |
| Premium SSD v2 | SSD | 64 TiB | 80,000 | 1,200 MB/s | sub-ms | Independently provisioned IOPS + MB/s | Most new production; best price/performance |
| Ultra Disk | NVMe-class | 64 TiB | 400,000 | 10,000 MB/s | sub-ms | Independently provisioned IOPS + MB/s, adjustable live | Top-tier OLTP, SAP HANA, high-end SQL, message queues |
Read that as a ladder of price and capability: Standard HDD is the cheapest and slowest; Ultra is the fastest and (for high performance) the priciest. A few load-bearing nuances behind the numbers:
- Premium SSD v1 unlocks the single-instance VM SLA. Microsoft’s 99.9% SLA for a single VM (no availability set/zone) requires all OS and data disks to be Premium SSD or Ultra. A single VM on Standard disks has no SLA. This is a classic exam point.
- Premium SSD v2 is usually the new default for production. It typically costs less than v1 for the same provisioned performance, has sub-millisecond latency, starts every disk with a free baseline (3,000 IOPS and 125 MB/s) regardless of size, and lets you dial IOPS and throughput up or down independently. Its main constraints: it cannot (yet) be an OS disk in most configurations, it does not support host caching, and zone/region availability is narrower than v1.
- Ultra is for the extreme tail. Up to 400,000 IOPS and 10,000 MB/s, sub-millisecond latency, and you can change provisioned IOPS/throughput without detaching or rebooting. It cannot be an OS disk, does not support host caching, does not support snapshots in the same simple way (incremental snapshot support arrived later and has constraints), and you must enable Ultra compatibility on the VM (
--ultra-ssd-enabled) in a supported zone.
Provisioned vs on-demand, and bursting
There are two different “elasticity” stories you must not conflate:
Provisioned vs on-demand performance (a property of the type):
- Standard HDD/SSD and Premium SSD v1: performance is provisioned by size — the tier you pick fixes baseline IOPS/throughput. You change performance by changing the size tier (or, for Premium v1, by setting a higher performance tier without growing the disk — see below).
- Premium SSD v2 and Ultra: performance is on-demand provisioned — you set IOPS and MB/s as independent dials, separate from capacity, and you are billed for the provisioned numbers. There is no “size tier” forcing your hand.
Bursting — two distinct models, both about handling short spikes above baseline:
| Bursting model | Applies to | How it works | Cost |
|---|---|---|---|
| Credit-based bursting | Premium SSD v1 (P30 and smaller), Standard SSD | Disk accrues burst credits while idle/below baseline; spends them to burst up to a fixed ceiling (e.g. 3,500 IOPS / 170 MB/s) for up to ~30 min. Free, automatic, on by default. | Free |
| On-demand bursting | Premium SSD v1 (P30 and larger) | Disk can burst up to a much higher ceiling (e.g. 30,000 IOPS / 1,000 MB/s) with no credit limit — burst as long as you need. Must be explicitly enabled; billed per transaction above baseline plus an enablement fee. | Paid |
Premium SSD v2 and Ultra don’t “burst” in this sense — you simply provision the IOPS/MB/s you want. The interview-grade summary: credit-based bursting is free, automatic, and capped by accrued credits and a low ceiling; on-demand bursting is paid, uncapped in duration, and only on larger Premium v1 disks.
OS disk vs data disk vs temp disk
Putting the three roles together with the operational facts you must remember:
| Aspect | OS disk | Data disk | Temporary disk |
|---|---|---|---|
| Count per VM | Exactly 1 | 0 to (VM-size limit, up to 64) | 0 or 1 (depends on size) |
| Persisted (managed)? | Yes | Yes | No — local to host |
| Survives deallocate? | Yes | Yes | No — wiped |
| Default caching | ReadWrite | ReadOnly (Premium) / None | n/a |
| Max size | 4 TiB | up to 64 TiB (type-dependent) | fixed by VM size |
| Typical use | Boot volume, OS | App data, DB files, logs | Page file, tempdb, scratch |
| Billed | Yes | Yes | Free (included in VM) |
Three rules that prevent most data-loss incidents: never put durable data on the temp disk; never store data you need on the OS disk if you can use a data disk (separating OS and data makes resize, swap, and backup cleaner); and remember that on Linux the temp disk device letter can change — mount data disks by UUID in /etc/fstab, not by /dev/sdX, or a reboot/resize can mount the wrong device.
Disk size tiers (P/E/S) and how size maps to performance
The classic types use lettered size tiers, and each tier is a fixed bundle of capacity + baseline performance:
- P-series = Premium SSD v1 (P1, P2, … P80). Example: P10 = 128 GiB / 500 IOPS / 100 MB/s; P30 = 1 TiB / 5,000 IOPS / 200 MB/s; P40 = 2 TiB / 7,500 IOPS / 250 MB/s; P80 = 32 TiB / 20,000 IOPS / 900 MB/s.
- E-series = Standard SSD (Economy SSD): E1…E80, same capacities as P but lower, less consistent performance and a different (cheaper) price.
- S-series = Standard HDD: S4…S80, magnetic, lowest cost, highest and most variable latency.
The pattern to internalise: on classic types, bigger = faster. If a P10 doesn’t give you enough IOPS, you don’t tune IOPS — you move to P15/P20/P30. This is also why people over-provision capacity just to buy IOPS, and exactly the pain that Premium SSD v2 removes.
| Need | Classic-type answer | v2/Ultra answer |
|---|---|---|
| More capacity | Pick a larger P/E/S tier | Increase provisioned GiB |
| More IOPS | Pick a larger tier (or set a higher performance tier on Premium v1) | Increase provisioned IOPS independently |
| More throughput | Pick a larger tier | Increase provisioned MB/s independently |
A subtle gotcha: a disk smaller than 4 GiB still rounds up to the next billing size (e.g. a P4/E4/S4 is billed at its tier capacity even if you only format part of it), so there’s rarely a reason to provision tiny disks.
Host caching: None / ReadOnly / ReadWrite
Host caching uses the VM host’s RAM and local SSD as a cache in front of a managed disk. It is one of the highest-impact and most dangerous settings, because the wrong choice silently trades durability for speed.
| Caching mode | What it caches | Effect | Use it for | Danger |
|---|---|---|---|---|
| None | Nothing | All reads/writes go straight to the disk | Write-heavy disks, log disks, any disk where every write must be durable immediately; required for Premium v2/Ultra (they don’t support caching) | None — safest |
| ReadOnly | Reads | Read hits served from host cache (fast, low-latency reads); writes go straight through to disk | Read-heavy data disks, database data files (read-mostly), boot performance | None for durability; cache can serve stale data only if another writer bypasses the host (rare) |
| ReadWrite | Reads and writes | Writes are acknowledged from the host cache (write-back) before hitting the disk | The OS disk (default), and only apps that manage their own write consistency / flushing | Data loss / corruption if the host fails before cached writes are flushed and the app assumed the write was durable |
The rules that matter in practice and in exams:
- OS disk → ReadWrite (the default). The OS handles its own flush semantics, and boot/read performance benefits.
- Database log files / write-heavy disks → None. A transaction log must be durable on every commit; write-back caching breaks that guarantee.
- Database data files (read-heavy) → ReadOnly. SQL Server and similar see a large read-latency win and write durability is preserved.
- Premium SSD v2 and Ultra → None only. They don’t support host caching at all; the portal will force None.
- Caching is per-disk and you can change it after creation (it triggers a brief detach/reattach for data disks, which on a live VM may need the disk offline momentarily — plan it).
The single sentence to memorise: ReadWrite caching is safe for the OS disk and dangerous for data you can’t afford to lose; logs get None; read-heavy data gets ReadOnly.
Encryption: every option and how they stack
Azure gives you several encryption mechanisms that operate at different layers. They are not mutually exclusive — some compose. Work through them as a stack.
| Mechanism | Where it runs | Key owner | Encrypts | When required / chosen |
|---|---|---|---|---|
| SSE with platform-managed keys (PMK) | Azure storage infrastructure | Microsoft | Data at rest on the disk (and snapshots) | Always on by default, free, transparent, no action needed |
| SSE with customer-managed keys (CMK) | Azure storage infrastructure | You (in Key Vault, via a Disk Encryption Set) | Data at rest | Compliance/regulatory key-control, key rotation, revocation |
| Encryption at host | The VM host (hypervisor) | Microsoft (PMK) or you (CMK) | OS disk, data disks, AND the temp disk + caches — end to end from the host | When you need the temp disk and host caches encrypted too; modern recommended default |
| Azure Disk Encryption (ADE) | Inside the guest OS (BitLocker on Windows, DM-Crypt on Linux) | You (Key Vault) | OS and data volumes from inside the guest | Legacy/compliance requiring in-guest, OS-level encryption keys |
| Double encryption at rest | Storage infra (two layers: platform + CMK) | Microsoft + you | Data at rest, twice with two different keys/algorithms | Highest at-rest assurance / specific compliance mandates |
| Confidential disk encryption | Confidential VM, key bound to a vTPM in the TEE | You/Microsoft, bound to the VM’s TEE | OS disk, with the key protected inside the confidential compute boundary | Confidential VMs (AMD SEV-SNP / Intel TDX) needing the OS disk key sealed to the TEE |
How to reason about them:
- SSE/PMK is the floor — every managed disk and snapshot is encrypted at rest with a Microsoft-managed key, always, at no cost. There is nothing to enable.
- SSE/CMK swaps the platform key for a key you hold in Azure Key Vault, referenced through a Disk Encryption Set (DES). You can rotate or revoke the key (revoking it makes the disk unreadable — your kill switch). Still transparent to the OS; still doesn’t cover the temp disk.
- Encryption at host is the modern recommendation when you want everything — OS, data, and the ephemeral temp disk and host caches — encrypted, with no performance hit and no in-guest agent. It must be enabled at the subscription feature level and then per-VM (
--encryption-at-host); the VM size must support it. - ADE is the older, in-guest approach (BitLocker/DM-Crypt). It encrypts inside the OS, needs a Key Vault, has VM-size and OS constraints, and is generally being superseded by encryption at host for new builds. You’ll still meet it in compliance estates and on the exam.
- Double encryption at rest layers a platform key and your CMK at the infrastructure level for defence-in-depth.
- Confidential disk encryption only applies to Confidential VMs and seals the OS-disk key to the hardware-based trusted execution environment.
These compose along their layers: you can run encryption at host with a CMK, and confidential disks build on the confidential-VM TEE. ADE and encryption-at-host are generally not combined. Exam reflex: PMK is automatic and free; CMK gives you key control via a Disk Encryption Set; encryption at host is the only one that also covers the temp disk; ADE is in-guest BitLocker/DM-Crypt.
Snapshots: full vs incremental
A snapshot is a read-only, point-in-time copy of a managed disk, stored as its own ARM resource. Two flavours:
| Full snapshot | Incremental snapshot | |
|---|---|---|
| What it stores | The entire disk every time | Only the changes since the previous snapshot of that disk |
| Cost | Billed for full disk size each time | Billed only for the delta (much cheaper for a snapshot chain) |
| Redundancy | LRS or ZRS | LRS or ZRS |
| Restore | Standalone | Each incremental is independently restorable (Azure stitches the chain) |
| Recommendation | Legacy | Use these — cheaper and the modern default |
Always prefer incremental snapshots (--incremental true). Despite the name, each incremental snapshot is independently restorable — Azure manages the chain, so deleting an old one doesn’t break newer ones. Snapshots are crash-consistent by default; for application-consistent backups (e.g. quiescing a database) use Azure Backup, which coordinates with the guest. Snapshots inherit encryption from the source and can themselves be PMK/CMK-encrypted. A common cost leak: orphaned full snapshots of large disks — audit and prefer incrementals.
Images vs snapshots
A managed image captures a generalized VM (one that has been run through sysprep on Windows or waagent -deprovision on Linux to strip machine-specific identity) so you can deploy many new VMs from it. A snapshot captures a single disk’s bytes as-is (specialized, not generalized) for backup or for cloning one disk. Rule of thumb: snapshot = back up or clone one disk; image = template for new VMs. For production image management at scale, use the Azure Compute Gallery (versioning, replication across regions, scaling) rather than standalone managed images — covered in the VMSS lesson.
Shared disks (clustering)
A shared disk is a managed disk attached to multiple VMs simultaneously (maxShares > 1), exposing shared block storage for clustered applications that bring their own cluster manager — Windows Server Failover Cluster with SCSI Persistent Reservations, SQL Server Failover Cluster Instances, Linux Pacemaker/SBD, scale-out file servers. Key facts and gotchas:
- Supported on Premium SSD v1, Premium SSD v2, and Ultra (not Standard HDD/SSD; Premium v1 data disks only, not OS disks).
maxSharesis set on the disk and capped by disk size; host caching must be None on a shared disk.- Azure does not arbitrate access — the application/cluster software must coordinate writes via SCSI PR, or you will corrupt the filesystem. A shared disk is raw shared block storage, not a clustered filesystem.
- This is for clustering, not general file sharing — for shared file access use Azure Files instead.
Performance tiers (Premium SSD v1)
For Premium SSD v1, a performance tier lets you temporarily provision the IOPS/throughput of a larger tier without changing the disk’s capacity. A P10 (128 GiB) can be set to run at P30 performance for a predictable busy period (a sale, a batch window), then dialled back — without resizing the disk or any downtime. You pay for the higher tier while it’s active. This is distinct from credit-based bursting (free, automatic, credit-limited) and on-demand bursting (paid, transaction-billed). On Premium SSD v2 and Ultra you don’t need performance tiers at all — you just change the provisioned IOPS/MB/s dials directly.
Resize without downtime (online expansion)
You can grow a managed disk’s capacity, and on supported configurations you can do it without deallocating the VM (“online resize”/live resize for the data disk path). The constraints:
- You can only ever grow a disk, never shrink it. To go smaller you must create a new smaller disk and copy data.
- Growing the managed disk only enlarges the device; you must then extend the filesystem/partition inside the guest (
resize2fs/xfs_growfson Linux, Disk Management orResize-Partitionon Windows) to actually use the new space. - For classic types, growing across a size-tier boundary also changes the provisioned performance and the price.
- Online (no-downtime) resize has VM-size and disk-type prerequisites; where unsupported, you stop/deallocate the VM, resize, then start.
Swap OS disk
You can replace a VM’s OS disk with a different managed disk (or one restored from a snapshot) while keeping the same VM resource — its name, NICs, IP, size, and data disks all stay. This is the standard recovery move when the OS disk is corrupted or you want to roll the VM back to a known-good snapshot: create a new disk from the snapshot, stop/deallocate the VM, point the VM at the new OS disk, start. The new and old OS disks must be the same OS type and (ideally) generation. We do this with one az vm update --os-disk command in the lab below.
Ephemeral OS disks
An ephemeral OS disk stores the OS disk on the VM host’s local storage (cache or temp disk) instead of in remote managed storage. The trade:
- Pros: the OS disk is free (no managed-disk charge), and read/write latency to it is very low (local). Reimage is fast. Ideal for stateless, identical, reimage-on-the-fly fleets — VMSS nodes, AKS node pools, stateless web tiers.
- Cons: the OS disk is not persisted — a deallocate, host failure, or reimage wipes it back to the image. You cannot snapshot or back it up, and you cannot use it where the OS must survive a stop. The image must fit in the host’s local cache/temp space, which constrains image and VM size.
Set it with --ephemeral-os-disk true (optionally --ephemeral-os-disk-placement CacheDisk|ResourceDisk). The mental model: ephemeral OS disk = cattle, not pets — fast and free, but the OS is disposable.
The disk landscape at a glance
The diagram below ties the pieces together: a VM with its single OS disk, multiple persistent data disks, and the non-persistent temp disk; how the size tier or provisioned dials set performance; where host caching sits between the VM host and the disk; and where the encryption layers and snapshots attach.
Use it as the map for the rest of this lesson — every term in the diagram has a section above explaining its choices, defaults, and trade-offs.
Creating and configuring disks: every setting
When you add a disk in the portal (VM creation Disks tab, or Disks → Create and attach a new disk), or via az/Bicep, these are the fields and the what/choices/default/when/trade-off treatment:
| Setting | Choices | Default | When / trade-off / gotcha |
|---|---|---|---|
| Disk SKU (type) | Standard HDD / Standard SSD / Premium SSD v1 / Premium SSD v2 / Ultra | Premium SSD (varies by VM size) | The master choice (see comparison table). v2/Ultra need VM support; v1 needed for single-VM SLA. |
| Size / capacity | Tier (P/E/S) or GiB (v2/Ultra) | — | Classic: size also sets performance. v2/Ultra: capacity is independent of performance. Round-up billing on tiny disks. |
| Provisioned IOPS | Number (v2/Ultra only) | 3,000 (v2 baseline) | Independent dial; billed. Capped by disk size and VM limits. |
| Provisioned throughput | MB/s (v2/Ultra only) | 125 MB/s (v2 baseline) | Independent dial; billed. |
| Host caching | None / ReadOnly / ReadWrite | OS=ReadWrite, data=ReadOnly | See caching section. v2/Ultra force None. |
| Encryption type | PMK (SSE) / CMK (SSE) / double / confidential | PMK | CMK needs a Disk Encryption Set + Key Vault. |
| Enable shared disk | Yes (maxShares) / No |
No | Premium v1/v2/Ultra only; caching must be None; cluster SW must coordinate. |
| Bursting | On-demand on/off (Premium v1 ≥P30) | Off (on-demand); credit-based auto-on | On-demand is paid; credit-based is free. |
| Network access | Public / private endpoint / deny | Public (with auth) | For disk export/import; lock down with private endpoints for sensitive estates. |
| Availability zone | None / 1 / 2 / 3 | Inherit VM | Zonal disks must match the VM’s zone. |
| LUN (data disk) | 0–63 | next free | The logical unit number the guest sees the disk on; keep stable for automation. |
az CLI — the core operations
# Variables
RG=rg-disks-lab
LOC=eastus
VM=vm-disklab
# Create a standalone Premium SSD v1 data disk (256 GiB)
az disk create -g $RG -n data-premium-256 \
--size-gb 256 --sku Premium_LRS --location $LOC
# Create a Premium SSD v2 disk with independent performance dials
az disk create -g $RG -n data-v2 \
--sku PremiumV2_LRS --size-gb 100 \
--disk-iops-read-write 5000 --disk-mbps-read-write 200 \
--location $LOC --zone 1
# Attach a new Premium data disk to a VM with ReadOnly caching, on LUN 0
az vm disk attach -g $RG --vm-name $VM \
--name data-premium-256 --caching ReadOnly --lun 0
# Create an Ultra-enabled VM (Ultra must be enabled at create, supported zone)
az vm create -g $RG -n vm-ultra --image Ubuntu2204 --zone 1 \
--size Standard_D4s_v5 --ultra-ssd-enabled true \
--generate-ssh-keys
# Change caching on an attached data disk (detach/reattach under the hood)
az vm update -g $RG -n $VM \
--set storageProfile.dataDisks[0].caching=None
# Resize (grow) a data disk to 512 GiB — never shrink
az disk update -g $RG -n data-premium-256 --size-gb 512
# Set a Premium v1 performance tier without growing capacity (P10 -> P30 perf)
az disk update -g $RG -n data-premium-256 --tier P30
# Enable encryption at host on a VM (feature must be registered first)
az feature register --namespace Microsoft.Compute --name EncryptionAtHost
az vm update -g $RG -n $VM --set securityProfile.encryptionAtHost=true
Bicep — a Premium SSD v2 data disk and a CMK disk
// Premium SSD v2 data disk with independent IOPS/throughput
resource dataV2 'Microsoft.Compute/disks@2024-03-02' = {
name: 'data-v2'
location: resourceGroup().location
zones: ['1']
sku: {
name: 'PremiumV2_LRS'
}
properties: {
creationData: { createOption: 'Empty' }
diskSizeGB: 100
diskIOPSReadWrite: 5000
diskMBpsReadWrite: 200
publicNetworkAccess: 'Disabled'
}
}
// A disk encrypted with a customer-managed key via a Disk Encryption Set
resource cmkDisk 'Microsoft.Compute/disks@2024-03-02' = {
name: 'data-cmk'
location: resourceGroup().location
sku: { name: 'Premium_LRS' }
properties: {
creationData: { createOption: 'Empty' }
diskSizeGB: 128
encryption: {
type: 'EncryptionAtRestWithCustomerKey'
diskEncryptionSetId: diskEncryptionSet.id
}
}
}
After creation: what you can (and can’t) change
| Operation | Possible after creation? | Notes |
|---|---|---|
| Grow capacity | Yes | Never shrink; extend the in-guest filesystem afterwards; online resize where supported. |
| Change disk SKU/type | Yes (most pairs) | e.g. Standard → Premium; usually requires the disk unattached or VM deallocated. Migrating to Premium v2/Ultra often means create-new + copy, not in-place. |
| Change host caching | Yes | Triggers detach/reattach; brief I/O interruption on a live data disk. |
| Change performance tier (Premium v1) | Yes | No capacity change, no downtime; billed at the higher tier while active. |
| Change provisioned IOPS/MB/s (v2/Ultra) | Yes | Ultra can change live without reboot; v2 supported with constraints. |
| Change encryption (PMK ↔ CMK) | Yes | Attach/detach or update via Disk Encryption Set. |
| Enable on-demand bursting (Premium v1 ≥P30) | Yes | Paid; can disable again (cooldown applies). |
Convert to/from shared (maxShares) |
Yes (supported types) | Disk must be detached from all VMs to change maxShares. |
| Detach / attach | Yes | Data disks hot-attach/detach on most VM sizes; OS disk requires VM stopped (swap). |
| Swap the OS disk | Yes | VM deallocated; same OS type/generation. |
| Move zones | No (not in place) | Zonal placement is fixed at create; to move, snapshot → create in new zone. |
The two hard "no"s to remember: you cannot shrink a disk, and you cannot move a disk to a different availability zone in place — both require create-new + copy/snapshot.
Hands-on lab
In this lab you attach a Premium data disk to a VM, set its caching, take an incremental snapshot, then clean everything up. Uses az CLI (Cloud Shell or local). The smallest disks/VM keep cost to a few rupees if you delete promptly.
0. Setup variables and a tiny VM.
RG=rg-disks-lab
LOC=eastus
VM=vm-disklab
az group create -n $RG -l $LOC
az vm create -g $RG -n $VM --image Ubuntu2204 \
--size Standard_B1s --generate-ssh-keys --no-wait
az vm wait -g $RG -n $VM --created
1. Create and attach a Premium SSD data disk with ReadOnly caching.
az disk create -g $RG -n lab-data --size-gb 32 --sku Premium_LRS -l $LOC
az vm disk attach -g $RG --vm-name $VM --name lab-data \
--caching ReadOnly --lun 0
Expected: the disk shows diskState: Attached and caching: ReadOnly.
2. Validate the attachment and caching.
az vm show -g $RG -n $VM \
--query "storageProfile.dataDisks[].{name:name, lun:lun, caching:caching, gb:diskSizeGb}" -o table
Expected output:
Name Lun Caching Gb
-------- ----- --------- ----
lab-data 0 ReadOnly 32
3. Change caching to None (e.g. this will become a log disk).
az vm update -g $RG -n $VM --set storageProfile.dataDisks[0].caching=None
az vm show -g $RG -n $VM \
--query "storageProfile.dataDisks[0].caching" -o tsv
Expected: None.
4. Take an incremental snapshot of the data disk.
DISK_ID=$(az disk show -g $RG -n lab-data --query id -o tsv)
az snapshot create -g $RG -n lab-data-snap \
--source "$DISK_ID" --incremental true -l $LOC
az snapshot show -g $RG -n lab-data-snap \
--query "{name:name, incremental:incremental, state:provisioningState}" -o table
Expected: incremental: true, provisioningState: Succeeded.
5. (Optional) Prove restore works — create a new disk from the snapshot.
SNAP_ID=$(az snapshot show -g $RG -n lab-data-snap --query id -o tsv)
az disk create -g $RG -n lab-data-restored \
--source "$SNAP_ID" --sku Premium_LRS -l $LOC
6. Cleanup — delete everything.
az group delete -n $RG --yes --no-wait
Validation: az group exists -n $RG eventually returns false. If you only want to remove the disk artifacts: detach the disk (az vm disk detach -g $RG --vm-name $VM --name lab-data), then az disk delete/az snapshot delete.
Cost note (INR-aware): a 32 GiB Premium SSD (P4-class) is only a few hundred rupees per month if left running; for a lab measured in minutes the cost is a rupee or two. The two things that quietly cost money here are the incremental snapshot (cheap — only the delta — but real if you forget it) and the data disk staying provisioned after you stop the VM (a deallocated VM stops compute charges but disks keep billing). Deleting the resource group removes the VM, both disks, and the snapshot in one shot — always finish the lab with step 6.
Common mistakes & troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| VM not hitting the disk’s rated IOPS/throughput | VM size’s disk limit is below the disk’s, or VM doesn’t support Premium | Right-size the VM (Premium-capable, higher disk cap); check the VM-size disk limits, not just the disk. |
| Data corruption after a host failure on a data disk | ReadWrite (write-back) caching on a disk whose app assumed durable writes | Set caching to None (logs) or ReadOnly (read-heavy data); reserve ReadWrite for the OS disk. |
| “Disk size still old” after resize | Grew the managed disk but didn’t extend the in-guest filesystem | resize2fs/xfs_growfs (Linux) or Disk Management/Resize-Partition (Windows). |
| Can’t shrink a disk | Disks can only grow | Create a smaller disk and copy data; you cannot shrink in place. |
| Ultra/Premium v2 option greyed out | VM size/zone doesn’t support it, or Ultra not enabled at create | Use a supported size in a supported zone; set --ultra-ssd-enabled at VM create. |
| Surprise bill after deallocating VMs | Disks (and snapshots) bill even when the VM is stopped/deallocated | Delete unused disks/snapshots; deallocate stops compute, not storage. |
| Wrong device mounted after reboot/resize (Linux) | Mounted data disk by /dev/sdX, which can shift |
Mount by UUID in /etc/fstab. |
| Shared disk filesystem corrupted | Two VMs wrote without cluster coordination | Shared disks need SCSI PR / cluster manager; caching must be None; or use Azure Files for shared files. |
Best practices
- Default to Premium SSD v2 for new production data disks — best price/performance, sub-ms latency, free baseline, independent IOPS/MB/s. Use Premium SSD v1 when you need the single-VM SLA, an OS disk, host caching, or a region where v2 isn’t available; reserve Ultra for the extreme tail.
- Separate OS and data. Keep application/database data on dedicated data disks (not the OS disk), so you can resize, swap, snapshot, and back up cleanly.
- Cache by role: OS disk ReadWrite, read-heavy data ReadOnly, logs/write-heavy None.
- Never store durable data on the temp disk — it’s wiped on deallocate/resize/maintenance. Use it only for page files,
tempdb, and scratch. - Use incremental snapshots and a retention policy; for application-consistent backups use Azure Backup, not raw snapshots.
- Right-size the VM to the disk — the VM-size disk cap is the real ceiling.
- Mount Linux data disks by UUID; stripe multiple disks (RAID0/LVM) when one disk can’t supply the IOPS you need.
- Tag disks with owner/app/environment and use Azure Policy to forbid unmanaged disks and enforce encryption/SKU standards.
Security notes
- Encryption at rest is always on (PMK) and free — but for regulated workloads move to CMK via a Disk Encryption Set so you control rotation and can revoke (your kill switch).
- Use encryption at host when you need the temp disk and host caches encrypted too — PMK/CMK doesn’t cover the temp disk; ADE or encryption-at-host does. Encryption at host has no agent and no measurable performance hit.
- Lock down disk network access — set
publicNetworkAccess: Disabledand use private endpoints for export/import in sensitive estates; SAS-based disk exports are a common exfiltration path if left open. - Snapshots inherit but also expose — a snapshot of a sensitive disk is a full copy; protect it with the same RBAC/encryption and don’t generate long-lived SAS download URLs.
- RBAC the disk resource — managed disks honour Azure RBAC and resource locks; restrict
Microsoft.Compute/disks/writeand snapshot/export rights, and putCanNotDeletelocks on disks holding production data. - For Confidential VMs, use confidential disk encryption so the OS-disk key is sealed to the hardware TEE.
Cost & sizing
The levers that move the disk bill, in order of impact:
- Disk type. Standard HDD is cheapest, Ultra most expensive for high performance. Right-typing (e.g. Standard SSD for a web server instead of Premium) is the biggest single saving.
- Provisioned capacity (all types) and provisioned performance (v2/Ultra). You pay for what you provision, not what you use — an over-provisioned 8 TB disk used at 5% still bills for 8 TB. On v2/Ultra, IOPS and MB/s above the free baseline are additional line items.
- Snapshots — full snapshots of large disks are a classic silent cost; prefer incremental and prune old chains.
- Disks on stopped VMs — deallocating a VM stops compute charges but disks keep billing. Delete disks you don’t need.
- On-demand bursting and performance tiers are paid — great for a known busy window, wasteful if left on permanently.
- Premium SSD v2 is usually cheaper than v1 for equivalent performance — migrating mature workloads to v2 is often a net saving and a performance gain.
- Redundancy: ZRS disks cost more than LRS; only pay for ZRS where you need zone-resilient storage for a zonal workload.
Sizing heuristic: estimate peak IOPS and MB/s and steady-state capacity; on classic types pick the smallest tier that meets both performance and capacity; on v2/Ultra provision capacity for data and dial IOPS/MB/s to peak (plus headroom). Then confirm the VM size can actually deliver those numbers.
Interview & exam questions
1. What’s the difference between a managed disk and an unmanaged disk, and why did managed win? Unmanaged disks are page blobs you place in storage accounts you manage yourself, with per-account IOPS caps and manual placement; managed disks are first-class ARM resources where Azure manages the backing storage, enforces provisioned performance, spreads availability-set disks across fault domains automatically, and supports RBAC/locks/Policy/tags. Managed is the default; unmanaged is deprecated.
2. Walk me through the five disk types and when you’d pick each. Standard HDD (cheapest, dev/test/backup, magnetic), Standard SSD (light prod/web, consistent-ish), Premium SSD v1 (production/DB, low ms latency, required for the single-VM SLA), Premium SSD v2 (best price/performance, sub-ms, independent IOPS/MB/s — the new default for most production), Ultra (extreme: up to 400k IOPS/10 GB/s, live-tunable, SAP HANA/top-tier SQL).
3. A single VM with no availability set or zone — what disks does it need for an SLA, and what’s the SLA? All OS and data disks must be Premium SSD or Ultra to get the 99.9% single-instance VM SLA. On Standard disks a single VM has no SLA.
4. Explain host caching modes and when each is correct. None (no cache, all I/O straight through — logs/write-heavy, and forced on v2/Ultra), ReadOnly (cache reads, write-through — read-heavy DB data files), ReadWrite (write-back — the OS disk default; dangerous for data because the host can fail before cached writes flush). OS=ReadWrite, data-read-heavy=ReadOnly, logs=None.
5. Why can ReadWrite caching cause data loss? ReadWrite is write-back: the write is acknowledged from the host cache before reaching the durable disk. If the host fails before the cache flushes and the application believed the write was committed (e.g. a DB log), that data is lost — so logs must use None.
6. Difference between credit-based and on-demand bursting? Credit-based: free, automatic, accrues credits while idle and spends them to burst to a low ceiling (e.g. 3,500 IOPS) for a limited time — on Premium v1 (small) and Standard SSD. On-demand: paid, no duration limit, much higher ceiling, only on larger Premium v1 disks, billed per transaction above baseline.
7. How does disk size relate to performance, and how do Premium SSD v2/Ultra change that? On classic types (P/E/S), performance is fixed by the size tier — bigger disk = more IOPS/throughput. Premium SSD v2 and Ultra decouple capacity from performance: you provision GiB, IOPS, and MB/s independently and pay for each.
8. What is the temp disk and what’s the number-one rule about it?
A local SSD on the host (D://mnt), free and fast, used for page files/tempdb/scratch — not persisted: it’s wiped on deallocate, resize, or host maintenance. Rule: never store durable data on it.
9. Full vs incremental snapshot — which and why? Incremental: stores only the delta since the last snapshot (cheaper), each is independently restorable, Azure manages the chain. Prefer incremental; use full only for legacy. For app-consistent backups use Azure Backup, not raw snapshots.
10. Name the encryption options and which one also encrypts the temp disk. SSE/PMK (default, free), SSE/CMK (your key via a Disk Encryption Set), encryption at host (the one that also covers the temp disk and host caches), Azure Disk Encryption (in-guest BitLocker/DM-Crypt), double encryption at rest, and confidential disk encryption (key sealed to the VM’s TEE).
11. What’s an ephemeral OS disk and its trade-off? The OS disk lives on the host’s local storage — free and very low latency — but not persisted: a deallocate/host failure/reimage wipes it, and you can’t snapshot or back it up. Ideal for stateless, reimage-on-the-fly fleets (VMSS/AKS); wrong for stateful single VMs.
12. How do you roll a corrupted VM back to a known-good OS state without rebuilding the VM?
Create a new managed disk from a known-good OS snapshot, deallocate the VM, swap the OS disk (az vm update --os-disk), and start — the VM keeps its name, NICs, IP, size, and data disks.
Quick check
- Which disk type is required on all disks for a single VM (no AS/zone) to have an SLA, and what is that SLA?
- What host caching mode belongs on a database transaction log disk, and why?
- True/false: you can shrink a managed disk in place.
- Which encryption option also encrypts the temporary disk?
- You deallocate a VM to save money but the bill barely drops. Why?
Answers
- Premium SSD (or Ultra) on all OS and data disks → 99.9% single-instance SLA. Standard disks give no SLA.
- None — a transaction log must be durable on every commit, and ReadWrite/write-back caching could lose acknowledged writes if the host fails before flush.
- False — disks can only grow; to shrink you create a smaller disk and copy data.
- Encryption at host (SSE PMK/CMK don’t cover the temp disk; ADE/in-guest also can, but encryption at host is the agentless platform answer).
- Deallocating stops compute charges, but managed disks (and snapshots) keep billing regardless of VM power state — you must delete unused disks to stop their cost.
Exercise
Take a workload you actually run (or invent a small e-commerce app: a web tier, a SQL Server, and a nightly analytics job). For each VM, produce a one-page disk plan that states, per disk: (a) the role (OS/data/temp), (b) the disk type and why, © the size and — for v2/Ultra — the provisioned IOPS and MB/s, (d) the host caching mode and the justification, (e) the encryption choice, and (f) the snapshot/backup cadence. Then write the az disk create / az vm disk attach commands that would build it, and estimate the monthly INR cost of the disks alone (capacity + any provisioned performance + snapshots). Bonus: identify one disk where Premium SSD v2 would be cheaper and faster than v1.
Certification mapping
AZ-104 (Azure Administrator):
- Deploy and manage Azure compute resources — create/configure disks, attach/detach data disks, resize disks, change disk SKU, configure host caching, ephemeral OS disks, and encryption at host; create VMs with the correct disk choices.
- Implement and manage storage — disk snapshots (incremental), creating disks/VMs from snapshots and images, shared disks, and disk encryption with PMK/CMK.
AZ-305 (Azure Solutions Architect Expert):
- Design data storage solutions — select the right managed-disk type for performance/cost/SLA, design encryption (CMK via Disk Encryption Set, encryption at host, confidential), and plan capacity/IOPS/throughput.
- Design business continuity — snapshot and Azure Backup strategy for disks, OS-disk swap for recovery, and zone-resilient (ZRS) disks within a resilience design.
Glossary
- Managed disk — a block-storage device provisioned and managed by Azure as a first-class ARM resource.
- OS disk / data disk / temp disk — the boot volume / attached persistent volumes / non-persistent local host volume.
- IOPS / throughput / latency — operations per second / MB per second / time per operation.
- Provisioned performance — the IOPS/MB/s you reserve and pay for (set by size on classic types; independent dials on v2/Ultra).
- Bursting (credit-based / on-demand) — short spikes above baseline; credit-based is free and credit-limited, on-demand is paid and uncapped in duration.
- Host caching (None/ReadOnly/ReadWrite) — using host RAM/SSD to cache disk I/O; write-back (ReadWrite) trades durability for speed.
- SSE PMK / CMK — server-side encryption with a platform-managed or customer-managed key (CMK via a Disk Encryption Set).
- Encryption at host — host-level encryption that also covers the temp disk and caches, agentless.
- Azure Disk Encryption (ADE) — in-guest encryption (BitLocker / DM-Crypt).
- Disk Encryption Set (DES) — the resource that binds a Key Vault key to disks for CMK.
- Snapshot (full / incremental) — point-in-time read-only copy of a disk; incremental stores only the delta.
- Image — a generalized template for deploying many VMs (vs a snapshot, which clones one disk).
- Shared disk (
maxShares) — a disk attached to multiple VMs for clustering; the cluster software must coordinate writes. - Performance tier — temporarily running a Premium SSD v1 disk at a larger tier’s performance without changing capacity.
- Ephemeral OS disk — an OS disk stored on host-local storage: free, fast, non-persistent.
Next steps
- Continue the course with the Azure Virtual Networks deep dive: every setting from subnets to peering — disks store the VM’s bytes; the VNet connects it to the world.
- Revisit where disks are chosen during VM creation in the Azure Virtual Machines deep dive.
- See how disk durability concepts (snapshots, lifecycle, immutability, soft delete) apply to object storage in Azure Blob Storage: lifecycle, immutability & soft delete.