Azure Lesson 33 of 137

Azure Backup Hardening: Immutable Vaults, Multi-User Authorization, Soft Delete, and Cross-Region Restore

Every ransomware tabletop I have run ends at the same uncomfortable question: when the attacker has Backup Contributor on your subscription, what actually stops them from stopping backups, dropping retention to one day, and waiting you out? The honest answer for most tenants is “nothing.” Backups are the last line of defence, which makes the backup control plane the highest-value target in the blast radius. Modern attackers know this – they delete recovery points before they encrypt, because a clean restore turns a seven-figure extortion into a Tuesday-afternoon rebuild. Azure Backup – the platform service that schedules, stores and restores recovery points for Azure VMs, SQL/SAP in VMs, Blobs, Disks, PostgreSQL Flexible Server, Azure Files and AKS – ships four independent controls that together make the vault tamper-resistant even against an admin-level compromise: immutability locks the retention floor, multi-user authorization (MUA) puts destructive operations behind a second tenant’s approval, enhanced soft delete keeps deleted backups recoverable, and cross-region restore (CRR) gives you an out-of-region copy when the primary region is gone.

This is the deep dive on wiring all four correctly, in the right order, and proving they work. The emphasis throughout is sequencing and irreversibility: three of these controls have a one-way door (Locked immutability, AlwaysON soft delete, and the day-zero redundancy choice), and getting the order wrong leaves a gap an attacker walks through. We treat each control as a setting with a value matrix, a default, a “when to flip it”, a trade-off, and a gotcha – because “I enabled immutability” is not the same as “I locked it after a retention review”, and the difference is whether the lock is protection or a self-inflicted ten-year bill.

By the end you will stop trusting the configuration blade. You will know which vault type protects which workload, why GeoRedundant is a prerequisite you cannot retrofit after onboarding, exactly which operations a Resource Guard gates, how to undelete a maliciously deleted backup, and how to prove recoverability by booting from a secondary-region recovery point and timing it. Because this is a reference you will return to during an incident, every control, limit, error and recovery path is laid out as scannable tables – read the prose once, then keep the tables open when the pager goes off.

What problem this solves

The pain is concrete and it is always the same: backups are the one resource whose destruction is irreversible and whose owner – the backup admin – has exactly the standing rights an attacker wants. Standard RBAC does not save you here. Backup Contributor legitimately includes “stop protection and delete backup data”, “disable soft delete”, and “modify retention” – those are normal day-job operations. So a compromised CI service principal, a phished admin, or a malicious insider with that role can quietly demolish your recovery path before anyone notices the encryption, and standard role separation does nothing because the role is doing what it is designed to do.

What breaks without these controls: the recovery point that would have saved you is gone before you reach for it. The team discovers during the incident – the worst possible time – that “we have backups” meant “we had backups until the attacker, holding our own admin role, deleted them.” Soft delete was off or fixed at the old 14-day basic tier and was disabled in the same script. Immutability was never enabled, or was enabled-but-never-locked so the attacker disabled it first. The vault was LocallyRedundant so when the region had a real outage there was no second copy to restore from. Each of these is a five-minute configuration that nobody sequenced.

Who hits this: every platform team that centralises backup, every regulated estate (finance, health, public sector) that must demonstrate WORM (write-once-read-many) retention to an auditor, and every organisation that has done – or fears – a ransomware tabletop and asked the uncomfortable question above. It bites hardest where backup rights are inherited broadly (CI principals with Contributor at subscription scope), where the same team owns both the workload and the guard (separation of duties that is theatre), and where “protected” was assumed to mean “recoverable” without a single restore drill.

To frame the whole field before the deep dive, here is each control, the attack it defeats, the one-way-door risk, and where it is configured:

Control Attack it blocks Reversible? Configured on Day-zero or anytime
Redundancy = GeoRedundant Region-loss with no second copy Only while 0 protected items Vault backup-properties Day-zero (locks after first item)
Cross-region restore (CRR) “Region is down” becomes an outage Flag toggles, needs GRS Vault backup-properties Day-zero (needs GRS first)
Immutability (Unlocked) Delete-before-retention, retention cut Yes (admin can disable) Vault securitySettings Anytime (soak here)
Immutability (Locked) Same, but attacker-proof No – irreversible Vault securitySettings After soak + retention review
Enhanced soft delete (AlwaysON) Deleting backups to destroy recovery No – can’t disable Vault backup-properties Anytime (extend only)
MUA via Resource Guard Disabling any of the above Yes (unmap the guard) Cross-tenant guard mapping Anytime (put guard cross-tenant)

Learning objectives

By the end of this article you can:

Prerequisites & where this fits

You should already understand Azure Backup basics: a vault is the resource that holds backup policies and recovery points; an App Service plan-style “rent the capacity” model does not apply – you pay for protected-instance count and storage consumed. You should know how to run az in Cloud Shell, read JSON output, and that RBAC roles like Backup Contributor / Backup Operator / Backup Reader scope to a vault or its parent. Familiarity with RTO (recovery time objective) and RPO (recovery point objective), geo-paired regions, and the difference between LRS/ZRS/GRS storage redundancy helps a great deal.

This sits in the Backup, DR & Resilience track and is the security-hardening capstone for it. It assumes the storage-redundancy fundamentals from the Azure Storage Accounts Deep Dive and the data-protection model in Azure Blob Storage: lifecycle, immutability & soft delete. It builds directly on Azure Backup & Site Recovery Deep Dive for the protection mechanics, and the cross-region story pairs with Azure Site Recovery: zone-to-zone & region failover runbooks and the RTO/RPO framing in HA vs DR. The MUA pattern leans on Azure PIM for resources & groups and break-glass emergency access. For the broader pattern across clouds, see Ransomware resilience: immutable backup & isolated recovery environment.

A quick map of who owns and confirms each control during a hardening project, so you assign the work correctly:

Layer What lives here Who usually owns it What it defends
Vault redundancy / CRR Storage replication, paired-region copy Platform / backup squad Region loss, out-of-region restore
Immutability WORM retention floor Backup squad + compliance Delete-before-retention, retention cut
Soft delete Deleted-item recovery window Backup squad Accidental/malicious deletion
Resource Guard (MUA) Approval gate for destructive ops Security team (separate tenant) Insider / compromised-admin attack
PIM on the guard Just-in-time Backup Operator Identity / security team Standing-access elimination
Diagnostics & alerts Job logs, destructive-op alerts Observability / SOC Detection of attempted strips

Core concepts

Five mental models make every later decision obvious.

The backup control plane is the attack surface, not the data plane. Attackers do not brute-force your encrypted recovery points; they use your own RBAC to delete them through the management API. Every control here defends the control plane: it makes a destructive management operation either impossible (immutability, soft-delete AlwaysON) or subject to out-of-band approval (MUA). The data is incidental; the operation is what you gate.

Three of these doors only open once. The day-zero redundancy choice (changeable only at zero protected items), Locked immutability, and AlwaysON soft delete are all one-way. This is deliberate – a control an admin can switch off is a control an attacker-as-admin can switch off. The cost of the one-way door is that you must get the value right before you walk through it: lock a 10-year retention by mistake and you pay for 10 years; that is the trade for tamper-resistance.

Immutability gates the destructive direction only. Vault immutability blocks operations that reduce protection of existing recovery points – deleting data before retention expires, shortening a policy’s retention, disabling soft delete. It never blocks creating new backups or extending retention. So immutability is not “freeze the vault”; it is “you can add and lengthen, you can never shorten or delete early.”

MUA is separation of duties, not a checkbox. A Resource Guard is a separate resource (Microsoft.DataProtection/resourceGuards) that you place where the backup admin has no permissions – ideally a different tenant owned by the security team. After you map a vault to it, the gated destructive operations require a just-in-time Backup Operator role on the guard, granted by the other team via PIM. If you put the guard in the same subscription the backup admin owns, the admin (or the attacker who became them) can self-approve, and you have built a speed bump, not a control.

“Protected” is not “recoverable” until you have restored. A green configuration blade and an untested restore is the oldest trap in DR. Cross-region restore in particular fails for boring reasons – the staging storage account or target resource group does not exist in the secondary region, the redundancy was never GeoRedundant, the CRR flag was never set. You only know you can recover after you have booted a VM from a secondary-region recovery point and recorded the RTO.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary repeats these for lookup; this table is the mental model side by side:

Concept One-line definition Where it lives Why it matters
Recovery Services vault Vault for VM / SQL-in-VM / SAP HANA / Files Microsoft.RecoveryServices/vaults The classic IaaS protection plane
Backup vault Vault for Blob / Disk / PostgreSQL / AKS Microsoft.DataProtection/backupVaults The newer managed-data-store plane
Immutability Blocks retention-reducing operations Vault securitySettings WORM floor; Locked is irreversible
Soft delete Keeps deleted backups restorable 14–180d Vault backup-properties Recover from deletion; AlwaysON = can’t disable
Resource Guard Approval gate for destructive ops Microsoft.DataProtection/resourceGuards MUA / separation of duties
MUA Multi-user authorization Vault ↔ guard mapping Second-team approval for strips
Cross-region restore (CRR) On-demand restore in paired region Vault flag (needs GRS) Region-loss recovery without failover
GeoRedundant (GRS) 6 copies, paired-region async Vault redundancy Prerequisite for CRR
Instant restore Local snapshot tier (1–5 days) Backup policy Fast same-region restore, snapshot cost
Recovery point (RP) One restorable backup at a point in time In the vault The thing an attacker deletes
GFS Grandfather-father-son retention ladder Backup policy Daily/weekly/monthly/yearly retention
Backup Contributor RBAC role with destructive rights Vault / parent scope The role the attacker wants

The hard limits and quotas worth committing to memory – the ones that shape design decisions:

Limit Value Why it matters
Soft-delete retention 14–180 days Floor 14 (can’t go lower), ceiling 180
Instant-restore snapshot retention 1–5 days Snapshot cost lever; default 2
VMs protectable per vault ~2,000 Shard large estates across vaults
Backup items per vault (all types) ~5,000 Plan vault topology for big fleets
Daily scheduled backups (enhanced) up to 6/day 4-hour minimum interval
Yearly retention max 99 years Effectively permanent once locked
Redundancy change Only at 0 protected items Day-zero decision, frozen after
Resource Guard protected ops ~7 default Some excludable, MUA-disable is not
Geo-replication (GRS) lag up to several hours Secondary RPs are not instant

1. Recovery Services vault vs Backup vault, and what each protects

Azure has two vault resource types and they are not interchangeable. Picking the wrong one means re-onboarding workloads later, so get this right on day one. The split is historical: the Recovery Services vault is the original IaaS/in-guest protection plane; the Backup vault is the newer plane for managed data stores that arrived with the Data Protection API.

Capability Recovery Services vault Backup vault
Resource type Microsoft.RecoveryServices/vaults Microsoft.DataProtection/backupVaults
Azure VMs Yes (snapshot + vault) No
SQL in Azure VM Yes No
SAP HANA in Azure VM Yes No
Azure Files Yes (snapshot) Yes (vaulted)
Azure Blobs No Yes (operational + vaulted)
Azure Managed Disks No Yes
Azure Database for PostgreSQL Flexible Server No Yes
AKS (cluster state + PV) No Yes
Immutability Yes Yes
MUA via Resource Guard Yes Yes
Enhanced soft delete Yes Yes
Cross-region restore Yes (VM/SQL/HANA) Yes (selected workloads)

The rule of thumb: Recovery Services vault for the classic IaaS and in-guest workloads (VMs, SQL-in-VM, SAP HANA-in-VM, snapshot-based Azure Files), Backup vault for the newer managed-data-store estate (Blobs, Disks, PostgreSQL Flexible Server, AKS, vaulted Azure Files). Many platform teams run both, and that is expected – they are governed the same way for immutability and MUA, which is the whole point of this article. Map your estate before you create anything:

Workload Vault to use Backup type CRR available
Azure VM (Windows/Linux) Recovery Services Snapshot + vaulted Yes
SQL Server in Azure VM Recovery Services Log + full/diff Yes
SAP HANA in Azure VM Recovery Services HANA backint Yes
Azure Files (snapshot) Recovery Services Share snapshot No (snapshot is in-region)
Azure Files (vaulted) Backup vault Vaulted Limited
Azure Blob (operational) Backup vault Operational (no data copy) No
Azure Blob (vaulted) Backup vault Vaulted copy Selected
Azure Managed Disk Backup vault Incremental snapshot No
PostgreSQL Flexible Server Backup vault Vaulted Selected
AKS Backup vault Cluster + PV No
On-prem servers (MARS agent) Recovery Services File/folder + system state No
On-prem VMs (MABS/DPM) Recovery Services Disk-to-disk-to-vault No

Now the day-zero properties. Create a Recovery Services vault and immediately set redundancy and CRR – storage redundancy is only changeable while the vault has zero protected items, so this is the first decision, not a later tuning step:

az backup vault create \
  --resource-group rg-backup-prod \
  --name rsv-prod-weu \
  --location westeurope

# GeoRedundant + CrossRegionRestore enabled is the prerequisite for CRR.
# This MUST happen before you onboard the first item.
az backup vault backup-properties set \
  --resource-group rg-backup-prod \
  --name rsv-prod-weu \
  --backup-storage-redundancy GeoRedundant \
  --cross-region-restore-flag true
resource vault 'Microsoft.RecoveryServices/vaults@2024-04-01' = {
  name: 'rsv-prod-weu'
  location: 'westeurope'
  sku: { name: 'RS0', tier: 'Standard' }
  identity: { type: 'SystemAssigned' } // for cross-tenant guard + CMK later
  properties: {}
}

// Redundancy + CRR are set on the backup config sub-resource.
resource vaultConfig 'Microsoft.RecoveryServices/vaults/backupstorageconfig@2023-04-01' = {
  parent: vault
  name: 'vaultstorageconfig'
  properties: {
    storageModelType: 'GeoRedundant'
    crossRegionRestoreFlag: true
  }
}

The redundancy options, what they cost you, and what they protect against:

Redundancy Copies Protects against CRR support Relative cost
LocallyRedundant (LRS) 3 (one datacentre) Disk/rack/node failure No Lowest
ZoneRedundant (ZRS) 3 (across AZs) Single-AZ loss in-region No (no 2nd region) Medium
GeoRedundant (GRS) 6 (3 local + 3 paired) Full region loss Yes Highest
Geo-Zone-Redundant (GZRS) 6 (3 AZ-spread + 3 paired) AZ loss and region loss Storage-account only (not vault default) Highest+

Cross-region restore requires GeoRedundant storage. It does not work with LocallyRedundant or ZoneRedundant. If you need both zone resilience and CRR, that is not a single setting – ZRS protects you within the region, GRS+CRR uses the geo-paired region. Decide which failure mode dominates your risk model before you onboard anything, because after the first protected item the redundancy is frozen.

2. Immutable vaults: unlocked vs locked, and the operational trade-off

Vault immutability prevents operations that would reduce the protection of existing recovery points: deleting backup data before its retention expires, shortening retention in a policy, or disabling soft delete. It does not block creating new backups or extending retention – only the destructive direction is gated. This is the single most-misunderstood control: people enable it, feel safe, and never lock it – which means an admin (or attacker) can simply disable it and then delete.

There are two states, and the difference is whether you can ever go back:

State Protection active? Can an admin disable it? Attacker-proof? Use it as
Disabled No n/a No Pre-hardening default
Unlocked Yes Yes No The soak / test period
Locked Yes No – irreversible Yes Final production state

Enable it unlocked first via the vault’s securitySettings. With the CLI you patch the vault property:

# Step 1: enable immutability in the "Unlocked" state for a soak period.
az resource update \
  --resource-group rg-backup-prod \
  --name rsv-prod-weu \
  --resource-type Microsoft.RecoveryServices/vaults \
  --set properties.securitySettings.immutabilitySettings.state=Unlocked

Run unlocked for a release cycle or two. Confirm no automation breaks – the usual offenders are decommissioning pipelines that delete backups early, or policy-as-code that lowers retention. Once you are confident, lock it. In Bicep the locked state is explicit and intentional:

resource vault 'Microsoft.RecoveryServices/vaults@2024-04-01' = {
  name: 'rsv-prod-weu'
  location: 'westeurope'
  sku: { name: 'RS0', tier: 'Standard' }
  properties: {
    securitySettings: {
      immutabilitySettings: {
        // 'Locked' is irreversible. Deploy this only after soaking on 'Unlocked'.
        state: 'Locked'
      }
    }
  }
}

Exactly which operations immutability blocks once active – this is the contract, memorise it:

Operation Blocked by immutability? Why
Create a new backup / recovery point No Adds protection
Extend retention in a policy No Lengthens protection
Stop protection, retain data No Data is kept
Delete a recovery point before retention expires Yes Reduces protection
Shorten retention duration in a policy Yes Reduces protection
Stop protection with delete data Yes Destroys protection
Disable soft delete Yes Removes the safety net
Reduce soft-delete retention Yes Shrinks the recovery window
Modify a policy to lower retention Yes Reduces protection of existing RPs
Change vault redundancy n/a Separately frozen after first item

The operational trade-off is real: once locked, you cannot shorten retention even for a legitimate cost-cutting exercise. If you set a 10-year policy by mistake and lock the vault, you pay for 10 years. Treat the lock like a production change-freeze decision – review every active policy’s retention before you flip it. The decision of when to move between states:

If you are… Immutability state Because
Standing up a brand-new vault Disabled → Unlocked same day Start soaking immediately
Mid-soak, automation still being audited Unlocked You may need to disable if a pipeline breaks
Soaked clean, retention reviewed, compliance signed off Locked Now attacker-proof; one-way door accepted
Unsure whether a policy is over-long Do not lock yet Trim retention first; locking freezes it

3. Multi-user authorization with Resource Guard across tenants

Immutability stops you reducing protection on existing data. MUA stops the other class of attack: disabling soft delete, deleting the protection entirely, or removing immutability while it is still unlocked. It does this by requiring that destructive vault operations be authorized through a Resource Guard – a separate Microsoft.DataProtection/resourceGuards resource that you deliberately place where the backup admin has no permissions.

The architecture that actually resists insider compromise puts the Resource Guard in a different tenant (or at minimum a different subscription governed by a different team):

Tenant A (workload)                         Tenant B (security)
+-----------------------+                   +------------------------+
| Recovery Services     |   protected by    | Resource Guard         |
| vault                 |------------------>| (no Backup Operator     |
|                       |                   |  for Tenant A admins)   |
| Backup admin: full    |                   | Security admin: owns    |
| rights EXCEPT the     |                   | the guard, approves     |
| guard-protected ops   |                   | critical operations     |
+-----------------------+                   +------------------------+

Create the guard in the security tenant/subscription:

az dataprotection resource-guard create \
  --resource-group rg-security-guards \
  --name rg-prod-resourceguard \
  --location westeurope
resource guard 'Microsoft.DataProtection/resourceGuards@2024-04-01' = {
  name: 'rg-prod-resourceguard'
  location: 'westeurope'
  properties: {} // protects the default critical-operation set
}

By default the guard protects a set of critical operations. Knowing exactly which ones are gated – and which you can optionally exclude – is the difference between real MUA and a guard that protects nothing useful:

Gated operation Default protected? Excludable? Attack it stops
Disable MUA (remove the guard) Yes No Attacker turning off the gate itself
Disable soft delete Yes Yes Pre-deletion safety-net removal
Reduce soft-delete retention Yes Yes Shrinking the recovery window
Disable immutability (while Unlocked) Yes Yes Removing the WORM floor
Stop protection with delete data Yes Yes Destroying recovery points
Modify / delete a backup policy Yes Yes Retention tampering
Change passphrase (MARS agent) Yes Yes Encryption-key theft for on-prem
Remove the Resource Guard mapping Yes No Detaching the gate from the vault
Unregister a protected container Yes Yes Orphaning recovery points

Inspect and tune which operations are gated:

az dataprotection resource-guard list-protected-operations \
  --resource-group rg-security-guards \
  --name rg-prod-resourceguard \
  --resource-type Microsoft.RecoveryServices/vaults

Now associate the vault with the guard. The backup admin in Tenant A needs Reader on the guard (cross-tenant) to create the association, and after this is in place they can no longer perform the protected operations without a just-in-time approval from Tenant B:

# Run as the backup admin, authenticated to BOTH tenants.
az backup vault resource-guard-mapping update \
  --resource-group rg-backup-prod \
  --vault-name rsv-prod-weu \
  --resource-guard-id "/subscriptions/<security-sub>/resourceGroups/rg-security-guards/providers/Microsoft.DataProtection/resourceGuards/rg-prod-resourceguard"

The operating model after association: when the backup team genuinely needs to perform a protected operation (say, retire a workload), the security team grants the backup operator’s identity a time-bound Backup Operator role on the Resource Guard via Azure AD PIM, the operation is performed within the activation window, and the role expires. An attacker who has only compromised Tenant A cannot self-approve – they lack any standing access to the guard.

The roles involved, where they are assigned, and what each can do – get the scope wrong and you either break MUA or lock yourself out:

Role Assigned on Held by Purpose
Backup Contributor Vault (Tenant A) Backup squad (standing) Day-job: configure, protect, restore
Reader Resource Guard (Tenant B) Backup admin (standing) Create the vault↔guard mapping
Backup MUA Operator / Backup Operator Resource Guard (Tenant B) Backup admin (JIT via PIM only) Approve a single destructive op in-window
Owner / User Access Admin Resource Guard (Tenant B) Security team only Grant the JIT role; never Tenant A

That separation of duties is the entire value of MUA. The placement decision is the whole control – if you co-locate the guard, you get nothing:

Guard placement Separation strength An attacker-as-backup-admin can… Verdict
Same subscription as vault None Self-grant Backup Operator on the guard Theatre – do not do this
Different subscription, same tenant, same team Weak Escalate via tenant-level role Better than nothing
Different subscription, same tenant, different team Good Nothing without the other team Acceptable minimum
Different tenant, security team Strong Nothing – no cross-tenant standing access Target architecture

4. Enhanced soft delete and recovering from deletion

Soft delete keeps backup data retrievable after someone deletes a backup item or stops protection with “delete data.” Enhanced soft delete (the current model for Recovery Services vaults) makes the feature always-on and configurable: you set a retention between 14 and 180 days, and you can optionally make soft delete itself immutable (non-disablable). Basic soft delete was a fixed 14 days and could be turned off – enhanced is the one you want.

The two soft-delete generations side by side:

Property Basic soft delete Enhanced soft delete
Retention Fixed 14 days Configurable 14–180 days
Can be disabled Yes Optional – AlwaysON makes it permanent
Cost during retention Free Free for 14 days, then charged
Applies to Recovery Services vault Recovery Services + Backup vault
Recommended No Yes

The three soft-delete feature states and what each means operationally:

State Soft delete active? Disablable? When to use
Disable No n/a Never in production
Enable Yes Yes (an admin can turn it off) Soak period only
AlwaysON Yes No – irreversible Production target

Configure it:

# Configure enhanced soft delete to 30 days. AlwaysON makes it non-disablable.
az backup vault backup-properties set \
  --resource-group rg-backup-prod \
  --name rsv-prod-weu \
  --soft-delete-feature-state AlwaysON \
  --soft-delete-retention-period-in-days 30
resource vaultProps 'Microsoft.RecoveryServices/vaults/backupconfig@2023-04-01' = {
  parent: vault
  name: 'vaultconfig'
  properties: {
    enhancedSecurityState: 'Enabled'
    softDeleteFeatureState: 'AlwaysON'   // irreversible
    softDeleteRetentionPeriodInDays: 30  // 14-180
  }
}

AlwaysON is irreversible in the same spirit as locked immutability – you can extend the retention but never disable the feature. Combined with immutability and MUA, you now have three controls that an admin-level attacker cannot individually defeat: they cannot delete inside retention (immutability), cannot turn off soft delete (AlwaysON), and cannot disable any of it without the guard (MUA).

When a backup is deleted – maliciously or by a fat-fingered decommission script – the item moves to a soft-deleted state. Recovery is undelete-then-resume:

# List soft-deleted items.
az backup item list \
  --resource-group rg-backup-prod \
  --vault-name rsv-prod-weu \
  --backup-management-type AzureIaasVM \
  --query "[?properties.isScheduledForDeferredDelete].name" -o tsv

# Undelete and re-enable protection for a specific VM.
az backup protection undelete \
  --resource-group rg-backup-prod \
  --vault-name rsv-prod-weu \
  --container-name <container> \
  --item-name <vm-name> \
  --backup-management-type AzureIaasVM \
  --workload-type VM

The deletion-state lifecycle, so you know what is recoverable and for how long:

Item state What happened Recoverable? Window Action to recover
Protected Normal, active backups n/a n/a none
Stop protection, retain data Backups paused, RPs kept Yes Until retention expires Resume protection
Soft-deleted Deleted but within soft-delete window Yes 14–180 days undelete + resume
Permanently deleted Soft-delete window expired or skipped No gone Restore from CRR copy if any

For Backup vaults (Blobs, Disks, PostgreSQL), the equivalent is configured through the vault’s softDeleteSettings with the same 14–180 day window, set via az dataprotection backup-vault update or the portal:

az dataprotection backup-vault update \
  --resource-group rg-backup-prod \
  --vault-name bv-prod-weu \
  --soft-delete-state AlwaysOn \
  --soft-delete-retention-in-days 30

5. Backup policies, retention, and instant-restore snapshots

Policy is where retention lives, and retention is what immutability and MUA enforce. Build the policy deliberately. For Azure VMs, the instant restore tier keeps local snapshots (1–5 days) for fast restores that never touch vault storage, while GRS-replicated recovery points serve long-term and cross-region needs.

A defensible IaaS policy template – daily plus a weekly/monthly/yearly grandfather-father-son ladder:

{
  "schedulePolicy": {
    "schedulePolicyType": "SimpleSchedulePolicy",
    "scheduleRunFrequency": "Daily",
    "scheduleRunTimes": ["2026-06-08T01:00:00Z"]
  },
  "retentionPolicy": {
    "retentionPolicyType": "LongTermRetentionPolicy",
    "dailySchedule":  { "retentionDuration": { "count": 30,  "durationType": "Days"   } },
    "weeklySchedule": { "daysOfTheWeek": ["Sunday"], "retentionDuration": { "count": 12, "durationType": "Weeks" } },
    "monthlySchedule": { "retentionScheduleFormatType": "Weekly", "retentionScheduleWeekly": { "daysOfTheWeek": ["Sunday"], "weeksOfTheMonth": ["First"] }, "retentionDuration": { "count": 36, "durationType": "Months" } },
    "yearlySchedule": { "retentionScheduleFormatType": "Weekly", "monthsOfYear": ["January"], "retentionScheduleWeekly": { "daysOfTheWeek": ["Sunday"], "weeksOfTheMonth": ["First"] }, "retentionDuration": { "count": 7, "durationType": "Years" } }
  },
  "instantRpRetentionRangeInDays": 5,
  "timeZone": "UTC"
}
az backup policy set \
  --resource-group rg-backup-prod \
  --vault-name rsv-prod-weu \
  --name policy-iaas-gfs \
  --policy @iaas-policy.json

The GFS ladder explained – each tier, its purpose, the typical count, and the cost driver:

Tier Frequency Typical retention Purpose Cost driver
Instant restore per backup 1–5 days (snapshot) Fast same-region restore Snapshot storage in source sub
Daily daily 7–30 days Operational recovery Vault storage, churn
Weekly 1/week 4–12 weeks Rollback past a bad week Vault storage
Monthly 1/month 12–36 months Monthly compliance points Vault storage
Yearly 1/year 1–10 years Long-term / audit WORM Vault storage (locked = permanent)

The retention limits and policy knobs that catch people:

Setting Range / default When to change Trade-off / gotcha
instantRpRetentionRangeInDays 1–5, default 2 Lower for large chatty VMs (cost) Snapshot cost in source sub; short = slower same-region restore
Daily retention up to 9999 days Match operational RPO Storage grows with churn × retention
Weekly/monthly/yearly up to 99 years (yearly) Compliance mandate Locked immutability freezes this
Backup frequency up to several/day (enhanced) Tighter RPO More RPs = more storage + snapshot cost
Time zone any Match maintenance window Wrong TZ = backup during peak
Daily backups per policy (enhanced) up to 6/day (4-hour min interval) Tighter RPO on critical DBs Snapshot + storage cost scales
Log backup frequency (SQL) 15 min–24 h Sub-15-min RPO for transactions Storage churn; log chain integrity

Two retention facts that catch people:

6. Cross-region restore and zone-redundant storage

CRR lets you restore a VM, SQL-in-VM, or SAP HANA backup into the Azure-paired region without waiting for a regional failover or a Microsoft-declared outage – you choose to restore in the secondary on demand. It is the control that turns “the region is down” from an outage into a runbook. The prerequisites, in order:

# Prerequisite Set where When If missing
1 Redundancy = GeoRedundant Vault backup-properties Day-zero, 0 items No 2nd copy; can’t enable CRR
2 crossRegionRestore flag = true Vault backup-properties Day-zero (needs GRS) Secondary RPs not exposed
3 Workload type supports CRR n/a (VM/SQL/HANA only) by design Other types: no CRR
4 Staging storage account in secondary Pre-provisioned Before incident Restore fails mid-incident
5 Target resource group in secondary Pre-provisioned Before incident Nowhere to land disks

CRR and zone-redundant storage solve different problems and you cannot have both on one vault. The comparison that drives the day-zero choice:

Dimension ZoneRedundant (ZRS) GeoRedundant (GRS) + CRR
Protects against Single-AZ loss in primary region Full primary-region loss
Second-region copy None Yes (paired region)
CRR (on-demand secondary restore) No Yes
Restore latency In-region, fast Cross-region, slower
Best when dominant risk is Zone failure, low-latency in-region HA Region outage, ransomware, compliance

For a production vault whose dominant risk is regional or ransomware, choose GeoRedundant + CRR. List the secondary-region recovery points and restore:

# Enumerate recovery points available in the SECONDARY (paired) region.
az backup recoverypoint list \
  --resource-group rg-backup-prod \
  --vault-name rsv-prod-weu \
  --container-name <container> \
  --item-name <vm-name> \
  --backup-management-type AzureIaasVM \
  --workload-type VM \
  --use-secondary-region \
  --query "[].{name:name, time:properties.recoveryPointTime}" -o table

# Restore disks into the secondary region from a secondary recovery point.
az backup restore restore-disks \
  --resource-group rg-backup-prod \
  --vault-name rsv-prod-weu \
  --container-name <container> \
  --item-name <vm-name> \
  --rp-name <recovery-point-id> \
  --use-secondary-region \
  --target-resource-group rg-dr-northeurope \
  --storage-account <staging-sa-in-secondary>

The restore lands disks in the secondary region; you then build the VM from those disks (or use the full-VM restore flow). Note the staging storage account and target resource group must already exist in the secondary region – pre-provision them as part of your DR landing zone, not during the incident. The common geo-pairs you will target:

Primary region Azure-paired secondary Notes
West Europe North Europe Classic EU pair
North Europe West Europe Symmetric
East US West US US pair
Central India South India In-country pair (data residency)
Southeast Asia East Asia APAC pair
UK South UK West In-country pair

Verify – prove each control

Do not trust the configuration blade. Prove each control with a command and, for restore, with an actual recovery:

# 1. Immutability state is Locked.
az resource show \
  --resource-group rg-backup-prod --name rsv-prod-weu \
  --resource-type Microsoft.RecoveryServices/vaults \
  --query "properties.securitySettings.immutabilitySettings.state" -o tsv
# Expect: Locked

# 2. Soft delete is AlwaysON with your retention; redundancy + CRR set.
az backup vault backup-properties show \
  --resource-group rg-backup-prod --name rsv-prod-weu \
  --query "{softDelete:softDeleteFeatureState, days:softDeleteRetentionPeriodInDays, redundancy:storageModelType, crr:crossRegionRestoreFlag}"
# Expect: AlwaysON / 30 / GeoRedundant / true

# 3. Resource Guard mapping exists.
az backup vault resource-guard-mapping show \
  --resource-group rg-backup-prod --vault-name rsv-prod-weu \
  --query "properties.resourceGuardOperationDetails" -o table
// 4. In Log Analytics (vault diagnostics -> CoreAzureBackup), confirm a
// successful secondary-region restore in the last 7 days.
AddonAzureBackupJobs
| where TimeGenerated > ago(7d)
| where BackupItemUniqueId != ""
| where JobOperation == "Restore"
| project TimeGenerated, JobStatus, JobOperation, BackupManagementType, JobUniqueId
| order by TimeGenerated desc

The fourth check is the one that matters. A green config and an untested restore is exactly the trap from the ASR world: “protected” is not “recoverable” until you have booted from a secondary-region recovery point and timed it. The verification matrix you run before signing off:

Control Confirm command / path Expected If wrong
Redundancy backup-properties show → storageModelType GeoRedundant Recreate vault (frozen after items)
CRR flag backup-properties show → crossRegionRestoreFlag true Set flag (needs GRS)
Immutability resource show → immutabilitySettings.state Locked Soak then lock
Soft delete backup-properties show → softDeleteFeatureState AlwaysON Set AlwaysON
MUA mapping resource-guard-mapping show Guard present Re-map cross-tenant guard
Restore proof AddonAzureBackupJobs Restore in 7d Completed Run a real drill

The az command cheat-sheet for this whole posture, in one place to keep open during operations:

Task Command (az …)
Set redundancy + CRR backup vault backup-properties set --backup-storage-redundancy GeoRedundant --cross-region-restore-flag true
Enable soft delete AlwaysON backup vault backup-properties set --soft-delete-feature-state AlwaysON --soft-delete-retention-period-in-days 30
Enable immutability (Unlocked) resource update --set properties.securitySettings.immutabilitySettings.state=Unlocked
Lock immutability (irreversible) resource update --set properties.securitySettings.immutabilitySettings.state=Locked
Create Resource Guard dataprotection resource-guard create -g <rg> -n <guard> -l <loc>
Map vault to guard backup vault resource-guard-mapping update --resource-guard-id <id>
List soft-deleted items backup item list --query "[?properties.isScheduledForDeferredDelete].name"
Undelete an item backup protection undelete --container-name <c> --item-name <i>
List secondary-region RPs backup recoverypoint list --use-secondary-region
Cross-region restore disks backup restore restore-disks --use-secondary-region --target-resource-group <dr-rg>
Stream diagnostics monitor diagnostic-settings create --workspace <law-id> --logs '[{"categoryGroup":"allLogs","enabled":true}]'

7. Monitoring with Backup center, alerts, and Backup reports

Backup center is the single pane across every vault in the tenant – jobs, alerts, policy compliance, and security posture in one place. Even with MUA gating destructive operations, you want to be told the moment one is attempted, because an attempted strip is a detection signal. Two monitoring layers matter:

az monitor diagnostic-settings create \
  --name backup-diag \
  --resource "/subscriptions/<sub>/resourceGroups/rg-backup-prod/providers/Microsoft.RecoveryServices/vaults/rsv-prod-weu" \
  --workspace "/subscriptions/<sub>/resourceGroups/rg-obs/providers/Microsoft.OperationalInsights/workspaces/law-platform" \
  --logs '[{"categoryGroup":"allLogs","enabled":true}]'

Alert on the security-relevant operations specifically:

CoreAzureBackup
| where TimeGenerated > ago(1d)
| where OperationName has_any ("StopProtectionWithRetainData", "StopProtectionWithDeleteData", "DisableSoftDelete")
| project TimeGenerated, OperationName, BackupItemUniqueId, State

The alerts to wire, what they catch, and their severity:

Alert Fires on Severity Route to
Backup failure Job status = Failed Sev 2 Backup squad
Stop protection + delete data Destructive op attempted Sev 0 SOC + backup squad
Disable soft delete Safety-net removal attempt Sev 0 SOC
Disable / reduce immutability WORM floor tampering Sev 0 SOC
Restore started Any restore job Sev 3 (informational) Backup squad
Resource Guard unmap attempt MUA being disabled Sev 0 Security team
Delete backup data RP deletion Sev 1 SOC + backup squad
Reduce retention in policy Retention tampering Sev 1 Compliance + backup squad
GRS replication lag high Secondary copy falling behind Sev 2 Backup squad

The diagnostic log categories worth streaming and what each powers:

Category Contains Powers
CoreAzureBackup Vault-level operations + state Destructive-op alerting
AddonAzureBackupJobs Job success/failure/duration Restore-drill proof, SLA
AddonAzureBackupPolicy Policy associations Compliance reporting
AddonAzureBackupStorage Storage consumed Cost trend in Backup reports
AddonAzureBackupProtectedInstance Protected-instance count Billing reconciliation

Architecture at a glance

The diagram traces the destructive path through all four controls, left to right, exactly as an attacker would attempt it and exactly as your hardening blocks it. On the left, the workload Tenant A holds the backup admin (standing Backup Contributor) and the ~900 protected items – VMs, SQL, Blob, Disk, PostgreSQL. The admin’s normal flow is the blue “protect / backup” arrow into the primary vault (West Europe), which is GeoRedundant with the CRR flag set. That vault carries three of the four controls stacked: the immutability WORM floor (state=Locked, badge 1), and enhanced soft delete (AlwaysON, 14–180 days, badge 2). When any destructive operation is attempted – the red “destructive op → authorize” arrow – it cannot complete inside Tenant A; it must round-trip to the security Tenant B, where the Resource Guard (badge 3) gates the five destructive operations and a PIM activation grants a just-in-time, time-bound Backup Operator role that flows back as the green “JIT approval” arrow. There is no standing path for an attacker-as-admin to self-approve.

The right half is recovery and proof. The vault asynchronously replicates over the teal “GRS replicate” arrow to the paired region (North Europe), where the read-only secondary recovery points live and CRR (badge 4) restores disks into a pre-provisioned DR resource group and staging storage account. Finally everything – the destructive-operation attempts especially (badge 5) – streams as diagnostics to Log Analytics (CoreAzureBackup), where the Sev-0 destructive-operation alert fires the moment someone tries a strip, even though MUA already blocked it. Read the five legend numbers as the five things that must hold: immutability floor, soft-delete net, MUA approval, cross-region copy, and the detection alert. Defeat any one in isolation and the attacker wins; together and locked, they do not.

Ransomware-resilient Azure Backup architecture: workload Tenant A backup admin and ~900 protected items feed a GeoRedundant West Europe Recovery Services vault carrying locked immutability and AlwaysON soft delete; destructive operations must round-trip to a Resource Guard and PIM approval in a separate security Tenant B; the vault GRS-replicates to a North Europe paired region for on-demand cross-region restore into a pre-provisioned DR resource group; all operations and destructive-op attempts stream to Log Analytics CoreAzureBackup with Sev-0 alerts. Five numbered badges mark immutability, soft delete, MUA, cross-region restore and the destructive-op alert.

Real-world scenario

A European fintech platform team – call them Helvetia Pay – ran ~900 production VMs across two GeoRedundant Recovery Services vaults (West Europe primary, North Europe pair), governed by a central backup squad holding Backup Contributor on the landing-zone subscriptions. Their CI/CD platform used a service principal that, through role inheritance at subscription scope, also held Backup Contributor – a fact nobody had registered as a risk. A scheduled red-team exercise compromised that CI service principal via a leaked pipeline secret.

The red team’s playbook was textbook ransomware: before touching any workload, destroy the recovery path. The first phase report was damning. With only immutability enabled in the Unlocked state, the attacker path was: (1) disable immutability (it was never locked, because the team feared losing the ability to shorten retention), (2) disable soft delete in the same call, then (3) stop-protection-with-delete-data on the 40 crown-jewel VMs. Every one of those operations succeeded in the lab because the compromised principal held the rights and nothing gated them. The simulated blast radius: zero recoverable backups for the payment-processing tier, against a regulatory RPO of 24 hours. Had this been real ransomware, Helvetia Pay would have been choosing between paying and going out of business.

The fix was sequencing and separation, not new technology. Over a two-week hardening sprint they:

Step Action Control hardened Why this order
1 Audited every policy’s retention; trimmed 3 over-long yearly schedules from 10y to 7y Cost / pre-lock hygiene Locking freezes retention forever
2 Set enhanced soft delete to AlwaysON / 30 days on both vaults Soft delete Net must exist before lock
3 Flipped immutability to Locked on both vaults Immutability Now attacker-proof, retention reviewed
4 Stood up a Resource Guard in a separate security tenant, mapped both vaults MUA Removes self-approval entirely
5 Replaced CI’s standing Backup Contributor with PIM-activated, scoped roles Least privilege Kills the inherited-rights path
6 Wired vault diag → Log Analytics; enabled Sev-0 destructive-op alerts Detection Get paged on attempts
7 Ran a timed CRR drill into North Europe; recorded RTO = 42 min Recoverability proof “Protected” ≠ “recoverable”

The re-run red team, holding the same compromised Backup Contributor in the workload tenant, was fully blocked. Their first destructive call failed authorization:

# Re-run attacker, holding Backup Contributor in the workload tenant, tries to
# strip protection. With the Resource Guard mapped, this FAILS authorization
# because the identity has no Backup Operator role on the guard in Tenant B.
az backup protection disable \
  --resource-group rg-backup-prod --vault-name rsv-prod-weu \
  --container-name <c> --item-name <vm> \
  --backup-management-type AzureIaasVM --delete-backup-data true
# -> ResourceGuard: operation requires authorization on the Resource Guard.

Better still, the attempt tripped the Sev-0 alert and the SOC saw it within seconds. The lesson the team wrote into their platform standard: these four controls are only worth anything combined and locked. Any single one, left unlocked or co-located with the admin who would be the attacker, is theatre. The total spend was roughly ₹40,000/month in extra soft-deleted and GRS storage across both vaults – trivially less than one hour of the outage they avoided.

Advantages and disadvantages

The hardened posture is not free – the irreversibility that makes it attacker-proof is the same property that bites you if you mis-size before locking. The honest trade-off:

Advantages Disadvantages
Survives an admin-level / insider compromise Three controls are one-way doors (Locked, AlwaysON, redundancy)
Satisfies WORM / regulatory retention audits Locked immutability freezes retention – mis-sizing = years of overpay
Deleted backups recoverable for up to 180 days Soft-deleted RPs cost storage after the free 14 days
Out-of-region copy and on-demand CRR GRS costs more than LRS/ZRS; no ZRS+CRR combo
MUA removes standing destructive rights Cross-tenant guard adds operational friction (PIM round-trip)
Attempted strips are detected and alerted Requires a second team / tenant to operate the guard
Recoverability is provable via timed drills Drills take effort and a pre-built DR landing zone

When each matters: the irreversibility is your friend in any regulated or high-extortion-risk estate – it is exactly what an auditor wants to see and exactly what defeats the attacker. It is your enemy only if you skip the soak and retention review, which is why the sequencing discipline (audit → soft delete → lock → MUA) is non-negotiable. The GRS cost premium matters most for very large, high-churn estates; for them, tune instant-restore retention down and consider operational-only Blob backup where a second copy is not mandated. The cross-tenant friction matters for small teams who do not have a separate security org – for them, a different-subscription-different-team guard is the pragmatic floor.

Hands-on lab

This builds a fully hardened single-VM vault end to end, proves every control, then tears it down. It uses a B1s VM and minimal storage; the soft-deleted/GRS storage cost for an afternoon is negligible. You need an Azure subscription, the az CLI, and (for the MUA step) Owner on a second subscription to host the guard.

1. Resource group and a tiny VM to protect.

az group create -n rg-bkup-lab -l westeurope
az vm create -g rg-bkup-lab -n vm-lab --image Ubuntu2204 \
  --size Standard_B1s --admin-username azureuser --generate-ssh-keys

2. Create the vault and set redundancy + CRR FIRST (day-zero).

az backup vault create -g rg-bkup-lab -n rsv-bkup-lab -l westeurope
az backup vault backup-properties set -g rg-bkup-lab -n rsv-bkup-lab \
  --backup-storage-redundancy GeoRedundant --cross-region-restore-flag true

3. Enable enhanced soft delete (AlwaysON) and immutability Unlocked.

az backup vault backup-properties set -g rg-bkup-lab -n rsv-bkup-lab \
  --soft-delete-feature-state AlwaysON --soft-delete-retention-period-in-days 14

az resource update -g rg-bkup-lab -n rsv-bkup-lab \
  --resource-type Microsoft.RecoveryServices/vaults \
  --set properties.securitySettings.immutabilitySettings.state=Unlocked

4. Protect the VM with the default IaaS policy and run a backup now.

az backup protection enable-for-vm -g rg-bkup-lab --vault-name rsv-bkup-lab \
  --vm vm-lab --policy-name DefaultPolicy

az backup protection backup-now -g rg-bkup-lab --vault-name rsv-bkup-lab \
  --container-name vm-lab --item-name vm-lab \
  --backup-management-type AzureIaasVM \
  --retain-until 30-06-2026
# Expected: a Backup job appears; wait for Completed.

5. Create a Resource Guard (in a second subscription) and map the vault.

az account set --subscription <security-sub-id>
az group create -n rg-guard-lab -l westeurope
az dataprotection resource-guard create -g rg-guard-lab -n guard-lab -l westeurope
GUARD_ID=$(az dataprotection resource-guard show -g rg-guard-lab -n guard-lab --query id -o tsv)

az account set --subscription <workload-sub-id>
az backup vault resource-guard-mapping update -g rg-bkup-lab \
  --vault-name rsv-bkup-lab --resource-guard-id "$GUARD_ID"

6. Prove every control (the verification matrix).

az resource show -g rg-bkup-lab -n rsv-bkup-lab \
  --resource-type Microsoft.RecoveryServices/vaults \
  --query "properties.securitySettings.immutabilitySettings.state" -o tsv  # Unlocked

az backup vault backup-properties show -g rg-bkup-lab -n rsv-bkup-lab \
  --query "{sd:softDeleteFeatureState, crr:crossRegionRestoreFlag, r:storageModelType}"
# Expect: AlwaysON / true / GeoRedundant

7. Test the MUA gate – this should be authorization-blocked.

# With the guard mapped and no PIM Backup Operator on it, this FAILS:
az backup protection disable -g rg-bkup-lab --vault-name rsv-bkup-lab \
  --container-name vm-lab --item-name vm-lab \
  --backup-management-type AzureIaasVM --delete-backup-data true
# -> Expected: ResourceGuard authorization error. The gate works.

8. (Optional) Test soft-delete recovery. Stop protection with retain, delete the item, then list soft-deleted and undelete as shown in section 4.

9. Teardown. Unmap the guard, disable protection (retain or delete in the lab), then delete both resource groups. Note that with Locked immutability you could not delete protected data early – which is why the lab uses Unlocked.

# Remove protection (lab uses Unlocked immutability so this is allowed),
az backup protection disable -g rg-bkup-lab --vault-name rsv-bkup-lab \
  --container-name vm-lab --item-name vm-lab \
  --backup-management-type AzureIaasVM --delete-backup-data true --yes
az group delete -n rg-bkup-lab --yes --no-wait
az group delete -n rg-guard-lab --subscription <security-sub-id> --yes --no-wait

Common mistakes & troubleshooting

The failure modes here are operational and they cluster around the irreversible doors and the cross-tenant plumbing. This is the playbook – symptom, root cause, the exact command or portal path to confirm, and the fix:

# Symptom Root cause Confirm (exact command / path) Fix
1 Can’t enable CRR – flag won’t set Redundancy is LRS/ZRS, not GRS backup-properties show → storageModelType Set GRS before items; if items exist, new vault
2 “Redundancy change not allowed” Vault already has protected items az backup item list (non-empty) Recreate vault empty; redundancy is day-zero
3 Destructive op succeeds despite “immutability on” Immutability is Unlocked, attacker disabled it first resource show → immutabilitySettings.state = Unlocked Lock it after soak + retention review
4 Can’t shorten an over-long retention Vault is Locked immutable state = Locked None – right-size before locking
5 MUA op blocked even for a legit change No PIM Backup Operator on the guard resource-guard-mapping show Security team grants JIT role for the window
6 Can’t map vault to guard Backup admin lacks Reader on the guard (cross-tenant) RBAC on the guard resource Grant Reader on the guard in Tenant B
7 Soft-deleted item gone before expected Soft delete was Enable (disablable) and got turned off backup-properties show → softDeleteFeatureState Set AlwaysON; can’t be disabled
8 CRR restore fails: “storage account not found” No staging SA in secondary region check secondary RG/SA exists Pre-provision staging SA + target RG in pair
9 Secondary recovery points empty GRS replication lag (up to hours) or CRR flag off recoverypoint list --use-secondary-region Wait for replication; confirm CRR flag true
10 Backup reports / KQL empty Diagnostics not wired to a workspace az monitor diagnostic-settings list on vault Create diagnostic setting → Log Analytics
11 Destructive-op alert never fired Built-in alert rules not enabled Backup center → Alerts config Enable failure + destructive-op alert rules
12 Locked the wrong (10y) retention Skipped pre-lock retention review yearly schedule count = 10 None – this is the cautionary tale; audit first
13 Guard “protects nothing” Excluded all operations when creating it list-protected-operations (short list) Re-add the critical ops to the guard
14 MUA bypassed in incident Guard co-located in same sub admin owns guard subscription = vault subscription Move guard cross-tenant/cross-team

The decision table for “which control failed me” during a real incident:

If you see… It’s probably… Do this
Backups deleted despite “immutability” Immutability was Unlocked, not Locked Lock it everywhere; this is the #1 gap
A destructive op went through with no approval No MUA, or guard co-located Map a cross-tenant Resource Guard
Deleted backup unrecoverable after 5 days Basic soft delete (14d) or it was disabled Enhanced soft delete AlwaysON
No copy when the region went down Vault was LRS/ZRS, not GRS GRS + CRR (rebuild if items exist)
Restore worked but took 6 hours No pre-built DR landing zone Pre-provision staging SA + target RG

Best practices

Security notes

The backup RBAC roles, what each can and cannot do, and who should hold them:

Role Can configure Can trigger backup/restore Can delete data / stop protection Typical holder
Backup Reader No No (read-only) No Monitoring, auditors, SOC
Backup Operator No (no policy create) Yes No (cannot delete backup data) Day-job operators
Backup Contributor Yes (policies, protection) Yes Yes Backup squad (sparingly)
Owner Yes (everything) Yes Yes Break-glass only
Reader (on guard, Tenant B) No No No Backup admin (to map guard)
Backup MUA Operator (on guard) n/a n/a Approves a gated op in-window Backup admin via PIM only

Cost & sizing

What drives the Azure Backup bill, in rough order of impact – price points are indicative (≈₹/USD, vary by region and commitment):

Cost driver What it is Rough indicative cost How to right-size
Protected-instance fee Per protected VM/DB/etc. per month ~₹400–₹800 / $5–$10 per instance/mo (size-banded) Decommission stale items; consolidate
Vault storage (GRS) Backup data stored, geo-replicated ~₹2/GB/mo LRS; GRS ≈ 2× Right-size retention; LRS where 2nd copy not mandated
Instant-restore snapshots Local snapshots in source sub (1–5d) Snapshot storage rate × churn Lower to 1–2 days for large chatty VMs
Soft-deleted storage Deleted RPs kept past free 14 days Same as vault storage rate Right-size soft-delete window (14–180)
CRR / cross-region egress Geo-replication + restore data movement Per-GB egress on restore Drill cost is one-off; replication is in GRS price
Log Analytics ingestion Diagnostic logs for reports/alerts ~₹/GB ingested Filter categories; cap retention

Sizing guidance: the protected-instance fee dominates for fleets of small VMs; the storage dominates for a few large, high-churn machines with long retention. The single biggest lever you control is retention × churn – a 7-year yearly point on a high-churn VM is expensive, and once the vault is locked you cannot reduce it, so size it before locking. GRS doubles storage versus LRS; pay it where a second-region copy or CRR is a real requirement, and consider LRS/ZRS for non-critical or operational-only protection. There is no free tier for production Azure Backup, but the lab in this article (one B1s VM, one afternoon) costs a few rupees. For the broader cost-conscious DR pattern on small teams, see Disaster recovery on a budget: backup & restore for small teams.

Interview & exam questions

These map to AZ-104 (Azure Administrator), AZ-305 (Solutions Architect), and SC-100 (Cybersecurity Architect) backup/resilience objectives.

  1. Why can’t you change vault redundancy after onboarding the first item? Redundancy determines where backup data is physically stored; changing it would require re-replicating all existing recovery points, so Azure freezes it once any item is protected. It is a day-zero decision – set GeoRedundant before onboarding if you want CRR.

  2. What is the difference between Unlocked and Locked immutability? Both block retention-reducing operations on existing recovery points. Unlocked can be disabled by an admin (a soak state); Locked is irreversible – not even the subscription owner or Microsoft support can disable it, which is what makes it attacker-proof and WORM-compliant.

  3. What exactly does immutability NOT block? Creating new backups and extending retention. It only gates the destructive direction: delete-before-retention, shortening retention, and disabling soft delete.

  4. What is a Resource Guard and why place it in a different tenant? A Microsoft.DataProtection/resourceGuards resource that gates destructive vault operations behind a second authorization (MUA). Placing it in a tenant the backup admin does not control means a compromised backup admin cannot self-approve a destructive operation – that is the separation of duties.

  5. Which operations does a Resource Guard gate by default? Disabling MUA itself, disabling/reducing soft delete, disabling immutability, stop-protection-with-delete-data, and modifying/deleting backup policies (and the MARS passphrase change). Several can be optionally excluded.

  6. Basic vs enhanced soft delete? Basic is a fixed 14 days and can be turned off. Enhanced is configurable 14–180 days and can be made AlwaysON (non-disablable). Enhanced is the current recommended model.

  7. Prerequisites for cross-region restore? Vault redundancy = GeoRedundant, the crossRegionRestore flag enabled, a workload type that supports CRR (Azure VM, SQL-in-VM, SAP HANA-in-VM), and a pre-provisioned staging storage account + target resource group in the paired region.

  8. Can a vault have both ZRS and CRR? No. ZoneRedundant protects within the primary region against a single-AZ loss but has no second-region copy. CRR reads from the GeoRedundant paired-region copy. Choose based on whether zone-loss or region-loss dominates your risk.

  9. How do you recover a maliciously deleted backup? If enhanced soft delete is on, the item is soft-deleted (14–180 day window). List soft-deleted items and run az backup protection undelete, then resume protection. After the window, only a CRR/secondary copy can help.

  10. An attacker holds Backup Contributor. Which single control stops them from stopping protection and deleting data? MUA via a cross-tenant Resource Guard – it requires an approval they cannot grant. Immutability (Locked) independently stops early deletion; combined, neither can be defeated. The exam answer for “stop the operation entirely” is the Resource Guard.

  11. Why is instantRpRetentionRangeInDays a cost lever? Instant-restore snapshots live in the source subscription and incur snapshot storage; lowering the range (1–5 days) reduces that cost at the price of slower same-region restore once snapshots age out.

  12. How do you prove a vault is recoverable, not just protected? Run an actual cross-region restore drill: enumerate secondary-region recovery points, restore disks into the paired region, boot the VM, and record the RTO. A green configuration blade is not proof.

Quick check

  1. Your vault is LocallyRedundant and already protecting 50 VMs. The CISO now wants cross-region restore. What must you do?
  2. Immutability is enabled but a red team still deleted backups. What state was it in, and what is the fix?
  3. You set a 10-year yearly retention, locked immutability, then realised it should be 3 years. Can you fix it? Why or why not?
  4. Where must the Resource Guard live for MUA to actually resist a compromised backup admin?
  5. A backup item was deleted and is unrecoverable after 6 days. What was misconfigured?

Answers

  1. Recreate the vault as GeoRedundant and re-onboard. Redundancy is frozen once any item is protected, so you cannot convert in place – create a new GRS vault, enable the CRR flag, and re-protect the VMs.
  2. It was Unlocked, so an admin (the red team) disabled immutability first and then deleted. The fix is to Lock it after a retention review – Locked is irreversible and cannot be disabled by anyone.
  3. No. Locked immutability blocks shortening retention. You will pay for the 10-year retention for its full duration. This is why you right-size and review retention before locking.
  4. In a separate tenant (or at minimum a different subscription owned by a different team) where the backup admin has no permissions – so a compromised admin cannot self-approve the destructive operation.
  5. Soft delete was either basic (fixed 14 days but presumably disabled) or set to Enable and then turned off – the deletion outlived the recovery window. Enhanced soft delete at AlwaysON with a 14–180 day window prevents this.

Glossary

Next steps

AzureAzure BackupRansomwareData ProtectionResilience
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments