Azure Update Manager: Maintenance Configurations, Scheduled Patching, and Hybrid Coverage with Arc

Patching is where good intentions go to die. Every estate I have inherited had a patch “strategy” that was really three strategies – a Windows team on WSUS, a Linux team running unattended-upgrades on a cron, and a cloud team hoping the images were recent enough. Nobody could answer the only question that matters at audit time: which machines are missing which CVEs right now, and when will they be patched? Azure Update Manager (AUM) is Microsoft’s answer, and unlike its predecessor it needs no Log Analytics workspace, no Automation account, and no agent of its own – it is a native VM platform capability that also reaches off-Azure through Arc. This is how to wire it up: assessment, on-demand remediation, recurring maintenance configurations, tag-driven dynamic scopes, pre/post automation, hybrid coverage, hotpatching, and the Policy that keeps it all honest.

Mental model. Update Manager has two planes. The data plane assesses and installs updates on a single machine on demand. The scheduling plane – maintenance configurations – is what turns one-off actions into a governed, recurring program. Most teams stall because they treat AUM as a button to click rather than a schedule to declare. The schedule is the product.

1. Architecture and migrating off Automation Update Management

The legacy Automation Update Management (under an Automation account) is retired – it reached end of support on 31 August 2024. If you still run it, migration is overdue. The new Azure Update Manager differs in ways that matter operationally:

Concern	Automation Update Management (legacy)	Azure Update Manager
Dependencies	Log Analytics workspace + Automation account	None – native to the VM/Arc platform
Agent	Log Analytics agent (MMA/OMS)	No separate agent; uses the VM/Arc extension framework
Scheduling	Automation schedules + Update Deployments	Maintenance configurations
Targeting	Saved searches / groups	Dynamic scopes (Azure Resource Graph queries on tags/sub/RG)
Off-Azure	Hybrid Runbook Worker	Azure Arc-enabled servers

The MMA/OMS agent retired in August 2024 as well, so anything depending on it is on borrowed time. Microsoft ships a portal migration experience and runbooks that recreate legacy schedules as maintenance configurations – use them rather than hand-translating, because the dynamic-scope mapping is the part people get wrong.

Update Manager itself requires no enablement resource, but it does require that the machine’s update settings allow the platform to orchestrate. The single most important property is the patch orchestration mode on each VM. For scheduled patching to apply, the VM must be set to Azure-orchestrated (surfaced as AutomaticByPlatform) and, critically, have bypassPlatformSafetyChecksOnUserSchedule = true so the platform does not also apply its own automatic patches and collide with your schedule.

# Providers AUM and its scheduling/policy surface depend on
az provider register --namespace Microsoft.Maintenance
az provider register --namespace Microsoft.Compute
az provider register --namespace Microsoft.PolicyInsights
az provider register --namespace Microsoft.HybridCompute   # Arc-enabled servers

# Put an existing Linux VM into customer-managed scheduled patching
az vm update \
  --resource-group rg-fleet-prod \
  --name vm-app-01 \
  --set osProfile.linuxConfiguration.patchSettings.patchMode=AutomaticByPlatform \
  --set osProfile.linuxConfiguration.patchSettings.assessmentMode=AutomaticByPlatform

# Hand control to YOUR maintenance schedule (suppress platform auto-patching)
az vm update \
  --resource-group rg-fleet-prod \
  --name vm-app-01 \
  --set osProfile.linuxConfiguration.patchSettings.bypassPlatformSafetyChecksOnUserSchedule=true

For Windows the equivalent path is osProfile.windowsConfiguration.patchSettings.* with patchMode=AutomaticByPlatform. Get this property wrong and your maintenance configuration will appear to run while the VM quietly patches itself on Microsoft’s cadence instead.

2. On-demand assessment and one-time update deployments

Before scheduling anything, learn what the fleet actually needs. Assessment is read-only: it queries each machine’s update source (Windows Update / WSUS for Windows, the distro package manager for Linux) and reports missing updates by classification and KB/package. It does not install anything.

# One-off assessment of a single machine (results land in Update Manager)
az vm assess-patches \
  --resource-group rg-fleet-prod \
  --name vm-app-01

You can drive this at fleet scale and read the results out of Azure Resource Graph, which is the only sane way to query patch state across hundreds of machines. The patchassessmentresources table holds both the per-machine summary and the individual patches children:

// Machines with pending CRITICAL or SECURITY updates, Azure VMs and Arc together
patchassessmentresources
| where type =~ "microsoft.compute/virtualmachines/patchassessmentresults/softwarepatches"
   or type =~ "microsoft.hybridcompute/machines/patchassessmentresults/softwarepatches"
| where properties.classifications has_any ("Critical", "Security")
| where properties.patchState =~ "Available"
| extend machine = tostring(split(id, "/")[8])
| summarize pendingUpdates = count() by machine, tostring(properties.classifications)
| order by pendingUpdates desc

When you need to remediate now – an out-of-band CVE, say – run a one-time update deployment (an install run). Filter by classification and, importantly, give it an explicit maintenance window in minutes plus a reboot policy. The window is a hard stop: AUM stops starting new package installs once the window is exhausted, so size it for the slowest machine in the batch.

# Install only Critical + Security updates, 120-minute window, reboot only if required
az vm install-patches \
  --resource-group rg-fleet-prod \
  --name vm-app-01 \
  --maximum-duration PT120M \
  --reboot-setting IfRequired \
  --classifications-to-include-linux Critical Security

For Windows, swap to --classifications-to-include-win Critical Security UpdateRollUp and you can additionally pass --kb-numbers-to-include / --kb-numbers-to-exclude to pin or block specific KBs. The --reboot-setting values are IfRequired, Always, and Never – Never is how you decouple patch install from reboot when a change window forbids restarts.

3. Maintenance configurations, schedules, and reboot settings

This is the heart of AUM. A maintenance configuration of scope InGuestPatch is a first-class Azure resource that declares when to patch, what classifications to include, and how to handle reboots. Machines are then associated to it – statically or, far better, via dynamic scopes (next section).

Key fields, and the traps in each:

maintenanceWindow.startDateTime / duration – the window. Duration is HH:mm, minimum 1 hour 30 minutes (01:30), and the last 10 minutes are reserved by the platform to finalize, so effective install time is duration - 10m.
recurEvery – the cadence, e.g. 1Day, 1Week, or a monthly expression like Month Third Sunday.
rebootSetting – IfRequired, RebootIfRequired is the Bicep/JSON spelling; values map to “reboot if needed”, “always”, “never”.
windowsParameters / linuxParameters – per-OS classification and KB/package include/exclude lists.

Here is a production-grade monthly Windows configuration in Bicep, patching on the third Sunday at 02:00 UTC with a 3-hour window:

resource patchMonthly 'Microsoft.Maintenance/maintenanceConfigurations@2023-10-01-preview' = {
  name: 'mc-win-prod-monthly'
  location: 'eastus2'
  properties: {
    maintenanceScope: 'InGuestPatch'
    extensionProperties: {
      // Required so the config is treated as a guest-patch (AUM) schedule
      InGuestPatchMode: 'User'
    }
    maintenanceWindow: {
      startDateTime: '2026-06-21 02:00'
      duration: '03:00'
      timeZone: 'UTC'
      recurEvery: 'Month Third Sunday'
    }
    installPatches: {
      rebootSetting: 'IfRequired'
      windowsParameters: {
        classificationsToInclude: [
          'Critical'
          'Security'
          'UpdateRollUp'
        ]
        kbNumbersToExclude: [
          'KB5099999'   // block a known-bad KB until validated
        ]
      }
    }
  }
}

Deploy it and you have a schedule with no machines yet – intentionally. Never staple machines into a config by hand at scale; declare the intent of the config (what it patches, when) and let dynamic scopes decide membership. To associate a single machine explicitly when you must, use a configuration assignment:

az maintenance assignment create \
  --resource-group rg-fleet-prod \
  --resource-name vm-app-01 \
  --resource-type virtualMachines \
  --provider-name Microsoft.Compute \
  --configuration-assignment-name assign-vm-app-01 \
  --maintenance-configuration-id "/subscriptions/<sub>/resourceGroups/rg-fleet-prod/providers/Microsoft.Maintenance/maintenanceConfigurations/mc-win-prod-monthly"

The most common silent failure: the VM’s patchMode is not AutomaticByPlatform, or bypassPlatformSafetyChecksOnUserSchedule is false. The maintenance run will be skipped with a status that looks benign. Reconcile machine settings to the schedule before you trust the schedule.

4. Dynamic scopes and tag-based targeting at scale

A dynamic scope attaches an Azure Resource Graph filter to a maintenance configuration. Membership is evaluated at run time, so a newly created VM that carries the right tag is patched on the next window with zero manual onboarding. This is the difference between a patch program that scales and one that rots.

Filters are expressed over subscriptions, resource groups, resource types, locations, OS types, and – the one you will actually lean on – tags. Define a small, governed tag vocabulary up front: PatchGroup = ring0|ring1|ring2|exempt, plus an Environment tag. Then bind one configuration per ring.

# Attach a dynamic scope: all machines tagged PatchGroup=ring1 in two subscriptions
az maintenance assignment create-or-update-subscription \
  --maintenance-configuration-id "/subscriptions/<sub>/resourceGroups/rg-fleet-prod/providers/Microsoft.Maintenance/maintenanceConfigurations/mc-win-prod-monthly" \
  --configuration-assignment-name "scope-ring1" \
  --filter-tags '{"PatchGroup":["ring1"]}' \
  --filter-tags-operator "All" \
  --filter-os-types "Windows" \
  --filter-locations "eastus2" "centralus"

The CLI surface for dynamic scopes has churned across versions; in practice many teams declare scopes in Bicep/ARM alongside the configuration so they are reproducible. The decision that matters is not syntax, it is ring design: ring0 is a thin canary (build agents, a few non-critical app servers) on the earliest window; ring1 the broad fleet a few days later; ring2 the latency-sensitive tier last. The exempt tag routes to no configuration and is audited separately – exemptions must be visible, time-boxed, and few.

Validate that a scope resolves to the machines you expect before the window fires, using the same Resource Graph query AUM evaluates:

resources
| where type in~ ("microsoft.compute/virtualmachines", "microsoft.hybridcompute/machines")
| where tags["PatchGroup"] =~ "ring1"
| project name, type, location, resourceGroup, subscriptionId

5. Pre and post maintenance events with automation hooks

Patching is rarely just patching. You drain a load balancer first, you quiesce a database, you snapshot a disk, you re-run smoke tests after. AUM exposes this through pre and post maintenance events delivered via Event Grid on the maintenance configuration. A pre-event fires before the window starts; a post-event after it completes. Subscribe an Azure Function, Logic App, Automation runbook, or webhook and you have orchestration hooks without bolting on a separate scheduler.

Wire a pre-event to a Function that cordons machines and a post-event that validates health:

# Pre-maintenance event -> Function App endpoint
az eventgrid event-subscription create \
  --name pre-maint-drain \
  --source-resource-id "/subscriptions/<sub>/resourceGroups/rg-fleet-prod/providers/Microsoft.Maintenance/maintenanceConfigurations/mc-win-prod-monthly" \
  --endpoint-type azurefunction \
  --endpoint "/subscriptions/<sub>/resourceGroups/rg-ops/providers/Microsoft.Web/sites/fn-patch-orchestrator/functions/PreMaintenanceDrain" \
  --included-event-types Microsoft.Maintenance.PreMaintenanceEvent

# Post-maintenance event -> validation Function
az eventgrid event-subscription create \
  --name post-maint-validate \
  --source-resource-id "/subscriptions/<sub>/resourceGroups/rg-fleet-prod/providers/Microsoft.Maintenance/maintenanceConfigurations/mc-win-prod-monthly" \
  --endpoint-type azurefunction \
  --endpoint "/subscriptions/<sub>/resourceGroups/rg-ops/providers/Microsoft.Web/sites/fn-patch-orchestrator/functions/PostMaintenanceValidate" \
  --included-event-types Microsoft.Maintenance.PostMaintenanceEvent

The contract to understand: the pre-event handler runs inside a bounded pre-window (on the order of ~20 minutes) and the maintenance run proceeds when it completes or times out. Keep handlers idempotent and fast – this is the wrong place for a 30-minute backup. Use it to call the operation (kick off a snapshot, drain a node) and let the long-running work happen asynchronously, with the post-event reconciling state.

6. Hybrid and multicloud patching via Azure Arc-enabled servers

Update Manager’s real leverage is that an Arc-enabled server is, to AUM, just another machine. Onboard a server in your datacenter, in AWS, or in GCP, and it gets the same assessment, the same one-time deployments, the same maintenance configurations and dynamic scopes. There is no per-machine charge for Update Manager on native Azure VMs; for Arc-enabled servers, Update Manager is billed (a small per-server monthly charge) – budget for it, but it is far cheaper than running a parallel patch stack off-cloud.

Connect a Linux server to Arc:

# On the target server (one-shot install + connect)
sudo azcmagent connect \
  --resource-group "rg-arc-servers" \
  --tenant-id "<tenant-id>" \
  --location "eastus2" \
  --subscription-id "<sub>" \
  --cloud "AzureCloud" \
  --tags "PatchGroup=ring1,Environment=prod"

Because you tagged the machine on connect, your existing ring1 dynamic scope picks it up automatically – no separate onboarding into AUM. That is the whole point: one targeting model spanning Azure, on-prem, and other clouds. Two operational notes that bite hybrid fleets:

Arc machines need outbound connectivity to the Arc and update endpoints. If you are behind a proxy, set it on the agent (azcmagent config set proxy.url ...); air-gapped or WSUS-restricted estates must point Windows machines at an internal update server and confirm Linux repos are reachable.
The patch orchestration / patchMode concept applies to Arc machines too. On Arc, set it via the machine’s osProfile patch settings (portal, Policy, or az connectedmachine update) so scheduled patching is permitted rather than blocked.

7. Hotpatching and Windows Server orchestration patterns

For supported Windows Server SKUs, hotpatching installs OS security updates without a reboot by patching in-memory code, dramatically shrinking your reboot-driven maintenance windows. It is available on Windows Server Azure Edition (Datacenter: Azure Edition) and, more recently, on Windows Server 2025 – including, notably, Arc-enabled Windows Server 2025 machines under a subscription. The cadence is the pattern to internalize:

Baseline months (Jan, Apr, Jul, Oct) ship a cumulative update and require a reboot.
The two months after each baseline ship hotpatches – security fixes, no reboot.

So a year is four reboots, not twelve, with no loss of security coverage. The orchestration implication: design your maintenance configuration with rebootSetting: IfRequired, and the platform reboots only on baseline months and skips it on hotpatch months automatically. You do not script the calendar; AUM and the hotpatch service handle it. Enable hotpatch on the OS profile:

// Windows Server Azure Edition VM with hotpatch enabled
properties: {
  osProfile: {
    windowsConfiguration: {
      provisionVMAgent: true
      enableAutomaticUpdates: true
      patchSettings: {
        patchMode: 'AutomaticByPlatform'
        enableHotpatching: true
      }
    }
  }
}

Even where hotpatch is not available, the orchestration pattern holds: separate install from reboot using --reboot-setting Never on disruption-sensitive tiers, then drive the reboot through a controlled post-event so it lands inside an approved restart window rather than mid-patch.

8. Reporting, compliance dashboards, and Policy-driven enforcement

A patch program you cannot report on is a liability. AUM surfaces compliance in the portal, but the durable answer is Azure Policy – it both enforces the prerequisites (so new machines are born compliant) and reports drift across the estate.

There are built-in policy definitions for exactly this. Assign these at a management group so the whole tenant inherits them:

“Configure periodic checking for missing system updates on azure virtual machines” – sets assessmentMode = AutomaticByPlatform so every machine is continuously assessed.
“Schedule recurring updates using Azure Update Manager” – associates in-scope machines to a maintenance configuration via a DeployIfNotExists effect, so onboarding a VM auto-enrolls it into your schedule.

# Enforce periodic assessment tenant-wide via the built-in policy
az policy assignment create \
  --name "enforce-periodic-assessment" \
  --display-name "AUM: periodic assessment on all VMs" \
  --scope "/providers/Microsoft.Management/managementGroups/mg-platform" \
  --policy "59efceea-0c96-497e-a4a1-4eb2290dac15" \
  --mi-system-assigned --location eastus2 \
  --role "Contributor"

DeployIfNotExists and Modify policies need a managed identity with the right role at the assigned scope. Skip the --mi-system-assigned / --role and remediation tasks silently fail to deploy – the assignment shows compliant-looking definitions but never acts. Always provision the identity.

Report compliance from Resource Graph so it feeds a workbook or your existing dashboards rather than living only in the AUM blade:

// Fleet patch-compliance rollup: compliant vs non-compliant by environment
patchassessmentresources
| where type =~ "microsoft.compute/virtualmachines/patchassessmentresults"
   or type =~ "microsoft.hybridcompute/machines/patchassessmentresults"
| extend pending = toint(properties.availablePatchCountByClassification.security)
                 + toint(properties.availablePatchCountByClassification.critical)
| extend state = iff(pending == 0, "Compliant", "NonCompliant")
| join kind=leftouter (
    resources
    | project id = tolower(id), env = tostring(tags["Environment"])
  ) on $left.id == $right.id
| summarize machines = count() by state, env
| order by env asc, state asc

Verify

Walk this end to end before you call it done:

Orchestration mode is correct. For each machine in scope, confirm patchMode == AutomaticByPlatform and bypassPlatformSafetyChecksOnUserSchedule == true. The fastest check is a Resource Graph query over osProfile patch settings.
Assessment returns data. Run az vm assess-patches (or trigger via Policy) and confirm rows appear in patchassessmentresources. No rows means the machine cannot reach its update source.
A one-time install actually patches. Run az vm install-patches on a canary, then re-assess and confirm the pending count drops. This proves the data plane before you trust the schedule.
The maintenance configuration resolves to the right machines. Run the dynamic-scope Resource Graph query and reconcile the count against expectation. A scope that resolves to zero machines is the number-one cause of “the schedule ran but nothing happened.”
A scheduled run completes. After the first window, inspect the maintenance run status (portal -> Update Manager -> History, or maintenanceresources in Resource Graph). Confirm install results and that reboots happened only where intended.
Pre/post events fired. Check the Function/Logic App invocation logs for the configuration’s Event Grid subscriptions around the window time.
Policy is enforcing, not just auditing. Confirm the DeployIfNotExists assignments have a managed identity, a non-zero remediation history, and that a freshly created test VM gets auto-enrolled.

Enterprise scenario

A healthcare ISV ran ~900 servers split across Azure (Windows + Linux app tiers) and two on-prem datacenters still hosting a regulated records system that legally could not move to the cloud yet. Their old world was Automation Update Management for the Azure VMs and a hand-maintained WSUS-plus-cron arrangement on-prem. When the legacy service hit end of support, two constraints collided: an external auditor required a single, queryable compliance view across the entire estate, and the on-prem records servers had a hard rule – no unscheduled reboots during clinical hours (06:00-22:00 local), ever.

They solved it with one targeting model. The on-prem servers were onboarded to Arc and tagged PatchGroup=ring2,Environment=prod at connect time, which dropped them straight into an existing dynamic scope – no bespoke onboarding. The reboot constraint was handled by splitting install from restart: the ring2 maintenance configuration used rebootSetting: Never, so AUM installed packages inside a late-evening window but never restarted anything itself. A post-maintenance Event Grid handler then queued required reboots and released them only after 22:00 via a controlled runbook, machine by machine, with health checks between. Finally, all compliance reporting – Azure and Arc alike – came from a single Resource Graph query feeding one workbook, which is exactly the artifact the auditor wanted. The reboot-suppressing slice of the config:

installPatches: {
  rebootSetting: 'Never'   // install in-window; reboots handled by post-event after clinical hours
  windowsParameters: {
    classificationsToInclude: [ 'Critical', 'Security' ]
  }
}

The lesson the team took away: Arc plus dynamic scopes collapsed three patch programs into one, and decoupling install from reboot turned a hard compliance constraint into a scheduling detail rather than a blocker.

Azure Update Manager: Maintenance Configurations, Scheduled Patching, and Hybrid Coverage with Arc

1. Architecture and migrating off Automation Update Management

2. On-demand assessment and one-time update deployments

3. Maintenance configurations, schedules, and reboot settings

4. Dynamic scopes and tag-based targeting at scale

5. Pre and post maintenance events with automation hooks

6. Hybrid and multicloud patching via Azure Arc-enabled servers

7. Hotpatching and Windows Server orchestration patterns

8. Reporting, compliance dashboards, and Policy-driven enforcement

Verify

Enterprise scenario

Checklist

Written by Vinod

Comments

Keep Reading

Application Gateway for Containers: Gateway API on AKS with Traffic Splitting, mTLS, and Header Routing

Azure Event Hubs at Scale: Partitioning, Capture, Kafka Endpoint, and Stream Analytics Processing

Azure Service Bus at Scale: Sessions, Deduplication, and Dead-Letter Handling