Azure Update Manager: Maintenance Configurations, Scheduled Patching, and Hybrid Coverage with Arc

Patching is where good intentions go to die. Every estate I have inherited had a patch “strategy” that was really three strategies – a Windows team on WSUS, a Linux team running unattended-upgrades on a cron, and a cloud team hoping the images were recent enough. Nobody could answer the only question that matters at audit time: which machines are missing which CVEs right now, and when will they be patched? Azure Update Manager (AUM) is Microsoft’s answer, and unlike its predecessor it needs no Log Analytics workspace, no Automation account, and no agent of its own – it is a native VM platform capability that also reaches off-Azure through Azure Arc. This is how to wire it up end to end: assessment, on-demand remediation, recurring maintenance configurations, tag-driven dynamic scopes, pre/post automation, hybrid coverage, hotpatching, and the Azure Policy that keeps it all honest.

Update Manager has two planes, and confusing them is the single most common reason a patch program stalls. The data plane assesses and installs updates on a single machine on demand – it is a button you click. The scheduling plane – maintenance configurations – is what turns one-off actions into a governed, recurring program that runs at 02:00 on the third Sunday whether or not anyone is awake. Most teams treat AUM as a button rather than a schedule to declare, and so they never escape the cycle of manual, panic-driven, audit-deadline patching. The schedule is the product. Everything in this article builds toward a maintenance configuration that targets the right machines, at the right time, with the right reboot behaviour, across Azure and everything you run outside it.

By the end you will be able to put a heterogeneous fleet – Windows and Linux, Azure and on-prem and other-cloud – under one targeting model, prove a patch landed before you trust a schedule, decouple install from reboot so a restart window never blocks a security fix, and produce the single queryable compliance view an auditor actually wants. You will also know the half-dozen silent failure modes (the wrong patchMode, a dynamic scope that resolves to zero machines, a window 10 minutes too short) that make a run look successful while patching nothing, and the exact az/Resource Graph command to confirm each.

What problem this solves

In production, “we patch monthly” is a sentence with no evidence behind it. The pain is concrete: an auditor asks for the current CVE exposure of 900 servers and you cannot produce it without three spreadsheets and a week. A critical zero-day drops and you have no fleet-wide mechanism to assess who is exposed and remediate in a bounded window. A latency-sensitive database reboots at 14:00 because someone’s cron fired, and you take an outage during clinical hours. Each of these is a governance failure dressed up as a tooling failure, and each is exactly what AUM exists to remove.

What breaks without it: patching becomes per-team, per-OS, and per-cloud, so there is no single answer to “are we compliant?” New machines are born unpatched because onboarding into the patch program is manual and gets forgotten. Reboots are uncoordinated because install and restart are welded together. And the legacy answer – Automation Update Management on a Log Analytics workspace plus the MMA/OMS agent – reached end of support on 31 August 2024, so anything still depending on it is running on a retired stack with no security backstop.

Who hits this: anyone operating more than a handful of VMs, and acutely anyone with a hybrid or multicloud estate, a regulated workload with an audit obligation, or a tier that legally or contractually cannot reboot during business hours. The fix is almost never “patch harder by hand” – it is “declare one schedule, target it by tag, let the platform install at run-time, and report from one query.”

To frame the whole field before the deep dive, here is every capability this article covers, the production pain it removes, and the AUM construct that delivers it:

Capability	Production pain without it	AUM construct that delivers it	First place to look
Fleet assessment	No fleet-wide CVE exposure answer	On-demand + periodic assessment	`patchassessmentresources` in Resource Graph
Out-of-band remediation	Zero-day with no bounded fix mechanism	One-time `install-patches` run	Update Manager -> History
Recurring program	“We patch monthly” with no evidence	Maintenance configuration (`InGuestPatch`)	`maintenanceresources`
Targeting at scale	New machines forgotten at onboarding	Dynamic scopes (ARG tag queries)	`resources` ARG query on tags
Orchestration hooks	Uncoordinated drain/snapshot/validate	Pre/post events via Event Grid	Function/Logic App invocation logs
Hybrid + multicloud	Separate patch stack off-cloud	Arc-enabled servers	`connectedmachine` resources
Reboot control	Outage during business hours	`rebootSetting` + post-event reboot	`installPatches.rebootSetting`
Born-compliant	Drift the moment a VM is created	Policy `DeployIfNotExists` enrolment	Policy compliance blade

Learning objectives

By the end of this article you can:

Explain the two planes of Update Manager (assessment/install vs scheduling) and migrate cleanly off the retired Automation Update Management without hand-translating schedules.
Set every in-scope machine to the correct patch orchestration mode (AutomaticByPlatform + bypassPlatformSafetyChecksOnUserSchedule = true) and prove it with a Resource Graph query before you trust any schedule.
Run on-demand assessment and one-time install runs with the right classification filters, maintenance window, and reboot policy – and read the results out of Resource Graph at fleet scale.
Author production-grade maintenance configurations in Bicep (InGuestPatch scope, window, cadence, reboot, OS-specific include/exclude) and bind machines via dynamic scopes on a governed tag vocabulary rather than static assignments.
Wire pre/post maintenance events to idempotent, fast Event Grid handlers (drain, snapshot, validate) and explain the bounded pre-window contract.
Extend the identical targeting model to Arc-enabled servers in your datacenter, AWS, or GCP, and reason about hotpatching’s baseline-vs-hotpatch reboot cadence.
Enforce prerequisites and report drift with Azure Policy at management-group scope, and build a single Azure-plus-Arc compliance view from Resource Graph.
Diagnose the silent failures – wrong patchMode, zero-resolving scope, window too short, unreachable update source, DeployIfNotExists with no managed identity – with the exact command to confirm each.

Prerequisites & where this fits

You should already understand Azure resource basics: subscriptions, resource groups, tags, and how to run az in Cloud Shell and read JSON output. You should know what an Azure VM is and that VMs carry an osProfile with OS-specific configuration. Familiarity with Azure Policy effects (Audit, DeployIfNotExists, Modify) and a passing knowledge of Kusto (KQL) for Resource Graph queries will let you use the reporting sections directly. No prior exposure to the legacy Automation Update Management is required – if anything it is baggage.

This sits in the Governance & Operations track. It assumes the platform foundation from Azure Policy: Governance at Scale (the enforcement engine AUM leans on) and the targeting fundamentals from Azure Resource Hierarchy Explained (management groups are where you assign the policies). It pairs tightly with Azure Arc-Enabled Servers: Machine Configuration & Extended Security Updates, because Arc is what makes the hybrid story work, and with Azure Monitor & Application Insights for Observability for surfacing compliance in workbooks. If you orchestrate pre/post events with serverless, Azure Functions: Serverless Patterns is the layer those handlers live in.

A quick map of who owns what during a patch program, so you route work to the right team:

Layer	What lives here	Who usually owns it	Failure classes it can cause
Policy / governance	Enrolment, assessment enforcement, drift reporting	Platform / governance team	Machines never enrolled; remediation silently no-ops
Maintenance configuration	Window, cadence, reboot, classifications	Platform / ops team	Window too short; wrong reboot setting
Targeting (dynamic scope)	Tag vocabulary, ring design	Platform + app owners	Scope resolves to zero; wrong ring patched
Machine settings	`patchMode`, `bypass`, assessment mode	VM / app team	Run skipped; platform auto-patches instead
Update source	WSUS, distro repos, egress	Network / Linux / Windows teams	Assessment empty; install fails
Orchestration hooks	Drain, snapshot, validate handlers	App / SRE team	Un-drained run; pre-event timeout

Core concepts

Six mental models make every later decision obvious.

Two planes, one product. The data plane – az vm assess-patches and az vm install-patches – acts on one machine, now. The scheduling plane – a maintenance configuration of scope InGuestPatch – declares when to patch, what classifications, and how to reboot, then machines are associated to it. You use the data plane to learn and to prove; you use the scheduling plane to operate. A patch program that only ever uses the data plane is a person clicking buttons forever.

The orchestration mode is the master switch. Every machine has a patch orchestration mode (patchMode). For a maintenance configuration to install anything, the machine must be AutomaticByPlatform (Azure-orchestrated) and carry bypassPlatformSafetyChecksOnUserSchedule = true so the platform does not also apply its own automatic patches on Microsoft’s cadence and collide with your window. Get this wrong and your run is silently skipped – the single most common “it ran but nothing happened.”

Targeting is a query, not a list. A dynamic scope attaches an Azure Resource Graph filter (over subscriptions, resource groups, locations, OS types, and – above all – tags) to a maintenance configuration. Membership is evaluated at run time, so a VM created an hour before the window, carrying the right tag, is patched with zero manual onboarding. Static assignments rot; dynamic scopes scale.

Arc makes off-Azure machines first-class. An Arc-enabled server – on-prem, in AWS, in GCP – is, to AUM, just another machine. It gets the same assessment, the same one-time runs, the same maintenance configurations and dynamic scopes. There is no per-machine charge for AUM on native Azure VMs; Arc-enabled servers carry a small per-server monthly charge. One targeting model spans every cloud.

Install and reboot are separable. rebootSetting has three values – IfRequired, Always, Never. Setting it to Never lets AUM install packages inside a window but restart nothing, so you can drive the reboot later through a controlled post-event – turning “no reboots during business hours” from a blocker into a scheduling detail. Hotpatching takes this further: on supported Windows Server SKUs, security updates apply without a reboot at all for two of every three months.

A window is a hard stop with a tax. A maintenance window has a duration (minimum 1 hour 30 minutes), and the platform reserves the last 10 minutes to finalize, so effective install time is duration - 10m. AUM stops starting new package installs once the window is exhausted; in-flight installs finish, but anything not yet started is deferred to the next window. Size the window for the slowest machine in the batch.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary at the end repeats these for lookup; this table is the mental model side by side:

Concept	One-line definition	Where it lives	Why it matters
Assessment	Read-only scan of missing updates	Per machine; results in Resource Graph	Source of CVE exposure; never installs
One-time deployment	Ad-hoc install run	`az vm install-patches`	Out-of-band remediation, proving a patch lands
Maintenance configuration	The recurring schedule resource	`Microsoft.Maintenance/maintenanceConfigurations`	The product – when/what/how to patch
`maintenanceScope`	What the config governs	Config property	Must be `InGuestPatch` for guest OS patching
`patchMode`	Orchestration mode of the machine	`osProfile` patch settings	Must be `AutomaticByPlatform` or run is skipped
`bypassPlatformSafetyChecks…`	Suppress platform auto-patch	`osProfile` patch settings	Must be `true` so your schedule owns patching
Dynamic scope	ARG filter binding machines to a config	Configuration assignment	Membership by tag; scales onboarding to zero
Configuration assignment	The binding of a machine/scope to a config	`Microsoft.Maintenance/configurationAssignments`	Static (one machine) or dynamic (a query)
Pre/post event	Hook fired before/after the window	Event Grid on the config	Drain, snapshot, validate, controlled reboot
Arc-enabled server	Off-Azure machine projected into Azure	`Microsoft.HybridCompute/machines`	Same patch model off-cloud; billed per server
Hotpatching	Reboot-less OS security updates	OS profile (WS Azure Ed / WS 2025)	4 reboots/yr instead of 12
`rebootSetting`	Reboot behaviour of a run	`installPatches`	Decouple install from restart
Classification	Update category to include	`windowsParameters`/`linuxParameters`	Critical/Security vs everything
Ring	A wave of the fleet with its own window	Tag value (`PatchGroup`)	Canary -> broad -> sensitive sequencing

Migrating off Automation Update Management

The legacy Automation Update Management (under an Automation account, backed by a Log Analytics workspace) is retired – it reached end of support on 31 August 2024, and the MMA/OMS agent it depended on retired the same month. If any of your patch program still runs on it, migration is overdue, not optional. The two services differ in ways that matter operationally, and the table below is the translation map:

Concern	Automation Update Management (legacy)	Azure Update Manager	Migration action
Dependencies	Log Analytics workspace + Automation account	None – native to VM/Arc platform	Decommission workspace dependency for patching
Agent	Log Analytics agent (MMA/OMS)	No separate agent; VM/Arc extension framework	Remove MMA/OMS after migration
Scheduling	Automation schedules + Update Deployments	Maintenance configurations (`InGuestPatch`)	Recreate via the portal migration tool
Targeting	Saved searches / computer groups	Dynamic scopes (ARG on tags/sub/RG)	Re-express groups as tag filters
Off-Azure	Hybrid Runbook Worker	Arc-enabled servers	Onboard servers to Arc
Reporting	Log Analytics queries	Resource Graph (`patchassessmentresources`)	Rebuild queries/workbooks on ARG
Pre/post tasks	Pre/post scripts in the deployment	Event Grid pre/post events	Re-wire to Functions/Logic Apps
Cost model	Log Analytics ingestion + Automation	Free on Azure VMs; per-server on Arc	Re-baseline the bill

Use Microsoft’s portal migration experience and the supplied runbooks that recreate legacy schedules as maintenance configurations – do not hand-translate. The dynamic-scope mapping (turning saved searches into tag filters) is precisely the part teams get wrong by hand, and the tool reduces the error surface. The migration sequence that avoids a coverage gap:

#	Migration step	Why this order	Verify
1	Inventory legacy schedules + groups	Know the target state before you move	Export Update Deployments list
2	Register AUM resource providers	Prerequisite for any AUM action	`az provider show` = Registered
3	Set `patchMode`/`bypass` on machines	Without it the new schedule no-ops	ARG over `osProfile` patch settings
4	Run the portal migration tool	Recreates schedules as maint. configs	Configs exist in `Microsoft.Maintenance`
5	Validate dynamic scopes resolve	Catch tag-mapping errors pre-cutover	ARG count matches legacy group size
6	Run a canary window on one ring	Prove the new path before fleet cutover	`maintenanceresources` shows installs
7	Disable legacy Update Deployments	Avoid double-patching during overlap	Legacy schedules disabled
8	Remove MMA/OMS agent	Retired dependency, attack surface	Agent absent; assessment still returns

# Providers AUM and its scheduling/policy surface depend on
az provider register --namespace Microsoft.Maintenance
az provider register --namespace Microsoft.Compute
az provider register --namespace Microsoft.PolicyInsights
az provider register --namespace Microsoft.HybridCompute   # Arc-enabled servers

# Confirm they are Registered before doing anything else
az provider show --namespace Microsoft.Maintenance --query registrationState -o tsv

Patch orchestration mode: the master switch

Update Manager itself requires no enablement resource, but it does require that each machine’s update settings allow the platform to orchestrate. The property that controls this is patchMode on the OS profile, and it is the master switch behind every scheduled patch. For Windows the path is osProfile.windowsConfiguration.patchSettings.*; for Linux it is osProfile.linuxConfiguration.patchSettings.*. Here is every value and what it means operationally:

`patchMode` value	OS	Who patches	Works with maintenance config?	When to use
`AutomaticByPlatform`	Win + Linux	The platform, on your schedule (with bypass)	Yes – required for AUM scheduling	Any machine you want AUM to schedule
`AutomaticByOS`	Windows	Windows Update automatically	No (platform owns timing)	Standalone auto-patch, no governance
`Manual`	Windows	You, by hand / your own tooling	No	Fully manual control
`ImageDefault`	Linux	The image’s default (e.g. unattended-upgrades)	No	Legacy/cron patching, not AUM

The crucial pairing is AutomaticByPlatform plus bypassPlatformSafetyChecksOnUserSchedule = true. The assessmentMode property is separate and controls scanning: set it to AutomaticByPlatform for continuous periodic assessment, or ImageDefault for on-demand only. The full setting matrix you must get right per machine:

Setting	Values	Default	What it controls	Set it to	Gotcha if wrong
`patchMode`	`AutomaticByPlatform`, `AutomaticByOS`, `Manual`, `ImageDefault`	varies by image	Who installs updates and when	`AutomaticByPlatform`	Any other value -> schedule installs nothing
`bypassPlatformSafetyChecksOnUserSchedule`	`true` / `false`	`false`	Suppress platform auto-patch so your schedule owns it	`true`	`false` -> platform patches on its own cadence, collides
`assessmentMode`	`AutomaticByPlatform`, `ImageDefault`	`ImageDefault`	Continuous vs on-demand scanning	`AutomaticByPlatform`	`ImageDefault` -> stale exposure data, no periodic scan
`provisionVMAgent` (Win)	`true` / `false`	`true`	VM agent present (prerequisite)	`true`	`false` -> no extension framework, no AUM
`enableAutomaticUpdates` (Win)	`true` / `false`	`true`	Windows Update service enabled	`true` (with platform mode)	`false` -> WU disabled, install can fail

Set the orchestration mode explicitly on an existing Linux VM, then hand control to your schedule:

# Put an existing Linux VM into customer-managed scheduled patching
az vm update \
  --resource-group rg-fleet-prod \
  --name vm-app-01 \
  --set osProfile.linuxConfiguration.patchSettings.patchMode=AutomaticByPlatform \
  --set osProfile.linuxConfiguration.patchSettings.assessmentMode=AutomaticByPlatform

# Hand control to YOUR maintenance schedule (suppress platform auto-patching)
az vm update \
  --resource-group rg-fleet-prod \
  --name vm-app-01 \
  --set osProfile.linuxConfiguration.patchSettings.bypassPlatformSafetyChecksOnUserSchedule=true

For Windows, swap to osProfile.windowsConfiguration.patchSettings.patchMode=AutomaticByPlatform. In Bicep, bake it into the VM definition so machines are born correct rather than reconciled later:

// VM born with platform orchestration + bypass so a maintenance config can drive it
properties: {
  osProfile: {
    linuxConfiguration: {
      patchSettings: {
        patchMode: 'AutomaticByPlatform'
        assessmentMode: 'AutomaticByPlatform'
        // bypass lives under automaticByPlatformSettings on some API versions:
        automaticByPlatformSettings: {
          bypassPlatformSafetyChecksOnUserSchedule: true
        }
      }
    }
  }
}

Prove the whole fleet is set correctly with one Resource Graph query – never trust a schedule until this returns clean:

// Machines NOT correctly configured for AUM scheduling (the silent-failure hunt)
resources
| where type in~ ("microsoft.compute/virtualmachines", "microsoft.hybridcompute/machines")
| extend ps = properties.osProfile.linuxConfiguration.patchSettings
| extend pw = properties.osProfile.windowsConfiguration.patchSettings
| extend mode = tostring(coalesce(ps.patchMode, pw.patchMode))
| extend bypass = tobool(coalesce(
    ps.automaticByPlatformSettings.bypassPlatformSafetyChecksOnUserSchedule,
    pw.automaticByPlatformSettings.bypassPlatformSafetyChecksOnUserSchedule))
| where mode != "AutomaticByPlatform" or bypass != true
| project name, type, mode, bypass, resourceGroup

On-demand assessment and one-time deployments

Before scheduling anything, learn what the fleet actually needs. Assessment is read-only: it queries each machine’s update source (Windows Update / WSUS for Windows; the distro package manager for Linux) and reports missing updates by classification and KB/package. It installs nothing. Run it on one machine:

# One-off assessment of a single machine (results land in Update Manager)
az vm assess-patches \
  --resource-group rg-fleet-prod \
  --name vm-app-01

Drive it at fleet scale and read the results out of Azure Resource Graph, the only sane way to query patch state across hundreds of machines. The patchassessmentresources table holds both the per-machine summary and the individual patches children:

// Machines with pending CRITICAL or SECURITY updates, Azure VMs and Arc together
patchassessmentresources
| where type =~ "microsoft.compute/virtualmachines/patchassessmentresults/softwarepatches"
   or type =~ "microsoft.hybridcompute/machines/patchassessmentresults/softwarepatches"
| where properties.classifications has_any ("Critical", "Security")
| where properties.patchState =~ "Available"
| extend machine = tostring(split(id, "/")[8])
| summarize pendingUpdates = count() by machine, tostring(properties.classifications)
| order by pendingUpdates desc

When you need to remediate now – an out-of-band CVE – run a one-time deployment (an install run). Filter by classification, give it an explicit maximum duration in minutes, and choose a reboot policy:

# Install only Critical + Security updates, 120-minute window, reboot only if required
az vm install-patches \
  --resource-group rg-fleet-prod \
  --name vm-app-01 \
  --maximum-duration PT120M \
  --reboot-setting IfRequired \
  --classifications-to-include-linux Critical Security

For Windows, swap to --classifications-to-include-win and you can pin or block specific KBs. Every classification value, per OS, and when to include it:

Classification	OS	What it covers	Include in scheduled runs?
`Critical`	Win + Linux	Critical-severity fixes	Always
`Security`	Win + Linux	Security updates	Always
`UpdateRollUp`	Windows	Cumulative roll-ups	Usually
`FeaturePack`	Windows	New feature packages	Rarely – test first
`ServicePack`	Windows	Service packs	Rarely – test first
`Definition`	Windows	AV/defender definitions	Often (fast-moving)
`Tools`	Windows	Utilities	Optional
`Updates`	Windows	Non-security updates	Optional
`Other`	Linux	Distro “other” bucket	Optional

The az vm install-patches flags you will actually use, with their values and effect:

Flag	Values	Default	Effect	Gotcha
`--maximum-duration`	ISO 8601 e.g. `PT120M`	required	Hard stop on starting new installs	Size for the slowest machine; last bit is reserved
`--reboot-setting`	`IfRequired`, `Always`, `Never`	`IfRequired`	Reboot behaviour	`Never` decouples install from restart
`--classifications-to-include-win`	Windows classifications	none	What to install (Windows)	Empty = nothing installs
`--classifications-to-include-linux`	Linux classifications	none	What to install (Linux)	Empty = nothing installs
`--kb-numbers-to-include`	KB list	none	Pin specific KBs (Windows)	Overrides classification filter union semantics
`--kb-numbers-to-exclude`	KB list	none	Block known-bad KBs (Windows)	Excludes win even if classification would include
`--packages-to-include`	package names	none	Pin specific packages (Linux)	Distro-specific naming
`--packages-to-exclude`	package names	none	Block packages (Linux)	Use to hold back a problematic package

The --reboot-setting values decoded – this is the lever the whole reboot-control story hinges on:

`--reboot-setting`	Behaviour	Use when
`IfRequired`	Reboot only if an installed update needs it	Default; balances currency and disruption
`Always`	Reboot after the run regardless	Force a clean state; maintenance windows that expect it
`Never`	Install, never restart	Restart-sensitive tiers; reboot driven by post-event later

After any install run, re-assess and confirm the pending count dropped – proving the data plane works before you trust a schedule. The result statuses you will see and what each means:

Run status	Meaning	Likely cause	Next step
`Succeeded`	All in-scope updates installed	–	Re-assess to confirm zero pending
`CompletedWithWarnings`	Some updates failed / pending reboot	A KB failed, or window cut it short	Inspect per-update detail; re-run
`Failed`	Run could not complete	Update source unreachable, agent issue	Check egress + agent; see playbook
`InProgress`	Still installing	Long window, large batch	Wait; do not start a second run
`NotStarted` / skipped	Run never began on the machine	`patchMode` wrong, machine off	Fix orchestration mode; power on

Maintenance configurations, schedules, and reboot settings

This is the heart of AUM. A maintenance configuration of scope InGuestPatch is a first-class Azure resource that declares when to patch, what classifications to include, and how to handle reboots. Machines are then associated to it – statically or, far better, via dynamic scopes (next section). Every field that matters, its format, default, and the trap in each:

Field	Format / values	Default	What it controls	Trap
`maintenanceScope`	`InGuestPatch` (for OS patching)	–	What the config governs	Wrong scope = not a guest-patch schedule
`extensionProperties.InGuestPatchMode`	`User` / `Platform`	–	Treats config as a user (AUM) schedule	Omit -> config ignored by AUM
`maintenanceWindow.startDateTime`	`YYYY-MM-DD HH:mm`	required	First window start	Local vs UTC confusion
`maintenanceWindow.duration`	`HH:mm`, min `01:30`	–	Window length	Last 10 min reserved; effective = duration - 10m
`maintenanceWindow.timeZone`	IANA/Windows TZ name	UTC	Window time zone	DST shifts the wall-clock window
`maintenanceWindow.recurEvery`	`1Day`, `1Week`, `Month Third Sunday`	–	Cadence	Monthly expression syntax is exact
`installPatches.rebootSetting`	`IfRequired`, `Always`, `Never`	`IfRequired`	Reboot behaviour	`Never` needs a post-event to ever reboot
`installPatches.windowsParameters`	classifications + KB include/exclude	–	Windows patch selection	Empty classifications = nothing installs
`installPatches.linuxParameters`	classifications + package include/exclude	–	Linux patch selection	Distro package naming

The recurEvery cadence expressions, with concrete examples:

Cadence intent	`recurEvery` expression	Notes
Every day	`1Day`	Aggressive; rings/canaries
Every week	`1Week`	Common for non-prod
Every N days	`7Days`, `14Days`	Numeric multiplier
Monthly, nth weekday	`Month Third Sunday`	“Patch Tuesday + a week” pattern
Monthly, last weekday	`Month Last Saturday`	End-of-month window
Monthly, specific day	`Month day23`	Calendar-day cadence

Here is a production-grade monthly Windows configuration in Bicep – patching on the third Sunday at 02:00 UTC with a 3-hour window, blocking a known-bad KB:

resource patchMonthly 'Microsoft.Maintenance/maintenanceConfigurations@2023-10-01-preview' = {
  name: 'mc-win-prod-monthly'
  location: 'eastus2'
  properties: {
    maintenanceScope: 'InGuestPatch'
    extensionProperties: {
      // Required so the config is treated as a guest-patch (AUM) schedule
      InGuestPatchMode: 'User'
    }
    maintenanceWindow: {
      startDateTime: '2026-06-21 02:00'
      duration: '03:00'
      timeZone: 'UTC'
      recurEvery: 'Month Third Sunday'
    }
    installPatches: {
      rebootSetting: 'IfRequired'
      windowsParameters: {
        classificationsToInclude: [
          'Critical'
          'Security'
          'UpdateRollUp'
        ]
        kbNumbersToExclude: [
          'KB5099999'   // block a known-bad KB until validated
        ]
      }
    }
  }
}

Deploy it and you have a schedule with no machines yet – intentionally. Never staple machines into a config by hand at scale; declare the intent (what it patches, when) and let dynamic scopes decide membership. To associate a single machine explicitly when you must, use a configuration assignment:

az maintenance assignment create \
  --resource-group rg-fleet-prod \
  --resource-name vm-app-01 \
  --resource-type virtualMachines \
  --provider-name Microsoft.Compute \
  --configuration-assignment-name assign-vm-app-01 \
  --maintenance-configuration-id "/subscriptions/<sub>/resourceGroups/rg-fleet-prod/providers/Microsoft.Maintenance/maintenanceConfigurations/mc-win-prod-monthly"

Static assignment versus dynamic scope – when to reach for each:

Dimension	Static assignment	Dynamic scope
Membership	One named machine	ARG query (tags/sub/RG/OS)
New-machine onboarding	Manual, per machine	Automatic on next window
Scale	Tens of machines	Hundreds to thousands
Drift risk	High – forgotten machines	Low – query-driven
Reproducibility	Per-resource assignment	One scope in IaC
When to use	A one-off exception	The fleet, every ring

The most common silent failure: the VM’s patchMode is not AutomaticByPlatform, or bypassPlatformSafetyChecksOnUserSchedule is false. The maintenance run is skipped with a status that looks benign. Reconcile machine settings to the schedule before you trust the schedule – run the ARG hunt from the previous section.

Dynamic scopes and tag-based targeting at scale

A dynamic scope attaches an Azure Resource Graph filter to a maintenance configuration. Membership is evaluated at run time, so a newly created VM that carries the right tag is patched on the next window with zero manual onboarding. This is the difference between a patch program that scales and one that rots. Filters are expressed over several dimensions; the one you will lean on is tags:

Filter dimension	Example value	Operator semantics	Notes
Subscriptions	`/subscriptions/<id>`	In-list	Scope across multiple subs
Resource groups	`rg-fleet-prod`	In-list	Narrow to an RG
Resource types	`microsoft.compute/virtualmachines`	In-list	VMs vs Arc machines
Locations	`eastus2`, `centralus`	In-list	Region-bounded windows
OS types	`Windows`, `Linux`	In-list	Per-OS configs
Tags	`PatchGroup=ring1`	`All` (AND) or `Any` (OR)	The primary targeting axis

Define a small, governed tag vocabulary up front and bind one configuration per ring. The recommended vocabulary:

Tag	Values	Purpose	Audit note
`PatchGroup`	`ring0`, `ring1`, `ring2`, `exempt`	Wave / ring membership	`exempt` routes to no config – audited separately
`Environment`	`prod`, `nonprod`, `dev`	Environment dimension	Combine with ring for prod-only windows
`OSFamily`	`windows`, `linux`	Optional OS hint	Redundant with OS-type filter but readable
`Owner`	team alias	Accountability	Who to call when a ring fails

Attach a dynamic scope binding all machines tagged PatchGroup=ring1 in chosen subscriptions and regions:

# Attach a dynamic scope: all machines tagged PatchGroup=ring1
az maintenance assignment create-or-update-subscription \
  --maintenance-configuration-id "/subscriptions/<sub>/resourceGroups/rg-fleet-prod/providers/Microsoft.Maintenance/maintenanceConfigurations/mc-win-prod-monthly" \
  --configuration-assignment-name "scope-ring1" \
  --filter-tags '{"PatchGroup":["ring1"]}' \
  --filter-tags-operator "All" \
  --filter-os-types "Windows" \
  --filter-locations "eastus2" "centralus"

The CLI surface for dynamic scopes has churned across versions; many teams declare scopes in Bicep/ARM alongside the configuration so they are reproducible. The decision that matters is not syntax, it is ring design. The reference ring model:

Ring	Tag	Membership	Window timing	Reboot posture	Risk tolerance
`ring0` (canary)	`PatchGroup=ring0`	Build agents, a few non-critical app servers	Earliest (e.g. 1st Sat)	`IfRequired`	Highest – catch bad patches here
`ring1` (broad)	`PatchGroup=ring1`	The bulk of the fleet	A few days after ring0	`IfRequired`	Medium
`ring2` (sensitive)	`PatchGroup=ring2`	Latency/availability-critical tier	Last; off-hours	`Never` + post-event reboot	Lowest
`exempt`	`PatchGroup=exempt`	Deliberate, time-boxed exceptions	None	n/a	Audited separately

Validate that a scope resolves to the machines you expect before the window fires, using the same Resource Graph query AUM evaluates – a scope that resolves to zero is the number-one cause of “the schedule ran but nothing happened”:

resources
| where type in~ ("microsoft.compute/virtualmachines", "microsoft.hybridcompute/machines")
| where tags["PatchGroup"] =~ "ring1"
| project name, type, location, resourceGroup, subscriptionId

A scope-design decision table – match the symptom to the cause:

If you see…	It’s probably…	Do this
Scope resolves to 0 machines	Tag typo or wrong sub/region in the filter	Run the ARG query manually; fix the filter
A machine patched by the wrong ring	Two configs both match its tags	Make tags mutually exclusive; one ring per machine
New VM not patched on first window	Tag applied after the window evaluated, or missing	Tag at create time (IaC/Policy), confirm before window
`exempt` machine still patched	A broad scope (e.g. by RG) overrides the tag	Exclude `exempt` explicitly or avoid RG-wide scopes
Arc machine not in scope	Resource type filter excludes hybridcompute	Include `microsoft.hybridcompute/machines`

Pre and post maintenance events with automation hooks

Patching is rarely just patching. You drain a load balancer first, you quiesce a database, you snapshot a disk, you re-run smoke tests after. AUM exposes this through pre and post maintenance events delivered via Event Grid on the maintenance configuration. A pre-event fires before the window starts; a post-event after it completes. Subscribe an Azure Function, Logic App, Automation runbook, or webhook and you have orchestration hooks without bolting on a separate scheduler. The two event types and their contract:

Event type	Fires	Bounded by	Run proceeds when	Use it for
`Microsoft.Maintenance.PreMaintenanceEvent`	Before the window	~20-minute pre-window	Handler completes or times out	Drain node, snapshot, quiesce DB
`Microsoft.Maintenance.PostMaintenanceEvent`	After the window	(post-completion)	–	Validate health, queue/release reboots

The handler options, and when each fits:

Endpoint type	Latency	State / orchestration	Best for
Azure Function	Low	Stateless (or Durable for long flows)	Fast drain/snapshot triggers, validation
Logic App	Medium	Visual, connectors, stateful	Multi-step approvals, ticketing integration
Automation runbook	Medium	PowerShell/Python, hybrid worker	Existing runbook estates, on-prem actions
Webhook	Low	Whatever you build	Custom orchestrators, ChatOps

Wire a pre-event to a Function that cordons machines and a post-event that validates health:

# Pre-maintenance event -> Function App endpoint
az eventgrid event-subscription create \
  --name pre-maint-drain \
  --source-resource-id "/subscriptions/<sub>/resourceGroups/rg-fleet-prod/providers/Microsoft.Maintenance/maintenanceConfigurations/mc-win-prod-monthly" \
  --endpoint-type azurefunction \
  --endpoint "/subscriptions/<sub>/resourceGroups/rg-ops/providers/Microsoft.Web/sites/fn-patch-orchestrator/functions/PreMaintenanceDrain" \
  --included-event-types Microsoft.Maintenance.PreMaintenanceEvent

# Post-maintenance event -> validation Function
az eventgrid event-subscription create \
  --name post-maint-validate \
  --source-resource-id "/subscriptions/<sub>/resourceGroups/rg-fleet-prod/providers/Microsoft.Maintenance/maintenanceConfigurations/mc-win-prod-monthly" \
  --endpoint-type azurefunction \
  --endpoint "/subscriptions/<sub>/resourceGroups/rg-ops/providers/Microsoft.Web/sites/fn-patch-orchestrator/functions/PostMaintenanceValidate" \
  --included-event-types Microsoft.Maintenance.PostMaintenanceEvent

The contract to internalize: the pre-event handler runs inside a bounded pre-window (on the order of ~20 minutes) and the maintenance run proceeds when it completes or times out. Keep handlers idempotent and fast – this is the wrong place for a 30-minute backup. Use it to call the operation (kick off a snapshot, drain a node) and let the long-running work happen asynchronously, with the post-event reconciling state. Good versus bad handler patterns:

Handler concern	Do	Don’t	Why
Duration	Trigger async work, return fast	Run a 30-min backup inline	Pre-window is ~20 min; you will time out
Idempotency	Make re-runs safe (no double-drain)	Assume exactly-once delivery	Event Grid can redeliver
Failure handling	Fail closed for safety-critical drains	Swallow errors silently	A silent failure patches an un-drained node
Long work	Snapshot kicked off, reconciled in post	Block the pre-event on completion	Decouple trigger from completion
Reboot control	Queue reboots in post, release in window	Reboot mid-business-hours	Post-event is where controlled reboot lives

Hybrid and multicloud patching via Azure Arc-enabled servers

Update Manager’s real leverage is that an Arc-enabled server is, to AUM, just another machine. Onboard a server in your datacenter, in AWS, or in GCP, and it gets the same assessment, the same one-time deployments, the same maintenance configurations and dynamic scopes. There is no per-machine charge for Update Manager on native Azure VMs; for Arc-enabled servers, Update Manager is billed (a small per-server monthly charge) – budget for it, but it is far cheaper than running a parallel patch stack off-cloud. Native VM versus Arc server, feature by feature:

Aspect	Native Azure VM	Arc-enabled server
AUM charge	Free	Small per-server / month
Agent	VM agent (built-in)	Connected Machine agent (`azcmagent`)
`patchMode` set via	`az vm update` / VM `osProfile`	`az connectedmachine update` / Policy
Update source	Windows Update / distro repo	WSUS / internal repo / distro repo
Egress requirement	Platform-managed	Outbound 443 to Arc + update endpoints
Dynamic scope membership	By tag/sub/RG/OS	Identical – tag at connect time
Hotpatching	WS Azure Ed / WS 2025	WS 2025 (Arc) under subscription

Connect a Linux server to Arc, tagging it at connect time so an existing scope picks it up:

# On the target server (one-shot install + connect)
sudo azcmagent connect \
  --resource-group "rg-arc-servers" \
  --tenant-id "<tenant-id>" \
  --location "eastus2" \
  --subscription-id "<sub>" \
  --cloud "AzureCloud" \
  --tags "PatchGroup=ring1,Environment=prod"

Because you tagged the machine on connect, your existing ring1 dynamic scope picks it up automatically – no separate onboarding into AUM. That is the whole point: one targeting model spanning Azure, on-prem, and other clouds. The connectivity requirements that bite hybrid fleets, and how to satisfy each:

Requirement	Why	How to satisfy	Confirm
Outbound 443 to Arc endpoints	Agent heartbeat, config pull	Allow `.his.arc.azure.com`, `.guestconfiguration.azure.com` etc.	`azcmagent check`
Proxy support (if behind one)	No direct egress	`azcmagent config set proxy.url http://proxy:8080`	`azcmagent show` proxy line
Reachable update source (Windows)	Assessment + install need it	Point at internal WSUS or Windows Update	Assessment returns rows
Reachable distro repos (Linux)	Package manager needs them	Mirror/repo reachable; air-gapped needs a local mirror	`apt`/`yum` update succeeds
`patchMode` on the machine	Same master switch applies	`az connectedmachine update` patch settings	ARG over `osProfile`

Set the orchestration mode on an Arc machine the same way conceptually, via the connected-machine surface:

# Arc machines honour the same patchMode concept; set it via the connectedmachine surface
az connectedmachine update \
  --resource-group rg-arc-servers \
  --name arc-records-01 \
  --set properties.osProfile.windowsConfiguration.patchSettings.patchMode=AutomaticByPlatform

Hotpatching and Windows Server orchestration patterns

For supported Windows Server SKUs, hotpatching installs OS security updates without a reboot by patching in-memory code, dramatically shrinking your reboot-driven maintenance windows. It is available on Windows Server Azure Edition (Datacenter: Azure Edition) and, more recently, on Windows Server 2025 – including, notably, Arc-enabled Windows Server 2025 machines under a subscription. The cadence is the pattern to internalize:

Month type	Months	What ships	Reboot?
Baseline	Jan, Apr, Jul, Oct	Cumulative update	Yes – required
Hotpatch	The two months after each baseline	Security fixes patched in-memory	No

So a year is four reboots, not twelve, with no loss of security coverage. Where hotpatch is and is not available:

Platform	Hotpatch support	Notes
Windows Server Datacenter: Azure Edition	Yes	The original hotpatch SKU
Windows Server 2025 (Azure VM)	Yes	Broader availability
Windows Server 2025 (Arc, under subscription)	Yes	Hotpatch reaches hybrid
Windows Server 2022/2019 Standard/Datacenter	No	Standard cumulative + reboot
Linux	N/A	Distro live-patch is separate, not AUM hotpatch

The orchestration implication: design your maintenance configuration with rebootSetting: IfRequired, and the platform reboots only on baseline months and skips it on hotpatch months automatically. You do not script the calendar; AUM and the hotpatch service handle it. Enable hotpatch on the OS profile:

// Windows Server Azure Edition VM with hotpatch enabled
properties: {
  osProfile: {
    windowsConfiguration: {
      provisionVMAgent: true
      enableAutomaticUpdates: true
      patchSettings: {
        patchMode: 'AutomaticByPlatform'
        enableHotpatching: true
      }
    }
  }
}

Even where hotpatch is not available, the orchestration pattern holds: separate install from reboot using Never on disruption-sensitive tiers, then drive the reboot through a controlled post-event so it lands inside an approved restart window rather than mid-patch. The reboot-decoupling patterns side by side:

Pattern	How	Reboots/yr	Window disruption	Best for
Hotpatch (`IfRequired` + hotpatch on)	Platform skips reboot on hotpatch months	~4	Minimal	WS Azure Ed / WS 2025
Install now, reboot later (`Never` + post-event)	AUM installs; post-event reboots off-hours	As needed	Deferred to approved window	Restart-sensitive tiers, no hotpatch
Standard (`IfRequired`)	Reboot whenever an update needs it	~12	Per window	General fleet
Always reboot (`Always`)	Force clean state every run	Per window	Highest	Machines that must restart to apply config

Reporting, compliance dashboards, and Policy-driven enforcement

A patch program you cannot report on is a liability. AUM surfaces compliance in the portal, but the durable answer is Azure Policy – it both enforces the prerequisites (so new machines are born compliant) and reports drift across the estate. There are built-in policy definitions for exactly this; assign them at a management group so the whole tenant inherits them:

Built-in policy	Effect	What it does	Why you need it
Configure periodic checking for missing system updates on Azure VMs	`Modify` / DINE	Sets `assessmentMode = AutomaticByPlatform`	Continuous exposure data, no stale scans
Schedule recurring updates using Azure Update Manager	`DeployIfNotExists`	Associates in-scope machines to a maintenance config	New VMs auto-enrol; no forgotten onboarding
Configure periodic checking on Arc machines	`Modify` / DINE	Assessment mode on Arc servers	Hybrid parity for exposure data
Machines should be configured to periodically check for missing updates	`Audit`	Reports machines not in periodic assessment	Drift visibility before you enforce

Enforce periodic assessment tenant-wide via the built-in policy. The critical part is the managed identity – without it, the policy reports but never acts:

# Enforce periodic assessment tenant-wide via the built-in policy
az policy assignment create \
  --name "enforce-periodic-assessment" \
  --display-name "AUM: periodic assessment on all VMs" \
  --scope "/providers/Microsoft.Management/managementGroups/mg-platform" \
  --policy "59efceea-0c96-497e-a4a1-4eb2290dac15" \
  --mi-system-assigned --location eastus2 \
  --role "Contributor"

DeployIfNotExists and Modify policies need a managed identity with the right role at the assigned scope. Skip the --mi-system-assigned / --role and remediation tasks silently fail to deploy – the assignment shows compliant-looking definitions but never acts. Always provision the identity.

The role each policy effect requires at the assigned scope:

Effect	Needs MI?	Typical role	If omitted
`Audit`	No	–	Reports only; no change
`Modify`	Yes	Contributor (or scoped role)	Tags/settings never applied
`DeployIfNotExists`	Yes	Contributor + resource-specific	Remediation never deploys; looks compliant
`Deny`	No	–	Blocks non-compliant creates

Report compliance from Resource Graph so it feeds a workbook or your existing dashboards rather than living only in the AUM blade:

// Fleet patch-compliance rollup: compliant vs non-compliant by environment
patchassessmentresources
| where type =~ "microsoft.compute/virtualmachines/patchassessmentresults"
   or type =~ "microsoft.hybridcompute/machines/patchassessmentresults"
| extend pending = toint(properties.availablePatchCountByClassification.security)
                 + toint(properties.availablePatchCountByClassification.critical)
| extend state = iff(pending == 0, "Compliant", "NonCompliant")
| join kind=leftouter (
    resources
    | project id = tolower(id), env = tostring(tags["Environment"])
  ) on $left.id == $right.id
| summarize machines = count() by state, env
| order by env asc, state asc

The Resource Graph tables you will query for patch reporting, and what each holds:

Table	Holds	Key columns	Use for
`patchassessmentresources`	Assessment summary + per-patch children	`classifications`, `patchState`, `availablePatchCountByClassification`	Exposure, pending counts
`patchinstallationresources`	Install run results + per-patch	`installationState`, `patchName`	What actually installed
`maintenanceresources`	Maintenance config + run history	`maintenanceScope`, run status	Did the schedule run?
`resources`	Machines + tags + `osProfile`	`tags`, `patchSettings`	Scope validation, `patchMode` hunt

Architecture at a glance

The diagram traces patch orchestration as it actually flows, left to right, across the four planes that make AUM work – and marks the five places a run silently does nothing. Read it as a control loop. On the left, the control plane is authored once as code: Azure Policy enrols machines and enforces assessment, a maintenance configuration (InGuestPatch, window ≥ 1h30m) declares the schedule, and a dynamic scope – an Azure Resource Graph query over the PatchGroup tag – decides membership at run time. That intent flows into the orchestration plane, where the AUM engine assesses and installs (only if the machine is AutomaticByPlatform with bypass = true) and an optional pre/post Event Grid handler drains, snapshots, and validates inside a bounded pre-window.

From orchestration the same schedule fans out to two execution targets: the Azure fleet (no per-VM charge, including hotpatch-capable Windows Server SKUs that take four reboots a year instead of twelve) and the hybrid/multicloud estate of Arc-enabled servers in your datacenter, AWS, or GCP, each of which must reach its update source – WSUS or a distro repo – over outbound 443. Both targets emit assessment data into the report-and-enforce plane, where Resource Graph (patchassessmentresources) and a compliance workbook give one queryable Azure-plus-Arc view, and detected drift loops back to Policy for remediation. The five numbered badges mark the silent failures: a window too short or wrong scope, a dynamic scope that resolves to zero machines, the wrong patchMode/bypass, a pre-event that times out, and an unreachable update source. Each badge in the legend reads as symptom · how to confirm · fix – the same diagnostic loop the playbook below expands.

Real-world scenario

Meridian Health Systems, a healthcare ISV, ran ~900 servers split across Azure (Windows + Linux app tiers) and two on-prem datacenters still hosting a regulated records system that legally could not move to the cloud yet. Their old world was Automation Update Management for the Azure VMs and a hand-maintained WSUS-plus-cron arrangement on-prem. When the legacy service hit end of support, two constraints collided: an external auditor required a single, queryable compliance view across the entire estate, and the on-prem records servers had a hard rule – no unscheduled reboots during clinical hours (06:00-22:00 local), ever.

They solved it with one targeting model. The on-prem servers were onboarded to Arc and tagged PatchGroup=ring2,Environment=prod at connect time, which dropped them straight into an existing dynamic scope – no bespoke onboarding. The Azure app tiers were tagged ring0 (a thin canary of build agents and two non-critical app servers) and ring1 (the broad fleet). Every machine’s patchMode was driven to AutomaticByPlatform with bypass = true by an Azure Policy Modify assignment at the platform management group, so newly created VMs were born correct. The reboot constraint on ring2 was handled by splitting install from restart: the ring2 maintenance configuration used rebootSetting: Never, so AUM installed packages inside a late-evening window but never restarted anything itself. A post-maintenance Event Grid handler then queued required reboots and released them only after 22:00 via a controlled runbook, machine by machine, with health checks between. Finally, all compliance reporting – Azure and Arc alike – came from a single Resource Graph query feeding one workbook, which is exactly the artifact the auditor wanted.

The first scheduled window did not go cleanly, and the failure is instructive. The ring1 run completed with a green status but the pending-update count barely moved. The cause was the classic one: a subset of ring1 VMs had been created from an older image whose patchMode was ImageDefault, and the Policy Modify remediation task had never run because the assignment was created without a managed identity – it reported compliant definitions but never acted. The ARG patchMode hunt surfaced 140 machines in the wrong mode within seconds. They added the system-assigned identity and Contributor role to the assignment, kicked a remediation task, re-validated the hunt to zero, and the next window patched all 900. The reboot-suppressing slice of the ring2 config:

installPatches: {
  rebootSetting: 'Never'   // install in-window; reboots handled by post-event after clinical hours
  windowsParameters: {
    classificationsToInclude: [ 'Critical', 'Security' ]
  }
}

The lessons the team took away: Arc plus dynamic scopes collapsed three patch programs into one; decoupling install from reboot turned a hard compliance constraint into a scheduling detail rather than a blocker; and a DeployIfNotExists/Modify policy without a managed identity is worse than no policy, because it looks like governance while doing nothing.

Advantages and disadvantages

The native-platform, schedule-declared model both removes a huge amount of operational toil and introduces a handful of sharp edges. Weigh it honestly:

Advantages (why this model helps you)	Disadvantages (why it bites)
No Log Analytics workspace, Automation account, or dedicated agent – it is native to the VM/Arc platform	The orchestration mode (`patchMode` + `bypass`) is an easy-to-miss prerequisite; get it wrong and runs silently no-op
One targeting model (dynamic scopes on tags) spans Azure, on-prem, AWS, GCP via Arc	Off-Azure machines need reachable update sources and 443 egress – air-gapped/WSUS estates need real network work
Free on native Azure VMs; only Arc servers carry a small per-server charge	Arc per-server billing must be budgeted; large hybrid fleets add up
`DeployIfNotExists` policy makes machines born-compliant and auto-enrolled	A policy without a managed identity looks compliant but never acts – a dangerous false signal
Install and reboot are separable (`Never` + post-event); hotpatch removes most reboots entirely	Reboot decoupling adds an orchestration handler you must build, test, and keep idempotent
Single Resource Graph query gives one Azure+Arc compliance view for audit	Reporting lives in ARG/KQL, not a turnkey dashboard – you build the workbook
Dynamic scopes evaluate at run time, so new machines need zero manual onboarding	A scope that resolves to zero (tag typo) fails silently – “ran but nothing happened”
Pre/post events integrate drain/snapshot/validate without a separate scheduler	The ~20-minute pre-window forces async design; long inline work times out

The model is right for any fleet beyond a handful of VMs, and especially for hybrid/multicloud and regulated estates that need one auditable answer. It bites hardest on teams that treat AUM as a button (never declaring a schedule), estates with restrictive egress (Arc machines that cannot reach their update source), and anyone who assigns a remediation policy without the identity that lets it act. Every disadvantage is manageable – but only if you know it exists, which is the point of the playbook below.

Hands-on lab

Stand up a single Linux VM, put it under platform orchestration, prove an assessment and a one-time install work, then attach it to a maintenance configuration – all free-tier-friendly on native Azure VMs (AUM has no per-VM charge; you pay only for the VM). Run in Cloud Shell (Bash) and tear it down at the end.

Step 1 – Variables and resource group.

RG=rg-aum-lab
LOC=eastus2
VM=vm-aum-lab
az group create -n $RG -l $LOC -o table

Step 2 – Register the providers (idempotent; skips if already registered).

az provider register --namespace Microsoft.Maintenance
az provider show --namespace Microsoft.Maintenance --query registrationState -o tsv
# Expected: Registered

Step 3 – Create a small Ubuntu VM born with platform orchestration.

az vm create -g $RG -n $VM --image Ubuntu2204 --size Standard_B1s \
  --admin-username azureuser --generate-ssh-keys \
  --patch-mode AutomaticByPlatform -o table

Expected: a VM row with a public IP. The --patch-mode AutomaticByPlatform flag sets the orchestration mode at create time.

Step 4 – Set the bypass flag so YOUR schedule will own patching.

az vm update -g $RG -n $VM \
  --set osProfile.linuxConfiguration.patchSettings.bypassPlatformSafetyChecksOnUserSchedule=true \
  --set osProfile.linuxConfiguration.patchSettings.assessmentMode=AutomaticByPlatform

Step 5 – Assess the machine and read the result from Resource Graph.

az vm assess-patches -g $RG -n $VM -o table
# Then query the result (may take a minute to surface):
az graph query -q "patchassessmentresources
| where id contains '$VM'
| project name, type, properties" -o jsonc

Expected: an assessment summary with availablePatchCountByClassification. Non-zero counts mean updates are pending.

Step 6 – Run a one-time install of Critical + Security, no reboot.

az vm install-patches -g $RG -n $VM \
  --maximum-duration PT60M \
  --reboot-setting Never \
  --classifications-to-include-linux Critical Security -o table

Expected: a run that returns Succeeded or CompletedWithWarnings. Re-run Step 5’s assessment to confirm the pending count dropped – this proves the data plane before you trust a schedule.

Step 7 – Create a daily maintenance configuration and attach the VM.

MCID=$(az maintenance configuration create -g $RG --resource-name mc-lab-daily -l $LOC \
  --maintenance-scope InGuestPatch \
  --extension-properties InGuestPatchMode=User \
  --duration 01:30 --recur-every 1Day --start-date-time "2026-06-09 03:00" --time-zone "UTC" \
  --reboot-setting IfRequired \
  --query id -o tsv)

az maintenance assignment create -g $RG --resource-name $VM --resource-type virtualMachines \
  --provider-name Microsoft.Compute \
  --configuration-assignment-name assign-lab \
  --maintenance-configuration-id "$MCID" -o table

Expected: the assignment is created. The VM is now associated to a daily schedule; the next window will assess and install per the config.

Step 8 – Verify the association and the machine’s orchestration mode.

az graph query -q "resources
| where name == '$VM'
| extend mode = properties.osProfile.linuxConfiguration.patchSettings.patchMode
| extend bypass = properties.osProfile.linuxConfiguration.patchSettings.automaticByPlatformSettings.bypassPlatformSafetyChecksOnUserSchedule
| project name, mode, bypass" -o jsonc
# Expected: mode = AutomaticByPlatform, bypass = true

Step 9 – Teardown.

az group delete -n $RG --yes --no-wait

This deletes the VM, the maintenance configuration, and the assignment in one shot.

Common mistakes & troubleshooting

This is the differentiator. Patch programs fail quietly – a green status that patched nothing is worse than a red one. The playbook below is the structured map: symptom → root cause → confirm (exact command/portal path) → fix. Scan it first; the prose after expands the worst offenders.

#	Symptom	Root cause	Confirm (exact command / portal path)	Fix
1	Run shows green, pending count barely drops	`patchMode` not `AutomaticByPlatform` or `bypass=false`	ARG `patchMode` hunt over `osProfile` patch settings	Set `patchMode=AutomaticByPlatform` + `bypass=true`
2	“Schedule ran but nothing happened”	Dynamic scope resolves to zero machines	Run the scope’s ARG query manually; count rows	Fix the tag/filter; re-validate count before window
3	Only some machines in a ring patched	Mixed `patchMode` across the ring (old image)	ARG hunt filtered to that ring’s tag	Remediate via Policy `Modify`; re-run with identity
4	Most patches skipped, run hit time limit	Window too short for the batch	`maintenanceresources` run status + `duration`	Raise `duration` (≥ 01:30); size for slowest machine
5	Assessment returns no rows	Machine can’t reach its update source	`patchassessmentresources` empty for that machine; `azcmagent check` (Arc)	Open 443 egress/proxy; point Windows at WSUS; fix repos
6	Arc machine not patched	Not in scope (resource-type filter) or agent disconnected	ARG scope query; `az connectedmachine show` status	Include `microsoft.hybridcompute`; reconnect agent
7	Machine reboots during business hours	`rebootSetting` = `IfRequired`/`Always` on sensitive tier	Inspect config `installPatches.rebootSetting`	Set `Never`; drive reboot via post-event off-hours
8	Policy shows compliant but VMs not enrolled	`DeployIfNotExists` assignment has no managed identity	Policy assignment -> Identity tab is empty	Add system-assigned MI + role; run remediation task
9	Pre-event handler errors / run un-drained	Handler exceeds ~20-min pre-window or not idempotent	Function/Logic App invocation logs around window time	Make handler fast + idempotent; do long work async
10	A known-bad KB keeps installing	KB not excluded in config	Config `windowsParameters.kbNumbersToExclude`	Add the KB to `kbNumbersToExclude`
11	Two configs patch the same machine	Overlapping scopes (both tag and RG-wide match)	List assignments on the machine	Make tags mutually exclusive; one ring per machine
12	New VM missed its first window	Tag applied after scope evaluated, or missing	ARG query for the tag at the time	Tag at create (IaC/Policy); confirm before window
13	Linux install fails on specific package	Held/broken package or repo conflict	`default` package-manager logs on the host	Exclude the package; fix the repo; re-run
14	Hotpatch month still rebooted	Hotpatch not enabled or unsupported SKU	OS profile `enableHotpatching`; SKU check	Enable hotpatch on a supported SKU; baseline months still reboot

The wrong orchestration mode (the #1 silent failure)

A run completes with a benign-looking status and the pending-update count barely changes. The machine’s patchMode is not AutomaticByPlatform, or bypassPlatformSafetyChecksOnUserSchedule is false, so the platform either never installs on your schedule or auto-patches on its own cadence. Confirm with the ARG patchMode hunt from earlier in this article – it returns every misconfigured machine in seconds. Fix: drive the property to AutomaticByPlatform + bypass=true (by az vm update, az connectedmachine update, or a Policy Modify assignment for born-correct machines), then re-run the hunt to confirm zero.

A dynamic scope that resolves to zero

The window fires, the run logs success, and nothing is patched – because the scope’s tag filter matched no machines. A single typo (ring1 vs Ring1, or the wrong subscription in the filter) silently empties the scope. Confirm by running the scope’s exact ARG query manually and counting rows; zero is the smoking gun. Fix: correct the tag or filter, re-validate the count against expectation, and make this validation a pre-window gate – never trust a scope you have not counted.

`DeployIfNotExists` policy with no managed identity

The Policy compliance blade shows your enrolment policy as compliant, yet new VMs are never associated to a maintenance configuration. The DeployIfNotExists/Modify assignment was created without a managed identity, so it evaluates definitions but never deploys the remediation. Confirm: open the assignment, check the Identity tab – it is empty – and look for a remediation history of zero. Fix: add a system-assigned identity and the required role (Contributor or a scoped equivalent) at the assignment scope, then trigger a remediation task and confirm the deployment runs.

An unreachable update source

Assessment returns no rows for a machine, or installs fail outright – because the machine cannot reach Windows Update / WSUS (Windows) or its distro repos (Linux), or, for Arc, the Arc endpoints on 443. Confirm: the machine is absent from patchassessmentresources; on Arc, azcmagent check flags the failing endpoint. Fix: open the egress/proxy path, point Windows machines at an internal WSUS, confirm Linux repos are reachable (or stand up a local mirror for air-gapped estates), then re-assess.

A window too short for the batch

Most patches are skipped and the run hits its time limit, because the window’s effective install time (duration - 10m) was too small for the slowest machine in the batch. Confirm: maintenanceresources shows the run hitting the time limit; compare duration against the batch’s worst-case install time. Fix: raise duration (minimum 01:30), split a large ring into smaller batches with their own windows, or move slow machines to a dedicated config with a longer window.

Best practices

Declare a schedule, do not click a button. The maintenance configuration is the product; one-time runs are only for proving the data plane and out-of-band CVEs.
Set patchMode=AutomaticByPlatform + bypass=true by Policy, not by hand, so every machine is born correct and the #1 silent failure cannot occur.
Author configs as code (Bicep/ARM): scope InGuestPatch, window ≥ 01:30, explicit reboot and classification settings, KB/package excludes for known-bad updates.
Target by tag with a governed vocabulary (PatchGroup = ring0|ring1|ring2|exempt, Environment); never staple machines into a config statically at scale.
Design rings deliberately: a thin canary first, the broad fleet next, latency-sensitive tiers last with reboots decoupled.
Validate every dynamic scope resolves to the expected count before the first window – make it a pre-window gate.
Prove a one-time install lands on a canary (re-assess, confirm the count drops) before you trust any schedule.
Decouple install from reboot (Never + post-event) for restart-sensitive tiers; enable hotpatch where supported to cut four reboots from twelve.
Keep pre/post handlers fast and idempotent; trigger long work asynchronously and reconcile in the post-event.
Onboard off-Azure machines via Arc and tag at connect time so they inherit existing scopes with zero bespoke onboarding.
Always provision a managed identity for DeployIfNotExists/Modify assignments, and verify a non-zero remediation history.
Build one compliance view from Resource Graph (Azure + Arc) and surface it in a workbook for audit; keep exemptions visible, time-boxed, and few.

Security notes

Patching is a security control, but the patch pipeline itself is also an attack surface and a least-privilege problem. Treat it accordingly:

Concern	Risk	Control
Policy remediation identity	Over-broad Contributor at MG scope	Scope the role to the minimum (a custom role granting only maintenance-assignment + VM patch settings) where feasible
Pre/post handler identity	A Function that drains/reboots needs power	Use a managed identity with the least role; never store credentials in the handler
Update source integrity	WSUS/repo poisoning installs malicious “updates”	Use trusted, signed sources; restrict who can publish to internal WSUS/mirrors
Arc agent egress	Broad outbound from on-prem to the internet	Allow-list the specific Arc + update endpoints on 443; use a proxy, not open egress
Known-bad / supply-chain KBs	A bad patch breaks or backdoors machines	Canary ring first; `kbNumbersToExclude`/`packagesToExclude` to hold back; validate before broad rollout
Exemptions	`exempt` machines accumulate unpatched	Time-box exemptions, audit them separately, require a documented owner and expiry
Reboot suppression	`Never` leaves machines pending-reboot, partially patched	Track pending-reboot state; drive the controlled reboot promptly via post-event
Compliance data exposure	CVE exposure is sensitive	Restrict Resource Graph/workbook access; treat the exposure view as confidential
Secrets in handlers	Snapshot/drain handlers touching storage/DB	Reference Key Vault via managed identity; no secrets in app settings

The throughline: the only identities that can act on the fleet (the remediation MI, the handler MI) should hold the minimum role at the minimum scope, the only sources machines pull from should be trusted and signed, and the only machines left unpatched should be deliberately, visibly, and temporarily exempt.

Cost & sizing

AUM’s pricing model is deliberately simple, and the headline is that it is free on native Azure VMs. The cost surface and rough figures (always verify current rates for your region/currency):

Cost driver	Native Azure VM	Arc-enabled server	Notes
Update Manager itself	Free	Small per-server / month	The only AUM line item is Arc machines
Underlying compute	You pay for the VM	You pay for the on-prem/other-cloud host	AUM does not change compute cost
Pre/post handler (Function)	Consumption per execution	Same	Tiny at monthly cadence
Event Grid	Per operation	Same	Negligible for patch events
Resource Graph queries	Free	Free	No charge for ARG
Log Analytics (optional)	Per GB if you route logs there	Same	AUM does not require it; only if you choose to

Rough INR/USD framing for a hybrid estate:

Scenario	Machines	AUM cost driver	Rough monthly cost
All-Azure fleet	200 Azure VMs	AUM free; pay only compute	₹0 for AUM (compute separate)
Small hybrid	50 Azure + 50 Arc	50 Arc × per-server charge	~50 × small fee (USD low single digits each)
Large hybrid	500 Azure + 400 Arc	400 Arc × per-server charge	400 × per-server fee – budget explicitly
Orchestration overhead	any	Functions + Event Grid at monthly cadence	Effectively rounding error

Sizing is about windows and batches, not money: size each maintenance window for the slowest machine in its batch (duration - 10m effective), split very large rings into multiple windowed batches to avoid time-limit skips, and prefer hotpatch-capable SKUs where reboot disruption (not licence cost) is the constraint. The one real spend decision is Arc per-server billing on large hybrid fleets – it is far cheaper than running a parallel patch stack, but on 400+ servers it is a line item you must put in the budget, not discover.

Interview & exam questions

These map to AZ-104 (Azure Administrator) and AZ-800/AZ-801 (Windows Server Hybrid); the governance angle touches AZ-305.

What are the two planes of Azure Update Manager? The data plane (assess-patches/install-patches) acts on one machine on demand; the scheduling plane (maintenance configurations of scope InGuestPatch) declares a recurring, governed program. You prove with the data plane and operate with the scheduling plane.
Why must a machine be AutomaticByPlatform with bypassPlatformSafetyChecksOnUserSchedule = true for scheduled patching? AutomaticByPlatform lets the platform orchestrate installs; bypass=true stops the platform from also applying its own automatic patches on Microsoft’s cadence, which would collide with your window. Without both, the run is silently skipped.
What replaced Automation Update Management, and when did it retire? Azure Update Manager replaced it; Automation Update Management reached end of support on 31 August 2024, as did the MMA/OMS agent it depended on.
What is a dynamic scope and why prefer it over static assignment? A dynamic scope binds machines to a maintenance configuration via a Resource Graph tag/sub/RG/OS filter evaluated at run time, so new machines carrying the right tag are patched with zero manual onboarding – static assignments rot as the fleet changes.
What is the minimum maintenance window duration, and what is the 10-minute caveat? Minimum 01:30; the platform reserves the last 10 minutes to finalize, so effective install time is duration - 10m, and AUM stops starting new installs once the window is exhausted.
How do you patch an on-prem or AWS/GCP server with AUM? Onboard it as an Arc-enabled server (azcmagent connect), tag it at connect time, and it inherits the same maintenance configurations and dynamic scopes – billed a small per-server monthly charge, unlike free native Azure VMs.
How do you patch a machine that cannot reboot during business hours? Set rebootSetting: Never so AUM installs but never restarts, then drive the reboot through a controlled post-maintenance Event Grid handler that releases reboots in an approved window.
What is hotpatching and what is its reboot cadence? On Windows Server Azure Edition / WS 2025, hotpatch installs OS security updates without a reboot. Baseline months (Jan/Apr/Jul/Oct) ship a cumulative update and require a reboot; the two months after each ship hotpatches with no reboot – four reboots a year, not twelve.
Which two built-in policies operationalize AUM, and what do they do? “Configure periodic checking for missing system updates” sets assessmentMode=AutomaticByPlatform; “Schedule recurring updates using Azure Update Manager” uses DeployIfNotExists to auto-enrol in-scope machines into a maintenance configuration.
Why does a DeployIfNotExists policy sometimes show compliant but never act? It needs a managed identity with the right role at the assigned scope to deploy the remediation. Without the identity, it evaluates definitions and looks compliant but never deploys anything.
Where do you query fleet patch compliance for both Azure and Arc? Azure Resource Graph – patchassessmentresources for exposure, patchinstallationresources for what installed, maintenanceresources for run history – joined to resources for tags; no Log Analytics workspace required.
What is the bounded contract for a pre-maintenance event handler? It runs inside a ~20-minute pre-window; the maintenance run proceeds when the handler completes or times out, so handlers must be fast and idempotent and trigger long work asynchronously.

Quick check

Your maintenance run shows a green status but the pending-update count barely moved. What is the single most likely cause and the one command to confirm it?
You attach a dynamic scope filtering PatchGroup=ring1 and the window fires with nothing patched. What do you check first?
A regulated tier cannot reboot between 06:00 and 22:00. How do you patch it without violating that rule?
Why is Azure Update Manager free on native Azure VMs but billed on Arc-enabled servers?
What two osProfile settings must be true for a VM, and what is the effect of each?

Answers

The machine’s patchMode is not AutomaticByPlatform (or bypass=false), so the run is silently skipped. Confirm with the Resource Graph patchMode hunt over osProfile patch settings, which returns every misconfigured machine.
Run the scope’s exact ARG query manually and count the rows – a scope resolving to zero (a tag typo or wrong subscription in the filter) is the number-one cause of “ran but nothing happened.” Fix the filter and re-validate the count.
Set rebootSetting: Never so AUM installs packages inside a window but restarts nothing, then drive the reboot through a controlled post-maintenance Event Grid handler that releases reboots only after 22:00.
AUM is a native platform capability for Azure VMs (no extra charge); Arc-enabled servers are off-Azure machines projected into Azure, and AUM coverage for them carries a small per-server monthly charge – still far cheaper than a parallel off-cloud patch stack.
patchMode = AutomaticByPlatform (lets the platform orchestrate installs on your schedule) and bypassPlatformSafetyChecksOnUserSchedule = true (stops the platform from auto-patching on its own cadence and colliding with your window). Both are required or the run is skipped.

Glossary

Azure Update Manager (AUM) – Native Azure capability that assesses and installs OS updates on Azure VMs and Arc-enabled servers, with maintenance configurations for scheduling. No Log Analytics workspace or dedicated agent required.
Maintenance configuration – A first-class Azure resource (Microsoft.Maintenance/maintenanceConfigurations) of scope InGuestPatch declaring when to patch, what classifications, and how to reboot.
maintenanceScope – The kind of maintenance a configuration governs; must be InGuestPatch for guest-OS patching.
Patch orchestration mode (patchMode) – The OS-profile property controlling who patches a machine and when; AutomaticByPlatform is required for AUM scheduling.
bypassPlatformSafetyChecksOnUserSchedule – OS-profile flag that suppresses platform auto-patching so your maintenance schedule owns patching; must be true.
Assessment mode – Controls scanning cadence (AutomaticByPlatform for continuous periodic assessment; ImageDefault for on-demand only).
Dynamic scope – An Azure Resource Graph filter (tags/subscriptions/RGs/locations/OS) binding machines to a maintenance configuration, evaluated at run time.
Configuration assignment – The binding of a machine (static) or a scope (dynamic) to a maintenance configuration.
Pre/post maintenance event – Event Grid events fired before/after a maintenance window for drain, snapshot, validation, or controlled reboot.
Arc-enabled server – An off-Azure machine (on-prem, AWS, GCP) projected into Azure via the Connected Machine agent, treated by AUM like any other machine (billed per server).
Hotpatching – Reboot-less installation of OS security updates on supported Windows Server SKUs; baseline months require a reboot, hotpatch months do not.
Classification – The category of an update (Critical, Security, UpdateRollUp, etc.) used to scope what a run installs.
Ring – A wave of the fleet (canary → broad → sensitive) with its own window and reboot posture, expressed as a PatchGroup tag value.
rebootSetting – A run’s reboot behaviour: IfRequired, Always, or Never (the lever for decoupling install from restart).
patchassessmentresources – The Azure Resource Graph table holding per-machine assessment summaries and per-patch detail for Azure VMs and Arc machines.

Next steps

Lock down the enforcement engine behind AUM with Azure Policy: Governance at Scale – the DeployIfNotExists and Modify assignments that make machines born-compliant.
Extend the hybrid story with Azure Arc-Enabled Servers: Machine Configuration & Extended Security Updates for the off-Azure machine lifecycle.
Build the compliance workbook and alerting on top of Azure Monitor & Application Insights for Observability.
Wire your pre/post orchestration handlers using Azure Functions: Serverless Patterns.
Place these maintenance configurations and policies correctly in the tenant with Azure Resource Hierarchy Explained.

Azure Update Manager: Maintenance Configurations, Scheduled Patching, and Hybrid Coverage with Arc

What problem this solves

Learning objectives

Prerequisites & where this fits

Core concepts

The vocabulary in one table

Migrating off Automation Update Management

Patch orchestration mode: the master switch

On-demand assessment and one-time deployments

Maintenance configurations, schedules, and reboot settings

Dynamic scopes and tag-based targeting at scale

Pre and post maintenance events with automation hooks

Hybrid and multicloud patching via Azure Arc-enabled servers

Hotpatching and Windows Server orchestration patterns

Reporting, compliance dashboards, and Policy-driven enforcement

Architecture at a glance

Real-world scenario

Advantages and disadvantages

Hands-on lab

Common mistakes & troubleshooting

The wrong orchestration mode (the #1 silent failure)

A dynamic scope that resolves to zero

`DeployIfNotExists` policy with no managed identity

An unreachable update source

A window too short for the batch

Best practices

Security notes

Cost & sizing

Interview & exam questions

Quick check

Answers

Glossary

Next steps

Written by Vinod

Comments

Azure Update Manager: Maintenance Configurations, Scheduled Patching, and Hybrid Coverage with Arc

What problem this solves

Learning objectives

Prerequisites & where this fits

Core concepts

The vocabulary in one table

Migrating off Automation Update Management

Patch orchestration mode: the master switch

On-demand assessment and one-time deployments

Maintenance configurations, schedules, and reboot settings

Dynamic scopes and tag-based targeting at scale

Pre and post maintenance events with automation hooks

Hybrid and multicloud patching via Azure Arc-enabled servers

Hotpatching and Windows Server orchestration patterns

Reporting, compliance dashboards, and Policy-driven enforcement

Architecture at a glance

Real-world scenario

Advantages and disadvantages

Hands-on lab

Common mistakes & troubleshooting

The wrong orchestration mode (the #1 silent failure)

A dynamic scope that resolves to zero

DeployIfNotExists policy with no managed identity

An unreachable update source

A window too short for the batch

Best practices

Security notes

Cost & sizing

Interview & exam questions

Quick check

Answers

Glossary

Next steps

Written by Vinod

Comments

`DeployIfNotExists` policy with no managed identity