Azure Governance

Azure Arc-Enabled Servers: Onboarding at Scale, Machine Configuration Guest Policy, and Extended Security Updates

A fleet of 600 Windows and Linux servers spread across two colo datacenters, an AWS account, and a GCP project is not a “we’ll get to it” problem. The moment your security team asks “which of these is missing a CIS baseline, which still runs an out-of-support OS, and who can touch them,” you need one control plane. Azure Arc-enabled servers projects each machine into Azure Resource Manager as a Microsoft.HybridCompute/machines resource. From there, the same management groups, Azure Policy assignments, RBAC, and Resource Graph queries you use for native Azure VMs reach into your hybrid estate — without lifting a single workload.

This walkthrough does the work a platform team actually has to do: onboard at scale non-interactively, manage extensions, enforce in-guest configuration with Machine Configuration (formerly Guest Configuration), report compliance through Azure Policy, deliver Extended Security Updates (ESU) for Windows Server 2012/2012 R2 through Arc, lock agent traffic behind a Private Link Scope, and scope RBAC so a stolen credential’s blast radius stays small. I assume Owner on the subscription and root/administrator on the machines.

1. The Connected Machine agent: architecture and connectivity modes

The agent is a single package — the azcmagent CLI plus services (the Hybrid Instance Metadata Service, the GuestConfig service, the extension manager). It runs low-privilege, polls Azure over outbound HTTPS (443) only, and never opens an inbound port. Three facts shape every design decision:

Configure the proxy before connecting so onboarding itself can route out:

azcmagent config set proxy.url "http://proxy.corp.local:3128"
azcmagent config set proxy.bypass "Arc,ArcData"   # built-in bypass lists

# Verify reachability of every required endpoint BEFORE onboarding
azcmagent check --location eastus

azcmagent check returns a pass/fail per required FQDN (*.his.arc.azure.com, *.guestconfiguration.azure.com, login.microsoftonline.com, management.azure.com, and the download CDN). Bake it into golden-image validation.

2. At-scale onboarding with a service principal

Interactive login does not scale to 600 servers. Create a dedicated onboarding service principal with the narrowest role for the job — Azure Connected Machine Onboarding — scoped to the single resource group that holds the machines. It can create Arc server resources and nothing else; a leaked secret cannot pivot.

# Dedicated onboarding identity, scoped to one RG, narrowest built-in role
az ad sp create-for-rbac \
  --name "sp-arc-onboarding" \
  --role "Azure Connected Machine Onboarding" \
  --scopes "/subscriptions/<sub-id>/resourceGroups/rg-arc-servers"

Never put the secret on a command line — azcmagent echoes arguments to logs in some failure paths. Use a config file referenced with --config; the agent reads the credential from disk and keeps it out of the console:

# /etc/arc-onboard.json  (mode 0600, deleted after onboarding)
cat > /etc/arc-onboard.json <<'JSON'
{
  "subscriptionId": "<sub-id>",
  "resourceGroup": "rg-arc-servers",
  "location": "eastus",
  "tenantId": "<tenant-id>",
  "servicePrincipalId": "<app-id>",
  "servicePrincipalSecret": "<secret>",
  "cloud": "AzureCloud"
}
JSON
chmod 600 /etc/arc-onboard.json

azcmagent connect \
  --config /etc/arc-onboard.json \
  --tags "Datacenter=COLO1,App=Payments,Owner='Platform Eng'" \
  --correlation-id "$(uuidgen)"

shred -u /etc/arc-onboard.json   # remove the secret immediately

Windows uses the same azcmagent connect --config from an elevated session. For golden images, do not onboard before cloning — install the agent, leave it disconnected, and let first-boot automation (cloud-init, Ansible, an MDT/Intune task) run connect with a per-machine --resource-name so hostnames do not collide. Certificate-based SPN auth (--service-principal-cert) is better where you can distribute certs — it removes the long-lived secret entirely. Use --use-azcli (agent 1.59+) only for ad-hoc operator onboarding, never unattended fleets.

3. Machine Configuration: audit and remediation in-guest

Machine Configuration runs DSC-style packages inside the OS to assert state Azure Policy alone cannot see — registry values, file contents, installed packages, service states, sysctl/secedit settings. The engine is in-box on Arc servers, so you only assign policy.

Every configuration is an MOF compiled into a signed .zip package published to Blob Storage, then referenced by a policy definition. Author it with the GuestConfiguration PowerShell module:

Install-Module -Name GuestConfiguration -Scope CurrentUser

# Compile your DSC config (here: assert a registry value) then package it
New-GuestConfigurationPackage `
  -Name 'EnforceTlsRegistry' `
  -Configuration './EnforceTlsRegistry.mof' `
  -Type 'ApplyAndAutoCorrect' `   # Audit | ApplyAndMonitor | ApplyAndAutoCorrect
  -Path './package'

# Test against the local machine before publishing
Get-GuestConfigurationPackageComplianceStatus `
  -Path './package/EnforceTlsRegistry.zip'

The -Type you compile in determines behavior and maps directly to the assignmentType on the resulting guest assignment resource:

assignmentType Test result false ⇒ Use it for
Audit report NonCompliant, do nothing read-only compliance reporting
ApplyAndMonitor apply once at assignment, then only report drift one-time enforcement, manual re-apply
ApplyAndAutoCorrect run Set to remediate on every evaluation continuous, self-healing enforcement

A subtlety that bites people: when a custom policy first deploys an assignment, assignmentType can briefly read Null before resolving (typically within an hour). Do not alert on that transient state.

Generate a policy definition from the package and assign it. For audit-only baselines (the CIS/STIG built-ins), the initiatives already exist — assign those directly rather than authoring your own.

New-GuestConfigurationPolicy `
  -PolicyId (New-Guid) `
  -ContentUri 'https://stgarc.blob.core.windows.net/pkgs/EnforceTlsRegistry.zip' `
  -DisplayName 'Enforce TLS registry baseline' `
  -Platform 'Windows' `
  -PolicyVersion '1.0.0' `
  -Mode 'ApplyAndAutoCorrect' `
  -Path './policy'

4. Extension management and VM-like operations

Arc servers accept the same extension model as Azure VMs through Microsoft.HybridCompute/machines/extensions. Day one you deploy the Azure Monitor Agent (telemetry to a Data Collection Rule) and, where used, the Custom Script Extension. Push them at scale with policy, or imperatively for a single box:

# Install the Azure Monitor Agent extension on an Arc server
az connectedmachine extension create \
  --resource-group "rg-arc-servers" \
  --machine-name "colo1-pay-01" \
  --name "AzureMonitorWindowsAgent" \
  --publisher "Microsoft.Azure.Monitor" \
  --type "AzureMonitorWindowsAgent" \
  --enable-auto-upgrade true

--enable-auto-upgrade true opts the extension into automatic minor-version upgrades — set it everywhere so you are not chasing CVEs in the agents themselves. Keep the agent current too via automatic agent upgrade so azcmagent self-updates.

Beyond extensions, Arc unlocks VM-like operations: SSH/RDP over Arc (az ssh arc, through ARM with no inbound port), Azure Update Manager for cross-fleet patch orchestration, and Run Command for one-off scripts audited through ARM — replacing bastion and jump-box sprawl with RBAC-governed, logged access.

5. Azure Policy guest assignments and compliance reporting

The standard path is the built-in initiative “Deploy prerequisites to enable Guest Configuration policies on virtual machines.” On Azure VMs it deploys the extension and a system-assigned identity; on Arc servers the extension half is a no-op (it’s in-box) but the identity wiring still applies. Assign it at the management-group level so new machines inherit it.

DeployIfNotExists and Modify assignments act only on new or updated resources. To bring the existing 600 into compliance you must create a remediation task — the single most common reason teams see “0% compliant” and panic. Trigger it on the assignment:

# Remediate all existing in-scope machines for one policy assignment
az policy remediation create \
  --name "remediate-machinecfg-baseline" \
  --policy-assignment "<assignment-id>" \
  --resource-discovery-mode ReEvaluateCompliance

Compliance lands in Azure Resource Graph — how you answer “what’s broken” across the whole fleet in one query instead of clicking through the portal:

// Non-compliant Machine Configuration assignments across the estate
guestconfigurationresources
| where type =~ "microsoft.guestconfiguration/guestconfigurationassignments"
| extend status = tostring(properties.complianceStatus)
| extend machine = tostring(split(id, "/")[8])
| where status =~ "NonCompliant"
| project machine, name, status, lastComplianceChecked = properties.lastComplianceStatusChecked
| order by machine asc

6. Extended Security Updates for Windows Server 2012/2012 R2 through Arc

This is frequently the business case that funds the entire rollout. Windows Server 2012/2012 R2 are out of support; ESU delivers patches for up to three more years, and Arc is the delivery mechanism for machines not in Azure. You provision a license resource, then link it to each eligible server; patches flow through Windows Update / Azure Update Manager and bill monthly — no MAK keys to distribute.

The license is a first-class ARM resource (Microsoft.HybridCompute/licenses). Provision it with the CLI, attesting to Software Assurance or SPLA coverage:

# Provision a Datacenter physical-core ESU license (min 16 physical cores)
az connectedmachine license create \
  --license-name "esu-ws2012-dc-colo1" \
  --resource-group "rg-arc-servers" \
  --location "eastus" \
  --license-type "ESU" \
  --state "Activated" \
  --target "Windows Server 2012 R2" \
  --edition "Datacenter" \
  --type "pCore" \
  --processors 16

Watch the licensing rules that actually cost money:

Linking is a licenseProfiles/default child on the machine. Declare it in Bicep so it lives in source control alongside the license:

@description('Resource ID of the ESU license to assign')
param esuLicenseId string
param machineName string

resource esuLink 'Microsoft.HybridCompute/machines/licenseProfiles@2023-06-20-preview' = {
  name: '${machineName}/default'
  location: resourceGroup().location
  properties: {
    esuProfile: {
      assignedLicense: esuLicenseId
    }
  }
}

To unlink (machine decommissioned, or moved into Azure where ESU is free), PUT the same licenseProfiles/default with an empty esuProfile: {}. Deactivate a license by PATCHing its state to Deactivated so billing stops.

7. Private Link Scope for secure agent-to-Azure traffic

For regulated estates that forbid agent traffic over the public internet, an Azure Arc Private Link Scope (Microsoft.HybridCompute/privateLinkScopes) routes the his and guestconfiguration data planes through one private endpoint over ExpressRoute or VPN. One scope serves many machines; a virtual network maps to at most one scope.

# Create the scope, then a private endpoint bound to its 'hybridcompute' group
az connectedmachine private-link-scope create \
  --resource-group "rg-arc-net" \
  --location "eastus" \
  --scope-name "pls-arc-prod" \
  --public-network-access Disabled

scopeId=$(az connectedmachine private-link-scope show \
  --resource-group "rg-arc-net" --scope-name "pls-arc-prod" --query id -o tsv)

az network private-endpoint create \
  --resource-group "rg-arc-net" \
  --name "pe-arc-prod" \
  --location "eastus" \
  --vnet-name "vnet-hub" \
  --subnet "snet-pe" \
  --private-connection-resource-id "$scopeId" \
  --group-id "hybridcompute" \
  --connection-name "arc-conn"

Two private DNS zones must resolve to the endpoint’s private IPs (a third only if you also run Arc-enabled Kubernetes):

privatelink.his.arc.azure.com
privatelink.guestconfiguration.azure.com
# privatelink.dp.kubernetesconfiguration.azure.com   # only for Arc K8s

Critically, Microsoft Entra ID (login.microsoftonline.com, pas.windows.net) and Azure Resource Manager (management.azure.com) do not traverse the scope — they keep using public endpoints. Allow those via the AzureActiveDirectory and AzureResourceManager service tags on your firewall/NSG, or servers fail to authenticate even with a healthy private endpoint. Onboard new machines with --private-link-scope <scope-resource-id>; associate existing ones afterward (up to 15 minutes to start accepting connections).

8. RBAC scoping and operational guardrails

The whole point of projecting servers into ARM is that existing governance applies. Use the purpose-built Arc roles instead of broad Contributor:

Role Grants Give it to
Azure Connected Machine Onboarding create/read Arc server resources only the onboarding SPN
Azure Connected Machine Resource Administrator manage Arc servers, extensions, ESU platform/ops team
Reader on the Arc RG read-only inventory and compliance auditors, monitoring

Layer policy guardrails on top so the estate stays inside the rails:

// Deny any Arc extension not on the allowlist (policyRule fragment)
"if": {
  "allOf": [
    { "field": "type", "equals": "Microsoft.HybridCompute/machines/extensions" },
    { "not": {
        "field": "Microsoft.HybridCompute/machines/extensions/type",
        "in": ["AzureMonitorWindowsAgent", "AzureMonitorLinuxAgent", "CustomScriptExtension"]
    }}
  ]
},
"then": { "effect": "deny" }

Enterprise scenario

A payments platform team I worked with ran 420 Windows Server 2012 R2 hosts across two PCI-scoped datacenters. The constraint was hard: the auditor would not accept agent telemetry crossing the public internet, and every box needed ESU because migrating the legacy payment gateway off 2012 R2 was an 18-month project they could not front-load. Their first attempt onboarded everything in direct mode and immediately failed the network review.

The fix had three parts. First, a Private Link Scope per datacenter, fronted by a private endpoint on the existing ExpressRoute-connected hub VNet, with the his and guestconfiguration zones in central private DNS. Second — the part that broke — they had blocked all outbound internet at the firewall, and onboarding hung. The agent still needs Entra ID and ARM over the public internet even behind Private Link, so they added exactly two service-tag rules and nothing else:

# The only public egress the agent needs behind Private Link
az network nsg rule create -g rg-pci-net --nsg-name nsg-arc \
  --name AllowAAD --priority 150 --direction Outbound --access Allow \
  --protocol Tcp --source-address-prefixes VirtualNetwork \
  --destination-address-prefixes AzureActiveDirectory --destination-port-ranges 443
az network nsg rule create -g rg-pci-net --nsg-name nsg-arc \
  --name AllowARM --priority 151 --direction Outbound --access Allow \
  --protocol Tcp --source-address-prefixes VirtualNetwork \
  --destination-address-prefixes AzureResourceManager --destination-port-ranges 443

Third, ESU as code: one Datacenter pCore license per physical host (2-socket boxes, well over the 16-core floor), provisioned and linked through a Bicep loop over an inventory file with assignedLicense referencing the license resource ID. Compliance — the CIS baseline via Machine Configuration ApplyAndMonitor and ESU coverage — rolled up into one Resource Graph dashboard the auditor could query directly. The review passed on the second pass, and the only public traffic on the wire was two service tags to identity and ARM.

Verify

Confirm the agent is connected and healthy from both the server and Azure:

azcmagent show   # on the server: status, mode, last heartbeat, MI

# From Azure: machine is Connected with a managed identity
az connectedmachine show -g rg-arc-servers -n colo1-pay-01 \
  --query "{status:status, agentVersion:agentVersion, mi:identity.principalId}"

Then estate-wide, in Resource Graph:

resources
| where type =~ "microsoft.hybridcompute/machines"
| project name, status = tostring(properties.status),
          osName = tostring(properties.osName), location
| order by status asc, name asc

Confirm Machine Configuration assignments actually evaluated (not just deployed) via the guestconfigurationresources query in section 5, and verify ESU by confirming each 2012/2012 R2 machine has a non-empty licenseProfiles/default.esuProfile.assignedLicense.

Checklist

AzureAzure ArcHybridGovernanceMachine Configuration

Comments

Keep Reading