A regional insurer has a datacenter lease expiring in nine months and roughly 400 VMs running on vSphere 7 — a mix of claims-processing apps, an Oracle estate, and a stubborn set of Windows VMs that the application owners swear cannot be re-platformed before the lease runs out. Re-architecting everything into native Azure IaaS in that window is fantasy. The realistic move is a lift-and-shift onto Azure VMware Solution (AVS): stand up a VMware private cloud inside Azure, bridge the on-prem vSphere environment to it with VMware HCX, and live-migrate the running VMs with zero downtime so the application teams barely notice. This guide walks the full path — provision the AVS private cloud, wire ExpressRoute, deploy and pair HCX, extend the networks, and run a vMotion migration — with the real commands and the failure modes that will page you if you skip a step.
The appeal of AVS is that it is the same VMware stack — vCenter, ESXi, vSAN, NSX-T — running on dedicated bare-metal hosts in an Azure region, operated by Microsoft as a first-party service. Your operators keep their tooling and runbooks. HCX is the migration fabric VMware ships specifically for this: it builds an encrypted transport between two vCenters, stretches Layer-2 networks so a VM keeps its IP across the move, and performs bulk and vMotion (live) migrations across that fabric. The combination lets you evacuate a datacenter on a deadline without an IP re-addressing project or an application-by-application rewrite.
Prerequisites
- An Azure subscription with the
Microsoft.AVSresource provider registered and an approved host quota (AVS hosts are dedicated bare metal — quota is requested via a support ticket and is not instant; start this days ahead). - A
/22(or larger) non-overlapping private CIDR reserved exclusively for the AVS management network (vCenter, NSX-T, HCX, vMotion, vSAN all carve subnets from it). It must not overlap on-prem or any peered VNet. - On-prem vSphere 6.5+/7.x with vCenter, and admin credentials; outbound connectivity to Azure.
- An ExpressRoute circuit (or ExpressRoute Global Reach to reach on-prem) — AVS is reached over private connectivity, not the public internet.
- Azure CLI 2.55+ with the
vmwareextension, and theContributorrole on the target resource group. - A maintenance window for the cutover steps (the network extension unstretch), though the migrations themselves are online.
Target topology
The shape is two VMware estates bridged by an HCX transport fabric. On the left, your on-prem vSphere datacenter with the HCX Connector. On the right, an AVS private cloud in an Azure region — a vSphere cluster on dedicated hosts with its own vCenter, NSX-T, and vSAN, fronted by the HCX Cloud Manager. Between them, an ExpressRoute path carries the HCX Service Mesh: the encrypted WAN/bulk transport, the Network Extension appliance that stretches L2 segments, and the vMotion path that moves live VMs. On the Azure side, the AVS management network connects to a hub VNet via an ExpressRoute gateway, where you land jump hosts, Entra ID-gated administrative access, and the observability and security agents that watch the migrated workloads. Migrated VMs keep their original IPs on stretched segments until you cut the gateway over to NSX-T and unstretch.
1. Register the provider and request host quota
Quota is the long pole — do this first. Register the provider, then raise the quota request so it is approved by the time you need hosts.
# Register the AVS resource provider (idempotent)
az provider register --namespace Microsoft.AVS --wait
az provider show --namespace Microsoft.AVS --query registrationState -o tsv
# Add the VMware CLI extension
az extension add --name vmware --upgrade
# Set working context
RG=rg-avs-prod-sea
LOCATION=southeastasia
PRIVATE_CLOUD=avs-insurer-prod
az group create -n "$RG" -l "$LOCATION"
Host quota is requested through Help + Support → New support request (Issue type Service and subscription limits (quotas), quota type Azure VMware Solution). Request at least the cluster minimum (3 hosts of the AV36 / AV36P / AV52 SKU available in your region). Capture the approved SKU — you need it in the next step.
2. Provision the AVS private cloud
Provisioning bare-metal hosts and the full SDDC takes 3–4 hours. Kick it off early. Pick a management /22 that overlaps nothing.
az vmware private-cloud create \
--resource-group "$RG" \
--name "$PRIVATE_CLOUD" \
--location "$LOCATION" \
--sku AV36P \
--cluster-size 3 \
--network-block 10.100.0.0/22 \
--internet Disabled \
--accept-eula
--network-block is the management CIDR the SDDC subdivides — do not reuse it anywhere. --internet Disabled keeps the private cloud off the public internet (the default and the right posture). When it completes, pull the auto-generated vCenter and NSX-T credentials and the management endpoints:
# vCenter and NSX-T admin credentials (rotate after handoff)
az vmware private-cloud list-admin-credentials \
--resource-group "$RG" --private-cloud-name "$PRIVATE_CLOUD" -o jsonc
# Management network endpoints (vCenter, NSX-T, HCX URLs)
az vmware private-cloud show \
--resource-group "$RG" --name "$PRIVATE_CLOUD" \
--query "{vcsa:endpoints.vcsa, nsxt:endpoints.nsxtManager, hcx:endpoints.hcxCloudManager}" -o jsonc
3. Connect AVS to your network over ExpressRoute
AVS exposes an ExpressRoute circuit of its own (Microsoft-managed). Connect it to your hub VNet’s ExpressRoute gateway, and use Global Reach to bridge it to your on-prem circuit so HCX traffic can flow datacenter-to-AVS.
# Authorize the AVS-managed ExpressRoute circuit
EXPR_ID=$(az vmware private-cloud show -g "$RG" -n "$PRIVATE_CLOUD" \
--query "circuit.expressRouteId" -o tsv)
az vmware authorization create \
--resource-group "$RG" --private-cloud "$PRIVATE_CLOUD" \
--name avs-er-auth
AUTH_KEY=$(az vmware authorization show -g "$RG" --private-cloud "$PRIVATE_CLOUD" \
--name avs-er-auth --query expressRouteAuthorizationKey -o tsv)
# Connect AVS circuit to the hub VNet's ExpressRoute gateway
az network vpn-connection create \
--name conn-avs-to-hub \
--resource-group rg-network-hub \
--vnet-gateway1 ergw-hub \
--express-route-circuit2 "$EXPR_ID" \
--authorization-key "$AUTH_KEY" \
--routing-weight 0
Then bridge on-prem to AVS with Global Reach (this is what lets the HCX Connector on-prem reach the HCX Cloud Manager in AVS):
az vmware global-reach-connection create \
--resource-group "$RG" --private-cloud "$PRIVATE_CLOUD" \
--name gr-onprem-to-avs \
--peer-express-route-circuit "/subscriptions/<sub>/resourceGroups/rg-onprem-er/providers/Microsoft.Network/expressRouteCircuits/er-onprem-circuit" \
--authorization-key "<onprem-circuit-auth-key>"
4. Deploy HCX on the AVS side and download the Connector
HCX is an AVS add-on. Enable it, which deploys the HCX Cloud Manager inside the private cloud, then download the Connector OVA to deploy on-prem.
# Enable the HCX add-on (deploys HCX Cloud Manager in AVS)
az vmware addon hcx create \
--resource-group "$RG" --private-cloud "$PRIVATE_CLOUD" \
--offer "VMware MaaS Cloud Provider (Enterprise)"
# Get the HCX Cloud Manager URL
HCX_URL=$(az vmware private-cloud show -g "$RG" -n "$PRIVATE_CLOUD" \
--query "endpoints.hcxCloudManager" -o tsv)
echo "HCX Cloud Manager: $HCX_URL"
Log into the HCX Cloud Manager UI (https://<hcx-cloud-manager>/) with the cloudadmin credentials. Under Administration → System Updates → Request Download Link, generate the HCX Connector OVA download. On the on-prem vCenter, deploy that OVA as a VM, give it a management IP and gateway on your on-prem network, and let it boot. The Connector is the on-prem half of the fabric; the Cloud Manager is the AVS half.
Generate the activation key the on-prem Connector needs from the Cloud Manager UI under Administration → Activation Keys (key type HCX Connector), and apply it during the Connector’s initial appliance configuration at https://<connector-ip>:9443.
5. Pair the sites and build the Service Mesh
This is the core of HCX: a Site Pairing links the two HCX managers, and a Service Mesh deploys the appliances (Interconnect/WAN, Network Extension, vMotion) that actually carry traffic. Most of this is driven in the Connector UI, but the work is concrete and ordered.
In the on-prem HCX Connector (https://<connector-ip>:9443 for appliance config, then the plugin in on-prem vCenter for operations):
- Site Pairing → Connect to Remote Site: enter the AVS HCX Cloud Manager URL and cloudadmin credentials. A successful pairing shows the AVS site as connected.
- Compute Profile (create one on each side): select the resource pool, datastore, and deployment networks the HCX appliances will use. On AVS the deployment is largely pre-baked; on-prem you choose the cluster and the management/uplink/vMotion/replication networks.
- Network Profile: define the IP pools HCX appliances draw from for management, uplink, vMotion, and vSphere Replication — on-prem these are your network’s ranges; misallocating here is the #1 mesh failure.
- Service Mesh → Create: pick the paired sites and the two compute profiles, then enable the services you need:
- HCX Interconnect (IX) — the WAN-optimized, encrypted bulk transport.
- HCX Network Extension (NE) — stretches on-prem L2 VLANs into AVS NSX-T segments so VMs keep their IP.
- HCX vMotion — the live-migration path.
- WAN Optimization — dedup/compression over the transport.
After Finish, watch the Service Mesh → Appliances view until every appliance reports a green Tunnel Status: Up. No green tunnels means no migrations — debug here before going further (almost always a Network Profile IP or a firewall rule on UDP 4500).
6. Stretch the networks (Network Extension)
To migrate a VM live and keep its IP, the VLAN it lives on must be extended into AVS. For each VLAN your in-scope VMs use:
In the HCX plugin → Network Extension → Extend Networks:
- Select the source on-prem distributed port group / VLAN.
- Provide the gateway IP + prefix for that subnet (the on-prem default gateway stays authoritative until cutover).
- Submit. HCX creates a corresponding NSX-T segment in AVS bridged over the NE appliance.
You can pre-check what is available and confirm the extension from the AVS NSX-T side. Plan to keep extensions only as long as the migration wave runs — a stretched L2 with a remote gateway is a latency tax (traffic tromboning back to on-prem) you want to retire by unstretching once the gateway moves to NSX-T.
7. Run a pilot migration (bulk), then live vMotion
Always validate with a non-production VM first using a Bulk migration (offline-style, restartable, switches over at a scheduled time), then move production with vMotion (live, zero downtime).
In the HCX plugin → Migration → Migrate:
- Select the source VMs (start with one pilot VM).
- Set the destination: the AVS resource pool, datastore (
vsanDatastore), folder, and the stretched network (the NSX-T segment from step 6) so the IP is preserved. - Choose the migration type:
- Bulk Migration for the pilot and for large batches — replicates in the background, then cuts over (a brief reboot) at a switchover window. Highest throughput per wave.
- HCX vMotion for live, zero-downtime moves of production VMs one at a time.
- Replication Assisted vMotion (RAV) when you want both — batch scale and zero-downtime — for a production wave.
- Validate (HCX runs pre-checks), then start. Track each VM’s progress under Migration → Tracking.
To migrate at scale repeatably, drive HCX via its REST API from your pipeline (Jenkins or GitHub Actions) so each wave is a reviewed, logged job rather than UI clicks. A skeleton call to enumerate migratable VMs:
# Authenticate to HCX and list VMs (token used for subsequent migration calls)
HCX_TOKEN=$(curl -sk -X POST "https://${HCX_URL}/hybridity/api/sessions" \
-H "Content-Type: application/json" \
-d '{"authData":{"username":"cloudadmin@vsphere.local","password":"<pwd>"}}' \
-D - -o /dev/null | awk '/x-hm-authorization/{print $2}' | tr -d '\r')
curl -sk -X POST "https://${HCX_URL}/hybridity/api/service/inventory/virtualmachines" \
-H "x-hm-authorization: ${HCX_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"filter":{"cloud":{"local":true}}}' | jq '.data.items[].name'
Use Terraform to manage everything around the migration — the AVS private cloud, ExpressRoute connections, hub VNet, and post-landing AVS NSX-T segments and DHCP — so the target estate is reproducible and code-reviewed; use Ansible for in-guest day-2 config of the migrated VMs (agents, domain join, patch baselines) once they are running in AVS.
8. Wire up identity, security, and observability on the landing zone
The migrated VMs are now in Azure’s blast radius — bring the enterprise controls before, not after, production traffic lands.
- Administrative access to vCenter/NSX-T and the jump hosts is gated through Entra ID (federated from Okta if Okta is your workforce IdP), with Conditional Access and PIM on the jump-host roles — no standing admin, no shared cloudadmin passwords floating in chat.
- Secrets the migration tooling and migrated apps need — the HCX cloudadmin password, vCenter service accounts, app DB strings — live in HashiCorp Vault, leased dynamically and pulled at runtime, instead of being baked into pipeline variables or guest config.
- CrowdStrike Falcon sensors are pushed to the migrated guest OSes (via the Ansible day-2 playbook) for runtime threat detection on the workloads now sitting in Azure, feeding the SOC.
- Wiz (with Wiz Code scanning the Terraform that builds the landing zone) provides cloud posture and attack-path analysis across the AVS-adjacent Azure resources — the hub VNet, the ExpressRoute, the storage — flagging any drift to public exposure.
- Dynatrace (or Datadog) OneAgent goes on the migrated VMs for application and infrastructure observability, so you can compare p95 latency on-prem vs. post-migration and prove the move was non-regressive.
- A migration wave is gated by a ServiceNow change request, and any HCX appliance tunnel-down or failed migration auto-raises a ServiceNow incident so there is a ticket, not just a log line.
- Public-facing apps that move keep their Akamai edge (WAF, TLS, anycast) — repoint the origin from the on-prem IP to the AVS-landed IP at cutover, which is seamless because HCX preserved the IP on the stretched segment.
9. Cutover: move the gateway and unstretch
A stretched network is a temporary state. Once all VMs on a segment are in AVS, migrate the gateway to NSX-T and unstretch so traffic stops tromboning to on-prem.
In the HCX Network Extension view, select the extended segment and choose Migrate Default Gateway / Unextend (HCX’s NE with Mobility-Optimized Networking can pre-stage this). Confirm the NSX-T segment in AVS now hosts the gateway and routes egress through the AVS-side path. Update DNS/routes as needed, then retire the NE appliance for that segment.
Validation
Confirm each layer before declaring a wave done:
# 1. Private cloud is healthy and host count matches the order
az vmware private-cloud show -g "$RG" -n "$PRIVATE_CLOUD" \
--query "{state:provisioningState, hosts:management.clusterSize, sku:sku.name}" -o jsonc
# 2. ExpressRoute connection is up
az network vpn-connection show -n conn-avs-to-hub -g rg-network-hub \
--query "connectionStatus" -o tsv # expect: Connected
# 3. Global Reach (on-prem <-> AVS) is provisioned
az vmware global-reach-connection show -g "$RG" \
--private-cloud "$PRIVATE_CLOUD" --name gr-onprem-to-avs \
--query "provisioningState" -o tsv
Then in the HCX UI / vCenter:
- Service Mesh → Appliances: every IX, NE, and vMotion appliance shows Tunnel Up (green).
- Migration → Tracking: target VMs show Migration complete; open each in AVS vCenter and confirm it is powered on, has its original IP, and answers an in-guest ping / app health check.
- From a jump host on the hub VNet,
pingand hit the app endpoint of a migrated VM by its preserved IP — proves L2 extension worked. - After cutover (step 9),
traceroutefrom a migrated VM shows egress through the AVS NSX-T gateway, not back to on-prem.
Rollback / teardown
HCX migrations are reversible before you unstretch, which is exactly why you migrate gateway-last.
- Failed or suspect migration (pre-cutover): the source VM still exists on-prem (vMotion/bulk leaves the origin in place until you delete it). In Migration → Tracking, use Migrate in reverse, or simply power the original on-prem VM back on — the stretched segment means clients reconnect to the same IP. Do not delete source VMs until the wave is validated and the gateway is moved.
- Roll back a network extension: Network Extension → Unextend removes the NSX-T segment bridge; the on-prem VLAN is untouched.
- Tear down the Service Mesh: Interconnect → Service Mesh → Delete removes all deployed appliances cleanly.
- Decommission AVS entirely (last, irreversible — destroys vSAN data):
# Remove Global Reach, then the ER connection, then the private cloud
az vmware global-reach-connection delete -g "$RG" \
--private-cloud "$PRIVATE_CLOUD" --name gr-onprem-to-avs --yes
az network vpn-connection delete -n conn-avs-to-hub -g rg-network-hub
az vmware private-cloud delete -g "$RG" -n "$PRIVATE_CLOUD" --yes
Deleting the private cloud destroys the vSAN datastore and every VM on it — only run it once workloads are confirmed migrated off AVS or no longer needed.
Common pitfalls
- CIDR overlap. The
--network-block/22overlapping on-prem or a peered VNet breaks routing in ways that look like random connectivity loss. Reserve it exclusively and check it against IPAM before provisioning. - Quota not approved in time. Host quota is a support-ticket gate, not self-service. A team that starts provisioning on day one of the window and discovers a multi-day quota wait has burned a week. Request it first.
- HCX tunnels down. A red tunnel is almost always a wrong Network Profile IP pool on-prem or a firewall blocking UDP 4500 / TCP 443 between the Connector and Cloud Manager. Fix the mesh before touching migrations.
- Forgetting Global Reach. The AVS ExpressRoute connects to your VNet, but on-prem-to-AVS HCX traffic needs Global Reach to bridge the two circuits. Without it, site pairing fails and engineers chase phantom DNS issues.
- Leaving networks stretched forever. A permanent stretched L2 with the gateway still on-prem tromboles all egress back to the datacenter you are trying to vacate. Plan the unstretch as part of each wave’s exit criteria.
- Migrating to the wrong network at cutover. If you target a non-stretched AVS segment, the VM gets a new IP and connections drop. Always target the stretched NSX-T segment until the gateway is moved.
- Disk/throughput planning. Bulk migration is bandwidth-hungry; sizing waves to your ExpressRoute capacity (and using WAN Optimization) keeps a migration from saturating the link other workloads share.
Security notes
AVS is reached over private connectivity only (--internet Disabled, ExpressRoute) — there is no public data-plane surface by default, and that is the posture to keep. Rotate the auto-generated cloudadmin / NSX-T credentials immediately after handoff and store them in HashiCorp Vault, never in pipeline variables. Gate all human admin access to vCenter/NSX-T through Entra ID (federated from Okta) with Conditional Access and just-in-time elevation. Put CrowdStrike Falcon on the migrated guests for runtime detection, and run Wiz / Wiz Code continuously over the landing zone and its Terraform so any drift to public exposure or an over-broad NSG is caught before it is exploited. HCX transport is encrypted end to end, but the management plane is yours to lock down.
Cost notes
AVS bills per dedicated host (3-host cluster minimum), so the dominant lever is host count and SKU: right-size the cluster to actual vSAN and CPU/RAM demand rather than over-provisioning for a peak that the on-prem estate never hit, and commit to 1- or 3-year reserved instances for the steady baseline (substantial discount vs. pay-as-you-go) once the migration settles. ExpressRoute and egress are line items to budget — a migration wave moves a lot of data, so schedule heavy bulk transfers within your circuit’s committed bandwidth. Retire the HCX appliances and any temporary extra hosts once migrations finish; HCX is a migration tool, not a permanent tax. Track per-cluster spend and pipe it to Dynatrace / Datadog so the team sees the cost of an over-sized cluster the same week, not on the next invoice. Longer term, the AVS landing is a staging post: workloads that can re-platform to native Azure IaaS/PaaS later will cost less than dedicated VMware hosts — plan the modernization backlog so AVS is the bridge, not the destination, for everything that does not genuinely need vSphere.