Hyper-V Live Migration and Replica for Zero-Downtime VM Mobility

Two Hyper-V mobility features get conflated constantly, and the conflation costs outages. Live migration moves a running virtual machine from one host to another with no perceptible downtime — a planned, online operation you lean on for patching, load balancing, and evacuating a host before maintenance. Hyper-V Replica asynchronously ships a VM’s changes to a second copy, usually in another site, so you have something to recover to when the primary is gone — disaster recovery, with a recovery point objective (RPO) of seconds to minutes, never zero. They are not substitutes: live migration keeps a healthy VM available across a planned event; Replica gives you a warm standby for an unplanned disaster. You want both, configured correctly, so the first time you lean on either is not the day you discover it was never set up right.

The confusion runs deeper than the two-line summary, because “live migration” itself is three operations depending on where the VM’s storage lives, authenticates two different ways (one of which fails silently from a remote session), moves memory over three transports with wildly different throughput, and behaves completely differently inside a failover cluster than between standalone hosts. Layer on quick migration (the pre-2012 cousin that saves and restores state, with real downtime), storage migration (moving the VHDX while the VM keeps running), and shared-nothing live migration (moving compute and storage together with zero shared infrastructure), and you have a topic where the words matter and the mistakes are expensive.

This article is the practitioner’s map of that territory, written the way a 22-year infrastructure engineer reasons about it — as decisions with trade-offs, not features to enable. The reference environment: two standalone hosts, hv01 (10.10.20.11) and hv02 (10.10.20.12), each with a dedicated 25 GbE migration NIC on 10.10.99.0/24; a two-node failover cluster (hv-clu01/hv-clu02) backed by Cluster Shared Volumes; and a DR host hv-dr01 (10.20.20.11) in a second site across a routed WAN. All are domain-joined to contoso.local. Commands target Windows Server 2022; cmdlets are identical on 2019 and 2025 unless noted. You’ll finish able to pick the right migration type, configure delegation without the number-one silent failure, tier replication like the bandwidth decision it is, and run planned, unplanned, and test failover without losing data to the wrong procedure.

What problem this solves

Servers need maintenance, and workloads need to move. Firmware updates, host patching, a failing power supply, a rebalancing exercise because one host is at 90% memory while its neighbour idles — every one of these needs the VMs off that host, and every one used to mean downtime. Before live migration, moving a running VM meant saving its state to disk (a pause users felt) and restoring it elsewhere. Live migration erased that pause: the guest keeps executing, its memory is copied underneath it, and a sub-second cutover switches execution to the destination. That single capability lets you patch a host in the middle of the business day without a change window and without a dropped session.

But mobility for a planned event solves nothing for an unplanned one. When a site loses power, a SAN fails, or ransomware encrypts a volume, live migration is useless — the source is gone, there is nothing running to migrate. That is the gap Hyper-V Replica fills: a continuously-updated copy in a different failure domain, ready to boot with bounded data loss. Teams that conflate the two discover the gap at the worst moment — “we have live migration, so we’re covered,” right up until the DC floods and there is no second copy anywhere.

Who hits this: anyone running Hyper-V beyond a single host. It bites hardest on teams with standalone hosts and no shared storage (needing shared-nothing migration and constrained delegation, the two most misconfigured pieces), estates with a thin or shared inter-site WAN (where Replica frequency and resync become scheduled bandwidth decisions), and anyone who set up Replica once, watched the initial copy succeed, and never ran a test failover. The cost of getting it wrong is not subtle: a stalled migration that pins a VM half-moved, a delegation misconfiguration that blocks every remote migration, or a Replica that’s been Critical for three weeks and is useless the day you need it.

Learning objectives

By the end of this article you can:

Distinguish the three live migration types (shared-storage, shared-nothing, storage-only) and the two migration kinds (quick vs live), and pick the right one for a given storage topology and downtime tolerance.
Explain the memory-copy phases of a live migration — setup, iterative pre-copy of dirty pages, the brownout/blackout final cutover — and reason about why a busy VM converges slowly.
Choose between CredSSP and Kerberos constrained delegation correctly, configure the two required delegated services (Microsoft Virtual System Migration Service and cifs), and avoid the ticket-cache trap that makes delegation “not work.”
Select the right migration transport — TCP/IP, Compression, or SMB (with SMB Direct/RDMA and SMB Multichannel) — for your NICs, and confirm RDMA is actually engaging rather than silently falling back.
Configure live migration inside a failover cluster with Cluster Shared Volumes (CSV), and understand why clustered live migration only moves memory while shared-nothing moves memory and disk.
Stand up Hyper-V Replica end to end — authorization entries, transport (Kerberos/HTTP vs certificate/HTTPS), replication frequency, recovery-point history, and VSS application-consistent snapshots — and tier it by workload.
Execute test, planned, and unplanned failover correctly, know exactly how much data each one loses, and reverse replication for failback.
Diagnose the common migration and replication failures — delegation errors, processor-compatibility mismatches, Critical replica health, resync storms — from the exact event IDs and cmdlet output.

Prerequisites & where this fits

You should already run Hyper-V comfortably: create VMs, understand VHDX vs pass-through disks, know a virtual switch connects VMs to a NIC, and be fluent enough in PowerShell to adapt an Invoke-Command block. You should know what Active Directory is, that hosts and users are security principals with accounts in it, and roughly what Kerberos does (issues tickets that prove identity). Familiarity with SMB 3, RDMA-capable NICs (RoCE/iWARP), and failover-clustering basics (quorum, cluster network) helps; where they matter, this article defines them.

This sits in the Windows Server compute and availability track. Live migration and clustering are two halves of a whole, so this pairs tightly with Windows Failover Clustering and Storage Spaces Direct: A Production Build, which builds the cluster and CSV storage this article migrates across, and Patching Failover Clusters with Cluster-Aware Updating and Stretch Clusters via Storage Replica, which uses live migration to drain nodes during patching. Kerberos delegation assumes a healthy directory — if AD replication or FSMO roles are shaky, delegation edits won’t propagate, so Active Directory Replication and FSMO Troubleshooting with repadmin and dcdiag is upstream of it. For enforcing migration settings across a fleet as code, see Configuration Management for Windows Server with PowerShell DSC and Ansible.

A map of who owns what during a migration or DR event, so you escalate fast:

Layer	What lives here	Who usually owns it	Failure classes it can cause
Active Directory	Computer accounts, delegation, Kerberos	Identity / AD team	Delegation not set → remote migration fails
Migration network	Dedicated subnet, DCB/PFC, RDMA	Network team	Wrong NIC used → saturates prod; no RDMA
Hyper-V host config	Auth type, performance option, limits	Virtualization team	CredSSP-only, defaults left, wrong transport
Storage (CSV / SMB / local)	Where the VHDX lives	Storage + virtualization	Shared-nothing needed vs memory-only
Failover cluster	Quorum, CSV, cluster network	Virtualization team	Live migration network priority, quorum loss
WAN / inter-site link	Bandwidth between primary and DR	Network team	Replica frequency/resync collisions
Replica relationship	Frequency, recovery points, auth	Virtualization + DR	`Critical` health, resync storms, no failover test

Core concepts

Six mental models make every later decision obvious.

Live migration copies memory while the VM runs; the guest never stops until a sub-second final switch. The engine does not pause the VM and copy it — that would be quick migration. It copies the running VM’s memory iteratively to the destination, tracking pages the guest dirties during the copy and re-sending them, until the remaining dirty set is small enough to transfer in a blink. Only then does it briefly suspend the VM (the blackout), copy the last pages and CPU/device state, and resume on the destination. The pause is measured in milliseconds — below the threshold that drops TCP connections or RDP sessions. A very busy VM that dirties memory faster than the link can ship it converges slowly or not at all; that is the fundamental tension of the operation.

“Live migration” is three operations, distinguished by where the storage sits. If the VHDX is on storage both hosts can see (a CSV in a cluster, or an SMB 3 file share), only memory and device state move — the fast path. If the VHDX is on the source host’s local disk, both memory and disk must move over the network — shared-nothing live migration, slower but requiring zero shared infrastructure. Moving only the disk to a new volume while the VM keeps running on the same host is storage migration. Naming the storage topology tells you which operation you’re running and how long it will take.

Quick migration is not live migration. The pre-Windows-Server-2012 mechanism, still available in clusters, saves the VM’s state to disk, moves ownership, and restores it — so the guest is genuinely offline for the save-and-restore window (seconds to minutes depending on RAM). It is not zero-downtime. It survives as a fallback for cases live migration cannot run, most notably when source and destination CPUs are incompatible in ways processor compatibility mode cannot bridge.

Authentication is delegated, and delegation has two models with a sharp trade-off. The source host must authenticate to the destination on your behalf and pull files across. CredSSP delegates your interactive credentials but only one hop — so you must be logged on at the source host to start the move; you cannot kick it off from your workstation. Kerberos constrained delegation configures the hosts’ computer accounts in AD so the source is trusted to delegate specific services to specific destinations, letting you initiate migrations remotely from Hyper-V Manager, VMM, or a remote session. For any managed estate, Kerberos is the answer — CredSSP’s only virtue is needing no AD configuration.

Replica is asynchronous, log-shipped, and independent of live migration. It tracks writes to a VM’s disks in a Hyper-V Replication Log (HRL) and ships that log to the replica server on a fixed cadence (30, 300, or 900 seconds). Because it ships a log of changes rather than mirroring every write synchronously, it tolerates a slow or intermittent WAN — the trade-off is that the replica is always slightly behind, so a failover loses up to one replication interval of data. It uses its own listener and authorization model, separate from live migration’s networking. Replica gives you a recoverable copy, not high availability.

Recovery points are a time machine, not just a mirror. By default the replica holds only the latest state. Enable recovery history and Replica keeps additional crash-consistent snapshots (hourly, up to 24) so you can fail over to a point before a corruption or ransomware event. Layer VSS (application-consistent) snapshots on top and VSS-aware guests (SQL Server, Exchange) recover to a transactionally clean state. Every extra recovery point costs storage and processing on the replica side — a granularity-vs-overhead dial.

The vocabulary in one table

Pin down every moving part before the deep sections. The glossary at the end repeats these for lookup; this is the mental model side by side.

Term	One-line definition	Where it lives	Why it matters
Live migration	Move a running VM with ~0 downtime	Between hosts	The planned-maintenance workhorse
Quick migration	Save/move/restore a VM (has downtime)	Clusters	Fallback when live can’t run
Shared-nothing	Live-migrate memory + disk, no shared storage	Between standalone hosts	Zero shared infra required
Storage migration	Move only the VHDX, VM stays put	Same host	Rebalance LUNs, escape a failing disk
Blackout	The brief final VM pause during cutover	Migration engine	Must stay sub-second
Brownout	The pre-copy phase; VM runs, memory copies	Migration engine	Where most of the time is spent
CredSSP	One-hop credential delegation	Source host	Requires interactive logon at source
Kerberos delegation	Computer-account delegation in AD	Active Directory	Enables remote-initiated migration
CSV	Cluster Shared Volume — concurrent multi-node access	Failover cluster	All nodes see the VHDX → memory-only migration
SMB Direct	RDMA transport for SMB 3	RDMA NICs	Fastest memory copy, offloads CPU
Hyper-V Replica	Async change-shipping to a second copy	Primary → replica host	Disaster recovery, not HA
HRL	Hyper-V Replication Log (tracked writes)	Primary host	The delta shipped each interval
Recovery point	A crash-consistent replica snapshot	Replica host	Fail over to before an event
RPO	Max data loss on failover	Replica relationship	One replication interval, worst case

The migration taxonomy: three types, two kinds

Get the words right and the rest of the article follows. Hyper-V has three live migration types (by storage location) and two migration kinds (live vs quick). They compose.

Type	Storage situation	What moves	Requires	Typical use
Shared-storage live	VHDX on CSV or SMB 3 both hosts see	Memory + device state only	Shared storage; cluster or SOFS	Fast host evacuation in a cluster
Shared-nothing live	VHDX on the source host’s local disk	Memory and storage over the network	Only network + delegation	Moving between standalone hosts
Storage migration	VM stays on the same host	Only the VHDX files, to a new path	Nothing special	Rebalance LUNs, move off a failing disk

The two kinds — how the move actually happens:

Kind	Mechanism	Downtime	Availability	When it runs
Live migration	Iterative memory pre-copy, sub-second cutover	~0 (milliseconds)	Guest stays up	Default for planned moves
Quick migration	Save state → move → restore state	Seconds to minutes	Guest offline during save/restore	Cluster fallback; CPU incompatibility

Shared-nothing live migration is the headline capability of the standalone world: it relocates a running VM between two hosts that share nothing — no SAN, no cluster, no common storage — copying disk and live memory in one operation. It is slower than memory-only migration because it physically moves the VHDX, but it needs zero shared infrastructure, which is exactly why it is the workhorse for standalone hosts. All three live types are online; only quick migration incurs a felt pause.

A decision table for picking the right operation:

If your situation is…	Storage topology	Use	Because
Evacuating a clustered host for patching	VHDX on CSV	Shared-storage live	Only memory moves — fastest
Moving a VM between two standalone hosts	VHDX on local disk	Shared-nothing live	No shared storage exists
Moving a VM off a failing local disk, same host	Local disk	Storage migration	Only the disk needs to move
Rebalancing a VM to a different SMB share	SMB 3	Storage migration (or shared-nothing)	Move the disk; keep it running
Source and destination CPUs are incompatible	Any	Quick migration	Live migration can’t bridge the CPU gap
Consolidating from many hosts onto a cluster	Local → CSV	Shared-nothing live (into the cluster)	Moves disk + memory onto shared storage

Inside a live migration: the memory-copy phases

Understanding why a migration is slow, or why a busy VM won’t converge, requires knowing the phases. A live migration runs in five distinct stages.

Phase	What happens	VM state	Dominant cost
1. Setup	Destination validates compatibility, allocates memory, creates the VM shell	Running normally	Negligible (validation)
2. Initial memory copy	The full working set is copied to the destination while the VM runs	Running (brownout)	Network bandwidth
3. Iterative pre-copy	Pages the guest dirtied during the copy are re-sent, repeatedly, shrinking the dirty set	Running (brownout)	Bandwidth vs dirty rate
4. Blackout / final copy	VM is briefly suspended; the last dirty pages + CPU/device state transfer	Paused (milliseconds)	Latency of the final flush
5. Cleanup	VM resumes on the destination; source releases resources; the switch relearns the MAC	Running on destination	Negligible

The interesting physics is in phases 3 and 4. During the brownout the VM is still executing and therefore still dirtying memory pages. The engine copies the working set, then the pages that changed during that copy, then the pages that changed during that copy, and so on — each round should transfer less than the last. This converges only if the link ships dirty pages faster than the guest produces them. A VM writing heavily to memory (a busy cache, a database under load) on a slow link can dirty pages as fast as they are sent; the dirty set never shrinks enough for a clean blackout. Hyper-V has a stop-and-copy fallback, but the practical answer is a faster migration network — the entire reason the reference build puts migration on dedicated 25 GbE.

The blackout (phase 4) is the only moment the guest is genuinely paused, and it must stay sub-second — the window that would otherwise drop a TCP connection or RDP session. A well-configured migration keeps it in the low tens of milliseconds; when you hear “no downtime,” this millisecond blackout is the technically-precise version of that claim.

A few consequences follow directly. Bigger RAM means a longer migration, because phase 2 copies the whole working set — a 256 GB VM takes far longer than a 4 GB one. Busy VMs converge slowly, because phase 3 fights the dirty rate, so migrate memory-heavy VMs on the fastest link. A fast network shortens both brownout and blackout (less pre-copy, smaller final flush), which is why 25 GbE RDMA beats 1 GbE dramatically. But the blackout itself is a fixed ceiling, not RAM-scaled — only the residual dirty set moves in it — so downtime stays sub-second regardless of VM size. For shared-nothing, budget the VHDX copy time separately, on top of the memory copy.

Authentication: CredSSP vs Kerberos constrained delegation

This is the decision that trips everyone up, and the misconfiguration that generates the most “it just fails” tickets. Live migration needs the source host to authenticate to the destination on your behalf and pull files across SMB. There are two ways to grant that.

Aspect	CredSSP	Kerberos constrained delegation
What’s delegated	Your interactive user credentials	The host computer accounts, per-service
Where you must be logged on	Interactively on the source host	Anywhere — workstation, VMM, remote session
AD configuration	None	`msDS-AllowedToDelegateTo` + delegation flag
Remote initiation (from your desktop)	No — it’s a double hop CredSSP can’t do	Yes — the whole point
Hyper-V Manager / VMM can drive it	Only from the console	Yes, remotely
Security posture	Credentials land on the destination	No credential exposure; scoped to named services
Setup effort	Trivial	Moderate (one-time AD edit per direction)
When to use	Quick lab, one-off from the console	Any managed estate — the default

The CredSSP trap is the “double hop.” When you run Move-VM -ComputerName hv01 -DestinationHost hv02 from your workstation, your session authenticates to hv01 (hop one), and then hv01 must authenticate to hv02 on your behalf (hop two). CredSSP delegates exactly one hop, so hop two fails with access-denied. It only works when you’re logged on at hv01 itself, collapsing it to a single hop. Kerberos constrained delegation solves this by trusting the computer account to delegate, independent of where you initiated the command.

Set the authentication type on every host to Kerberos:

$hosts = 'hv01','hv02','hv-dr01'
Invoke-Command -ComputerName $hosts -ScriptBlock {
    Enable-VMMigration
    Set-VMHost -VirtualMachineMigrationAuthenticationType Kerberos
}

Now configure constrained delegation in AD. Live migration requires delegating two services from each source host to each destination it might migrate to: Microsoft Virtual System Migration Service (the migration control channel) and cifs (so the destination can pull VM files over SMB). Delegation is directional — configuring hv01 → hv02 does not configure hv02 → hv01 — so bidirectional migration between the two hosts is set up both ways.

Delegated service SPN	Purpose	Required for	Direction
`Microsoft Virtual System Migration Service/<dest FQDN>`	The live-migration control channel	Every live migration	Source → each destination
`Microsoft Virtual System Migration Service/<dest short name>`	Same, NetBIOS form	Robustness (both name forms)	Source → each destination
`cifs/<dest FQDN>`	SMB file access to pull the VHDX / config	Shared-nothing & storage moves	Source → each destination
`cifs/<dest short name>`	Same, NetBIOS form	Robustness	Source → each destination

Configure it with the ActiveDirectory module — delegation is set on the source computer account, targeting the destination’s SPNs:

Import-Module ActiveDirectory

function Add-MigrationDelegation {
    param($SourceHost, $DestHost)
    $dest = Get-ADComputer $DestHost
    $spns = @(
        "Microsoft Virtual System Migration Service/$($dest.DNSHostName)",
        "Microsoft Virtual System Migration Service/$DestHost",
        "cifs/$($dest.DNSHostName)",
        "cifs/$DestHost"
    )
    Set-ADComputer $SourceHost -Add @{ 'msDS-AllowedToDelegateTo' = $spns }
    # Constrained delegation needs the source account flagged for it:
    Get-ADComputer $SourceHost | Set-ADAccountControl -TrustedToAuthForDelegation $true
}

# Bidirectional between hv01 and hv02 (both directions, explicitly):
Add-MigrationDelegation -SourceHost 'hv01' -DestHost 'hv02'
Add-MigrationDelegation -SourceHost 'hv02' -DestHost 'hv01'

The number-one silent failure. Kerberos caches delegation data in ticket-granting tickets. After you edit msDS-AllowedToDelegateTo, the change is not effective on a host until its Kerberos ticket cache reflects the new AD state. In practice that means rebooting the affected hosts (or waiting for ticket renewal, up to ~10 hours by default) before testing. Nearly every “I configured delegation and it still fails” report is a host that hasn’t refreshed its tickets. Reboot, then test.

Verify delegation is present and correctly scoped:

Get-ADComputer hv01 -Properties msDS-AllowedToDelegateTo, TrustedToAuthForDelegation |
    Select-Object -ExpandProperty msDS-AllowedToDelegateTo

You should see the four SPNs for hv02. An empty list means Set-ADComputer didn’t take (or replicated to a DC you’re not querying); a False TrustedToAuthForDelegation means the account flag is missing and delegation won’t fire.

Migration networks, SMB Direct, and performance options

By default, Hyper-V uses any available network for live migration — so migration traffic can saturate the management NIC or, worse, the production NIC carrying your VMs’ traffic, turning a “no downtime” migration into an “everything is slow” incident. Pin migration traffic to a dedicated subnet and order it explicitly.

Invoke-Command -ComputerName $hosts -ScriptBlock {
    # Stop using every network; opt in to specific subnets only.
    Set-VMHost -UseAnyNetworkForMigration $false

    # Remove the catch-all and add the dedicated migration subnet.
    Get-VMMigrationNetwork | Remove-VMMigrationNetwork -ErrorAction SilentlyContinue
    Add-VMMigrationNetwork -Subnet 10.10.99.0/24 -Priority 10
}

Then choose the performance option, which governs the transport used for the memory copy. This is the single biggest throughput lever.

Performance option	Transport	CPU cost	Best on	Requires
TCP/IP	Raw TCP, uncompressed	Low	Simple setups where you want neither	Nothing
Compression	Memory pages compressed on CPU before send	High (spends CPU)	Constrained links (1 GbE) with spare CPU	Nothing (default)
SMB	SMB 3 — enables SMB Direct (RDMA) + Multichannel	Very low with RDMA (offloaded)	Fast NICs, especially RDMA-capable	RDMA NICs for SMB Direct

Invoke-Command -ComputerName $hosts -ScriptBlock {
    # SMB  -> uses SMB 3; SMB Direct (RDMA) + SMB Multichannel if NICs support it
    # Compression -> compresses memory pages on the CPU before sending (default)
    # TCPIP -> raw TCP, no compression
    Set-VMHost -VirtualMachineMigrationPerformanceOption SMB
}

The trade-off is concrete. Compression spends CPU cycles to shrink the memory stream — a good bargain when the network is the bottleneck (1 GbE) and the host has spare cores. SMB rides SMB 3, which transparently uses SMB Direct (RDMA) when your NICs are RDMA-capable (RoCE or iWARP) and SMB Multichannel to aggregate multiple links or queues. On the reference 25 GbE RDMA hardware, SMB is dramatically faster than compression and offloads the CPU entirely — the memory copy runs through the NIC’s RDMA engine with the host CPUs barely involved. TCP/IP is the do-nothing fallback. The rule of thumb: below ~10 GbE with spare CPU, Compression buys effective bandwidth; at 10 GbE and above compression only wastes CPU, so use SMB (and if the NICs are RDMA-capable, SMB Direct is the clear winner — near wire speed with the host CPU untouched).

Live migration over SMB uses TCP port 6600 on the destination in addition to the standard SMB ports (445). Permit it on the host firewall and any inter-host ACLs on the migration subnet, or the migration will fail to establish the SMB channel.

Port	Protocol	Used by	Direction	Notes
6600	TCP	Live migration over SMB	To destination	Must be open on the migration subnet
445	TCP	SMB (file transfer)	To destination	Standard SMB; also needed
80	TCP	Hyper-V Replica (Kerberos/HTTP)	To replica server	Only if using Kerberos transport
443	TCP	Hyper-V Replica (Certificate/HTTPS)	To replica server	Only if using certificate transport

Confirm SMB Multichannel and RDMA are actually engaging once a migration runs — a silent fallback to plain SMB is a common, invisible performance loss:

Get-SmbMultichannelConnection -ServerName hv02 |
    Format-Table ClientIp, ServerIp, ClientRdmaCapable, ServerRdmaCapable

If *RdmaCapable reads False on a NIC you expected to be RDMA, you are silently on plain SMB. Check Get-NetAdapterRdma (is RDMA enabled on the adapter?) and the switch’s DCB/PFC configuration (RoCE needs lossless Ethernet via Priority Flow Control). Getting RDMA wrong is the difference between a 40-second migration and a 6-minute one.

Executing and throttling migrations

With authentication, networking, and transport configured, the moves are one-liners.

A shared-nothing live migration — moves both compute and storage in one shot, between hosts that share nothing:

# Run remotely — Kerberos delegation makes this possible from your workstation.
Move-VM -Name 'app-web01' `
        -ComputerName hv01 `
        -DestinationHost hv02 `
        -IncludeStorage `
        -DestinationStoragePath 'D:\VMs\app-web01'

A pure storage migration — move the VHDX to a new volume while the VM keeps running on the same host:

Move-VMStorage -VMName 'app-db01' `
               -DestinationStoragePath 'E:\VMs\app-db01' `
               -ComputerName hv01

A memory-only live migration — when the VM is already on shared storage both hosts see (SMB 3 share or, in a cluster, a CSV):

Move-VM -Name 'app-cache01' -ComputerName hv01 -DestinationHost hv02

The key parameters, decoded:

Parameter	What it does	When to use
`-DestinationHost`	The target Hyper-V host	Every live migration between hosts
`-IncludeStorage`	Also move the VHDX (shared-nothing)	Local-disk VMs; no shared storage
`-DestinationStoragePath`	Where the VHDX lands on the destination	With `-IncludeStorage`
`-VirtualMachinePath` / `-SmartPagingFilePath`	Fine-grained placement of config/paging	When splitting files across volumes
`-Vhds`	Per-VHD destination mapping	Spreading disks across multiple volumes
`-ComputerName`	Where the command targets (the source)	Always (unless run on the source)

Throttling matters. Each host caps the number of simultaneous migrations to protect the network and disk from being overwhelmed. Defaults are conservative (2 of each); tune them to your fabric.

Invoke-Command -ComputerName $hosts -ScriptBlock {
    Set-VMHost -MaximumVirtualMachineMigrations 4 `  # concurrent live migrations
               -MaximumStorageMigrations 2           # concurrent storage moves
}

Setting	Default	When to raise	When to keep low
`MaximumVirtualMachineMigrations`	2	25 GbE/RDMA — push to 4–8 for fast evacuation	1 GbE — leave at 2 or a host drain thrashes
`MaximumStorageMigrations`	2	Fast, separate volumes	Same-volume moves are disk-bound; don’t over-parallelize

On 25 GbE RDMA, push the live-migration count higher so a full evacuation completes fast. On 1 GbE, leave it low — parallelizing too many migrations over a thin link makes all of them slow and starves the guests still running. Storage migrations are disk-bound; running several against the same volume just makes them contend, so parallelize storage moves only across independent volumes.

Processor compatibility: the migration blocker nobody expects

A live migration can fail at the setup phase for a reason that has nothing to do with networking or delegation: the destination CPU exposes a different instruction set than the source, and the running guest is using CPU features the destination lacks. The engine refuses to migrate a VM into a processor it can’t guarantee will keep running the guest correctly.

The fix is processor compatibility mode, a per-VM setting that presents the guest with a lowest-common-denominator CPU feature set — masking off the newer instructions so the VM can move between different processor generations of the same vendor.

# Must be set while the VM is OFF (it changes the CPU features exposed to the guest).
Stop-VM -Name 'app-web01' -ComputerName hv01
Set-VMProcessor -VMName 'app-web01' -ComputerName hv01 -CompatibilityForMigrationEnabled $true
Start-VM -Name 'app-web01' -ComputerName hv01

Scenario	Live migration works?	What to do
Identical CPU models	Yes	Nothing
Same vendor, different generation (e.g. Xeon Gen 2 → Gen 4)	Only with compatibility mode	Enable `CompatibilityForMigrationEnabled`
Different vendors (Intel ↔ AMD)	No — never	Cannot live-migrate; rebuild or quick-migrate offline
Compatibility mode enabled everywhere	Yes, across generations	Standard for mixed-generation clusters

The trade-off: compatibility mode hides the newest CPU instructions from the guest, costing a sliver of performance for workloads that would otherwise use them (some crypto and vector operations). In practice the cost is negligible and the mobility benefit large — in a cluster spanning CPU generations, enable it on every VM as standard. Note the hard wall: you can never live-migrate between Intel and AMD hosts. A mixed-vendor estate is separate migration domains, full stop.

Live migration inside a failover cluster

Everything so far centered on standalone hosts, where the manual configuration and the mistakes concentrate. Clustered live migration is a distinct, simpler world — worth its own section because the storage model changes what actually moves.

In a failover cluster, VM storage lives on a Cluster Shared Volume (CSV) — an NTFS/ReFS volume that all nodes read and write simultaneously. Because every node already sees the VHDX, a clustered live migration only moves memory and device state, never the disk. That makes it the fast path: no -IncludeStorage, no VHDX copy, just the iterative memory pre-copy and a sub-second cutover.

Aspect	Standalone (shared-nothing)	Clustered (CSV-backed)
What moves	Memory and the VHDX	Memory only
Storage requirement	None (disk moves over the network)	Shared CSV all nodes see
Speed	Slower (disk copy)	Fast (memory only)
Authentication	Kerberos delegation (or CredSSP)	Same options; cluster identity simplifies
Failure handling	Manual	Cluster can auto-recover on node failure
Tooling	`Move-VM`	`Move-ClusterVirtualMachineRole` or `Move-VM`
Networking	Dedicated migration subnet	Cluster network with live-migration priority

Clustered live migration is driven through the cluster role rather than the raw VM:

# Live-migrate a clustered VM role to a specific node (memory-only over the CSV).
Move-ClusterVirtualMachineRole -Name 'app-web01' `
    -Node hv-clu02 `
    -MigrationType Live `
    -Cluster hv-clu01

The cluster decides which networks live migration may use, ranked by priority — so you tell the cluster which network is the migration network rather than pinning it per host:

# Order cluster networks for live migration; put the dedicated migration net first.
$mig = Get-ClusterNetwork -Cluster hv-clu01 | Where-Object { $_.Address -eq '10.10.99.0' }
$mig.Role = 'Cluster'   # cluster + client, eligible for live migration
# Set live-migration network preference via the cluster's LiveMigrationNetworks (Failover Cluster Manager
# or Set-ClusterParameter) — the dedicated 25 GbE net should be the top preference.

The MigrationType parameter is where quick vs live lives in the cluster:

`MigrationType`	Behaviour	Downtime	When the cluster uses it
`Live`	Iterative memory pre-copy, ~0 downtime	Milliseconds	Planned moves, node drain
`Quick`	Save state → move → restore	Seconds–minutes	CPU incompatibility; explicit request
`Shutdown`	Guest OS shutdown, move, restart	Full boot	Last resort / forced

A crucial distinction the cluster introduces: live migration is planned mobility; failover is unplanned recovery. If a cluster node dies unexpectedly, its VMs do not “live migrate” anywhere — there was no time to pre-copy memory from a dead host. The cluster fails them over: it restarts them on a surviving node from their last-persisted CSV state (the equivalent of pulling the power and booting elsewhere) — a reboot of the guest, not a zero-downtime move. Confusing the two leads people to expect zero-downtime survival of a crashed host, which clustering does not provide.

Draining a node for maintenance uses live migration under the hood to move every VM off first:

# Pause a node and live-migrate all its VMs to other nodes before patching.
Suspend-ClusterNode -Name hv-clu01 -Drain -Cluster hv-clu01
# ...patch the node...
Resume-ClusterNode -Name hv-clu01 -Failback Immediate -Cluster hv-clu01

This is exactly what Patching Failover Clusters with Cluster-Aware Updating and Stretch Clusters via Storage Replica automates across a whole cluster.

System Center Virtual Machine Manager (VMM), briefly

Everything above is host-by-host PowerShell. At fleet scale — dozens of hosts, hundreds of VMs — you graduate to System Center Virtual Machine Manager (VMM/SCVMM), Microsoft’s management fabric on top of Hyper-V. It doesn’t replace the mechanics here; it orchestrates them and adds placement intelligence. Its key mobility value: Dynamic Optimization live-migrates VMs to balance cluster load automatically, and Power Optimization consolidates VMs onto fewer hosts off-peak — both driven by live migration under the hood. VMM also standardizes migration and networking settings across a host group, so you set transport and delegation policy once rather than per host.

VMM capability	What it does	Underlying mechanism
Dynamic Optimization	Rebalances VMs across cluster nodes on load thresholds	Automated live migration
Power Optimization	Consolidates VMs off-peak; powers down idle hosts	Live migration + host power control
Intelligent placement	Rates hosts by fit when you deploy/migrate a VM	Star-rating against CPU/RAM/storage
Host groups + profiles	Uniform migration/network settings across many hosts	Applies host config as policy
Cross-cluster / storage migration	Wizard-driven moves between clusters and storage	`Move-VM` / storage migration

The trade-off is operational weight — VMM is a full System Center product with its own database, agents, and licensing. For a handful of hosts, the raw cmdlets here are the right tool; VMM earns its keep once per-host management and manual rebalancing stop scaling.

Enabling Hyper-V Replica

Replica is independent of live migration, uses its own listener, and lives on a separate configuration surface. Enable the replica server role on the DR host (hv-dr01) — the side that receives replicas. Decide the transport first; it dictates ports and whether you need certificates.

Transport	Port	Encryption	Requires	When to use
Kerberos (HTTP)	80	None on the wire	Same-forest domain join	Trusted/private network or VPN link
Certificate (HTTPS)	443	Mutual TLS end to end	A cert on each host	Across untrusted domains or the public internet

Kerberos/HTTP is simple — no certificates — but unencrypted on the wire, so it’s only acceptable over an already-private or VPN-tunneled link. Certificate/HTTPS is mandatory across untrusted networks or the internet, and needs a certificate on each host whose Enhanced Key Usage includes both Server Authentication and Client Authentication (each host acts as both).

Kerberos/HTTP, suited to the cross-site-but-private-WAN reference:

Invoke-Command -ComputerName hv-dr01 -ScriptBlock {
    Set-VMReplicationServer `
        -ReplicationEnabled $true `
        -AllowedAuthenticationType Kerberos `
        -KerberosAuthenticationPort 80 `
        -DefaultStorageLocation 'R:\Replicas'

    # Authorize which primary servers may send, and where their replicas land.
    New-VMReplicationAuthorizationEntry `
        -AllowedPrimaryServer '*.contoso.local' `
        -ReplicaStorageLocation 'R:\Replicas' `
        -TrustGroup 'PrimarySite'
}

Open the listener on the DR host’s firewall — the rule ships built-in, just disabled:

Invoke-Command -ComputerName hv-dr01 -ScriptBlock {
    Enable-NetFirewallRule -DisplayName 'Hyper-V Replica HTTP Listener (TCP-In)'
    # For certificate-based replication instead:
    # Enable-NetFirewallRule -DisplayName 'Hyper-V Replica HTTPS Listener (TCP-In)'
}

The Set-VMReplicationServer knobs, decoded:

Parameter	What it controls	Notes
`-ReplicationEnabled`	Turns the host into a replica server	Must be `$true` on the receiving side
`-AllowedAuthenticationType`	Kerberos, Certificate, or both	Match the primary’s `-AuthenticationType`
`-KerberosAuthenticationPort`	HTTP listener port	Default 80
`-CertificateAuthenticationPort`	HTTPS listener port	Default 443; certificate transport
`-CertificateThumbprint`	The host’s TLS cert	Required for certificate transport
`-DefaultStorageLocation`	Where replicas land by default	Overridable per authorization entry

The authorization entry is the security boundary — it scopes which primaries may push replicas here and where their data lands. TrustGroup segments multiple primaries so one tenant’s replicas can’t be confused with another’s.

Authorization parameter	Purpose
`-AllowedPrimaryServer`	Which primaries may replicate (FQDN or wildcard like `*.contoso.local`)
`-ReplicaStorageLocation`	Where this group’s replicas are stored
`-TrustGroup`	A label grouping primaries that trust each other; isolates tenants

For certificate auth: set -AllowedAuthenticationType Certificate -CertificateAuthenticationPort 443 -CertificateThumbprint <thumbprint> on Set-VMReplicationServer, and enable replication on each VM with -AuthenticationType Certificate -CertificateThumbprint <primary-side-thumbprint>. The certificate’s EKU must include Server Authentication and Client Authentication, or the mutual-TLS handshake fails one direction and replication never establishes.

Tuning frequency, recovery points, and resync windows

Now enable replication for a VM, pointing the primary host at the replica server. These parameters define your RPO and recovery granularity — the heart of a Replica design.

Enable-VMReplication -VMName 'app-web01' `
    -ComputerName hv01 `
    -ReplicaServerName 'hv-dr01.contoso.local' `
    -ReplicaServerPort 80 `
    -AuthenticationType Kerberos `
    -ReplicationFrequencySec 300 `
    -RecoveryHistory 12 `
    -VSSSnapshotFrequencyHour 4

# Kick off the first full copy (defer to off-hours with -InitialReplicationStartTime).
Start-VMInitialReplication -VMName 'app-web01' -ComputerName hv01

What each knob does:

Parameter	Accepts	What it controls	Guidance
`-ReplicationFrequencySec`	30, 300, or 900 only	How often the delta log ships → your RPO floor	30 for tier-1 DBs; 300 general; 900 bulk/archival
`-RecoveryHistory`	0–24	Extra crash-consistent recovery points kept	0 = latest only; higher = more time-travel, more storage
`-VSSSnapshotFrequencyHour`	1–12	App-consistent (VSS) snapshot cadence	Only meaningful when `RecoveryHistory > 0`
`-AuthenticationType`	Kerberos / Certificate	Transport, must match the replica server	Kerberos for private links
`-ReplicaServerName` / `-ReplicaServerPort`	FQDN / port	Where to send	Port 80 (Kerberos) or 443 (cert)
`-InitialReplicationStartTime`	datetime	Defer the first full copy	Schedule initial replication off-peak

The three parameters that define your recovery story: -ReplicationFrequencySec accepts exactly 30, 300, or 900 seconds — the cadence the HRL delta ships, and effectively your RPO floor (die just before a cycle and you lose up to one interval). -RecoveryHistory (0–24) keeps additional crash-consistent recovery points beyond the latest so you can fail over to a point before a corruption event — 0 is fine for stateless VMs, dangerous for anything that can be silently corrupted, and each extra point costs storage on the replica. -VSSSnapshotFrequencyHour (1–12) layers application-consistent VSS snapshots on top, so VSS-aware guests (SQL Server, Exchange) recover transactionally clean rather than crash-consistent; it’s only meaningful when RecoveryHistory > 0.

Consistency is the subtle part — know what each recovery-point type guarantees:

Recovery-point type	Consistency	Guest on recovery	Use for
Latest (standard)	Crash-consistent	As if power was pulled; may replay logs	Stateless / tolerant apps
Additional recovery points	Crash-consistent	Same, at an earlier time	Escaping recent corruption
VSS (application-consistent)	Application-consistent	Transactionally clean, no recovery needed	SQL, Exchange, VSS-aware apps

For initial replication of large VMs, copying the full VHDX over a thin WAN is brutal and can take days. Better options:

Initial replication method	Mechanism	Best when
Over the network (default)	Full VHDX copy across the wire	Small VMs, fat link
Send over the network, deferred	`-InitialReplicationStartTime` to off-peak	Medium VMs; avoid business-hours WAN load
Seed from a restored backup	Pre-place a restored copy on the replica	Large VMs; a recent backup exists on DR side
Export/import (out-of-band)	Ship a disk physically; import at DR	Huge VMs; WAN too thin to ever seed

If replication ever breaks badly enough to need a full resync, schedule it — resync re-hashes the entire disk and is bandwidth-heavy, exactly the operation you don’t want firing at 2pm on a shared WAN:

# Constrain when an out-of-sync VM may resynchronize (off-peak window only).
Set-VMReplication -VMName 'app-web01' -ComputerName hv01 `
    -AutoResynchronizeEnabled $true `
    -AutoResynchronizeIntervalStart '22:00:00' `
    -AutoResynchronizeIntervalEnd '06:00:00'

Extended (chained) replication adds a third copy — replicate primary → replica, then from that replica to a third server (Set-VMReplication -Extended -ReplicaServerName hv-dr02... on the secondary), giving a two-tier DR fan-out where the tertiary sits further away on a looser frequency. Use it when a single DR site isn’t enough — a regional replica plus a distant archival copy.

Planned, unplanned, and test failover

These three procedures are not interchangeable, and using the wrong one loses data. Internalize the difference before you touch a production DR event.

Failover type	When to use	Data loss	Run it on	Reverses replication?
Test	Verify the replica boots — anytime	None (isolated copy)	Replica (DR)	No — production keeps replicating
Planned	Primary is healthy (site maintenance)	Zero — final delta flushed first	Primary, then replica	Yes, after cutover
Unplanned	Primary is gone (disaster)	Up to one replication interval	Replica (DR)	Not until primary returns

Test failover — non-disruptive. Spins up an isolated copy of the replica on the DR host, disconnected from the network, so you can verify the VM boots and the app works while production replication runs untouched. This is the only failover you run routinely, and you should — Normal health proves data is arriving, not that the VM boots.

# On the DR host (replica side): create an isolated test copy.
Start-VMFailover -VMName 'app-web01' -ComputerName hv-dr01 -AsTest
# ...verify the test VM boots and the app is intact, then clean it up:
Stop-VMFailover -VMName 'app-web01' -ComputerName hv-dr01 -AsTest

Planned failover — zero data loss, when the primary is healthy and reachable (site maintenance, a controlled migration). It flushes the final delta before cutover, so nothing is lost, then reverses direction. The prepare step runs on the primary:

# 1. On the PRIMARY: shut the VM down and ship the last changes.
Stop-VM -Name 'app-web01' -ComputerName hv01
Start-VMFailover -VMName 'app-web01' -ComputerName hv01 -Prepare

# 2. On the REPLICA (DR): bring it online as the live copy.
Start-VMFailover -VMName 'app-web01' -ComputerName hv-dr01
Start-VM -Name 'app-web01' -ComputerName hv-dr01

# 3. On the REPLICA: reverse replication so DR is now primary, hv01 is the replica.
Set-VMReplication -VMName 'app-web01' -ComputerName hv-dr01 -Reverse

Unplanned failover — the primary site is gone. There is no final flush; you accept losing whatever had not yet replicated (up to one ReplicationFrequencySec interval). Run it on the replica, optionally choosing an older recovery point to escape corruption:

# On the REPLICA (DR), primary is unreachable:
Start-VMFailover -VMName 'app-web01' -ComputerName hv-dr01
Start-VM -Name 'app-web01' -ComputerName hv-dr01

To recover to an earlier point (to escape a corruption or ransomware event), pass -VMRecoverySnapshot with a snapshot from Get-VMSnapshot:

# List available recovery points and fail over to a chosen (earlier) one.
$rp = Get-VMSnapshot -VMName 'app-web01' -ComputerName hv-dr01 |
        Where-Object SnapshotType -eq 'Replica' | Sort-Object CreationTime
Start-VMFailover -VMName 'app-web01' -ComputerName hv-dr01 -VMRecoverySnapshot $rp[2]
Start-VM -Name 'app-web01' -ComputerName hv-dr01

Once the chosen point checks out, commit it with Complete-VMFailover to discard the others and finalize:

Complete-VMFailover -VMName 'app-web01' -ComputerName hv-dr01

The failover cmdlet map, so you never guess which switch does what:

Cmdlet + switch	Runs on	Effect
`Start-VMFailover -AsTest`	Replica	Isolated test copy; production unaffected
`Stop-VMFailover -AsTest`	Replica	Tears down the test copy
`Start-VMFailover -Prepare`	Primary	Flushes final delta (planned failover)
`Start-VMFailover` (no switch)	Replica	Brings the replica online as live
`Start-VMFailover -VMRecoverySnapshot`	Replica	Fails over to a chosen earlier point
`Complete-VMFailover`	Replica	Commits the chosen point, discards others
`Set-VMReplication -Reverse`	New primary	Reverses the replication direction

Failback: reversing replication after the primary returns

After an unplanned failover, the DR copy is running but not replicating anywhere — it’s a single point of failure until you re-establish protection. When the primary site comes back, reverse the relationship, let it resync DR→primary, then plan-failover back during a maintenance window.

# 1. With the VM running on DR, reverse replication so primary becomes the replica.
Set-VMReplication -VMName 'app-web01' -ComputerName hv-dr01 -Reverse
Start-VMInitialReplication -VMName 'app-web01' -ComputerName hv-dr01

# 2. Once health is 'Normal', do a PLANNED failover back to hv01 (see the previous section),
#    then -Reverse again so hv01 -> hv-dr01 is restored as the steady state.

Failback is just a planned failover in the opposite direction. Never skip the resync-and-verify step — cutting back to a primary whose disk drifted (because it was down while DR took writes) turns a recovered incident into a fresh corruption. The order is always: reverse → resync → verify Normal → planned failover back → reverse to steady state.

Failback step	Command	Why
1. Reverse direction	`Set-VMReplication -Reverse` (on DR)	Make the returned primary the replica
2. Resync DR → primary	`Start-VMInitialReplication` (on DR)	Bring the stale primary current
3. Wait for `Normal`	`Measure-VMReplication`	Never cut back to a drifted disk
4. Planned failover back	`Start-VMFailover -Prepare` on DR, bring up on primary	Zero-data-loss return
5. Reverse to steady state	`Set-VMReplication -Reverse` on primary	Restore primary → DR as normal

Verification and monitoring

Configuration you can’t verify is configuration you don’t have. Confirm the live-migration config is what you set, then prove a real move works:

# Performance + network config is what you intended:
Get-VMHost -ComputerName hv01 |
    Format-List VirtualMachineMigrationEnabled, `
        VirtualMachineMigrationAuthenticationType, `
        VirtualMachineMigrationPerformanceOption, `
        MaximumVirtualMachineMigrations
Get-VMMigrationNetwork -ComputerName hv01

# Move a low-risk VM hv01 -> hv02 and confirm it stayed up (ping it in another window).
Move-VM -Name 'test-canary' -ComputerName hv01 -DestinationHost hv02 -IncludeStorage `
        -DestinationStoragePath 'D:\VMs\test-canary'

Replica health is reported per VM by Measure-VMReplication — this is your DR dashboard:

Measure-VMReplication -ComputerName hv01 |
    Format-Table Name, State, Health, LReplTime, LReplSize, AvgReplSize, PrimaryServerName

The health states and what they mean:

`Health` value	Meaning	Action
`Normal`	Replicating on schedule	None — but still run test failovers
`Warning`	Missed some cycles; behind but recovering	Check WAN bandwidth, listener, and load
`Critical`	Replication failed / far behind / paused	Investigate now; may need resync

LReplTime (last replication time) should be within one frequency interval of now; a stale time with Normal health is a red flag. Confirm the listener is live on the DR host:

Get-VMReplicationServer -ComputerName hv-dr01 |
    Format-List ReplicationEnabled, AllowedAuthenticationType, KerberosAuthenticationPort
Get-NetTCPConnection -LocalPort 80 -State Listen -ErrorAction SilentlyContinue

And run a quarterly test failover — Normal health proves data is arriving, not that the VM boots. The most common real-world Replica failure is a relationship healthy for months but never test-failed, so nobody noticed the guest won’t boot (a missing integration driver) until the actual disaster.

Architecture at a glance

Picture the reference estate as three zones connected by two very different links. In the primary site sit hv01 and hv02, each with a management/production path for guest traffic and a dedicated 25 GbE RDMA path on 10.10.99.0/24 solely for live migration. When you evacuate hv01 to patch it, a VM’s running memory streams across that RDMA path — copied iteratively while the guest keeps serving — until a sub-second blackout switches execution to hv02. Because the two hosts share no storage, the VHDX rides that same path (shared-nothing); were they cluster nodes over a CSV, only memory would move. Kerberos constrained delegation, configured in contoso.local, is what lets you trigger the move from your workstation — the source’s computer account is trusted to delegate the migration service and cifs to the destination.

The second zone is the failover cluster (hv-clu01/hv-clu02) sharing a CSV — mobility is memory-only and fast, and a node that dies is handled not by live migration but by the cluster failing over the VMs onto the survivor from last-persisted CSV state (a reboot, not a zero-downtime move). The third zone is DR host hv-dr01, reachable only across a thinner WAN that changes everything: live migration is impractical, so Hyper-V Replica ships a change log (HRL) on a 30/300/900-second cadence, keeping a copy always slightly behind but recoverable. The WAN is the scarce resource — migration traffic never crosses it, and Replica frequency plus resync windows are scheduled around it. The two guarantees layer: within a site, live migration and clustering keep a VM available through planned events and node loss; across sites, Replica keeps a warm copy for the disaster where the whole primary is gone. The first question for any VM is which guarantee it needs, and the zone it lives in answers it.

Real-world scenario

Meridian Logistics ran roughly 40 standalone Hyper-V hosts across two metro data centres joined by a 1 GbE WAN they neither owned nor could upgrade — a carrier circuit with fixed capacity. Every VM replicated to the second site, and every relationship sat at the default 5-minute (300s) frequency, including a chatty 2 TB SQL Server VM generating enormous deltas. Live migration within each site used the default “any network” setting and the default Compression option. It had worked for a year, which is exactly why nobody questioned it.

The incident began during a planned data-centre power test. To protect the affected hosts, the team began live-migrating workloads off them — while Replica kept running. On the shared 1 GbE uplink two flows now collided: shared-nothing migration traffic (which, with no dedicated migration network, went out the same interface as everything else) and the continuous Replica deltas. The uplink saturated. Migrations that should have taken two minutes stalled mid-flight, pinning VMs half-migrated. Worse, several relationships fell far enough behind to go Critical, triggering Hyper-V’s automatic resynchronization — and resync re-hashes the entire disk. The 2 TB SQL VM began re-hashing 2 TB across an already-saturated 1 GbE link. The congestion became catastrophic; the window closed with VMs still stuck.

The root cause was structural: one un-ownable, saturated WAN carrying two competing flows with no isolation and no scheduling. The fix had three parts. First, they gave live migration a dedicated path within each site and switched the memory-heavy VMs to SMB mode, so evacuations stopped competing and stopped burning CPU on compression:

Invoke-Command -ComputerName $siteHosts -ScriptBlock {
    Set-VMHost -UseAnyNetworkForMigration $false
    Add-VMMigrationNetwork -Subnet 10.10.99.0/24 -Priority 10
    Set-VMHost -VirtualMachineMigrationPerformanceOption SMB
}

Second, they tiered replication frequency by workload and bounded resync to an overnight window, so a Critical event could never re-hash a disk during business hours:

Set-VMReplication -VMName 'sql-tier1' -ComputerName hv11 -ReplicationFrequencySec 30
Set-VMReplication -VMName 'file-bulk' -ComputerName hv11 `
    -ReplicationFrequencySec 900 `
    -AutoResynchronizeEnabled $true `
    -AutoResynchronizeIntervalStart '23:00:00' `
    -AutoResynchronizeIntervalEnd '05:00:00'

Third — the part most teams miss — they seeded the 2 TB SQL replica from a backup restore already on the DR host, instead of ever crawling the full disk across 1 GbE. The next power test was a non-event: migrations completed on the dedicated path in minutes, Replica stayed Normal, and no resync fired in business hours. The timeline told the whole lesson:

Time	Event	Root issue	What it should have been
T+0	Power test starts; begin evacuating hosts	Migration shares the WAN with Replica	Migration on a dedicated path
T+8m	Live migrations stall mid-flight	1 GbE uplink saturated by two flows	Isolated migration network
T+15m	Several replicas go `Critical`	Deltas fell behind under congestion	Tiered frequency; bulk on 900s
T+18m	Auto-resync fires, re-hashes 2 TB	Resync unbounded, business hours	Resync bounded to overnight window
T+40m	Window closes, VMs still stuck	No isolation, no scheduling	All three fixes in place
Next test	Non-event	—	The correct steady state

The lesson on the wall was not “Replica is fragile.” It was that frequency and resync are bandwidth decisions, a shared WAN forces you to schedule them like the scarce resource they are — and live migration must never compete for that same wire.

Advantages and disadvantages

Hyper-V’s mobility model — live migration for planned events, Replica for disasters, clustering for node failure — is powerful precisely because the three are separate tools, but that separation is also its sharpest edge.

Advantages	Disadvantages
Live migration moves running VMs with sub-second downtime — patch hosts in business hours	Busy, memory-heavy VMs on slow links converge slowly or need a stop-and-copy fallback
Shared-nothing needs zero shared infrastructure — works between plain standalone hosts	Slower than memory-only; the whole VHDX crosses the network
SMB Direct/RDMA makes the memory copy near-wire-speed and offloads the CPU entirely	RDMA is fiddly (DCB/PFC/RoCE) and fails silently to plain SMB if misconfigured
Replica ships changes async, tolerating a slow/intermittent WAN	Always slightly behind — a failover loses up to one interval; not zero-RPO
Recovery points let you fail over to before a corruption/ransomware event	Each recovery point costs storage and processing on the replica
Clustering + CSV survives sudden node loss automatically	That survival is a reboot (failover), not a zero-downtime move — a common misconception
Everything is scriptable PowerShell — repeatable, reviewable, automatable	Kerberos delegation is a genuine footgun (directional, ticket-cached, silent when wrong)
Test failover proves recoverability non-disruptively	Only works if you actually run it — untested replicas fail the day you need them

The model is right for any Hyper-V estate beyond a single host. It bites hardest on estates with a thin or shared inter-site WAN (where Replica scheduling becomes a discipline), mixed-CPU-vendor hardware (which cannot live-migrate at all), and teams that configure delegation or Replica once and never verify — every one of these failure modes is invisible until the moment you depend on the feature.

Hands-on lab

Configure and verify a shared-nothing live migration between two hosts, then stand up and test a Replica relationship — all with real cmdlets and validation at each step. This assumes two domain-joined Hyper-V hosts (hv01, hv02) and a third (hv-dr01) for the Replica half; adapt the names to your lab. Run from an elevated PowerShell session that can reach all three.

Step 1 — Set the migration authentication type to Kerberos on both hosts.

$hosts = 'hv01','hv02'
Invoke-Command -ComputerName $hosts -ScriptBlock {
    Enable-VMMigration
    Set-VMHost -VirtualMachineMigrationAuthenticationType Kerberos
}

Expected: no errors. Verify: Get-VMHost -ComputerName hv01 | Select VirtualMachineMigrationAuthenticationType returns Kerberos.

Step 2 — Configure constrained delegation both directions, then reboot.

Import-Module ActiveDirectory
function Add-MigrationDelegation {
    param($SourceHost, $DestHost)
    $dest = Get-ADComputer $DestHost
    $spns = @(
        "Microsoft Virtual System Migration Service/$($dest.DNSHostName)",
        "Microsoft Virtual System Migration Service/$DestHost",
        "cifs/$($dest.DNSHostName)","cifs/$DestHost")
    Set-ADComputer $SourceHost -Add @{ 'msDS-AllowedToDelegateTo' = $spns }
    Get-ADComputer $SourceHost | Set-ADAccountControl -TrustedToAuthForDelegation $true
}
Add-MigrationDelegation -SourceHost 'hv01' -DestHost 'hv02'
Add-MigrationDelegation -SourceHost 'hv02' -DestHost 'hv01'
Restart-Computer -ComputerName $hosts -Wait -For PowerShell   # ticket cache MUST refresh

Expected: hosts reboot and come back. Verify: Get-ADComputer hv01 -Properties msDS-AllowedToDelegateTo | Select -Expand msDS-AllowedToDelegateTo lists the four hv02 SPNs.

Step 3 — Pin migration to a dedicated subnet and pick a transport.

Invoke-Command -ComputerName $hosts -ScriptBlock {
    Set-VMHost -UseAnyNetworkForMigration $false
    Get-VMMigrationNetwork | Remove-VMMigrationNetwork -ErrorAction SilentlyContinue
    Add-VMMigrationNetwork -Subnet 10.10.99.0/24 -Priority 10
    Set-VMHost -VirtualMachineMigrationPerformanceOption SMB
}

Expected: no errors. Verify: Get-VMMigrationNetwork -ComputerName hv01 shows the 10.10.99.0/24 subnet.

Step 4 — Create a small test VM on hv01 (if you don’t have one).

Invoke-Command -ComputerName hv01 -ScriptBlock {
    New-VM -Name 'test-canary' -MemoryStartupBytes 1GB -Generation 2 `
           -NewVHDPath 'D:\VMs\test-canary\test-canary.vhdx' -NewVHDSizeBytes 20GB `
           -SwitchName 'vSwitch-Prod'
    Start-VM -Name 'test-canary'
}

Expected: the VM is created and running. Verify: Get-VM -ComputerName hv01 -Name test-canary shows State = Running.

Step 5 — Shared-nothing live-migrate it hv01 → hv02 and confirm it stayed up. In a second window, start a continuous ping to the VM (or an RDP session) so you can watch it survive.

Move-VM -Name 'test-canary' -ComputerName hv01 -DestinationHost hv02 `
        -IncludeStorage -DestinationStoragePath 'D:\VMs\test-canary'

Expected: the cmdlet completes without error after copying disk + memory. The ping in the other window drops at most one packet. Verify: Get-VM -ComputerName hv02 -Name test-canary now shows it running on hv02.

Step 6 — Enable Hyper-V Replica on the DR host.

Invoke-Command -ComputerName hv-dr01 -ScriptBlock {
    Set-VMReplicationServer -ReplicationEnabled $true `
        -AllowedAuthenticationType Kerberos -KerberosAuthenticationPort 80 `
        -DefaultStorageLocation 'R:\Replicas'
    New-VMReplicationAuthorizationEntry -AllowedPrimaryServer '*.contoso.local' `
        -ReplicaStorageLocation 'R:\Replicas' -TrustGroup 'PrimarySite'
    Enable-NetFirewallRule -DisplayName 'Hyper-V Replica HTTP Listener (TCP-In)'
}

Expected: no errors. Verify: Get-VMReplicationServer -ComputerName hv-dr01 | Select ReplicationEnabled returns True, and Get-NetTCPConnection -LocalPort 80 -State Listen (run on hv-dr01) shows a listener.

Step 7 — Enable replication for the VM and start initial replication.

Enable-VMReplication -VMName 'test-canary' -ComputerName hv02 `
    -ReplicaServerName 'hv-dr01.contoso.local' -ReplicaServerPort 80 `
    -AuthenticationType Kerberos -ReplicationFrequencySec 300 -RecoveryHistory 4
Start-VMInitialReplication -VMName 'test-canary' -ComputerName hv02

Expected: initial replication begins copying the VHDX to hv-dr01. Verify: Measure-VMReplication -ComputerName hv02 shows the VM with State = ReplicationInProgress transitioning to Replicating, and eventually Health = Normal.

Step 8 — Run a test failover and confirm the replica boots.

Start-VMFailover -VMName 'test-canary' -ComputerName hv-dr01 -AsTest
Start-VM -Name 'test-canary - Test' -ComputerName hv-dr01
# ...confirm the isolated test VM boots (console/VMConnect), then clean up:
Stop-VMFailover -VMName 'test-canary' -ComputerName hv-dr01 -AsTest

Expected: an isolated - Test copy boots on hv-dr01 without touching production replication. Verify: the test VM reaches a login screen, and after Stop-VMFailover it’s gone while Measure-VMReplication still shows Normal.

Step 9 — Teardown. Remove the lab artifacts.

# Remove replication, then the test VM and its disks.
Remove-VMReplication -VMName 'test-canary' -ComputerName hv02
Invoke-Command -ComputerName hv-dr01 -ScriptBlock {
    Get-VM -Name 'test-canary' -ErrorAction SilentlyContinue | Remove-VM -Force
    Remove-Item 'R:\Replicas\test-canary' -Recurse -Force -ErrorAction SilentlyContinue
}
Invoke-Command -ComputerName hv02 -ScriptBlock {
    Stop-VM -Name 'test-canary' -Force -ErrorAction SilentlyContinue
    Remove-VM -Name 'test-canary' -Force
    Remove-Item 'D:\VMs\test-canary' -Recurse -Force -ErrorAction SilentlyContinue
}

Expected: no residual VM or replica on any host. Optionally revert the delegation and migration-network changes if this was a throwaway lab.

Common mistakes & troubleshooting

The playbook. Each row is a real failure mode with the symptom you see, the root cause, the exact way to confirm it, and the fix. This is the section to keep open mid-incident.

#	Symptom	Root cause	Confirm (exact command / where to look)	Fix
1	Remote `Move-VM` fails with access denied; works from the console	CredSSP (single-hop) in use, or Kerberos delegation not set	`Get-VMHost -ComputerName hv01 \| Select VirtualMachineMigrationAuthenticationType`	Set Kerberos + configure constrained delegation both directions
2	Delegation configured, migration still fails	Ticket cache hasn’t picked up the AD change	Was the source host rebooted after the `msDS-AllowedToDelegateTo` edit?	Reboot the source host (or wait for ticket renewal)
3	Migration fails at setup: processor error	Destination CPU exposes fewer features than the guest uses	Compare CPU models; `Get-VMProcessor -VMName x \| Select CompatibilityForMigrationEnabled`	Enable `CompatibilityForMigrationEnabled` (VM off)
4	Cannot migrate Intel host ↔ AMD host at all	Cross-vendor live migration is unsupported	Check both hosts’ CPU vendor	Impossible; use quick migration offline or rebuild
5	Migration saturates production/management NIC	`UseAnyNetworkForMigration` still `$true`	`Get-VMHost \| Select UseAnyNetworkForMigration`; `Get-VMMigrationNetwork`	Set `$false`; add only the dedicated migration subnet
6	SMB transport chosen but migration is slow	RDMA silently fell back to plain SMB	`Get-SmbMultichannelConnection`; `*RdmaCapable` = False	Fix `Get-NetAdapterRdma`; enable DCB/PFC on the switch
7	Live migration over SMB fails to start	Port 6600 blocked on the migration subnet	Test 6600 to the destination; check host firewall / ACLs	Open TCP 6600 (and 445) on the migration network
8	Shared-nothing move fails pulling files	`cifs` SPN not delegated (only the migration service was)	`Get-ADComputer <src> -Prop msDS-AllowedToDelegateTo` lacks `cifs/*`	Add both `cifs/<fqdn>` and `cifs/<short>` SPNs
9	Replica health shows `Warning` then `Critical`	WAN can’t keep up; deltas falling behind	`Measure-VMReplication`; check WAN utilization	Lower frequency for bulk VMs; verify the link/listener
10	A `Critical` replica triggers a resync that cripples the WAN	Auto-resync unbounded, running in business hours	`Get-VMReplication \| Select AutoResynchronize*`	Bound resync to an off-peak window
11	Initial replication of a large VM never finishes	Full VHDX copy over a thin WAN	`Measure-VMReplication` stuck in `InitialReplicationInProgress`	Seed from a backup restore on the DR host instead
12	Replica listener not reachable from the primary	Firewall rule disabled, or wrong transport/port	`Get-NetTCPConnection -LocalPort 80 -State Listen` on DR host	Enable the Replica HTTP/HTTPS listener firewall rule
13	Certificate-based replication won’t establish	Cert missing Client or Server Authentication EKU	Inspect the cert’s Enhanced Key Usage	Reissue cert with both EKUs on each host
14	Replica health `Normal` but the VM won’t boot after failover	Never tested; missing driver / corrupt point	Run `Start-VMFailover -AsTest` and try to boot	Fix the guest; schedule quarterly test failovers
15	After unplanned failover, DR VM isn’t protected	Replication not re-established post-failover	`Measure-VMReplication` shows no relationship	Reverse replication and resync to the returned primary
16	Data lost after failover more than expected	Used unplanned failover when the primary was healthy	Which failover type was run?	Use planned failover (flushes final delta) when primary is up
17	Storage migration is painfully slow / contends	Multiple storage moves hitting the same volume	`Get-VMHost \| Select MaximumStorageMigrations`	Serialize per-volume; parallelize only across volumes
18	VM stuck in a half-migrated state after a stall	Migration interrupted (network drop mid-copy)	Check both hosts for the VM; `Get-VM` on each	Cancel/clean up the partial move; retry on a healthy link

Three distinctions that save the most time, because getting them backwards is expensive:

Distinction	The trap	How to tell them apart
Live migration vs node failover	Expecting zero-downtime survival of a crashed host	Live migration = planned, memory pre-copied; failover = crash, guest reboots from CSV state
Planned vs unplanned failover	Running unplanned when the primary was reachable, losing an interval of data	Planned flushes the final delta (zero loss); unplanned does not
`Normal` health vs bootable replica	Trusting green health as proof of recoverability	Only a test failover proves the guest actually boots

Best practices

Use Kerberos constrained delegation, never CredSSP, for any managed estate. Configure it both directions between every host pair, delegate both services (Microsoft Virtual System Migration Service and cifs) in both name forms, and reboot the hosts after the AD edit so the ticket cache refreshes.
Give live migration a dedicated network — UseAnyNetworkForMigration $false and pin it to a subnet not carrying guest or management traffic, so an evacuation never degrades production.
Match the transport to the NICs — SMB (with SMB Direct/RDMA) on fast hardware, Compression on constrained 1 GbE with spare CPU, TCP/IP only when you want neither — and verify RDMA is actually engaging rather than silently falling back.
Enable processor compatibility mode on every VM in a mixed-generation estate so migration never fails at setup on a CPU-feature mismatch — and remember cross-vendor (Intel↔AMD) migration is impossible.
Tier Replica frequency by workload (30s tier-1, 300s general, 900s bulk) rather than leaving everything at the default — frequency is a bandwidth decision.
Bound auto-resync to an off-peak window so a Critical event can never re-hash a disk during business hours, and set a sane MaximumStorageMigrations so storage moves don’t contend.
Seed initial replication for large VMs from a backup restore or out-of-band export — never crawl a multi-terabyte VHDX across a thin WAN.
Keep recovery points and VSS snapshots for anything that can be corrupted — stateless VMs can live at RecoveryHistory 0, but a database wants history and application-consistent snapshots.
Run test failovers on a schedule (quarterly minimum) — Normal health proves data is arriving, not that the VM boots; the untested replica is the one that fails you.
Know your three failover procedures cold and document runbooks — planned (zero loss, primary healthy), unplanned (interval loss, primary gone), test (isolated) — including the Set-VMReplication -Reverse step that must follow every real failover.
Monitor Measure-VMReplication health with alerting, not an occasional manual check, so Warning is caught before it becomes Critical.
Never confuse live migration, clustering failover, and Replica — planned mobility, automatic node-failure survival (a reboot), and cross-site DR respectively. A VM often needs more than one.

Security notes

Migration and replication both move a VM’s entire memory and disk across the network, so the security story is mostly about protecting that data in transit and scoping who can trigger it.

Live migration traffic can be unencrypted by default. Compression and TCP/IP send memory in the clear; SMB with SMB Encryption (or an isolated migration network) protects it. A captured live-migration stream is a captured copy of the running VM’s RAM, secrets included — so encrypt the SMB transport or isolate the subnet on any network you don’t fully trust.
Kerberos constrained delegation is least-privilege by design — keep it that way. It grants the source the right to delegate only the two named services to only the named destinations, unlike unconstrained delegation which trusts the host to impersonate to anything. Never widen it, and audit msDS-AllowedToDelegateTo so it lists only real targets.
Replica over Kerberos/HTTP is unencrypted on the wire — only acceptable over a private/VPN link. Across untrusted networks, use certificate/HTTPS with a mutual-TLS certificate (both Server and Client Authentication EKUs) on each host.
Scope the Replica authorization entry tightly. -AllowedPrimaryServer should name the real primaries (or a specific wildcard like *.contoso.local), not *; use -TrustGroup to isolate tenants.
Protect the replica storage like production data — the replica VHDX is your production VM’s data. Apply the same encryption-at-rest (BitLocker), access control, and backup posture to R:\Replicas as to the primary volumes.
Restrict who can initiate migrations and failovers — scope Hyper-V administrative rights (Hyper-V Administrators, or delegated VMM roles) so an arbitrary user cannot evacuate a host or fail a VM to DR.
Firewall the ports precisely. Open TCP 6600/445 only on the migration subnet, and the Replica listener (80/443) only from the primaries to the replica server.

Cost & sizing

Hyper-V live migration and Replica are features of Windows Server, not separately licensed products — the cost is the infrastructure that makes them fast plus host licensing.

Cost driver	What drives it	Rough magnitude	How to right-size
Migration network NICs	Speed and RDMA capability of the dedicated migration NICs	25 GbE RDMA NICs ≈ ₹25,000–60,000 (~$300–700) per port	Match to VM density and RAM; 10 GbE suffices for small estates
DR host + storage	The replica server must hold copies of every protected VM	A full second copy of protected VMs’ storage	Right-size DR storage to protected-VM footprint + recovery points
WAN bandwidth	Inter-site link carrying Replica deltas	Recurring carrier cost; the usual bottleneck	Tier frequency; seed large VMs; schedule resync
Recovery-point storage	`RecoveryHistory` and VSS snapshots on the replica	Each point adds delta storage on DR	More points for critical VMs only; 0 for stateless
Windows Server licensing	Datacenter vs Standard edition, per host	Datacenter licenses unlimited VMs per host	Datacenter pays off above ~11–14 VMs/host
CPU headroom	Compression transport burns host CPU	Opportunity cost during evacuations	Prefer SMB/RDMA to offload the copy off the CPU

Sizing guidance grounded in the mechanisms:

Migration network throughput scales evacuation time. A host with 512 GB of RAM across its VMs evacuates far faster on 25 GbE RDMA than on 1 GbE — if you patch on a schedule, the migration network bounds your maintenance window. Size it to aggregate RAM per host, not a single VM.
Replica frequency is a WAN-bandwidth budget, not a free dial. 30s on a 2 TB SQL VM can generate more delta than a thin WAN can ship, forcing it perpetually behind. Compute the change rate against the link before promising a tight RPO.
Recovery points multiply DR storage. Reserve deep history for VMs that can be silently corrupted (databases); keep stateless VMs at RecoveryHistory 0.
Windows Server edition governs density economics. Datacenter licenses unlimited Windows guests per host — above roughly a dozen VMs per host it beats stacking Standard licenses. Live migration, Replica, and clustering carry no extra charge on either edition.

Interview & exam questions

Q: What is the difference between live migration and quick migration? Live migration copies a running VM’s memory iteratively and switches execution with a sub-second blackout — essentially no downtime. Quick migration saves state to disk, moves ownership, and restores it, incurring real downtime for the save-and-restore; it survives as a fallback (e.g. incompatible CPUs). Maps to Windows Server / Hyper-V administration certifications.

Q: Why can’t you initiate a live migration remotely when using CredSSP? CredSSP delegates credentials only one hop. From your workstation it’s a double hop — you authenticate to the source, and the source must then authenticate to the destination on your behalf — which CredSSP can’t do. You must be logged on interactively at the source host. Kerberos constrained delegation removes this by trusting the computer account.

Q: Which two services must be delegated for shared-nothing live migration, and why two? Microsoft Virtual System Migration Service (the migration control channel) and cifs (SMB file access, so the destination can pull the VHDX and config files). Shared-nothing moves the disk over the network, which is an SMB file transfer — hence cifs in addition to the migration service. Both are delegated from the source, to the destination’s SPNs, in both name forms.

Q: A team configured Kerberos delegation but migrations still fail. What’s the most likely cause? The hosts’ Kerberos ticket cache hasn’t picked up the AD change to msDS-AllowedToDelegateTo. Delegation edits aren’t effective until the ticket cache reflects the new state — reboot the affected hosts (or wait for ticket renewal, up to ~10 hours). This is the single most common “delegation doesn’t work” cause.

Q: Describe the memory-copy phases of a live migration. Setup (destination validates and allocates); initial memory copy (working set copied while the VM runs — the brownout); iterative pre-copy (pages dirtied during the copy re-sent repeatedly until the dirty set is small); blackout (VM briefly suspended, last dirty pages plus CPU/device state transfer); cleanup (VM resumes on the destination). The blackout is the only true pause and must stay sub-second.

Q: When does live migration only move memory versus memory and disk? Memory-only when the VHDX sits on storage both hosts can see — a Cluster Shared Volume in a cluster, or an SMB 3 file share. Memory and disk (shared-nothing) when the VHDX is on the source host’s local disk and must physically move across the network. The storage topology, not a switch, determines which happens.

Q: What’s the difference between SMB, Compression, and TCP/IP as live-migration performance options? TCP/IP sends memory raw over TCP. Compression compresses pages on the host CPU before sending — good on constrained links with spare CPU. SMB rides SMB 3, enabling SMB Direct (RDMA, offloading the copy to the NIC) and SMB Multichannel (aggregating links) — the fastest and lowest-CPU option on RDMA-capable NICs. Choose based on your network speed and whether you have RDMA.

Q: What’s the RPO of Hyper-V Replica, and why isn’t it zero? Up to one replication interval — 30, 300, or 900 seconds depending on -ReplicationFrequencySec. Replica is asynchronous: it ships a change log on a cadence rather than mirroring every write synchronously, so the replica is always slightly behind. An unplanned failover loses whatever hadn’t shipped yet. Planned failover flushes the final delta first and loses nothing.

Q: Distinguish planned, unplanned, and test failover. Test spins up an isolated copy to verify it boots, disrupting nothing (run it routinely). Planned is for a healthy primary — flushes the final delta before cutover (zero loss), then reverses direction. Unplanned is for a gone primary — no final flush, so you lose up to one interval; run on the replica, optionally to an earlier recovery point.

Q: If a cluster node dies suddenly, do its VMs live-migrate to a survivor? No. Live migration requires pre-copying memory from a running source; a dead host offers none. The cluster fails over the VMs — restarting them on a surviving node from their last-persisted state on the CSV, which is a guest reboot, not a zero-downtime move. Live migration protects planned events; failover protects against sudden node loss.

Q: What is processor compatibility mode and when do you need it? A per-VM setting that presents the guest a lowest-common-denominator CPU feature set, masking newer instructions so the VM can live-migrate between different generations of the same-vendor CPU. You need it in a mixed-generation cluster; enable it (VM off) on affected VMs. It does not enable cross-vendor (Intel↔AMD) migration, which is impossible.

Q: What does a Normal Replica health actually prove, and what does it not? It proves the delta log is arriving on schedule — data is replicating. It does not prove the replica VM will boot: a missing integration driver, a corrupt recovery point, or a config problem can leave a healthy-replicating VM unbootable. Only a test failover proves recoverability, which is why you run one quarterly.

Quick check

Between two standalone Hyper-V hosts with no shared storage, which live-migration type do you use, and what moves?
You edited msDS-AllowedToDelegateTo and delegation still fails. What’s the one step you probably skipped?
Which performance option should you pick on 25 GbE RDMA NICs, and why?
The primary site is healthy and you’re doing planned DC maintenance. Which failover type gives zero data loss?
What does a Normal Replica health not prove, and how do you prove it?

Answers

Shared-nothing live migration; both the VM’s memory and its VHDX move across the network (use Move-VM -IncludeStorage -DestinationStoragePath).
Rebooting the hosts (or waiting for ticket renewal) so the Kerberos ticket cache picks up the new delegation state — the number-one silent failure.
SMB — it enables SMB Direct (RDMA), which offloads the memory copy to the NIC at near-wire-speed and leaves the host CPU almost untouched; compression would waste CPU when bandwidth is already ample.
Planned failover — it flushes the final delta from primary to replica before cutover, so nothing is lost, then reverses direction. (Unplanned failover would lose up to one interval.)
It doesn’t prove the replica VM actually boots — only that data is arriving. Prove bootability by running a test failover (Start-VMFailover -AsTest) and confirming the isolated copy comes up.

Glossary

Live migration — Moving a running VM between hosts with sub-second downtime via iterative memory pre-copy and a final blackout cutover.
Quick migration — Save-state → move → restore-state migration; incurs real downtime; a cluster fallback for cases live migration can’t handle.
Shared-nothing live migration — Live migration between hosts that share no storage; moves memory and the VHDX over the network.
Storage migration — Moving a VM’s VHDX to a new path/volume while the VM keeps running on the same host.
Brownout — The pre-copy phase where the VM keeps running while its memory is copied to the destination.
Blackout — The brief (sub-second) window where the VM is suspended to transfer the last dirty pages and CPU/device state.
CredSSP — Credential Security Support Provider; delegates the user’s credentials one hop, requiring interactive logon at the source to migrate.
Kerberos constrained delegation — Configuring a computer account in AD to delegate specific services to specific targets; enables remote-initiated migration.
msDS-AllowedToDelegateTo — The AD attribute listing the SPNs a computer account is allowed to delegate to (the constrained-delegation target list).
SMB Direct — RDMA transport for SMB 3; offloads the data copy to the NIC hardware for near-wire-speed, low-CPU transfer.
SMB Multichannel — SMB 3 feature aggregating multiple NICs/queues into one logical channel for higher throughput and resilience.
Cluster Shared Volume (CSV) — A volume all cluster nodes can read/write concurrently; because every node sees the VHDX, clustered live migration moves only memory.
Failover clustering — Grouping hosts so VMs restart on a surviving node when one fails (a reboot-based recovery, distinct from live migration).
Hyper-V Replica — Asynchronous change-shipping of a VM to a second copy for disaster recovery, with an RPO of one replication interval.
Hyper-V Replication Log (HRL) — The file tracking a VM’s disk writes between replication cycles; shipped to the replica each interval.
Recovery point — A crash-consistent snapshot retained on the replica so you can fail over to an earlier moment.
VSS snapshot — An application-consistent recovery point (via Volume Shadow Copy Service) for VSS-aware guests like SQL Server and Exchange.
RPO (Recovery Point Objective) — The maximum acceptable data loss; for Replica, up to one -ReplicationFrequencySec interval.
Processor compatibility mode — A per-VM setting masking CPU features to the lowest common denominator so VMs migrate across same-vendor CPU generations.
Test / planned / unplanned failover — Isolated boot test / zero-loss cutover from a healthy primary / interval-loss cutover from a gone primary, respectively.

Next steps

Build the cluster and shared storage this article migrates across in Windows Failover Clustering and Storage Spaces Direct: A Production Build.
Automate node-draining live migrations during patching with Patching Failover Clusters with Cluster-Aware Updating and Stretch Clusters via Storage Replica.
Keep the directory that underpins Kerberos delegation healthy with Active Directory Replication and FSMO Troubleshooting with repadmin and dcdiag.
Enforce host migration settings as code across the fleet using Configuration Management for Windows Server with PowerShell DSC and Ansible.
Harden the hosts and the WSUS pipeline that patches them via Hardening Windows Server and Building a Reliable WSUS Patch Pipeline.