When ransomware detonates inside an Active Directory forest, the question is not “which DC do I restore.” It is whether you can trust any surviving DC at all. Once a forest is fully compromised every DC is a suspect: SYSVOL GPOs may be weaponized, the schema poisoned, and the attacker almost certainly holds a Golden Ticket forged from krbtgt. Forest recovery is the deliberate teardown and rebuild of the entire forest from known-good backups in an environment the attacker cannot reach. This is the runbook I build and rehearse for that day.
Scope note: this is full forest recovery per the Microsoft AD Forest Recovery Guide — you rebuild the forest from system-state backups in isolation. It is not object-level recovery (AD Recycle Bin,
Restore-ADObject) and not a single-DC rebuild. If you can trust the rest of the forest, you do not need this. If you cannot, nothing short of this is safe.
1. Forest recovery vs object restore: knowing which fight you are in
Reach for object restore when the damage is bounded and the forest is otherwise trustworthy:
- A few thousand objects deleted? Enable the AD Recycle Bin and
Restore-ADObject— you are reanimating, not rebuilding. - One DC corrupted or hardware-dead? Force-remove it, clean metadata, and re-promote a replica. The forest is fine.
Reach for full forest recovery only when one of these is true:
- All DCs are encrypted, wiped, or otherwise non-functional.
- A DC was compromised at the OS level and you cannot prove the attacker did not alter the database, schema, or SYSVOL on every replica.
krbtgtis suspected stolen and Golden Tickets are in play forest-wide.
The trap is treating a forest-wide compromise as an object problem: restore objects into a forest the attacker still controls and you restore into their hands. The decision gate is trust, not blast radius. When in doubt, recover the forest — over-recovering costs a weekend; under-recovering hands the keys back.
The recovery target is one writable DC per domain, restored from backup, fully isolated, cleansed, and used to seed a freshly rebuilt forest. Additional DCs, member servers, and trusts are rebuilt around that seed.
2. Backup strategy: system-state, offline copies, ransomware-proofing
Your recovery is only as good as the backup the attacker could not touch. Three non-negotiables:
Back up system state, not files. You need the NTDS database, SYSVOL, registry, and COM+ class registration — wbadmin captures all of it:
# Scheduled daily system-state backup to a dedicated, ACL-locked volume.
wbadmin start systemstatebackup -backupTarget:E: -quiet
Back up at least two DCs per domain (including a FSMO holder), and keep backups younger than tombstone lifetime — default 180 days. A backup older than that is not restorable as a DC: AD refuses to reanimate it because reviving objects past their tombstone would resurrect them as zombies forest-wide.
Keep offline, immutable copies. Online backups reachable from the domain are exactly what ransomware encrypts first. You need copies the production identity plane cannot reach or delete:
- Azure Backup with a Recovery Services vault that has immutability locked and multi-user authorization (MUA) enabled, so even a compromised backup admin cannot shorten retention or delete recovery points.
- Or write-once media / a pull-based backup host in a separate trust domain that authenticates to the DCs, never the reverse.
# Lock vault immutability so retention cannot be reduced (irreversible once locked).
az backup vault create \
--name rsv-ad-forest-recovery \
--resource-group rg-identity-dr \
--location eastus2 \
--immutability-state Locked
Protect the recovery credentials themselves. Store the DSRM password, FSMO role-holder map, and the runbook offline — printed in a safe, plus a break-glass vault that does not depend on the very AD you are recovering. If the only copy of your procedure authenticates against the dead forest, you do not have a procedure.
Test the backup, not just the backup job. A green backup report proves bytes were written. It does not prove they restore into a bootable DC. Restore one quarterly (Step 8) or you are running on faith.
3. Designing an isolated recovery environment (IRE)
The single biggest reason forest recoveries fail is restoring into a network the attacker still owns. The restored DC must come up air-gapped from production until proven clean. Build the IRE before the incident:
- A dedicated isolated VLAN or VNet with no route to production, no peering, no shared DNS, no VPN. In Azure, a standalone VNet with no peering and an NSG that denies all but intra-subnet traffic; on-prem, a physically separate switch.
- Its own time source. Kerberos dies if clocks skew past five minutes, so the IRE needs an NTP source independent of the production PDC emulator you are rebuilding.
- Standalone DNS inside the IRE. The restored DC hosts AD-integrated DNS; nothing should resolve against production.
- A clean management host — a freshly imaged jump box, not a production admin workstation that may itself be compromised.
- Capacity for the full seed set: one DC per domain plus the staging host.
The contract: nothing the attacker controls reaches the IRE, and nothing in the IRE phones home to production. You lift the gap only once health is validated (Step 7).
4. Recovery sequence: first DC, FSMO seizure, metadata cleanup
This is the core of the runbook. Order matters; do not improvise it live.
4.1 Restore the first writable DC of the forest root. Restore the chosen backup into the IRE and boot it into Directory Services Restore Mode (DSRM) using the DSRM password you stored offline — not normal mode, because you do not want this DC replicating or advertising yet.
For a system-state restore that performs an authoritative SYSVOL restore in one shot:
# Identify the system-state backup version, then restore it.
wbadmin get versions -backupTarget:E:
wbadmin start systemstaterecovery -version:06/01/2026-09:00 -authsysvol -quiet
After restore, but before the DC reaches the network, disable replication so it cannot pull from a (potentially compromised, currently unreachable) partner on first boot:
# Block inbound and outbound replication until the forest is rebuilt and trusted.
repadmin /options localhost +DISABLE_INBOUND_REPL +DISABLE_OUTBOUND_REPL
4.2 Seize the FSMO roles. The original role holders are gone (or untrusted). Seize all five roles onto this restored DC. Seize — do not transfer — because transfer requires the old holder online, which by definition it is not:
# Seize all five FSMO roles onto the restored DC.
Move-ADDirectoryServerOperationMasterRole `
-Identity "DC-RECOVERY-01" `
-OperationMasterRole SchemaMaster,DomainNamingMaster,PDCEmulator,RIDMaster,InfrastructureMaster `
-Force
-Force triggers the seizure path when the holder is unreachable. The classic ntdsutil roles “seize” commands do the same; the PowerShell cmdlet is cleaner and scriptable.
4.3 Clean metadata of every other DC. AD still believes all the dead DCs exist; their NTDS Settings objects, computer accounts, DFSR members, and DNS records are stale and will poison replication. Remove every DC except the one you restored:
# Remove the metadata of a dead/compromised DC from the directory.
ntdsutil
metadata cleanup
connections
connect to server DC-RECOVERY-01
quit
select operation target
list domains
select domain 0
list sites
select site 0
list servers in site
select server 1
quit
remove selected server
quit
quit
Remove-ADDomainController -ForceRemoval does the same on modern Windows Server, but ntdsutil metadata cleanup remains the authoritative scrub when objects are orphaned. Afterward, delete stale DNS records (A, CNAME under _msdcs, SRV) and any lingering NTDS Settings objects in dssite.msm.
4.4 Raise the RID pool to prevent collisions. A restored DC has an older RID pool than the forest had at compromise. New objects could be assigned RIDs the lost DCs already issued, causing duplicate SIDs. Invalidate the current pool:
# Invalidate the current RID pool so the restored DC requests a fresh, higher block.
# (Microsoft's guide also covers raising the rIDAvailablePool value via ADSI if needed.)
ntdsutil
set DSRM password
quit
quit
The Microsoft Forest Recovery Guide details raising
rIDAvailablePooldirectly when the gap is large. Do not skip it: duplicate RIDs are silent until two objects collide months later. Re-read that section against your forest functional level before you rehearse.
5. Resetting krbtgt twice, trust passwords, and DSRM
A restored forest still carries the secrets the attacker stole. Rotate every one while the forest is isolated, before a single production client authenticates.
5.1 Reset krbtgt twice. Golden Tickets are TGTs forged with the krbtgt hash. Resetting krbtgt invalidates them — but the account keeps a password history of N-2, so a single reset leaves the previous key valid and Golden Tickets still work. Reset it twice, allowing replication (or a brief pause in the single-DC case) between resets so the first new password ages out of usable history:
# First krbtgt reset.
$krbtgt = Get-ADUser krbtgt -Properties pwdLastSet
Set-ADAccountPassword -Identity krbtgt -Reset `
-NewPassword (ConvertTo-SecureString (New-Guid).Guid -AsPlainText -Force)
# Wait out replication / key propagation, then reset a SECOND time.
Set-ADAccountPassword -Identity krbtgt -Reset `
-NewPassword (ConvertTo-SecureString (New-Guid).Guid -AsPlainText -Force)
Do this for the krbtgt of every domain. A single reset is the most common, most dangerous mistake in a hurried recovery — it feels done, and it is not.
5.2 Reset trust passwords. Inter-domain and forest trust secrets may also be compromised. Reset the trust password on both sides of each trust:
# Reset the trust secret for a child or trusting domain.
netdom trust child.contoso.com /domain:contoso.com /resetOneSide /passwordT:* /userO:Administrator /passwordO:*
5.3 Reset the DSRM password and privileged accounts. Set a fresh DSRM password, and reset the built-in Administrator plus every Tier-0 account (Domain Admins, Enterprise Admins, elevated service accounts). Assume all are burned:
# Set a new DSRM (Directory Services Restore Mode) password.
ntdsutil
set DSRM password
reset password on server null
quit
quit
quit
Also rotate any gMSA/sMSA secrets used by Tier-0 services. The principle: every credential that existed during compromise is assumed stolen, and nothing leaves the IRE until it is rotated.
6. Rebuilding additional DCs and re-establishing replication
You now have one cleansed, authoritative DC per domain. Do not restore additional DCs from backup — those backups carry the same compromise. Build fresh and replicate from the clean seed.
- Stand up new Windows Server VMs in the IRE — fully patched, hardened baseline.
- Promote each as a replica, pulling its entire database from the clean DC:
# Promote a fresh server as an additional DC, replicating from the cleansed seed.
Install-ADDSDomainController `
-DomainName "contoso.com" `
-ReplicationSourceDC "DC-RECOVERY-01.contoso.com" `
-InstallDns `
-SiteName "RecoverySite" `
-SafeModeAdministratorPassword (Read-Host -AsSecureString) `
-Force
- Once you have enough DCs for redundancy and confidence is high, re-enable replication on the seed DC and let convergence happen inside the IRE:
# Re-enable replication once additional clean DCs exist.
repadmin /options localhost -DISABLE_INBOUND_REPL -DISABLE_OUTBOUND_REPL
repadmin /syncall /AdeP
- Redistribute FSMO roles to their intended permanent holders, and reconfigure Sites and Services subnets/site links to match the target topology you will cut back to.
Only after health validation (next section) do you lift the air gap — reconnecting on cleansed segments, reimaging or rejoining member servers and workstations (their machine secrets are also suspect), and rebuilding external trusts.
7. Verify
Do not declare recovery complete on vibes. Run an objective health gate before reintroducing the forest.
Replication and topology — look for any failures or large queues:
# Full replication health report; look for any failures or large queues.
repadmin /replsummary
repadmin /showrepl * /csv | ConvertFrom-Csv | Where-Object { $_.'Number of Failures' -gt 0 }
# Comprehensive DC diagnostics across all tests.
dcdiag /v /c /e
DNS: confirm the _msdcs zone, SRV records, and AD-integrated zones are healthy and that every DC registered its records:
dcdiag /test:dns /v /e
SYSVOL / DFSR: verify the chosen seed (PDC emulator) became the authoritative primary for DFSR and that GPOs replicated. If SYSVOL did not converge, set the seed’s DFSR subscription object msDFSR-Options to 1 (authoritative primary) and the others to non-authoritative, then restart the DFSR service:
# Confirm SYSVOL is shared and DFSR is healthy.
dfsrmig /getmigrationstate
Get-WmiObject -Namespace "root\microsoftdfs" -Class dfsrreplicatedfolderinfo |
Select-Object ReplicatedFolderName, State
Authentication smoke tests: prove Kerberos and the rebuilt secrets work end to end:
# Issue a fresh TGT from a clean client and confirm against the recovered DC.
klist purge
nltest /sc_verify:contoso.com
# Then: log on with a privileged account, create+delete a test object, run a GPO update.
gpupdate /force
Every one must be green before the gap comes down. A forest that replicates but cannot issue tickets, or shares SYSVOL but serves a poisoned GPO, is not recovered.
Enterprise scenario
A global manufacturer with a single-forest, three-domain AD (empty forest root, two regional child domains) took a ransomware hit that encrypted every DC across both regions inside forty minutes. Their backups were Azure Backup recovery points in a Recovery Services vault — but the vault’s backup contributor role was held by the same Tier-0 service account the attacker had compromised, and the attacker had already issued delete requests against the recovery points.
The save was multi-user authorization (MUA), enabled six months earlier for exactly this threat model. MUA puts deletion and retention-reduction operations behind a separate Azure RBAC principal protected by a resource guard. The compromised account could request deletion; it could not approve it.
# The resource guard that blocked the attacker's delete requests.
az dataprotection resource-guard create \
--name rg-guard-ad-dr \
--resource-group rg-identity-security \
--location eastus2
# Vault operations like deleteRACPoint and updatePolicy now require approval
# from a principal with access to the resource guard, in a separate scope.
With recovery points intact, they executed the runbook: restored the forest root’s PDC-emulator DC into a pre-built isolated VNet (no peering, standalone DNS, dedicated NTP), seized FSMO, cleaned metadata for the five dead DCs, reset krbtgt twice per domain, and rebuilt eight fresh replicas from the cleansed seeds. The constraint that nearly broke them was time: their stated RTO was 8 hours, but metadata cleanup and the double krbtgt reset with replication waits had never been timed. The first real run took 14 hours. They re-baselined the RTO to 12 hours, scripted the metadata cleanup, and added a quarterly one-DC restore test so the wait windows were known, not discovered. The lesson: the steps were correct on paper, but an untested RTO is a guess, and the gap between “we have backups” and “we are running again” is measured in rehearsals you actually performed.
8. Tabletop, runbook maintenance, and RTO
A forest recovery runbook decays the moment AD changes. Keep it alive:
- Quarterly restore test. Restore one DC into the IRE from real backups every quarter. This validates the backup and times the slow steps (metadata cleanup, krbtgt waits, DFSR convergence) so your RTO is measured, not aspirational.
- Annual full tabletop. Walk the entire recovery with the actual on-call team, FSMO map in hand, offline runbook open. Inject realistic constraints: “the backup admin’s account is compromised,” “the PDC emulator backup is corrupt, fail to the secondary.”
- Refresh the artifacts that drift. The FSMO map, DC inventory, DSRM passwords, trust list, and tombstone-lifetime value all change. Re-print and re-vault them on every material AD change.
- Set a defensible RTO/RPO. RPO is bounded by backup frequency (daily system state = up to 24h loss). RTO is whatever your timed rehearsal proves, with margin. Publish both; do not let leadership assume four hours when the rehearsal says twelve.
Recovery readiness checklist
The forest you can recover is the one you have already recovered — in a lab, on a Tuesday, with a stopwatch running. Build the IRE, lock the backups, rehearse the slow steps, and the worst day becomes a long shift instead of an extinction event.