A regional insurer runs roughly 320 VMs on a four-host vSphere 8 cluster, and the backup story is the kind that does not survive an audit: nightly jobs land on a single dedupe appliance in the same rack as production, retention is “whatever fits,” and the last restore test was an informal one nobody wrote down. Then the obvious thing happens to a peer in the sector — a ransomware crew finds the backup console, deletes the restore points, and then encrypts the VMs — and the insurer’s board asks one question: “if that were us, could we recover, and how do we know?” The answer this guide builds is a Veeam Backup & Replication estate that keeps fast local restore points on a performance tier and tiers older points to S3-compatible object storage with an immutability lock the backup admin themselves cannot override — a Scale-Out Backup Repository (SOBR) that gives you the 3-2-1-1-0 rule on paper and a tested restore in practice. Everything below is the build, in order, with the real commands and the real flags.
Prerequisites
- VMware vSphere 8.0+ with a vCenter Server you can reach over 443; an account for Veeam with at least the read/snapshot privileges (a dedicated
veeam-backupvCenter role, not Administrator). - A Veeam Backup & Replication 12.x server (Windows Server 2022, 8 vCPU / 32 GB RAM / 100 GB OS disk is a sane starting point). v12 is the first release that allows a direct-to-object-storage repository, but for VMware you will still front it with a performance tier.
- A performance-tier repository: a Windows or Linux server with fast block storage (NVMe/SAS). For ransomware resilience, use a hardened Linux repository with XFS fast-clone and single-use credentials.
- An S3-compatible object store with Object Lock (compliance mode) support — AWS S3, Wasabi, MinIO, Cloudian, or any vendor that honors S3 Object Lock. You need the bucket, an access key/secret key pair, and the bucket created with versioning + Object Lock enabled before Veeam touches it.
- Network egress from the Veeam gateway servers to the object endpoint (443), and DNS resolution for the endpoint FQDN.
- HashiCorp Vault reachable for secret storage (we pull the S3 keys from Vault, not from a sticky note).
Target topology
The data path is deliberately two-tier. Veeam proxies pull VM data from vSphere over the VMware vStorage APIs for Data Protection (VADP), landing fresh restore points on the Performance Tier (the hardened Linux repository) for fast operational restores. A Scale-Out Backup Repository then groups that performance tier with a Capacity Tier that points at the S3 bucket; a tiering policy moves (or copies) older restore points to object storage, where Object Lock makes them immutable for the retention window. A separate small Archive Tier can target S3 Glacier-class storage for long-hold compliance copies. Identity for the console is federated through Microsoft Entra ID (with Okta as the upstream workforce IdP), secrets live in HashiCorp Vault, Wiz watches the bucket’s posture, CrowdStrike Falcon guards the Windows and Linux servers, Dynatrace watches throughput and the repository fill rate, and ServiceNow is where a failed job or an immutability-change attempt becomes a ticket. The deployment itself is codified in Terraform (cloud-side: bucket, IAM, lock config) and Ansible (server-side: repository hardening), driven from a GitHub Actions pipeline.
1. Prepare and harden the object storage bucket
Create the bucket with versioning and Object Lock enabled at creation time — Object Lock cannot be turned on after the fact on most providers. We do this in Terraform so the lock configuration is reviewed in code, not clicked in a console. A minimal AWS shape:
resource "aws_s3_bucket" "veeam_capacity" {
bucket = "ins-veeam-capacity-prod"
object_lock_enabled = true
}
resource "aws_s3_bucket_versioning" "veeam" {
bucket = aws_s3_bucket.veeam_capacity.id
versioning_configuration { status = "Enabled" }
}
resource "aws_s3_bucket_object_lock_configuration" "veeam" {
bucket = aws_s3_bucket.veeam_capacity.id
rule {
default_retention {
mode = "COMPLIANCE" # GOVERNANCE can be bypassed by a privileged user; COMPLIANCE cannot
days = 30 # Veeam manages the real per-object lock; this is a floor
}
}
}
Grant Veeam a least-privilege policy — list/get/put/delete on the bucket plus the Object Lock actions, and nothing else. Do not hand it s3:*:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"s3:ListBucket", "s3:GetBucketVersioning", "s3:GetBucketObjectLockConfiguration",
"s3:GetObject", "s3:PutObject", "s3:DeleteObject",
"s3:GetObjectVersion", "s3:GetObjectRetention", "s3:PutObjectRetention",
"s3:GetObjectLegalHold", "s3:PutObjectLegalHold"
],
"Resource": [
"arn:aws:s3:::ins-veeam-capacity-prod",
"arn:aws:s3:::ins-veeam-capacity-prod/*"
]
}]
}
Store the resulting access/secret key in HashiCorp Vault rather than typing it into the Veeam wizard from memory:
vault kv put secret/veeam/s3-capacity \
access_key="AKIA..." \
secret_key="wJalr..." \
endpoint="s3.ap-south-1.amazonaws.com" \
bucket="ins-veeam-capacity-prod"
When you add the object repository in Veeam, read the values back at the console (vault kv get -field=access_key secret/veeam/s3-capacity) so the credential’s source of truth is Vault. Finally, point Wiz (or Wiz Code on the Terraform PR) at this bucket so any drift — public access, lock disabled, a policy widened to s3:* — raises a posture finding before it becomes an incident.
2. Install the Veeam server and add the vCenter
Install Veeam Backup & Replication 12.x from the ISO (or run the unattended installer). After the console is up, register vCenter rather than individual ESXi hosts — it scales and survives vMotion:
# Run inside the Veeam PowerShell module on the backup server
Connect-VBRServer -Server localhost
# Add vCenter using a credential whose vCenter role has only backup privileges
$cred = Get-VBRCredentials -Name "veeam-backup@vsphere.local"
Add-VBRvCenter -Name "vcenter.ins.local" -Credentials $cred
Create the dedicated vCenter role (don’t reuse Administrator). The privileges Veeam actually needs are well-documented; at minimum: Virtual machine > Snapshot management > Create/Remove snapshot, Datastore > Browse/Low level file ops, Global > Disable methods/Enable methods/Licenses, and the VADP transport privileges. Pull the service-account password from Vault and feed it to the credential store so it never lands in a script.
3. Build the hardened Linux performance-tier repository
The performance tier is where ransomware resilience starts on-prem. Use a Linux hardened repository with single-use credentials and immutability so that even a compromised Veeam server cannot delete fresh local restore points. On the Linux box (Ubuntu 22.04 / RHEL 9), create a non-root, non-sudo transport user and a dedicated XFS filesystem formatted for fast-clone:
# Create the XFS filesystem with reflink support for Veeam fast clone (synthetic fulls)
sudo mkfs.xfs -m reflink=1,crc=1 /dev/sdb1
sudo mkdir -p /mnt/veeam-repo
sudo mount -o noatime,nodiratime /dev/sdb1 /mnt/veeam-repo
echo '/dev/sdb1 /mnt/veeam-repo xfs noatime,nodiratime 0 0' | sudo tee -a /etc/fstab
# Dedicated repository user with NO sudo and NO ssh key persistence
sudo useradd -m -s /bin/bash veeamrepo
sudo chown -R veeamrepo:veeamrepo /mnt/veeam-repo
In the Veeam console, add this server as a Linux server using single-use credentials (Veeam connects once with the password, deploys its components, then discards the credential — there is no standing SSH access for an attacker to find). When adding the repository, tick “Make recent backups immutable for N days” and set it to your local operational window (e.g. 14 days). This uses the XFS immutable attribute managed by Veeam’s own transport service:
$repoServer = Get-VBRServer -Name "veeam-repo01.ins.local"
Add-VBRBackupRepository -Name "Perf-XFS-Hardened" `
-Type LinuxLocal -Server $repoServer `
-Folder "/mnt/veeam-repo" `
-UseFastCloningOnXFSVolumes:$true `
-EnableXFSImmutability:$true -ImmutabilityPeriod 14
This XFS reflink layer gives you synthetic full backups that consume almost no extra space (block clones, not copies), which is what makes daily synthetic fulls affordable on the performance tier.
4. Add the S3 object store as a capacity-tier repository
Add the bucket as an Object Storage repository. The critical toggle is “Make recent backups immutable” — this is what turns Object Lock into real ransomware protection on the capacity tier:
# Build the connection from the Vault-sourced credential
$s3cred = Add-VBRAmazonAccount -AccessKey (vault kv get -field=access_key secret/veeam/s3-capacity) `
-SecretKey (vault kv get -field=secret_key secret/veeam/s3-capacity)
# Register the S3-compatible repository with Object Lock immutability
Add-VBRAmazonS3CompatibleRepository -Name "Cap-S3-Immutable" `
-Connection (Connect-VBRAmazonS3CompatibleService -Account $s3cred `
-CustomRegionId "ap-south-1" `
-ServicePoint "https://s3.ap-south-1.amazonaws.com") `
-Bucket "ins-veeam-capacity-prod" -Folder "veeam-sobr" `
-EnableBackupImmutability:$true -ImmutabilityPeriod 30
A few flags that matter and that teams get wrong:
-ServicePointmust be the exact endpoint URL with scheme. For MinIO or Cloudian on-prem, this is your internal endpoint and you must also import the CA certificate so TLS validates.- Immutability period of 30 days is the capacity-tier floor; combined with your retention this is what survives an attacker who gets console access. Veeam writes a per-object Object Lock retention that the S3 COMPLIANCE-mode bucket enforces — nobody, including the storage admin, can shorten it.
- For non-AWS providers, use
Add-VBRAmazonS3CompatibleRepository(as above); for genuine AWS S3 you can useAdd-VBRAmazonS3Repository. The wizard equivalents live under Backup Infrastructure > Backup Repositories > Add Repository > Object Storage.
5. Compose the Scale-Out Backup Repository (SOBR)
Now bind the two tiers into a single logical SOBR. The performance extent is the hardened XFS repo; the capacity extent is the immutable S3 repo. The tiering policy is the heart of it:
$perf = Get-VBRBackupRepository -Name "Perf-XFS-Hardened"
$cap = Get-VBRObjectStorageRepository -Name "Cap-S3-Immutable"
Add-VBRScaleOutBackupRepository -Name "SOBR-Prod" `
-PolicyType DataLocality `
-PerformanceExtent $perf `
-EnableCapacityTier:$true -ObjectStorageRepository $cap `
-OperationalRestorePeriod 14 ` # keep 14 days locally for fast restore
-CopyPolicyEnabled:$true ` # COPY new backups to S3 immediately (3-2-1)
-MovePolicyEnabled:$true ` # and MOVE points older than the window off local disk
-EncryptionEnabled:$true `
-EncryptionKey (Get-VBRObjectStorageEncryptionKey -Name "sobr-key")
Understand the two policies because they are not mutually exclusive and you usually want both:
- Copy policy writes every new restore point to object storage as soon as it is created. This is your offsite copy — the “1” of an extra copy and the “1” of immutable/offline — and it means a freshly created backup is already protected the same day.
- Move policy relocates restore points older than the operational restore window off the expensive local tier into object storage, reclaiming performance-tier capacity. Local data is only moved once it has aged out of the fast-restore window you set.
-PolicyType DataLocality keeps all dependent files of a backup chain on the same extent (the right default for a single performance extent); Performance placement striping is for multiple local extents. Encrypt the capacity tier — the encryption key wraps data at rest in object storage, and you escrow that key in Vault, because losing it means losing every offsite restore point.
6. Build the backup job and point it at the SOBR
Create a job that backs up a vSphere folder or tag-based scope (tag-based is better — new VMs in the tag are protected automatically) and targets the SOBR:
# Scope by vSphere tag so new VMs are auto-included
$vms = Find-VBRViEntity -Tags -Name "Backup-Tier1"
Add-VBRViBackupJob -Name "Tier1-Daily" `
-Entity $vms `
-BackupRepository (Get-VBRBackupRepository -ScaleOut -Name "SOBR-Prod")
# Configure retention and a daily synthetic full
$job = Get-VBRJob -Name "Tier1-Daily"
Set-VBRJobAdvancedBackupOptions -Job $job `
-BackupFullMode Synthetic -SynthenticFullSchedule Saturday `
-TransformFullToSyntethic:$true
Set-VBRJobRetentionPolicy -Job $job -Type RestorePoints -RestorePoints 30
# Application-aware processing for transactional VMs (SQL/AD) — quiesce + log truncation
Set-VBRViJobObjectVssOptions -Job $job -VssOptions (
New-VBRJobVssOptions -ForApplicationProcessing -GuestFSIndexingType None)
# Schedule and enable
Set-VBRJobSchedule -Job $job -Daily -At "21:00"
Enable-VBRJobSchedule -Job $job
Notes that pay off later: application-aware processing quiesces SQL Server and Active Directory via VSS so you get transactionally consistent restores and log truncation instead of crash-consistent guesses. Synthetic fulls on XFS are near-free thanks to the reflink fast-clone from step 3. Drive job creation from your IaC where possible — the GitHub Actions pipeline that runs Terraform for the bucket can also invoke an Ansible role that calls these Veeam cmdlets, so the whole estate is reproducible rather than hand-built.
Validation
Do not declare victory because a job went green. Prove the chain end to end.
-
Run the job and confirm the offload copies to S3. Start it, then verify the capacity tier received the restore point:
Start-VBRJob -Job (Get-VBRJob -Name "Tier1-Daily") Get-VBRBackup -Name "Tier1-Daily" | Get-VBRRestorePoint | Select-Object Name, @{n='InCapacityTier';e={$_.IsInCapacityTier}}, CreationTime -
Confirm immutability is real on the object. Veeam reports the lock-until date; cross-check on the storage side:
aws s3api get-object-retention \ --bucket ins-veeam-capacity-prod \ --key veeam-sobr/Clients/.../<object> \ --query 'Retention.{Mode:Mode,Until:RetainUntilDate}' # Expect: Mode=COMPLIANCE, Until ~30 days out. Then prove deletion is blocked: aws s3api delete-object --bucket ins-veeam-capacity-prod --key veeam-sobr/.../<object> # Expected: AccessDenied (WORM) — this is the test that satisfies the auditor -
Restore a real workload, not a test file. Run an Instant VM Recovery from a capacity-tier point to prove you can recover directly from object storage:
$rp = Get-VBRRestorePoint -Name "app-sql01" | Sort-Object CreationTime -Descending | Select-Object -First 1 Start-VBRInstantRecovery -RestorePoint $rp -RestoredVMName "app-sql01-restore-test" -PowerOn -
Run a SureBackup verification job so recoverability is tested automatically and on a schedule — it boots the restored VM in an isolated virtual lab and runs heartbeat/ping/script tests, giving you the “0” in 3-2-1-1-0 (zero errors, verified). Pipe the SureBackup result and the per-job RPO into Dynatrace (or Datadog) so the backup window, throughput, and repository fill-rate are on the same dashboards as production, and a missed window alerts the on-call. Wire a failed SureBackup to auto-open a ServiceNow incident so a silent backup failure becomes a tracked ticket, not a surprise during an actual recovery.
Rollback / teardown
If you need to back this out (lab cleanup, repointing to a different store), order matters — and immutable data cannot be force-deleted until its lock expires, which is the whole point.
# 1. Disable and remove the job (keeps backups on disk)
Disable-VBRJob -Job (Get-VBRJob -Name "Tier1-Daily")
Remove-VBRJob -Job (Get-VBRJob -Name "Tier1-Daily") -Confirm:$false
# 2. Remove the SOBR (must remove extents/job mappings first)
Remove-VBRScaleOutBackupRepository -Repository (Get-VBRBackupRepository -ScaleOut -Name "SOBR-Prod") -Confirm:$false
# 3. Remove the object and performance repositories
Remove-VBRObjectStorageRepository -Repository (Get-VBRObjectStorageRepository -Name "Cap-S3-Immutable") -Confirm:$false
Remove-VBRBackupRepository -Repository (Get-VBRBackupRepository -Name "Perf-XFS-Hardened") -Confirm:$false
The S3 objects under Object Lock remain until their per-object retention expires — you literally cannot delete them early in COMPLIANCE mode, and trying to delete the bucket will fail with BucketNotEmpty / AccessDenied. That is correct and intended; budget for the locked window when you plan a teardown. Only after all locks expire can Terraform destroy the bucket. For the on-prem side, the XFS immutability flag similarly holds files until the period lapses; chattr -i will fail by design.
Common pitfalls
- Object Lock not enabled at bucket creation. You cannot retrofit it on most providers — the bucket has to be born with
object_lock_enabled = true. Recreate it; don’t fight it. - GOVERNANCE instead of COMPLIANCE mode. GOVERNANCE retention can be bypassed by a sufficiently privileged IAM principal — exactly the principal an attacker pivots to. Use COMPLIANCE for true ransomware resilience.
- Clock skew breaks immutability math. Veeam computes lock-until from system time; if the Veeam server or repository drifts, immutability windows are wrong. Pin all servers to the same NTP source.
- Granting Veeam
s3:*. Over-broad keys mean a stolen key can disable lifecycle, versioning, or the lock config. Scope to the policy in step 1 and let Wiz alert on any widening. - Copy vs. move confusion. Teams enable only move and are shocked there is no same-day offsite copy. Enable copy for the immediate offsite protection and move for capacity reclaim.
- TLS failures to on-prem object stores. MinIO/Cloudian with a private CA need the CA cert imported on every Veeam gateway, or offload silently fails on certificate validation.
- Treating green jobs as recovery. A successful backup is not a tested restore. SureBackup + a quarterly real restore is the only proof that counts.
Security notes
Federate the Veeam console to Microsoft Entra ID for SSO and MFA, with Okta as the upstream workforce IdP brokered to Entra, so backup administrators authenticate with the same conditional-access and MFA posture as the rest of the estate — a shared local Veeam admin account is exactly what the peer-insurer attacker used. Keep all credentials — the S3 keys, the vCenter service account, the encryption key — in HashiCorp Vault, read at use time, never embedded in scripts. Run CrowdStrike Falcon sensors on the Veeam server and both repository hosts for runtime threat detection (the backup infrastructure is a prime target and deserves EDR), and run Wiz continuously against the bucket to catch any drift to public access or a disabled lock. The immutability itself is the keystone control: COMPLIANCE-mode Object Lock on the capacity tier plus XFS immutability on the performance tier means an attacker with full console access still cannot destroy the restore points — they can only wait out a window they do not control. Route any attempt to alter immutability, or any failed/auth-anomalous backup, into ServiceNow as an incident so the security team gets a ticket, not a buried log line.
Cost notes
Object storage cost is driven by stored capacity, API request volume, and egress — and Veeam’s tiering is the lever. Keep the operational restore window as short as your RTO allows (14 days here): every day kept on the performance tier is fast-restore convenience but also block storage you pay a premium for, while moving older points to S3 is far cheaper per TB. Use the move policy aggressively to reclaim the expensive local tier, and the archive tier to push long-retention compliance copies to Glacier-class storage where per-TB cost is lowest and retrieval-latency is acceptable for a once-a-year audit pull. Watch egress: Instant Recovery and large restores from the capacity tier pull data back across the wire, so size your DR tests and keep the most-likely-restored recent points local. Enable Veeam’s per-VM backup files and compression so block-clone and dedupe work efficiently, and put the repository fill-rate and monthly object-storage spend on the same Dynatrace dashboard as everything else, so the backup line item is visible to whoever owns the budget rather than a quarterly surprise on the cloud bill.