Architecture GCP

GCP Enterprise Architecture: Migration to Google Cloud

Every data-center exit dies the same two deaths. The first is starting to move before anyone has measured what is actually there — so the team discovers, in production cutover week, that the “stateless web tier” shares an NFS mount with three other apps and a forgotten Oracle box nobody has the password for. The second is treating the cloud as a colo with someone else’s hypervisor — lifting 400 VMs unchanged into a single flat VPC with public IPs and Owner everywhere, then spending the next two years and the entire savings case un-doing it. A real migration architecture refuses both. It begins with machine-level evidence of the estate, lands every workload inside a governed foundation that already exists, and then routes each application down one of two engineered paths — rehost for the ones that should move as-is, replatform to containers for the ones worth modernizing in flight — with a repeatable wave engine, a tested cutover, and a rollback that actually works. This article builds that end to end with Migration Center, Migrate to Virtual Machines (M4VM), and Migrate to Containers (M2C), sitting on top of a Google Cloud landing zone.

This is the deployable reference, not a slide about the “6 Rs.” It follows the shape of the major architecture centers — the scenario, the end-to-end data and control path, a component-by-component breakdown, the concrete services and IaC wiring, the enterprise concerns, a named worked example with real numbers, and an honest section on when not to do it this way. The landing zone itself (organization hierarchy, Shared VPC, VPC Service Controls, Security Command Center) is a prerequisite here, not the subject; I reference it where the migration plugs in and spend the depth on the migration machine.

The business scenario

The trigger is almost never “we want to be cloud-native.” It is a date and a number. A data-center lease ends in 18 months and renewal is a seven-figure capital commitment. A VMware renewal quote arrived and the licensing math changed overnight. A hardware refresh cycle is due and finance would rather not buy another rack. Whatever the spark, three companies feel the same forces, and the dial settings are all that differ.

All three share four root problems. No ground truth — the CMDB is stale, dependencies are tribal knowledge, and right-sizing is a guess, so every estimate is fiction. A binary false choice — the org is told it must pick “lift-and-shift everything” (cheap, fast, but you carry your technical debt forward) or “rewrite cloud-native” (great end state, but slow and expensive and it stalls the data-center exit). An ungoverned destination — if VMs land wherever, with public IPs and flat networking, the migration creates the mess the next project has to clean up. And no repeatable mechanics — moving five VMs by hand is fine; moving five hundred by hand is how you miss a cutover window and a dependency at the same time.

The design goal is precise: measure first, land governed, and offer two engineered paths, not one. Every server is assessed with machine-level data before a decision is made. Every workload lands inside the existing landing zone — the right project, the Shared VPC subnet, no public IP, least-privilege IAM — by construction. Apps that should move as-is take the Migrate to VMs rehost path; apps worth modernizing in flight take the Migrate to Containers replatform path onto GKE. And the whole thing runs as a wave engine: assess → group → plan → replicate → test-clone → cutover → validate → decommission, repeated until the source is empty.

Architecture overview

GCP enterprise migration reference architecture: an on-prem discovery collector and Migrate Connector replicate over Cloud Interconnect/HA VPN into Migration Center, which sequences move groups into waves that flow down two engineered paths — Migrate to VMs (staging → test-clone → cutover → right-sized Compute Engine) and Migrate to Containers (artifacts → Artifact Registry → GKE → Cloud SQL) — all landing inside a governed landing zone with a VPC Service Controls perimeter and a Cloud Operations + Security Command Center observation plane.

The migration is organized on two axes: a decision axis — assessment data flowing into a portfolio of grouped, wave-sequenced applications — crossed with a movement axis — two parallel migration pipelines (rehost and replatform) that both terminate inside the landing zone. Evidence flows in, decisions flow down into waves, and bytes flow through one of two engineered paths into a destination that is already governed. Get the assessment honest and the landing zone real, and the rest is throughput.

The assessment path — earning the right to move. Before any data moves, Migration Center builds ground truth. You feed it the estate three ways, often all three: a discovery client (a lightweight collector) installed against vCenter or running agentlessly that streams per-VM utilization (CPU, memory, disk IOPS, network) over weeks, not a single-day snapshot; guest-level collection for deeper OS/software inventory; and bulk file/RVTools or CSV import for environments where you cannot install anything. Migration Center then does three things you cannot eyeball: it right-sizes each machine against real utilization (the 32-vCPU box that idles at 3 vCPUs becomes an e2-standard-4, not an n2-standard-32), it builds a total cost of ownership comparison against your current spend, and — critically — it discovers dependencies from observed network connections so the “stateless web tier” reveals the database, the NFS mount, and the license server it actually talks to. The output is not a number; it is a portfolio: every server tagged with a right-sized target, a cost, a confidence, and a dependency graph. That graph is what lets you draw move groups — sets of machines that must move together because they talk to each other — and sequence them into waves.

The destination — a foundation that already exists. Nothing lands in a vacuum. The landing zone is assumed in place: an organization → folders → projects hierarchy, a Shared VPC per environment owned by the network team, VPC Service Controls perimeters around regulated data, Cloud Interconnect or HA VPN back to the data center, and Organization Policy constraints that forbid external IPs and enforce CMEK. Migrated workloads are consumers of this: a rehosted VM draws its IP from a host-project subnet, has no public IP (the policy refuses one), and egresses through Cloud NAT; a containerized workload lands on a GKE cluster in a service project attached to the same Shared VPC. The migration does not get to invent networking or identity — it inherits them. (If the foundation does not exist yet, that is a separate prerequisite project; see the landing-zone reference.)

Path A — rehost with Migrate to Virtual Machines. For workloads moving as-is, M4VM runs an agentless, block-level replication from the source hypervisor into Google Cloud. You deploy a Migrate Connector appliance next to vCenter; it continuously replicates each source VM’s disks into managed staging in your project over the Interconnect/VPN — while the source keeps running. Initial replication seeds the full disk; after that, only changed blocks ship, so the delta to “ready to cut over” stays small. When a wave is ready, you first launch test clones: isolated copies of the target VMs booted in a fenced subnet so the app team can validate the application on Compute Engine before touching production — login works, the database connects, the batch job runs. When validation passes, cutover is a short, scheduled event: a final incremental sync, the source is stopped, and the target Compute Engine instances boot from the replicated disks with the right machine type, the right subnet, no public IP, and disk CMEK. M4VM handles the OS adaptation (drivers, agents, boot config) so the booted VM is a working Google Cloud instance, not a raw disk image.

Path B — replatform with Migrate to Containers. For workloads worth modernizing in flight, M2C does something M4VM does not: it extracts the application from the VM and produces container artifacts — a Dockerfile/image plus Kubernetes manifests (Deployment, Service, ConfigMap, persistent-volume claims, and the discovered runtime config) — that you deploy to GKE. It works best on the right candidates: Linux web/app servers (Tomcat, JBoss, Apache, IIS-on-Windows in some cases) where the app is the unit of value and the VM is just packaging. The generated artifacts are a starting point you own, not a black box: they land in your Git repo, get reviewed, get wired into your existing CI/CD and Artifact Registry, and deploy through the same GitOps pipeline as your cloud-native services. The result is the application running as pods on GKE — stateless, horizontally scalable, patched by replacing images — while the heavy stateful pieces it depended on (a database) are pointed at a managed service (Cloud SQL) rather than carried along.

The control and observation plane, over everything. Migration Center remains the system of record for the whole program: which servers are assessed, grouped, in-flight, cut over, and decommissioned, with the running TCO. Every migrated workload immediately inherits the foundation’s observability — Cloud Logging/Monitoring from the first boot, audit logs to the central sink, findings into Security Command Center — so a freshly cut-over VM is governed and observed on day one, not retrofitted later.

The diagram in words. On the left, an on-prem boundary: a VMware cluster and some bare metal, with two appliances sitting beside it — a Migration Center discovery collector (dotted lines reaching every VM, labeled “utilization + dependencies”) and a Migrate Connector (a thick pipe labeled “block-level replication”). Both pipes cross a single labeled link — Cloud Interconnect / HA VPN — into the Google Cloud boundary. Inside the boundary, top-center, a Migration Center box holding a portfolio table and a dependency graph, with arrows fanning down into waves. From the waves, the flow splits into two lanes. The upper lane (Migrate to VMs) shows staging disks → test-clone VMs in a fenced subnet → a “cutover” gate → Compute Engine instances sitting in a Shared-VPC subnet (each drawn with “no public IP” and a small CMEK lock). The lower lane (Migrate to Containers) shows a VM icon transforming into a container-artifacts bundle → a Git repo → Artifact Registry → pods on a GKE cluster in the same Shared VPC, with a dotted line from the pods to a Cloud SQL instance (“DB → managed”). Wrapping both lanes: the Shared VPC host project (Cloud NAT, firewall, the Interconnect attachment), a VPC Service Controls perimeter dotted around the projects holding regulated data, and a thin telemetry layer feeding Cloud Operations and Security Command Center. A faint “source” box on the left fades out, labeled “decommission,” to close the loop.

Component breakdown

Each component earns its place by removing a specific failure mode rather than adding a feature.

Component Role in this architecture Key configuration choices
Migration Center The assessment and program system-of-record: discovers the estate, right-sizes against real utilization, builds TCO, maps dependencies, and tracks every server through its lifecycle. Run the discovery client / collector against vCenter for weeks (peak + steady-state), not a one-day snapshot. Layer in guest-level collection for software inventory; bulk-import the long tail via CSV/RVTools. Use dependency maps to draw move groups; export the right-sized targets to feed M4VM. Treat its portfolio as the single source of truth for wave status.
Migrate to Virtual Machines (M4VM) The rehost pipeline: agentless block-level replication of source VMs into Compute Engine with OS adaptation, test clones, and a short cutover. Deploy the Migrate Connector near the source hypervisor; replicate over Interconnect/VPN, not the public internet. Always test-clone into a fenced subnet before cutover. Apply the Migration Center right-sized machine type at cutover (don’t lift the over-provisioned size). Target subnet in the Shared VPC, no external IP, boot disk CMEK. Schedule cutover in a maintenance window; keep the source stopped-but-intact for rollback.
Migrate to Containers (M2C) The replatform pipeline: extracts the application from a VM and emits container images + Kubernetes manifests for GKE, decoupling app from machine. Use only on good candidates (Linux web/app tiers; stateless or externalizable state). Review the generated Dockerfile/manifests as a starting point — wire them into your repo, CI, and Artifact Registry, don’t deploy blind. Re-point persistent state to a managed service (Cloud SQL/Memorystore) rather than carrying volumes. Deploy via your existing GitOps/Binary Authorization path so migrated workloads obey the same supply-chain rules.
Landing zone (prerequisite) The governed destination: resource hierarchy, Shared VPC, Org Policy guardrails, VPC-SC perimeters, central logging, SCC. Migrations land into it. Must exist before bulk waves. Workloads are service-project consumers of the Shared VPC (compute.networkUser). Org Policy: no external IP, disable SA key creation, CMEK required, OS Login on. Regulated apps land inside a VPC-SC perimeter.
Hybrid connectivity The replication and steady-state data path between data center and Google Cloud. Dedicated/Partner Interconnect for large estates (predictable bandwidth for replication seed + steady traffic); HA VPN for smaller ones or as backup. Size for the initial seed (terabytes of disk) and the cutover delta, then steady app traffic. Terminate in the host project.
Compute Engine The destination for rehosted VMs: managed IaaS with right-sized machine families, CMEK disks, and managed-instance-group options post-migration. Land on E2/N2/N2D/C3 families matched to the assessed profile. Enable OS Login, Shielded VM, Ops Agent. Post-cutover, consider committed-use discounts once the steady size is known, and regional MIGs for the tiers that should become horizontally scalable.
GKE The destination for replatformed (containerized) workloads. Autopilot (or standard) cluster in a service project on the Shared VPC. Migrated pods deploy through the same Artifact Registry + CI/CD + Binary Authorization path as native services. Right-size pod requests from the assessment data.
Cloud SQL / managed data services The destination for stateful pieces that should not be carried as VM disks. Migrate databases with Database Migration Service (continuous, minimal-downtime) rather than lifting the DB VM. Private IP only, inside the perimeter. Frees the app tier to be stateless and elastic.
Cloud Operations + Security Command Center The observation plane: every migrated workload is logged, monitored, and security-scanned from first boot. Ops Agent on Compute Engine; native telemetry on GKE. Audit logs to the central sink. SCC Security Health Analytics catches a misconfigured migrated VM (public IP slip, open firewall) immediately.

Three of these choices deserve emphasis, because they are where migration programs most often go wrong.

Assess for weeks, decide from data, and right-size down. The single biggest waste in cloud migration is lifting the allocated size instead of the used size. On-prem VMs are over-provisioned by habit — someone asked for 16 vCPU “to be safe” and it idles at two. If you replicate that allocation into Compute Engine, you pay for the habit forever. Migration Center’s multi-week utilization data exists precisely so the target is an e2-standard-2, not an n2-standard-16. Decide machine types from the 95th-percentile utilization, not the spec sheet — and revisit again after a month of real cloud data with committed-use discounts. This is where the savings case is won or lost, and it is won at assessment, before a single byte moves.

Test-clone is not optional; it is the whole safety model. The difference between M4VM and “copy a disk image and pray” is the test clone: a fully booted copy of the target VM, in an isolated network, that the application team validates before the production cutover. This is what catches the hard-coded IP, the license bound to a MAC address, the agent that phones a server that no longer exists. Skipping the test clone to “save time” is how you turn a 30-minute cutover window into a four-hour rollback at 2 a.m. Budget a test-clone validation pass for every move group, owned by the app team, with a sign-off before cutover is scheduled.

Two paths, chosen per app, not per program. The architecture’s core idea is refusing the binary. You do not pick “rehost everything” or “rewrite everything” for the whole estate — you pick per application, from the assessment. The over-provisioned Windows LOB app nobody will touch again: rehost with M4VM and move on. The stateless Java product tier that is one Dockerfile away from cloud-native: replatform with M2C onto GKE in the same wave. The Oracle on AIX that is genuinely going away: don’t migrate it at all — retire or replace it. Mixing the paths within one program, governed by one assessment and one wave engine, is what lets you exit the data center on time and not carry every piece of debt forward.

Implementation guidance

Concrete service mapping. The program uses Migration Center for discovery, assessment (right-sizing + TCO), dependency mapping, and lifecycle tracking; Migrate to Virtual Machines (with one or more Migrate Connector appliances) for the rehost pipeline into Compute Engine; Migrate to Containers for the replatform pipeline into GKE (artifacts flowing through Artifact Registry and your CI/CD); Database Migration Service for stateful databases moving to Cloud SQL; Cloud Interconnect / HA VPN for the data path; and the existing Shared VPC, Organization Policy, VPC Service Controls, Cloud Operations, and Security Command Center for the governed destination. Migration Center’s groups become your waves.

IaC and the boundary of automation. A migration has two halves with very different automation models, and conflating them is a classic mistake. The destination is declarative IaC; the movement is imperative orchestration. Build the landing-zone scaffolding each wave lands into with Terraform — the service project, its attachment to the Shared VPC (google_compute_shared_vpc_service_project), the subnet grants, firewall rules, the Compute Engine targets’ boot config, the GKE cluster, the Cloud SQL instance, IAM bindings, and budgets. That is the part you want reproducible, reviewable, and identical across waves; a wave is “stamp this Terraform with these parameters.” The replication and cutover themselves — connector setup, replication scheduling, test-clone launch, cutover trigger — are driven through the M4VM/M2C control plane and gcloud (gcloud migration vms ...), because they are stateful, long-running operations against a live source, not declarative resources. The clean seam: Terraform builds the empty, governed shell; the migration tooling fills it; and once a VM is cut over, you can import it into Terraform state so the running estate is managed declaratively going forward.

A representative wave-shell skeleton (illustrative):

# Per-wave service project, attached to the env Shared VPC, no external IPs
module "wave_project" {
  source          = "terraform-google-modules/project-factory/google"
  name            = "wave-07-orders-prod"
  folder_id       = var.production_folder_id
  billing_account = var.billing_account
  shared_vpc      = var.prod_host_project          # consume, don't own
  shared_vpc_subnets = [var.orders_subnet_self_link]
  activate_apis   = ["compute.googleapis.com", "vmmigration.googleapis.com",
                     "container.googleapis.com", "sqladmin.googleapis.com"]
}

# Org Policy is inherited from the folder — external IPs are already denied,
# SA-key creation disabled, CMEK required. The wave cannot loosen them.

Networking wiring. Replication must not traverse the public internet at any meaningful scale — point the Migrate Connector at Google Cloud over the Interconnect/VPN, and size that link for the initial seed (full disks, terabytes) plus the cutover delta, separate from steady-state app traffic. Migrated VMs land in Shared VPC subnets in the host project; they get no external IP (Org Policy enforces it) and egress through Cloud NAT; access to Google APIs goes via Private Google Access / Private Service Connect. Test clones launch into a fenced/isolated subnet with firewall rules that prevent them from talking to production (so a validation clone cannot accidentally write to the live database). For replatformed workloads, the GKE cluster is VPC-native on a Shared VPC subnet, and its pods reach Cloud SQL over private IP.

Identity wiring. The migration’s own identity is a least-privilege service account for the M4VM/M2C control plane and the Terraform pipeline — not a human’s Owner. On the destination, migrated VMs use OS Login (SSH governed by IAM, no shared keys) and run with a dedicated, minimal service account, never the default Compute Engine SA with Editor. Replatformed pods use Workload Identity so they reach Cloud SQL and Secret Manager with short-lived tokens and no static keys. The principle is that the migration must land into least privilege, not promise to tighten it later — because “later” never comes.

Enterprise considerations

Security and Zero Trust. The non-negotiable rule of a secure migration is that it must not lower the bar. Because workloads land inside the existing foundation, they inherit it: Org Policy denies external IPs and SA-key creation by construction, disks are CMEK, SSH is OS Login, and regulated apps land inside a VPC Service Controls perimeter so even a correctly-authorized token can’t exfiltrate their data across a project boundary. East-west is governed by hierarchical firewall policies in the host project, not per-VM rules a migration script forgot. The replatform path goes further: containerized workloads deploy through the same Artifact Registry + Binary Authorization supply chain as native services, so a migrated app cannot run an unsigned image. And Security Command Center is watching from first boot — if a migrated VM slips out with a public IP or an over-broad firewall rule, Security Health Analytics flags it the same day rather than a year later in a pen test. Zero Trust here means the migrated estate is identity-and-policy governed on day one, not “we’ll secure it in phase 2.”

Cost optimization (FinOps). The savings case lives and dies at three moments. At assessment, right-size down from real utilization — this is the largest lever and it is pulled before anything moves. At cutover, land on the assessed machine type, not the on-prem allocation. At steady state (after ~30 days of real cloud telemetry), apply committed-use discounts for the predictable baseline, switch dev/test tiers to stop-on-schedule, and move the heaviest stateful pieces to managed services (a Cloud SQL with right-sized tiers beats a perpetually-on DB VM). Track it all against the Migration Center TCO baseline so “we saved money” is a measured claim, not a hope, and tag every wave project so the budget alert is per-application. The classic anti-pattern is celebrating the data-center exit while the cloud bill quietly equals the old run-rate because everything was lifted at full allocation — the assessment discipline is the cure.

Scalability. Two kinds. The program scales through the wave engine: assessment, grouping, the Terraform shell, and the M4VM/M2C pipelines are the same for wave 7 and wave 70, so a 3,000-VM enterprise runs the same mechanics as a 120-VM manufacturer, just more iterations with more parallel connectors. The workloads scale only if you let them: a rehosted VM is exactly as elastic as it was on-prem (i.e., not), which is fine for the apps that don’t need to be — but the replatformed path is where elasticity is gained, turning a fixed app tier into horizontally-scaling pods. Choosing M2C for the tiers that will actually benefit from scaling, and M4VM for the ones that won’t, is how the architecture buys elasticity where it pays and skips the cost where it doesn’t.

Reliability and DR (RTO/RPO). During the migration, the safety model is the test clone plus a reversible cutover: the source stays stopped-but-intact through a stabilization window, so rollback is “restart the source” — a real RTO measured in minutes, not a heroic rebuild. For steady-state DR after migration, the rehosted estate inherits the foundation’s regional design: spread Compute Engine across zones in a regional MIG, snapshot disks on a schedule (RPO = snapshot interval), and for tier-1 apps, replicate to a second region. Databases that moved to Cloud SQL get its HA (regional) and cross-region replicas, turning RPO/RTO from “whatever the old tape backup gave us” into a configured SLA. The migration is also the cheapest moment to fix a workload’s DR posture — you are rebuilding its runtime anyway.

Observability. Every migrated workload is observable from first boot: the Ops Agent on Compute Engine ships metrics and logs; GKE is natively instrumented; audit logs flow to the central sink; and the four golden signals are available without a separate monitoring project. Crucially, Migration Center is the program-level observability — the dashboard that answers “how many servers assessed / grouped / replicating / cut over / decommissioned, and what’s the TCO” — which is the view executives actually ask for. The two together mean you can see both the program (are we on track to exit the data center) and the workloads (is the migrated order system healthy) in one place each.

Governance. The wave engine is the governance model. Because every wave lands through Terraform into a folder that already carries Org Policy, governance is inherited, not re-litigated per app: naming, network, identity, CMEK, and perimeter membership are properties of where the workload lands, which a wave cannot change. Migration Center provides the audit trail of decisions (assessed → grouped → path chosen → cut over → decommissioned). And the source-decommission step is a governed gate, not an afterthought — a server isn’t “done” until it is validated in the cloud and retired at the source, which is what actually ends the lease and the run-rate.

Reference enterprise example

MeridianWeave Logistics runs freight-brokerage and warehouse-management software for North American shippers. They have two data centers (primary in Dallas, DR in Columbus), ~520 VMs across VMware, and a hard deadline: the Dallas lease ends in 14 months and renewal would cost $1.4M/year plus a hardware refresh. The estate is mixed: ~60 Windows .NET LOB and back-office apps, a stateless Java warehouse-management tier (their crown jewel, ~40 VMs across web/app), a SQL Server cluster and a couple of PostgreSQL databases, a tail of ~180 “unknown” machines, and a handful of legacy boxes on an old Solaris pair.

Assessment (weeks 1-8). They stand up the existing landing zone first (org hierarchy, three Shared VPCs, VPC-SC around the data projects, Dedicated Interconnect to Dallas). In parallel, they deploy a Migration Center discovery collector against vCenter and let it run for six weeks. The data is sobering and clarifying: average VM CPU utilization is 8%; the “180 unknown” machines turn out to be 90 genuinely-idle dev/test boxes (kill them), 50 duplicates of a retired app (kill them), and 40 that map — via the dependency graph — into the warehouse-management cluster nobody had documented. Right-sizing against real utilization collapses the compute footprint by ~55%: the 32-vCPU app servers become e2-standard-4s. Migration Center’s TCO projects steady-state Google Cloud spend at ~$31k/month versus the ~$118k/month all-in cost of the Dallas DC — but only if they land right-sized, not lifted.

Path decisions, from the data. They refuse the binary. The 60 Windows LOB apps and back-office VMs: rehost with M4VM — nobody will modernize them and they don’t need to scale. The stateless Java warehouse-management tier (now 40 VMs): replatform with M2C onto GKE Autopilot, because it is already twelve-factor-ish and will benefit from elastic scaling during peak shipping seasons. The SQL Server and PostgreSQL databases: migrate to Cloud SQL via Database Migration Service (continuous replication, minimal-downtime cutover), not as VM disks. The Solaris pair: it runs an EOL app being replaced by SaaS in Q3 — retire, don’t migrate. The ~140 idle/duplicate machines: decommissioned at the source, zero migration effort.

Waves and movement (months 3-13). Migration Center’s dependency graph yields nine move groups, sequenced into seven waves, lowest-risk first. Each wave is a Terraform-stamped service project on the production Shared VPC (no external IPs, CMEK, OS Login — all inherited). For the rehost waves, M4VM replicates disks over the Interconnect for days while the source runs; the app team test-clones each group into a fenced subnet and signs off; cutover is a scheduled 45-minute window with the source left stopped-but-intact for a one-week rollback buffer. The Java tier’s M2C path runs alongside: M2C emits Dockerfiles and manifests, the platform team reviews and wires them into the existing Artifact Registry + Cloud Build + Config Sync pipeline (so the migrated app obeys Binary Authorization like everything else), re-points it at the new Cloud SQL instance, and deploys to GKE — the tier comes up as autoscaling pods. Databases cut over via DMS with <10 minutes of write-downtime each.

Outcome. Dallas is emptied in 11 months, three months ahead of the lease deadline; Columbus DR is collapsed into a second Google Cloud region. Right-sizing holds: steady-state spend lands at ~$33k/month (close to the modeled $31k), against $118k on-prem — a ~72% run-rate reduction, before committed-use discounts, which they apply after month one to shave another ~18% off the baseline compute. The warehouse-management tier, now on GKE, auto-scales through the holiday freight peak that used to require pre-provisioning a rack — and during the November surge it scales to 3.4× baseline and back down, paying only for the peak hours. Crucially, the migration didn’t create a mess: every workload landed governed, no VM has a public IP, the regulated shipping data sits inside a VPC-SC perimeter, and Security Command Center showed zero new high-severity misconfigurations across the whole estate. The two-person-team fear never materialized, because the rehosted VMs are just VMs and only the one tier they chose to modernize required new skills.

When to use it

This two-path, assessment-first architecture is the right call when you have a portfolio to move (dozens to thousands of VMs), a deadline or cost trigger (a lease, a refresh, a licensing change), and a mix of workloads — some worth modernizing, most not. It is purpose-built for the data-center exit that must happen on time without either carrying all your debt forward or stalling on a full rewrite. The wave engine is what makes it scale from a 120-VM mid-market move to a 3,000-VM enterprise program without changing shape.

It is overkill for a handful of machines. If you have five VMs and a clear picture of all of them, you do not need a formal Migration Center assessment campaign and a wave engine — you assess them in an afternoon and use M4VM directly. The full apparatus pays off at portfolio scale, where the assessment data and the repeatable mechanics save you from the dependency you’d otherwise miss.

Anti-patterns this design exists to prevent. Moving before measuring — skipping the multi-week assessment and discovering dependencies in cutover week. Lifting the allocation — replicating over-provisioned on-prem sizes into Compute Engine and erasing the savings case. Skipping the test clone — turning a clean cutover into a 2 a.m. rollback. Landing ungoverned — VMs with public IPs in a flat VPC, creating the mess the next project must clean. The false binary — being told to choose rehost-everything or rewrite-everything for the whole estate instead of per app. Carrying state as disks — lifting database VMs instead of moving to a managed service, so you never get the reliability or cost benefit.

Alternatives and trade-offs. If the goal is genuinely “modernize everything” and there is no deadline pressure, a re-architect / refactor program (build cloud-native, strangle the monolith) yields a better end state — but it is slower and more expensive, and it is the wrong tool for a hard data-center exit; use it after the migration, on the apps that warrant it, not as the migration. If you only need DR or burst capacity and intend to keep the data center, hybrid extension (Interconnect + a thin cloud footprint, or Google Distributed Cloud / Anthos on-prem) beats a full migration. If the application portfolio is mostly commercial SaaS-replaceable (email, CRM, the EOL Solaris app in the example), the cheapest “migration” is retire-and-replace — don’t move what you can delete. And for the database tier specifically, Database Migration Service to a managed engine almost always beats lifting the DB VM with M4VM — reach for M4VM on databases only when a managed engine genuinely can’t host them. The architecture’s discipline is choosing the right path per workload from real data — and the single most valuable thing it produces is not the moved VMs, but the assessment that tells you which workloads to move, which to modernize, and which to simply switch off.

GCPArchitectureEnterpriseReference Architecture
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading