Architecture AWS

AWS Enterprise Architecture: Migration to AWS

A data-center exit is not a project; it is a production line. Most “lift and shift to AWS” write-ups stop at “install the replication agent and click launch,” which is fine for ten servers and a weekend. It falls apart at the scale where migration actually hurts: 1,200 servers, 300 applications, four years of undocumented dependencies, a hard lease-expiry date, and a steering committee that wants a burn-down chart. At that scale the unit of work is not a server — it is a wave, and the thing you are really building is a migration factory: a repeatable assembly line that ingests a portfolio, dispositions each application against the 7 Rs, replicates the keepers into a pre-governed landing zone, cuts them over on a schedule, and decommissions the source.

This article is a concrete reference architecture for that factory on AWS, built from four load-bearing pieces: AWS Application Discovery Service + Migration Hub as the assessment and tracking plane; the 7 Rs framework as the disposition engine that decides what happens to each app; AWS Application Migration Service (MGN) as the block-level rehost engine for the bulk of servers; and a multi-account landing zone as the governed target every wave lands into. It is designed to move a 40-server SMB in a few sprints and to run a 5,000-server enterprise exit over twelve to eighteen months without the wheels coming off.

The business scenario

Migration is forced, almost never chosen. The trigger is concrete and dated, and it shapes everything downstream. Three stages, one pattern:

All three share the same five problems that an architecture — not heroics — has to solve. You don’t know what you have: the CMDB is stale, nobody can list every server, and nobody can draw the dependency map, so you can’t safely group apps into waves. You don’t know what to do with each app: retire the dead ones, replatform the database, rehost the rest — but which is which, and who decides? Moving a running server without breaking it is genuinely hard: you need continuous block-level replication, a non-disruptive test boot, and a sub-hour cutover. The target has to be ready before the first wave lands: an ungoverned account is where migrations go to rot. And you have to prove progress and prove you didn’t regress security — a leadership burn-down on one side, an auditor asking “did anything land with a public S3 bucket or a hard-coded key?” on the other. (This org carries scar tissue from exactly that last failure mode — long-lived credentials committed to source control — so “the factory must not let static keys ride along into AWS” is a hard requirement, not a nicety.)

The design goal is a migration factory: discovery feeds disposition, disposition feeds replication, replication feeds a scheduled cutover, and every server lands in an account that was governed from its first second. A wave should move on rails, not on adrenaline.

Architecture overview

Think of the factory as four stations on a conveyor — Discover → Decide → Move → Land — with AWS Migration Hub as the control tower that tracks every server’s position on the line from a single pane.

AWS migration-factory reference architecture: an on-prem source data center feeds a four-station conveyor — Discover (Application Discovery Service), Decide (the 7 Rs disposition engine), Move (MGN block replication, DMS/SCT to RDS/Aurora, Snowball), and Land (a Control Tower governed landing zone with Transit Gateway, IAM Identity Center, SCP/Config guardrails and central CloudTrail) — all tracked end-to-end by AWS Migration Hub.

Station 1 — Discover (assessment plane). You cannot wave-plan what you cannot see, so the line starts with AWS Application Discovery Service. It runs in one of two modes, and a real estate uses both. Agentless discovery is a connector OVA deployed into VMware vCenter; it inventories every VM, its specs, and utilization without touching the guests — perfect for a fast, low-friction first pass and for servers you can’t put an agent on. Agent-based discovery installs a small agent on individual hosts (Windows/Linux, virtual or physical) and captures the thing the agentless mode can’t: running processes and inbound/outbound network connections on a per-port basis — the raw material for the dependency map. Discovery data streams into Migration Hub, where it is grouped into applications (logical sets of servers) and explored with Migration Hub network-dependency views (and, for richer graphing and right-sizing, exported into the optional Migration Evaluator TCO model or a third-party tool). The output of this station is a portfolio: every server, its utilization profile, and — critically — what talks to what, which is what lets you draw wave boundaries that don’t sever a live dependency.

Station 2 — Decide (the 7 Rs disposition engine). Each application is run through the 7 Rs to decide its fate: Retire (turn it off — discovery’s utilization data routinely finds 10–20% of servers are zombies), Retain (leave on-prem for now — latency, licensing, or compliance reasons), Relocate (move a whole VMware estate as-is to VMware Cloud on AWS, no server-level change), Rehost (lift-and-shift the VM to EC2 via MGN — the default for the bulk), Replatform (lift-and-reshape: move a self-managed database to Amazon RDS/Aurora via AWS DMS, or a server-hosted app onto Elastic Beanstalk/containers, with minimal code change), Repurchase (drop the app and buy SaaS instead), and Refactor/Re-architect (rewrite cloud-native — reserved for the few apps where the business case justifies it). This decision is recorded against each application in Migration Hub and becomes the routing instruction for the rest of the line. The discipline here is migrate sooner, modernize later: most servers should be Rehost so you make the lease date, with replatform/refactor scheduled as a later optimization rather than a blocker.

Station 3 — Move (replication engines, MGN-led). The disposition routes each app to an engine. The workhorse for Rehost is AWS Application Migration Service (MGN): a lightweight AWS Replication Agent installed on each source server performs continuous, block-level replication of its disks into a low-cost staging area subnet in the target AWS account (cheap staging EBS + a handful of replication servers MGN manages for you). Because replication is continuous and block-level, the source keeps running untouched while the target stays in near-real-time sync. When an app’s servers are ready, you launch test instances into an isolated test subnet — boot the real machine in AWS, validate the app against real data, confirm right-sizing — without disturbing production. You rehearse as many times as needed; only then do you schedule the cutover, which is a brief final-sync-and-launch that converts the staged volumes into a running EC2 instance. Databases dispositioned as Replatform take a parallel track: AWS Database Migration Service (DMS) does continuous logical replication into RDS/Aurora with minimal downtime, and the AWS Schema Conversion Tool (SCT) handles heterogeneous engine changes (e.g., Oracle → PostgreSQL). Huge cold datasets that would never replicate over the link in time ride AWS Snowball offline instead.

Station 4 — Land (the governed landing zone). Every launched instance lands not in a bare account but in a multi-account landing zone stood up before wave one — AWS Control Tower/Organizations with OUs, SCP guardrails (deny leaving approved regions, deny disabling CloudTrail, require IMDSv2), a Transit Gateway hub the migration VPCs attach to, IAM Identity Center for human SSO, and a Log Archive account capturing everything. MGN launch templates place instances into the correct workload account/VPC/subnet with the right security groups and instance profiles (IAM roles), so the migrated server gets AWS permissions via a role — not a baked-in access key. On-premises connectivity (Direct Connect or redundant VPN) terminates once at the Transit Gateway, so half-migrated apps can keep talking across the hybrid boundary throughout the cutover months.

Put together: Discovery agents/connector → Migration Hub portfolio → 7 Rs disposition → MGN/DMS/Snowball replication into staging → test-launch → scheduled cutover into a governed landing-zone account, with Migration Hub tracking each server’s status (replicating → tested → cutover → validated) so leadership sees a live burn-down and operators see exactly what is where.

Component breakdown

Component What it does Why it’s here Key configuration choices
Application Discovery Service Inventories servers (specs, utilization) and — agent-based — captures running processes and per-port network connections You cannot wave-plan or right-size what you haven’t measured; the dependency data is what makes waves safe Agentless connector in vCenter for the fast broad sweep; agent-based on hosts that need process/connection (dependency) data; let it collect 2–6 weeks to capture month-end/peak utilization, not a quiet Tuesday
AWS Migration Hub Single control tower: groups servers into applications, shows network dependencies, tracks migration status across tools and accounts One pane for “where is every server on the line”; the source of the leadership burn-down Pick a home region; group discovery data into applications; use it as the status board MGN/DMS report into; Migration Hub Orchestrator to template repeatable wave runbooks
The 7 Rs framework Per-application disposition: Retire / Retain / Relocate / Rehost / Replatform / Repurchase / Refactor Routing logic for the whole factory — decides which engine each app goes to and what “done” means Default to Rehost to hit the date; Retire zombies discovery exposes (free wins); reserve Refactor for apps with a real business case; record the decision per app in Migration Hub
AWS Application Migration Service (MGN) Block-level, continuous server replication into a staging area; non-disruptive test launches; scheduled cutover to EC2 The rehost workhorse — moves the bulk of servers with near-zero downtime and full pre-cutover rehearsal Replication Agent on each source; staging subnet with cheap EBS + MGN replication servers; launch templates set target account/subnet/SG/instance profile/right-sized type; test before cutover, every time; post-launch actions (SSM) to install agents/domain-join automatically
AWS DMS + Schema Conversion Tool Continuous logical DB replication into RDS/Aurora; SCT converts heterogeneous schemas/code The Replatform track for databases — minimal-downtime DB moves and engine modernization CDC (change data capture) for near-zero downtime; SCT for Oracle/SQL Server → PostgreSQL/Aurora; validate row counts + checksums before cutover
AWS Snowball Offline bulk data transfer (ship the appliance) Some datasets are too large to replicate over the link inside the schedule For multi-TB cold data / poor bandwidth; pairs with online replication for the delta
Landing zone (Control Tower + Organizations) The governed multi-account target every wave lands into Migrating into an ungoverned account is where security regressions and sprawl are born OUs + SCP guardrails, Log Archive + Audit accounts, Transit Gateway hub, IAM Identity Center SSO; built before wave one (see the Landing Zone reference architecture)
AWS MAP (Migration Acceleration Program) AWS’s funded methodology: Assess → Mobilize → Migrate & Modernize, with credits and partner support The commercial/operating wrapper around the factory — funding and a phased plan Tag every migrated resource with the MAP tag to qualify for credits; align waves to the Mobilize → Migrate phases

Three of these choices deserve a sentence of why. Agent-based discovery is non-negotiable for the dependency map — the agentless connector tells you a server exists and how busy it is, but only the agent captures the per-port connections that tell you app-tier-07 opens a socket to db-cluster-02:1521, and that single fact is what stops you scheduling them into different waves and severing the app at cutover. MGN over the old CloudEndure / hand-rolled AMI copies matters because MGN is the AWS-native successor, is free for the 90-day migration window per server, and gives you the test launch — the rehearsal that converts cutover from a leap of faith into a checklist. And the landing zone must precede wave one: SCPs that forbid leaving the region and forbid disabling CloudTrail are what make “every server lands governed” structurally true rather than a hope pinned on operator discipline.

Implementation guidance

Sequence the program in MAP’s three phases and build the factory before you scale it. Assess (portfolio + business case), Mobilize (build the landing zone, run a pilot wave end-to-end, harden the runbook), then Migrate & Modernize (run waves at throughput). The cardinal error is skipping Mobilize and migrating into an unfinished foundation.

Layer 1 — Assess (discovery, weeks 0–6). Deploy the Application Discovery Service agentless connector into vCenter for the broad sweep, and install discovery agents on the hosts where you need process/connection data (typically the tier-1 and shared-services estate). Let it run across at least one month-end / peak cycle — right-sizing off a quiet week is how you land oversized and overpay. In Migration Hub, group servers into applications and use the network-dependency view to find the cut lines. Export to Migration Evaluator for a TCO/right-sizing model if you need a CFO-grade business case.

Layer 2 — Decide (disposition). Run each application through the 7 Rs and record the decision in Migration Hub. Practical heuristics: anything with near-zero CPU and no inbound connections for weeks is a Retire candidate (confirm with the owner, then reclaim the license); a self-managed Oracle/SQL Server with a modernization mandate is Replatform via DMS/SCT; a commodity app with a credible SaaS equivalent is Repurchase; everything else is Rehost unless there’s a funded reason to refactor. Group the rehosts into waves along dependency boundaries and business-risk tiers — a wave is a set of apps that can cut over together without breaking a live dependency.

Layer 3 — Mobilize (landing zone, IaC, pilot). Stand up the landing zone with Terraform / AWS Control Tower Account Factory for Terraform (AFT) before you migrate anything: OUs, SCP guardrails, the Network account with the Transit Gateway, IAM Identity Center, Log Archive. Then initialize MGN in each target workload account, define the staging subnet and replication settings template (instance type for replication servers, EBS volume types, throttling, VPC endpoints), and build per-tier launch templates. Run a pilot wave of a few low-risk apps all the way through test-launch and cutover to prove the runbook and the right-sizing assumptions before you turn up the volume.

# The crux: MGN launch templates give the migrated server an IAM ROLE,
# not a baked-in key — so no static credentials ride into AWS.
launch_template {
  iam_instance_profile { arn = aws_iam_instance_profile.migrated_app.arn }
  metadata_options {
    http_tokens = "required"   # IMDSv2 only — enforced org-wide by SCP too
  }
}
# Post-launch SSM action: install CloudWatch agent + SSM agent automatically

Networking & identity wiring — the load-bearing details.

Cutover is a deployment — define rollback before you start. The MGN cutover sequence per app: (1) confirm replication lag is effectively zero; (2) quiesce/read-only the source app in a planned low-traffic window; (3) MGN performs the final delta sync and launches the cutover instance; (4) validate the app and data (row counts / checksums for DB-backed apps); (5) repoint DNS / load balancers to the AWS instance; (6) watch health for 24–48 h. The rollback is cheap because MGN keeps the source intact and replicating until you explicitly mark the server finalized — if the cutover fails validation, you revert the DNS/LB change back to the still-running source. Never decommission the source on cutover day. Only after a clean hypercare window do you finalize in MGN, archive the source, and reclaim the on-prem capacity.

(On other IaC: Bicep and Deployment Manager are Azure/GCP-native and don’t target AWS. For this factory, Terraform or AWS CDK/CloudFormation are the right tools — AFT in particular is purpose-built to vend governed accounts for the waves to land into.)

Enterprise considerations

Security & Zero Trust — don’t migrate your sins. A lift-and-shift can faithfully reproduce every on-prem misconfiguration in AWS, so the factory has to raise the floor as it moves. Identity: MGN launch templates attach IAM instance profiles, so migrated servers get AWS permissions from short-lived role credentials — the leaked-static-key incident this org was burned by simply cannot ride along, because no key is created. Guardrails as gravity: the landing zone’s SCPs make region-egress, CloudTrail-disable, and IMDSv1 impossible in any account a wave lands in, regardless of what the migrated app tries to do. Posture scanning: point AWS Security Hub, GuardDuty, and Inspector (agentless EC2 + ECR vulnerability scanning) at the workload accounts so every newly-landed instance is assessed within minutes; this is how you answer the auditor’s “did anything land with a public bucket or critical CVE?” with evidence, not a shrug. Network: migrated VPCs are private, reach the internet only through the landing zone’s central inspection/egress path, and are segmented by Transit Gateway route tables (prod cannot route to non-prod). Secrets: anything the app needs at runtime moves to Secrets Manager, pulled via the instance role — not migrated as a plaintext config.

Cost optimization. The factory creates cost levers the data center never had. Right-size on the way in: discovery utilization data routinely shows on-prem servers sized for a 2018 peak; MGN launch templates land them on smaller, current-gen (often Graviton-eligible after replatform) instances — a downsize is the single biggest one-time saving. Retire first: the 7 Rs Retire column is pure margin — 10–20% of servers are zombies you stop paying for entirely. Kill staging cost promptly: MGN staging EBS is cheap but real; finalize and clean up cutover servers so staging volumes don’t linger. Buy commitments after stabilizing: run migrated workloads on On-Demand through hypercare, then cover the steady-state baseline with Savings Plans / Reserved Instances once utilization is proven. Claim the credits: tag every migrated resource with the MAP tag so the migration qualifies for AWS MAP funding — real money against the bill. Modernize for the next step-change: the replatform/refactor backlog (containers, serverless, managed databases) is where the recurring savings beyond rehost live — schedule it post-migration.

Scalability — of the factory, not just the apps. The thing that has to scale here is throughput: servers per wave and waves per month. MGN replicates hundreds to thousands of servers concurrently (mind the staging-area and replication-server limits, and Service Quotas), and Migration Hub Orchestrator templates the wave runbook so the Nth wave is the same checklist as the first. The bottleneck is rarely AWS — it’s link bandwidth for replication and human cutover validation; both are solved by parallelizing waves and automating post-launch validation, not by adding compute.

Reliability & DR (a free upgrade you should bank). Two things to call out. First, MGN is a DR engine too — the same continuous block-level replication underpins AWS Elastic Disaster Recovery (DRS), so the muscle you build migrating is the muscle that gives you a tested recovery posture afterward (the regulator’s untested-DR finding, solved). Second, set RTO/RPO targets per tier for the cutover itself: tier-1 apps get a near-zero-downtime cutover (RPO ≈ minutes via continuous replication, RTO = the short final-sync-and-launch window); tier-3 apps can take a planned-downtime cutover. After migration, the landing zone gives you AWS-native HA (Multi-AZ, ASGs) and the option of cross-region DRS for the workloads that warrant it.

Observability. Two dashboards, two audiences. Program: Migration Hub is the burn-down — servers by status (discovered → replicating → tested → cutover → finalized), wave progress, and the disposition mix, which is what the steering committee actually wants to see. Operational: migrated instances ship metrics/logs to CloudWatch (agents installed by post-launch actions), with Security Hub/GuardDuty findings aggregated in the Audit account. The post-cutover hypercare window watches health/error rates for 24–48 h against pre-defined go/no-go thresholds.

Governance. Every disposition decision and every server’s status lives in Migration Hub, so the plan is auditable. The landing zone’s Log Archive account captures CloudTrail/Config org-wide, so who launched what, where is tamper-evidently recorded. SCPs + Config rules enforce the non-negotiables continuously, and AWS Backup policies attach to migrated workloads by tag so backup isn’t a per-server afterthought. Change management and the audit story become the same artifacts: the wave runbook, the Migration Hub status, and the immutable logs.

Reference enterprise example

Aldermere Insurance is a (fictional) mid-market property-and-casualty insurer running a single leased data center whose contract expires in 11 months with a steep renewal and end-of-support hardware. Their estate: 620 servers across 140 applications — a mix of Windows policy-admin systems, a Linux claims-rating tier, three Oracle databases (one of them the policy system of record that the business will not allow extended downtime on), a SQL Server reporting stack, and the usual long tail of file servers and forgotten utilities. Two things forced an architecture rather than a scramble: a board mandate to exit the DC before renewal, and a prior security review that had found an AWS access key committed to an internal Git repo during an earlier ad-hoc cloud experiment — making “the migration must not let static credentials ride along” a hard, named requirement.

What they built. They ran MAP: a 5-week Assess with Application Discovery Service (agentless connector across vCenter, agents on the 90 tier-1 and shared hosts to capture dependencies), letting it collect through a month-end close so right-sizing reflected real peaks. Migration Hub grouped the 620 servers into 140 applications and surfaced the dependency edges that defined wave boundaries. The 7 Rs disposition: 18 apps Retired (zombie file/print and dev boxes discovery proved were idle — ~70 servers and a stack of Windows licenses reclaimed), 3 Replatformed (the SQL Server reporting DB and two smaller Oracle schemas to Aurora PostgreSQL via DMS + SCT), 2 Repurchased (an aging ticketing tool and an on-prem email-archive appliance, both swapped for SaaS), the policy-system-of-record Oracle DB Replatformed to RDS for Oracle via DMS CDC to honor the near-zero-downtime constraint, and the remaining 116 apps Rehosted with MGN. During a 6-week Mobilize they stood up the landing zone with AFT (Control Tower, OUs, SCP guardrails, a Transit Gateway hub reachable over a redundant Direct Connect, IAM Identity Center, Log Archive) and ran a pilot wave of 4 low-risk apps end-to-end. Then Migrate: nine MGN waves of ~12–16 apps each, scheduled along dependency lines, every server test-launched into an isolated subnet before a weekend cutover, DNS repointed at cutover, source kept live until a clean hypercare window. Every migrated instance landed with an IAM instance profile (zero static keys), IMDSv2 enforced by SCP, and Inspector + Security Hub scanning it within minutes.

The numbers and the outcome.

Dimension Before (leased DC) After (AWS, post-factory)
Servers in scope 620 532 migrated, 88 retired (never moved)
DC exit timeline 11-month hard deadline DC vacated in ~9 months, ahead of renewal
Steady-state infra cost ~$310k/mo (DC + hardware refresh path) ~$188k/mo (right-sized + Retire + Savings Plans)
Cutover downtime, tier-1 system of record n/a (never moved) < 20 min (DMS CDC final sync)
Static AWS keys created during migration (prior key leaked in Git) zero — instance profiles only
New servers landing non-compliant unknown / unprovable 0 — SCPs + Inspector enforce at landing
Disaster recovery posture second rented cage, never tested tested DRS replication, RPO ≈ minutes for tier-1

The headline outcome wasn’t only the ~39% run-rate reduction or beating the lease date by two months. It was that the 88 retired servers (found by discovery, not guesswork) paid for a chunk of the program before anything moved; that the previously-leaked-key failure mode became structurally impossible because no key is ever minted; and that a forced data-center exit doubled as a tested DR upgrade the regulator had been asking for. Migration became a production line with a burn-down chart, not a heroic all-nighter.

When to use it

Use this architecture when you have a portfolio-scale move — roughly 50+ servers and/or a dozen+ applications — with a real deadline, an unknown dependency map, and a need to prove both progress and that you didn’t regress security on the way in. It is the right shape for data-center exits, co-lo lease expiries, M&A consolidation, and “cloud-first mandate” programs where the unit of work is a wave, not a weekend.

Trade-offs and anti-patterns to avoid.

Alternatives in brief. VMware Cloud on AWS (Relocate) — when you want the whole VMware estate moved with zero server-level change and your operating model stays vSphere; fastest path off the DC, but you defer the cloud-native benefits. Replatform-heavy programs — when the apps are modern enough that moving databases to managed services and apps onto Beanstalk/containers is cheap, you skip pure rehost and bank recurring savings sooner, at the cost of a slower move. Partner-led “migration factory as a service” — large enterprises often run this exact architecture through an AWS MAP partner for surge capacity; the design is the same, the labor is outsourced. Greenfield rebuild — for a small, modern portfolio it can be cheaper to re-deploy from IaC than to replicate VMs, but for a 600-server estate with undocumented apps, MGN’s lift-and-shift is what makes the deadline. The sweet spot for this reference architecture is the broad middle: enough servers and enough dependency fog that you need a measured, governed, repeatable line rather than a clever one-off.

AWSArchitectureEnterpriseReference Architecture
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading