AWS Lesson 79 of 123

Building a Multi-Account AWS Landing Zone with Control Tower and Account Factory

A landing zone is the governed, pre-wired AWS foundation your workloads land in — accounts, OUs, identity, logging, and guardrails already in place so teams ship without re-litigating security per project. AWS Control Tower orchestrates this on top of AWS Organizations, and Account Factory for Terraform (AFT) turns account creation into a GitOps pipeline. This guide builds the whole thing the way it is done in regulated enterprises — and because you will return to it mid-build and mid-incident, almost every decision here is laid out as a scannable table: the controls, the SCP shapes, the OU choices, the limits, the error codes, and a symptom→cause→confirm→fix playbook for the day governance fights you.

The core principle is simple and load-bearing: the AWS account is your strongest isolation boundary. IAM, SCPs, networking, and billing all stop at the account edge. So instead of cramming prod and dev into one account separated by tags and hope, you give each workload-and-environment its own account. A compromised dev account cannot reach prod. A runaway Lambda cannot exhaust prod’s concurrency. A DeleteBucket blast radius is one account, not the company. Accounts are cheap; the hard part is governing hundreds of them consistently — and that is exactly what Control Tower and an organizational unit (OU) strategy solve.

By the end you will be able to enable Control Tower deliberately (home region and all), design an OU tree that scales to hundreds of accounts, attach the right controls and custom SCPs at the right level, stand up AFT in its own account, vend a governed account from a pull request, centralize logging into an immutable Log Archive, and run the drift, upgrade, and decommission lifecycle without hand-patching the platform into a corner.

What problem this solves

A single shared AWS account is a time bomb. Every team’s IAM policies pile up; one over-broad *:* grant or a leaked access key reaches everything; cost is impossible to attribute; an experiment in dev can throttle, delete, or bankrupt prod because they share the same limits, the same buckets, the same network. The first instinct — “we will separate environments with tags and IAM conditions” — fails because IAM is additive and tag-based isolation is one missing Condition away from collapse. The account boundary, by contrast, is the one isolation primitive AWS cannot leak across by accident.

But the moment you accept “an account per workload-environment,” you have a new problem: consistency at scale. Account #1 is hand-built lovingly. Account #50 is built at 5pm on a Friday and forgets the CloudTrail, the region lock, the break-glass role, the budget alarm. Six months later a security review finds forty accounts, each subtly different, none fully compliant, and no one able to say which control is where. That drift is the real enemy. Control Tower exists to make account #100 the same PR as account #1 — governed, logged, and consistent on day one — and to detect the moment any account diverges from that baseline.

Who hits this: any organization past its first few AWS accounts; anyone in a regulated industry (finance, health, public sector) that must prove centralized logging and preventive controls; any platform team asked to “give every squad their own account but keep us out of the news.” The failure mode without a landing zone is not dramatic — it is slow: a hundred snowflake accounts, an audit you cannot pass, and a blast radius the size of the company.

To frame the whole field before the build, here is every layer this article governs, who owns it, and the failure it prevents:

Layer What lives here Who owns it Failure it prevents
Management (payer) Organizations, Control Tower, SCPs, billing Cloud platform / security One account that can dismantle everything
OU policy tree SCPs, Config rules, control attachment Platform + governance Inconsistent guardrails per team
Security OU Log Archive + Audit accounts Security / SecOps Tamperable logs; no central detection
Identity IAM Identity Center, permission sets Identity team IAM-user sprawl; static keys
Account vending (AFT) Account requests, customizations Platform engineering Snowflake, half-built accounts
Member accounts The actual workloads App / product teams Blast radius beyond one account
Network (shared) Transit Gateway, central egress, DNS Network team Re-inventing connectivity per account

Learning objectives

By the end of this article you can:

Prerequisites & where this fits

You should already understand AWS account basics: an account is the billing and isolation boundary; IAM governs identity within an account; STS issues temporary credentials for cross-account access. You should be comfortable running aws CLI v2 with named profiles, reading JSON output with --query, and writing basic Terraform (providers, modules, terraform apply). Familiarity with AWS Organizations (the root, OUs, member accounts, consolidated billing) is assumed at a conceptual level — this article builds the governance layer on top of it.

This sits at the foundation of any multi-account AWS estate. It is upstream of almost everything: networking, identity federation, workload deployment, and cost management all assume the landing zone exists. It pairs tightly with AWS Organizations: SCP guardrails and delegated admin for the policy mechanics, Control Tower guardrails: the multi-account foundation for the control catalog in depth, Account Factory for Terraform: account vending and customizations for the AFT pipeline internals, and IAM Identity Center: permission sets and ABAC across accounts for human access. Centralized logging connects to CloudTrail and Config for audit and compliance, and the shared network it enables is built in Transit Gateway multi-account VPC architecture.

Where the responsibility boundary sits between you and AWS, so you know what you can and cannot change:

Concern AWS owns You own
Control Tower control plane The orchestration, managed roles, baseline logic Which OUs you register, which controls you enable
Mandatory controls The control definitions; you cannot disable them Designing workflows that live within them
Org CloudTrail trail The managed trail + delivery roles The Log Archive bucket policy hardening, retention
SCPs The evaluation engine Authoring your own custom SCPs
Account vending The Account Factory provisioning product The AFT pipeline, requests, and customizations
Member account contents Nothing — it is yours All workloads, IAM, networking inside it

Core concepts

Five mental models make every later decision obvious.

The account is the blast radius; the OU is the policy unit. Accounts are where isolation and billing stop; OUs are where you attach controls once and inherit everywhere beneath. You design the OU tree around how you want to govern (prod vs non-prod, security vs sandbox), not around your reporting lines. A new team account dropped into Workloads/Prod inherits prod’s stricter SCPs on creation — that inheritance is the entire point.

Controls come in three behaviors. Preventive controls are SCPs: they block a non-compliant API call outright (return AccessDenied no matter what IAM says). Detective controls are AWS Config rules: they flag drift but do not stop it. Proactive controls are CloudFormation hooks: they block a non-compliant resource before it is provisioned. “Block the call,” “flag the state,” “block the deploy” — three different points on the timeline.

The three foundational accounts enforce separation of duties. Control Tower creates a Management account (Organizations, billing, SCPs — no workloads, ever), a Log Archive account (the immutable central log store), and an Audit account (cross-account security tooling). The people who can change the org are not the people who can read every log, and neither can tamper with the log store. That triangle is what makes the landing zone trustworthy to an auditor.

The landing zone has versioned state that can drift. Control Tower ships landing-zone versions and baseline versions. Enabling a control on an OU applies a baseline to every current and future account in it — but an in-place landing-zone upgrade does not re-apply the baseline to already-enrolled OUs. Accounts can therefore drift against a new control set until something downstream trips over the gap. Treat “update landing zone” and “re-register OUs” as one atomic step.

Account vending should be GitOps, not clicks. Clicking “Enroll account” does not scale and leaves no audit trail. AFT wraps Account Factory: you describe an account in a Terraform request, merge it, and a pipeline vends, enrolls, and customizes the account end to end — reproducible, reviewed, logged.

The vocabulary in one table

Pin down every moving part before the deep sections. The glossary repeats these for lookup; this is the mental model side by side:

Term One-line definition Where it lives Why it matters
Management account The Organizations root payer Top of the org Can dismantle everything; keep it clean
Organizational unit (OU) A container for accounts Under the root The unit of policy attachment
Service Control Policy (SCP) Org-level Deny/Allow boundary Attached to root/OU/account Preventive control; caps IAM
Control (guardrail) A managed governance rule Applied to an OU Preventive/detective/proactive
Baseline The control set applied to an OU Per registered OU Drifts if not re-applied on upgrade
Log Archive account Immutable central log store Security OU Tamper-proof audit trail
Audit account Cross-account security tooling Security OU Delegated GuardDuty/Hub admin
IAM Identity Center Workforce SSO + permission sets Org-wide Replaces IAM users for humans
Account Factory Control Tower’s account-provisioning product Service Catalog Vends + enrolls accounts
AFT Terraform GitOps wrapper for Account Factory Its own account Reproducible account vending
Customization Terraform run in a vended account AFT repos Bakes VPC/roles/tags into every account
Home region The region anchoring the landing zone Chosen at setup Painful to change later
Drift An account diverging from its baseline Detected by Control Tower Re-register/re-enroll to fix

Step 1 — Enable Control Tower and design the OU hierarchy

Control Tower is set up from the management account (the Organizations root payer). Before you click anything, decide your home region carefully — it anchors the landing zone, hosts the managed resources, and is painful to change later. Enable it from the console (Set up landing zone) or, if you prefer IaC from day one, the aws controltower API once the prerequisites exist. The console wizard is genuinely the right call for the initial enablement because it provisions the audit and log archive accounts atomically; automate everything after that point.

During setup you choose two foundational OUs (Control Tower calls the security one the Security OU and the sandbox one the Sandbox OU) and the regions Control Tower governs. Once the landing zone is live, extend the tree to something that scales:

Root
├── Security                 (Control Tower foundational OU)
│   ├── Log Archive  (account)
│   └── Audit        (account)
├── Infrastructure           (shared platform: networking, CI/CD, DNS)
│   ├── Network      (account: Transit Gateway, central egress)
│   └── Shared-Services (account: AD, artifact registries)
├── Workloads
│   ├── Prod
│   └── NonProd
├── Sandbox                  (Control Tower foundational OU; loose guardrails)
└── Suspended                (quarantine: deny-all SCP, pending closure)

This mirrors the AWS multi-account reference. Workloads/Prod and Workloads/NonProd are split so you can attach stricter SCPs (deny leaving approved regions, deny disabling CloudTrail) to prod without slowing down experimentation. Suspended exists for the day you decommission an account — you move it there, attach a deny-all SCP, and it sits inert until closure.

Each OU exists for a reason; placing an account in the wrong one gives it the wrong guardrails. The full tree, what governs each node, and the SCP posture:

OU Purpose Typical accounts SCP posture Control Tower role
Security Audit + logging plane Log Archive, Audit Tightest; deny tampering Foundational (mandatory)
Infrastructure Shared platform services Network, Shared-Services Strict; region-locked Custom (you register)
Workloads/Prod Production workloads per-team prod accounts Strict; deny region/trail changes Custom (you register)
Workloads/NonProd Dev/test/stage per-team non-prod accounts Looser; region allowlist Custom (you register)
Sandbox Free experimentation personal/POC accounts Loosest; budget caps Foundational
Suspended Quarantine before closure accounts being retired Deny-all Custom (you register)
PolicyStaging (opt.) Test SCPs before prod a throwaway account Whatever you are testing Custom (you register)

Register additional OUs with Control Tower so it enrolls and governs accounts placed in them. With the CLI:

# OUs themselves are Organizations objects; create under the root or a parent OU
aws organizations create-organizational-unit \
  --parent-id "$ROOT_ID" \
  --name "Workloads"

aws organizations create-organizational-unit \
  --parent-id "$WORKLOADS_OU_ID" \
  --name "Prod"

# Then register the OU with Control Tower so its accounts are governed/enrolled
aws controltower enable-baseline \
  --baseline-identifier "$AWS_CONTROL_TOWER_BASELINE_ARN" \
  --target-identifier "$WORKLOADS_PROD_OU_ARN" \
  --baseline-version "4.0"

In Terraform, the OU tree and registration are declarative — this is how you keep the hierarchy in version control:

resource "aws_organizations_organizational_unit" "workloads" {
  name      = "Workloads"
  parent_id = data.aws_organizations_organization.this.roots[0].id
}

resource "aws_organizations_organizational_unit" "prod" {
  name      = "Prod"
  parent_id = aws_organizations_organizational_unit.workloads.id
}

# Register the OU's baseline with Control Tower (provider-dependent resource)
resource "aws_controltower_baseline_enablement" "prod" {
  baseline_identifier = var.aws_control_tower_baseline_arn
  baseline_version    = "4.0"
  target_identifier   = aws_organizations_organizational_unit.prod.arn
}

Why this matters: an OU registered with Control Tower applies its baseline (the mandatory controls and a Config recorder) to every current and future account in that OU. New teams inherit guardrails on day one — that is the entire point.

A note on OU and Organizations limits so your tree design does not hit a wall: nesting and counts are bounded. Design wide, not deep.

Resource Default limit Adjustable? Design implication
OU nesting depth 5 levels below root No Keep the tree shallow; group by governance
OUs per organization 1,000 No Plenty; do not create an OU per team if policy is shared
Accounts per organization Default ~10, raise via Service Quotas Yes (quota) Request increases early for large estates
SCPs per organization 5,000 No Reuse SCPs across OUs; do not template per account
SCPs attached per OU/account 5 No Compose policy carefully; consolidate statements
SCP document size 5,120 characters No Trim whitespace; split logical policies
Policy types per root SCP, Tag, AI opt-out, Backup, RCP n/a Enable SCP type at the root before attaching

Step 2 — Inside the landing zone: the three foundational accounts

Control Tower creates a deliberate separation of duties across three accounts. Treat these as platform infrastructure, not playgrounds.

Account Lives in Owns Never does
Management Root Organizations, Control Tower, consolidated billing, SCPs Run workloads; hold IAM users
Log Archive Security OU The immutable, central S3 destination for org CloudTrail and Config logs Allow member accounts to delete logs
Audit Security OU Cross-account security tooling: GuardDuty/Security Hub delegated admin, read/audit roles Hold workloads or write access to prod

The split exists so that the people who can change the org (management) are not the same as the people who can read every log (audit), and neither can tamper with the log store (archive). Lock the management account down hard: no IAM users, root protected with hardware MFA, access only through IAM Identity Center (formerly AWS SSO) with a tightly scoped permission set, and SCPs that prevent anyone from leaving the org or disabling Control Tower.

How to lock down the management account, concretely — each control and the exact mechanism:

Hardening control Why How (mechanism)
No IAM users Static keys are the #1 breach vector Identity Center permission sets only; delete legacy IAM users
Root hardware MFA Root cannot be SCP-restricted FIDO2 security key on root; store offline
No root access keys A root key is total compromise Delete any root access keys; alarm on root usage
Restrict who can assume admin Limit blast radius of the payer Scoped permission set; SCP aws:PrincipalOrgID conditions
Deny leaving the org Stop an account escaping governance SCP deny on organizations:LeaveOrganization
Deny disabling Control Tower / trail Preserve the foundation Mandatory controls already do this; do not relax
Alarm on root + console login Detect misuse fast CloudTrail → EventBridge → SNS on root events

A useful pattern is to delegate administration of security services out of the management account to the audit account, keeping the payer account clean:

# Run from the management account: make Audit the org-wide GuardDuty admin
aws guardduty enable-organization-admin-account \
  --admin-account-id "$AUDIT_ACCOUNT_ID"

# Same idea for Security Hub
aws securityhub enable-organization-admin-account \
  --admin-account-id "$AUDIT_ACCOUNT_ID"

Which security services support delegated administration, and where to run them from:

Service Delegate to Why delegate Run org enable from
GuardDuty Audit Org-wide threat detection, single pane Management → Audit becomes admin
Security Hub Audit Aggregate findings org-wide Management → Audit becomes admin
IAM Access Analyzer Audit Org-level external-access analyzer Management → Audit
Config aggregator Audit Single multi-account/region view Management → Audit
Macie Audit (or Security) Central data-classification posture Management → delegated admin
Detective Audit Cross-account investigation graph Management → delegated admin
Firewall Manager Security/Network Centralized WAF/firewall policy Management → delegated admin

Step 3 — Baseline controls: mandatory, strongly recommended, elective

Control Tower governance is delivered through controls (historically “guardrails”). They come in three behaviors and three categories.

By behavior:

Behavior Implemented as Effect Timing Example
Preventive Service Control Policy Blocks the API call (AccessDenied) At call time Disallow changes to CloudTrail
Detective AWS Config rule Flags non-compliance (does not block) After the fact Detect public-read S3 buckets
Proactive CloudFormation hook Blocks the resource before provisioning At deploy time Block creating an unencrypted volume

By category:

Category What it is You can disable it? How to treat it
Mandatory Always on; the bedrock of the landing zone No Do not fight it; design within it
Strongly recommended AWS best practice (Well-Architected aligned) Yes Enable broadly across OUs
Elective Common but situational locks Yes Apply surgically to OUs that need them

The mandatory set is what makes the landing zone trustworthy — it does things like disallow deleting the central log archive, disallow changes to the CloudTrail/Config roles, and disallow public access to the log buckets. Do not fight the mandatory controls. A representative sample of what mandatory controls actually enforce (the catalog evolves; verify in your account):

Mandatory control (representative) Behavior What it enforces
Disallow changes to CloudTrail Preventive Members cannot stop/alter the org trail
Disallow deletion of the log archive Preventive The central log bucket cannot be removed
Disallow public read on log buckets Preventive Audit logs never become public
Disallow changes to Config setup Preventive The Config recorder/role stay intact
Disallow changes to encryption config of log archive Preventive KMS on the log store cannot be weakened
Integrate CloudTrail with CloudWatch Logs Detective/setup Trail events reach a log group for alarms

Enable a strongly-recommended or elective control on an OU via the API:

aws controltower enable-control \
  --control-identifier "$STRONGLY_RECOMMENDED_CONTROL_ARN" \
  --target-identifier "$WORKLOADS_PROD_OU_ARN"

A starter map of high-value strongly-recommended / elective controls and where to apply them:

Control (intent) Category Apply to Rationale
Disallow internet access via IGW on EC2 (no public IP) Strongly recommended Workloads/Prod Force traffic through controlled egress
Disallow public-read/-write S3 buckets Strongly recommended All workload OUs Stop the classic bucket leak
Detect EBS volumes not encrypted Detective (strongly rec.) All OUs Flag unencrypted storage
Disallow RDS public accessibility Strongly recommended Workloads/Prod Databases never internet-facing
Require MFA for root Strongly recommended All OUs Baseline identity hygiene
Disallow changes to AWS Config rules set by CT Elective Workloads Prevent drift from the baseline
Disallow cross-region networking (where unused) Elective NonProd Reduce attack surface

Reading note: a control’s behavior tells you how it acts (block now / flag later / block deploy); its category tells you whether you may turn it off. Mandatory + preventive is the strongest combination and the bedrock you never relax.

Custom SCPs: your org-specific non-negotiables

Beyond Control Tower’s catalog, layer your own custom SCPs at the OU level for org-specific rules — a region allowlist is the classic one:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyOutsideApprovedRegions",
      "Effect": "Deny",
      "NotAction": [
        "iam:*", "organizations:*", "route53:*",
        "cloudfront:*", "support:*", "sts:*"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": ["us-east-1", "eu-west-1"]
        }
      }
    }
  ]
}

Callout: global services (IAM, Route 53, CloudFront, Organizations) authenticate through us-east-1. If you region-lock with an SCP, you must exempt those actions or you will lock yourself out of IAM. The NotAction list above is the safe baseline.

SCPs are a boundary, not a grant — the mechanics trip up almost everyone the first time. Internalize these rules:

SCP rule What it means Consequence if forgotten
SCPs never grant permission They only cap what IAM can allow Expecting an SCP to “enable” an action wastes hours
Effective perms = IAM ∩ SCP An action needs both to allow A correct IAM policy still fails if SCP denies
Explicit Deny always wins A Deny anywhere overrides any Allow One stray Deny can break a whole account
SCPs do not apply to the management account The payer is exempt Test SCPs in a member account, never the root
SCPs do not affect service-linked roles SLRs bypass SCPs Some platform actions still work; do not rely on that
NotAction is a complement, not “all except” semantics you assume Lists actions the Deny does not hit Forgetting global services = lockout

Common custom-SCP patterns, what each blocks, and the gotcha that bites:

SCP pattern Blocks Apply to Gotcha
Region allowlist API calls outside approved regions Workloads OUs Must NotAction global services
Deny leave-organization organizations:LeaveOrganization Root / all OUs Keep management exempt (it is automatically)
Deny CloudTrail tampering Stop/delete/update trail All OUs (CT also does this) Do not duplicate-conflict with CT’s control
Protect IAM roles (deny mutation) Delete/modify of named platform roles All OUs Match exact role names/paths
Deny disabling default EBS encryption ec2:DisableEbsEncryptionByDefault All OUs Pair with proactive control
Require IMDSv2 RunInstances without IMDSv2 Workloads Use ec2:MetadataHttpTokens condition
Deny-all (quarantine) Everything Suspended OU Use only for decommissioning
Data-perimeter (org-only access) Principals/resources outside the org All OUs Combine with RCPs for resource-side perimeter

Apply a custom SCP with the CLI or Terraform:

# Create the SCP, then attach it to an OU
SCP_ID=$(aws organizations create-policy \
  --name "deny-non-approved-regions" --type SERVICE_CONTROL_POLICY \
  --content file://region-allowlist.json \
  --query 'Policy.PolicySummary.Id' -o tsv)

aws organizations attach-policy \
  --policy-id "$SCP_ID" --target-id "$WORKLOADS_OU_ID"
resource "aws_organizations_policy" "region_allowlist" {
  name    = "deny-non-approved-regions"
  type    = "SERVICE_CONTROL_POLICY"
  content = file("${path.module}/policies/region-allowlist.json")
}

resource "aws_organizations_policy_attachment" "region_allowlist" {
  policy_id = aws_organizations_policy.region_allowlist.id
  target_id = aws_organizations_organizational_unit.workloads.id
}

Step 4 — Automating account vending with AFT

Clicking “Enroll account” in the Account Factory console does not scale. Account Factory for Terraform (AFT) wraps Account Factory in a GitOps pipeline: you describe an account in a Terraform request, merge it, and AFT vends and customizes the account end to end.

AFT runs in its own dedicated account and is itself deployed with the published Terraform module. The bootstrap is a one-time apply from a management-context backend:

module "aft" {
  source  = "aws-ia/control_tower_account_factory/aws"
  version = "1.14.0"

  # Core account wiring
  ct_management_account_id    = "111111111111"
  log_archive_account_id      = "222222222222"
  audit_account_id            = "333333333333"
  aft_management_account_id   = "444444444444"
  ct_home_region              = "us-east-1"
  tf_backend_secondary_region = "us-west-2"

  # Point AFT at your four pipeline repos (CodeCommit by default,
  # or GitHub/GitLab/Bitbucket via *_vcs settings)
  account_request_repo_name             = "aft-account-request"
  global_customizations_repo_name       = "aft-global-customizations"
  account_customizations_repo_name      = "aft-account-customizations"
  account_provisioning_customizations_repo_name = "aft-account-provisioning-customizations"

  terraform_distribution = "oss"
}

AFT is driven by four repositories, each with a distinct job. Mixing up which code goes where is the most common AFT mistake — this table is the map:

Repo Holds Runs when Scope
aft-account-request One module call per account On merge → triggers vend Per account (the request)
aft-global-customizations Terraform for every account After every account is provisioned Universal baseline
aft-account-customizations Named bundles (workload-prod, sandbox) When a request selects that bundle Per environment type
aft-account-provisioning-customizations Step Functions / pre-vend hooks During provisioning, before customizations The provisioning pipeline itself

Key AFT module inputs you will actually set, and what each controls:

Input Purpose Typical value Gotcha
ct_home_region Must match Control Tower’s home region us-east-1 Mismatch breaks the pipeline
tf_backend_secondary_region DR region for AFT’s state backend us-west-2 Pick a second region deliberately
terraform_distribution oss / tfc / tfe oss TFC/TFE needs token wiring
vcs_provider codecommit / github / gitlabselfmanaged github Needs connection/credentials
aft_feature_cloudtrail_data_events Enable AFT data-event trail false Cost vs forensic depth
aft_feature_enterprise_support Auto-enroll Enterprise Support false Only if you have the agreement
aft_metrics_reporting Send anonymized AFT metrics to AWS true/false Disable in strict environments

Once AFT is live, vending an account is a pull request to the account request repo. Each account is a module call:

module "team_payments_prod" {
  source = "./modules/aft-account-request"

  control_tower_parameters = {
    AccountEmail              = "aws+payments-prod@kloudvin.io"
    AccountName               = "payments-prod"
    ManagedOrganizationalUnit = "Prod (ou-xxxx-prod1234)"
    SSOUserEmail              = "platform@kloudvin.io"
    SSOUserFirstName          = "Platform"
    SSOUserLastName           = "Team"
  }

  account_tags = {
    "team"        = "payments"
    "environment" = "prod"
    "cost-center" = "CC-4012"
  }

  # Which customization sets run after provisioning
  account_customizations_name = "workload-prod"

  change_management_parameters = {
    change_requested_by = "vinod"
    change_reason       = "New prod account for payments service"
  }
}

The control_tower_parameters block has exact required keys — miss one and the vend fails at validation:

Parameter Required What it sets Gotcha
AccountEmail Yes The new account’s unique root email Must be globally unique; use plus-addressing
AccountName Yes Display name in Organizations Cannot collide with an existing account
ManagedOrganizationalUnit Yes Which OU the account enrolls in Wrong OU = wrong guardrails (see playbook)
SSOUserEmail Yes Initial Identity Center user Becomes the account’s first SSO admin
SSOUserFirstName / SSOUserLastName Yes SSO user display Cosmetic but required
account_tags No (recommended) Cost/ownership tags on the account Drive showback; set cost-center
account_customizations_name No Which customization bundle to run Must match a folder in the repo
change_management_parameters No Audit metadata for the change Good practice for the trail

Merge to the main branch, and AFT’s pipeline calls Account Factory to create and enroll the account, then runs your customization layers against it — no console, full audit trail, fully reproducible. The end-to-end vend, stage by stage and roughly how long each takes:

Stage What happens Typical duration Fails if…
1. PR merged Account-request pipeline triggers seconds Branch protection blocks merge
2. Account Factory provision Service Catalog creates + enrolls account ~25–35 min Email collides; OU not registered
3. Baseline applied Mandatory controls + Config recorder land minutes OU not registered with CT
4. Provisioning customizations Pre-vend Step Functions hooks run minutes Hook code errors
5. Global customizations Universal baseline Terraform applies minutes Module/version error
6. Account customizations The selected bundle applies minutes Bundle name mismatch; IAM/KMS deny
7. Done Account governed + customized

Step 5 — Customizations: baking VPCs, IAM roles, and CloudTrail into every account

AFT applies customizations in two tiers, and understanding the order is the key to a clean baseline:

  1. Global customizations — Terraform that runs on every account AFT touches. Put your universal baseline here: a standard VPC pattern, break-glass IAM roles, an account-level GuardDuty/Config posture, default budgets, mandatory tags.
  2. Account customizations — named bundles (e.g. workload-prod, sandbox) selected per request. Put environment-specific bits here: a larger VPC CIDR for prod, stricter password policy, prod-only backup vaults.

What belongs in each tier — the decision is “does every account need it, or only this type?”:

Resource Global or account-specific Why
Standard VPC (3-AZ, private/public) Global (size varies per account) Every account needs a network
Break-glass IAM role Global Emergency access everywhere
Default tags enforcement Global Org-wide cost/ownership hygiene
Account-level budget + alarm Global Catch runaway spend in any account
GuardDuty/Config member enable Global Detection in every account
Larger prod VPC CIDR Account (workload-prod) Only prod needs the headroom
Strict password policy Account (workload-prod) Prod-grade identity controls
Backup vault + Vault Lock Account (workload-prod) Compliance backups for prod only
Sandbox auto-nuke schedule Account (sandbox) Cost control for throwaway accounts
Per-team SSO permission set assignment Account (per bundle) Team-specific access

A global customization that lays down a standard network and a cross-account automation role looks like ordinary Terraform — AFT just runs it in the target account:

# aft-global-customizations/terraform/network.tf
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.13.0"

  name = "core"
  cidr = var.vpc_cidr

  azs             = ["${var.region}a", "${var.region}b", "${var.region}c"]
  private_subnets = var.private_subnets
  public_subnets  = var.public_subnets

  enable_nat_gateway   = true
  single_nat_gateway   = var.environment != "prod"
  enable_dns_hostnames = true
}

# A standard role the platform pipeline assumes into this account
resource "aws_iam_role" "platform_automation" {
  name = "PlatformAutomation"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { AWS = "arn:aws:iam::${var.aft_mgmt_account_id}:root" }
      Action    = "sts:AssumeRole"
      Condition = {
        StringEquals = { "sts:ExternalId" = var.automation_external_id }
      }
    }]
  })
}

Note: you generally do not create a per-account CloudTrail trail in customizations. The organization trail (Step 6) already captures every account centrally; a redundant per-account trail just duplicates data and cost. Reserve account-level trails for narrow cases like a data-events trail scoped to one sensitive account.

A practical baseline budget for every account — fail loud before the bill does:

resource "aws_budgets_budget" "account_monthly" {
  name         = "account-monthly-baseline"
  budget_type  = "COST"
  limit_amount = var.environment == "prod" ? "5000" : "500"
  limit_unit   = "USD"
  time_unit    = "MONTHLY"

  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                  = 80
    threshold_type             = "PERCENTAGE"
    notification_type          = "ACTUAL"
    subscriber_email_addresses = [var.finops_alert_email]
  }
}

Step 6 — Centralized logging: the CloudTrail org trail to Log Archive

Control Tower provisions an organization CloudTrail trail that captures management events across all accounts and delivers them to a hardened S3 bucket in the Log Archive account. The org trail is created in the management account but applies org-wide, so a new account is covered the moment it is enrolled — no per-account wiring.

The properties that make this trustworthy:

The hardening layers on the Log Archive bucket, and what each defends against:

Layer What it does Defends against
Cross-account location Bucket in Log Archive, not the workload account A compromised account deleting its own logs
Mandatory SCP deny Members cannot alter trail/bucket/roles Insider/attacker disabling the trail
S3 Object Lock (WORM) Write-once-read-many, retention enforced Deletion/overwrite even by an admin
Log file validation SHA-256 digest files per delivery Silent tampering with delivered logs
Bucket KMS encryption Logs encrypted at rest with a CMK Reading logs without the key
Lifecycle to Glacier Cheap long-term retention Cost blowup on years of logs
Block Public Access Account + bucket level Accidental public exposure

For long-term integrity, harden the bucket with S3 Object Lock (write-once-read-many) and a lifecycle policy. Verify CloudTrail integrity from a role in the audit account:

# Validate that delivered log files haven't been tampered with
aws cloudtrail validate-logs \
  --trail-arn "arn:aws:cloudtrail:us-east-1:111111111111:trail/aws-controltower-BaselineCloudTrail" \
  --start-time "2026-05-01T00:00:00Z"

If you need queryable history, point Athena or CloudTrail Lake at the archive bucket rather than standing up logging stacks in every account. Where each log type lands and how to query it:

Log source Where it lands Query path Retention strategy
Org CloudTrail (management events) Log Archive S3 bucket Athena / CloudTrail Lake Object Lock + Glacier lifecycle
CloudTrail data events (opt-in) S3 (scoped trail) Athena Expensive; scope to sensitive buckets only
AWS Config snapshots/history Log Archive S3 bucket Config aggregator (Audit) Long-term in S3
VPC Flow Logs Per-account S3 or central Athena Often centralized to Log Archive
GuardDuty findings Audit (delegated admin) Security Hub console Findings retained ~90 days
Access logs (S3/ALB) App account or central Athena Per-workload decision

A compact Athena query to answer the auditor’s first question — “who did what in the last day?”:

SELECT eventtime, useridentity.arn AS who, eventname, awsregion, sourceipaddress
FROM cloudtrail_logs
WHERE eventtime > to_iso8601(current_timestamp - interval '1' day)
  AND eventname IN ('DeleteTrail','StopLogging','PutBucketPolicy','DeleteBucket')
ORDER BY eventtime DESC
LIMIT 100;

Architecture at a glance

Read the diagram left to right as a build-and-govern path. On the far left sits the management plane: the Organizations root in ALL-features mode (consolidated billing, SCP support), Control Tower pinned to a home region and tracking a landing-zone version, and IAM Identity Center issuing permission sets so no human ever needs an IAM user. From here, policy flows down into the second zone — the OU policy tree — where preventive SCPs (the region allowlist, the leave-org deny) and detective Config rules attach to OUs (Workloads/Prod, NonProd, Sandbox, Suspended) and inherit to every account beneath. The third zone is the Security OU: the Log Archive account holding the immutable org-trail bucket (Object Lock WORM), the org CloudTrail trail with log validation, and the Audit account running GuardDuty and Security Hub as delegated administrator.

The right half is how accounts are born and where workloads live. The fourth zone is account vending: Account Factory (the Service Catalog product) provisions and enrolls an account, the AFT pipeline drives it from a pull request through CodeBuild across four git repos, and customizations bake a standard VPC, break-glass roles, budgets, and tags into the result. The vended member accounts in the fifth zone are where workloads actually run — each account a blast-radius boundary — with shared infrastructure (Transit Gateway, DNS) reaching them via RAM shares. Crucially, logs flow back from every member account into the Log Archive, closing the loop. The five numbered badges mark the real failure points: a wrong home region or stale landing-zone version, a region-lock SCP that forgot the global-service exemptions, an audit/log-archive tampering or delegation gap, an AFT pipeline failing on KMS after a baseline change, and a workload placed in the wrong OU or the management account.

Multi-account AWS landing zone architecture: a management plane (Organizations, Control Tower, IAM Identity Center) attaching preventive SCPs and detective Config rules to an OU policy tree, a Security OU with Log Archive (Object Lock org-trail bucket), org CloudTrail and a delegated-admin Audit account, an AFT account-vending pipeline (Account Factory, CodeBuild across four repos, global plus per-account customizations), and member accounts where workloads land with logs flowing back to Log Archive — five numbered failure-point badges on home-region/version, region-lock SCP, audit tampering, AFT KMS drift, and wrong-OU placement.

Real-world scenario

A fintech platform team — call it NorthLedger — runs ~140 accounts under AFT for a payments product subject to PCI-DSS and SOC 2. Their OU tree was textbook: Security (Log Archive, Audit), Infrastructure (Network, Shared-Services), Workloads/Prod and Workloads/NonProd, Sandbox, and Suspended. Account vending was a pull request; a new squad got a governed prod and non-prod account within an afternoon, each with a baselined VPC, break-glass role, budget, and the org trail already capturing every API call into the immutable Log Archive bucket.

The wall they hit came during a routine landing-zone upgrade. A new baseline version shipped a stricter mandatory control on the org CloudTrail’s KMS key policy. After the platform lead clicked Update landing zone, every subsequent AFT account-customization run started failing at terraform apply with AccessDenied on kms:GenerateDataKey — but only in accounts vended before the upgrade. Brand-new accounts were fine. The on-call engineer’s first instinct was to edit the KMS key policy by hand to add the CodeBuild execution role back. That “fix” worked for ten minutes and then Control Tower’s drift detection flagged the key as non-compliant and the platform reconciled it back, breaking the pipeline again — now with an added drift alarm.

The actual root cause: an in-place landing-zone update updates the managed baseline definition, but it does not re-apply that baseline to already-enrolled OUs. The pre-existing accounts were sitting in a drifted state against the new control set, and their CodeBuild execution role could no longer write encrypted logs because the baseline that would have re-granted it had never been pushed down to their OU. New accounts vended after the upgrade got the new baseline at enrollment, which is why they worked.

The correct fix was not to touch the KMS policy at all. They re-registered each affected OU to push the new baseline down, then let AFT reconcile:

# Re-apply the current baseline to a drifted OU so enrolled accounts converge
aws controltower enable-baseline \
  --baseline-identifier "$AWS_CONTROL_TOWER_BASELINE_ARN" \
  --target-identifier "$WORKLOADS_PROD_OU_ARN" \
  --baseline-version "4.0" \
  --parameters '[{"key":"IdentityCenterEnabledBaselineArn","value":"'"$IC_BASELINE_ARN"'"}]'

# Then list operations to confirm the OU baseline op reached SUCCEEDED
aws controltower list-baseline-operations \
  --query 'baselineOperations[0].{Op:operationType,Status:status}'

Within twenty minutes of the OU baselines reaching SUCCEEDED, the CodeBuild roles regained kms:GenerateDataKey through the managed policy, the drift alarms cleared, and the stalled customization pipelines drained. The lasting lesson NorthLedger wrote into their runbook: treat update landing zone and re-register every registered OU as one atomic change-management step, gated behind approval and a maintenance window. Upgrading the baseline without re-registering OUs leaves your existing fleet quietly non-compliant until something downstream — usually a pipeline — trips over it at the worst possible time. They also added a synthetic canary: a throwaway account in a PolicyStaging OU that runs the full vend+customize flow nightly, so a baseline regression surfaces in CI, not in a payments incident.

Advantages and disadvantages

The honest trade-off of adopting Control Tower + AFT versus rolling your own or staying single-account:

Advantages Disadvantages
Account #100 is the same governed PR as #1 The home region is hard to change later
Mandatory controls give an audit-ready baseline free You live within mandatory controls; some workflows must adapt
Separation of duties (mgmt/log/audit) out of the box More accounts = more operational surface (limits, billing, IAM)
Centralized immutable logging with no per-account wiring Control Tower abstracts AWS primitives; hand-edits cause drift
AFT makes vending reproducible and fully audited AFT has real setup complexity (its own account, 4 repos, a pipeline)
Drift detection surfaces divergence automatically Upgrades are a two-step (update LZ + re-register OUs) people forget
SCPs give a hard preventive boundary above IAM SCP mistakes (region lock without exemptions) can lock you out
Delegated admin keeps the payer account clean Some newer services lag Control Tower governance support

When each side matters: the advantages dominate the moment you are past a handful of accounts or face a compliance regime — the cost of not having a consistent baseline (a failed audit, a snowflake fleet, a breach that crosses accounts) dwarfs the operational overhead. The disadvantages matter most for tiny estates (a two-account startup may not need AFT yet) and for teams unwilling to invest in the upgrade/drift discipline — for them, the abstraction becomes a thing they fight rather than a thing that protects them. Build the muscle: never hand-edit managed resources, always re-register OUs on upgrade, and the disadvantages shrink to footnotes.

Hands-on lab

This lab builds the governance primitives you can practice safely without a full Control Tower enablement (which provisions billable accounts). You will create an OU, author and attach a region-allowlist SCP, prove it blocks a forbidden action, and verify the org posture an auditor checks. Run it in a sandbox organization you can afford to experiment in, from the management account.

Prerequisites: an AWS Organization in ALL features mode, the SCP policy type enabled at the root, and aws CLI v2 configured for the management account.

# 0. Confirm ALL features mode and that SCPs are enabled
aws organizations describe-organization --query 'Organization.FeatureSet'   # -> "ALL"
aws organizations list-roots --query 'Roots[0].PolicyTypes'                 # SERVICE_CONTROL_POLICY -> ENABLED

# If SCPs are not enabled, enable the policy type on the root:
ROOT_ID=$(aws organizations list-roots --query 'Roots[0].Id' -o tsv)
aws organizations enable-policy-type --root-id "$ROOT_ID" \
  --policy-type SERVICE_CONTROL_POLICY
# 1. Create a Workloads OU and a NonProd child under it
WORKLOADS_OU=$(aws organizations create-organizational-unit \
  --parent-id "$ROOT_ID" --name "Lab-Workloads" \
  --query 'OrganizationalUnit.Id' -o tsv)

NONPROD_OU=$(aws organizations create-organizational-unit \
  --parent-id "$WORKLOADS_OU" --name "Lab-NonProd" \
  --query 'OrganizationalUnit.Id' -o tsv)
echo "Workloads=$WORKLOADS_OU  NonProd=$NONPROD_OU"
# 2. Write the region-allowlist SCP (note the global-service NotAction exemptions)
cat > /tmp/region-allowlist.json <<'JSON'
{ "Version": "2012-10-17",
  "Statement": [{
    "Sid": "DenyOutsideApprovedRegions", "Effect": "Deny",
    "NotAction": ["iam:*","organizations:*","route53:*","cloudfront:*","support:*","sts:*"],
    "Resource": "*",
    "Condition": { "StringNotEquals": { "aws:RequestedRegion": ["us-east-1"] } }
  }]
}
JSON

SCP_ID=$(aws organizations create-policy \
  --name "lab-deny-non-approved-regions" --type SERVICE_CONTROL_POLICY \
  --content file:///tmp/region-allowlist.json \
  --query 'Policy.PolicySummary.Id' -o tsv)

aws organizations attach-policy --policy-id "$SCP_ID" --target-id "$NONPROD_OU"
echo "Attached SCP $SCP_ID to $NONPROD_OU"
# 3. Prove it. Move/assume into a MEMBER account in the NonProd OU, then:
#    A call in an approved region succeeds:
aws ec2 describe-vpcs --region us-east-1 --query 'Vpcs[].VpcId' -o table
#    The SAME call in a denied region returns AccessDenied regardless of IAM:
aws ec2 describe-vpcs --region eu-west-1
# Expected: An error occurred (UnauthorizedOperation/AccessDenied) ... explicit deny
# 4. Audit the posture the way a reviewer would
aws organizations list-policies-for-target --target-id "$NONPROD_OU" \
  --filter SERVICE_CONTROL_POLICY --query 'Policies[].Name' -o table
aws organizations list-parents --child-id "$NONPROD_OU" --query 'Parents[].Id' -o table
# 5. Teardown — detach and delete the lab SCP and OUs (order matters)
aws organizations detach-policy --policy-id "$SCP_ID" --target-id "$NONPROD_OU"
aws organizations delete-policy --policy-id "$SCP_ID"
aws organizations delete-organizational-unit --organizational-unit-id "$NONPROD_OU"
aws organizations delete-organizational-unit --organizational-unit-id "$WORKLOADS_OU"

Expected outcomes and what each proves:

Step Expected result What it proves
0 ALL and ENABLED Org is ready for SCP governance
1 Two OU IDs printed You can build the policy tree
2 An SCP ID, attached Custom preventive control in place
3a us-east-1 call succeeds The allowlist permits approved regions
3b eu-west-1 call denied The SCP blocks regardless of IAM (boundary works)
4 SCP listed on the OU The control is where you think it is
5 Clean delete You can decommission governance safely

Lab note: an OU cannot be deleted while it still contains accounts or child OUs, and an SCP cannot be deleted while still attached — hence the teardown order. If a delete fails with “not empty,” move the accounts out first.

Common mistakes & troubleshooting

The landing zone fights you in predictable ways. This is the symptom → root cause → confirm → fix playbook; scan for your symptom, confirm with the exact command, then apply the real fix (not the band-aid).

# Symptom Root cause Confirm (exact command / path) Fix
1 AccessDenied on IAM after a region SCP Region allowlist with no global-service exemption Assume into the account; aws iam list-users → explicit deny Add iam/route53/cloudfront/organizations/support/sts to NotAction
2 New account ignores its guardrails OU was never registered with Control Tower aws controltower list-enabled-baselines lacks the OU enable-baseline on that OU; re-vend/enroll
3 AFT customization fails kms:GenerateDataKey Baseline upgraded but OU not re-registered (drift) list-baseline-operations; CodeBuild log shows KMS deny enable-baseline (current version) on the OU; never hand-edit the key
4 Account vend fails at validation Duplicate AccountEmail or AccountName AFT Step Functions execution error in the AFT account Use a unique plus-addressed email; unique name
5 Account landed in the wrong OU ManagedOrganizationalUnit wrong in the request aws organizations list-parents --child-id <acct> Correct the request OU string; move + re-baseline
6 Control Tower shows an account “Not enrolled / drifted” Manual change out-of-band (control disabled, account moved) Control Tower dashboard → account status Re-register OU or re-enroll the account; stop hand-editing
7 Logs missing for a new account OU not registered, or trail config drifted describe-trails --query "...IsOrganizationTrail" Re-register OU; verify the org trail is multi-region + logging
8 Can delete the audit trail from a workload account Mandatory control relaxed or trail not org-level Try aws cloudtrail stop-logging from a member account Restore mandatory controls; never relax trail protection
9 SCP “works” in console simulator but not live Testing against the management account (exempt) Run the action from a member account, not root Always test SCPs in a member account
10 Landing-zone upgrade left fleet non-compliant Updated LZ but skipped re-registering OUs get-landing-zone version vs OU baseline versions Re-register every registered OU as part of the upgrade
11 Cannot delete an account immediately Organizations has no instant delete close-account → ~90-day suspended state Plan decommission as a multi-step runbook (below)
12 Runaway spend in a new account No baseline budget in global customizations Cost Explorer by cost-center tag Add an account budget + alarm to global customizations
13 Quarantined account still reachable Deny-all SCP not attached / wrong OU list-policies-for-target on the account Move to Suspended OU and attach the deny-all SCP
14 Hit “max accounts” on a big vend wave Default account quota too low Service Quotas console for Organizations Request an increase early, before the wave

The error/limit codes you will actually see across this surface, decoded:

Code / message Where it appears Meaning First fix
AccessDenied (explicit deny) Any API in a member account An SCP denies it (or IAM does not allow) Check SCPs on the OU path and IAM
kms:GenerateDataKey denied AFT CodeBuild / customization Drifted baseline removed the KMS grant Re-register the OU baseline
AccountEmailAlreadyExists Account vend Email not unique across all AWS New plus-addressed root email
ConcurrentModificationException Control Tower op Another CT operation in progress Wait; CT serializes landing-zone ops
DRIFTED (account/control status) Control Tower dashboard Account diverged from baseline Re-register OU or re-enroll account
ServiceQuotaExceeded (accounts) Org / vend Org account limit reached Raise quota via Service Quotas
OU is not empty delete-organizational-unit Accounts/child OUs still inside Move children out first
Policy is attached delete-policy SCP still attached somewhere Detach from all targets first
LandingZoneInProgress update-landing-zone An LZ op already running Wait for the current op to finish

Decision table: which fix actually fits

When the dashboard is red, this table points you at the right corrective action instead of guessing:

If you see… It is probably… Do this
One account drifted, others fine A manual out-of-band change in that account Re-enroll that single account
Whole OU’s accounts drifted after an upgrade Baseline not re-applied post-upgrade Re-register the OU baseline
AccessDenied only outside one region A region-lock SCP Verify global-service exemptions
Pipeline KMS denies post-upgrade Drift on the log-encryption baseline Re-register OU; do not edit the key
Account in the wrong place Wrong ManagedOrganizationalUnit Move account + re-baseline
Spend alarm but no budget Missing global-customization budget Add budget to global customizations
Can tamper with logs from a member Relaxed/missing mandatory control Restore the mandatory control set

Best practices

Security notes

Security in a landing zone is layered, and each layer has a least-privilege story:

Control area Least-privilege / hardening practice Mechanism
Human access No IAM users; short-lived sessions only IAM Identity Center permission sets
Management account Root behind hardware MFA; no root keys FIDO2 key; alarm on root usage
Preventive boundary Deny dangerous actions above IAM SCPs at OU/root level
Resource perimeter Allow only org principals to touch resources RCPs (Resource Control Policies) + data-perimeter SCPs
Cross-account roles External ID + confused-deputy protection sts:ExternalId condition on assume-role
Log integrity Tamper-proof, cross-account audit trail Log Archive + Object Lock + validation
Detection Org-wide threat + config monitoring GuardDuty + Security Hub (delegated to Audit)
Encryption Logs and data encrypted with managed keys KMS CMKs; mandatory control protects the log key
Network egress Controlled, inspected outbound Central egress VPC via Transit Gateway

Two deeper points worth their own paragraph. First, preventive beats detective for the things that must never happen. A Config rule that detects a public bucket fires after the data is already exposed; an SCP (or a proactive hook) that blocks it never lets the window open. Reserve detective controls for posture you want to measure, and reach for preventive/proactive controls for posture you must guarantee. Second, the data perimeter is the modern frontier. SCPs cap what your principals can do; Resource Control Policies (RCPs) cap who can touch your resources from outside the org. Together with aws:PrincipalOrgID conditions they close the “confused deputy” and cross-org-access gaps that plain IAM leaves open — see Resource Control Policies and the data perimeter and IAM least privilege and permission boundaries for the deep mechanics, and KMS encryption: keys, policies, envelope, rotation for the key policies that protect the log store.

Cost & sizing

The landing zone itself is cheap; what drives the bill is the plumbing it standardizes — logging, detection, and centralized networking — multiplied across accounts. Control Tower has no per-account license fee; you pay for the AWS resources it provisions and the per-account baselines.

Cost driver What it is Rough cost How to control
Control Tower service The orchestration itself No direct charge n/a
AWS Config (per account) Configuration items recorded + rules evaluated ~$0.003/config item + rule eval Scope recording; avoid recording chatty global resources everywhere
Org CloudTrail (mgmt events) First copy of management events Free (1st mgmt trail) Do not duplicate per-account trails
CloudTrail data events S3/Lambda object-level events ~$0.10 per 100k events Scope to sensitive resources only
S3 Log Archive storage Years of logs ~$0.023/GB → Glacier ~$0.004/GB Lifecycle to Glacier; Object Lock retention sized to compliance
GuardDuty Threat detection per account Per-GB analyzed (CloudTrail/DNS/flow) Org-wide but watch flow-log volume
Security Hub Findings + checks per account Per check + finding ingestion Disable unneeded standards
KMS CMKs for log/data encryption ~$1/key/month + API calls Reuse keys where policy allows
NAT / central egress Shared outbound data processing ~$0.045/GB + hourly Gateway endpoints for S3/DynamoDB
AFT pipeline CodeBuild minutes, small backend Pennies per vend Negligible; runs on merge

Rough INR framing for an Indian team: the fixed landing-zone overhead (a couple of KMS keys, the AFT backend, the base Config recording in the foundational accounts) lands in the low ₹2,000–6,000/month range before workloads. The variable cost scales with log and flow-log volume and GuardDuty/Security Hub per-account charges — for a 100-account estate, central logging + detection commonly runs ₹40,000–1,50,000/month depending on flow-log retention and data-event scope. The single biggest lever is VPC Flow Log and CloudTrail data-event volume: centralize and lifecycle aggressively, and scope data events to genuinely sensitive buckets rather than turning them on everywhere.

Sizing guidance — match the spend control to the estate:

Estate size Posture What to enable What to defer
1–5 accounts Single team / startup Control Tower, org trail, basic Config Full AFT (manual vend is fine)
5–30 accounts Growing platform AFT, global customizations, GuardDuty Heavy data-event trails
30–150 accounts Enterprise Delegated admin, central egress, RCPs Per-account bespoke trails
150+ accounts Regulated estate Nightly vend canary, aggregators, Vault Lock backups Anything that does not scale to clicks

Free-tier and quota notes: AWS Config and GuardDuty offer limited free trials/tiers per account, and the first organization management-events CloudTrail trail is free — design around these so you are not paying for redundant copies. Request account-count quota increases well before a large vend wave, not during it.

Interview & exam questions

Q1. Why is the AWS account, not IAM or tags, the recommended isolation boundary in a landing zone? The account is where IAM, SCPs, networking, and billing all stop — it is the one boundary AWS cannot leak across by accident. IAM is additive and tag-based isolation is one missing Condition from collapse, whereas a compromised or runaway account cannot reach another account’s resources, limits, or data. Maps to SAP-C02 (multi-account strategy) and the Security specialty.

Q2. Distinguish preventive, detective, and proactive controls in Control Tower. Preventive controls are SCPs that block a non-compliant API call at call time; detective controls are Config rules that flag non-compliance after the fact; proactive controls are CloudFormation hooks that block a non-compliant resource before it is provisioned. “Block the call / flag the state / block the deploy.” Relevant to SAP-C02 and SCS-C02.

Q3. A region-lock SCP locked the team out of IAM. What went wrong and how do you fix it? Global services (IAM, Route 53, CloudFront, Organizations, STS) authenticate through us-east-1, so a blanket region Deny catches them. The fix is a NotAction exemption listing those services so the Deny applies only to regional services. Always test SCPs in a member account because the management account is exempt.

Q4. What are the three foundational accounts and why are they separated? Management (Organizations, billing, SCPs — no workloads), Log Archive (immutable central log store), and Audit (cross-account security tooling). The separation enforces separation of duties: whoever can change the org cannot read every log, and neither can tamper with the log store — the property auditors require.

Q5. After a landing-zone upgrade, old accounts started failing on KMS while new ones worked. Why? An in-place landing-zone update changes the managed baseline definition but does not re-apply it to already-enrolled OUs, so pre-existing accounts drift against the new control set. New accounts get the new baseline at enrollment. The fix is to re-register (re-enable-baseline) every registered OU as part of the upgrade.

Q6. Why deploy AFT instead of clicking “Enroll account”? AFT turns account vending into a reviewed, reproducible GitOps pipeline: an account is a Terraform request that, on merge, vends, enrolls, and customizes the account with a full audit trail. Console enrollment does not scale, leaves no review trail, and produces snowflake accounts.

Q7. What is the difference between AFT global and account customizations? Global customizations run on every account AFT touches (the universal baseline: standard VPC, break-glass role, budgets, tags, GuardDuty/Config). Account customizations are named bundles selected per request (workload-prod, sandbox) for environment-specific resources. Global runs first, then the selected account bundle.

Q8. How does the landing zone make audit logs tamper-proof? The org CloudTrail trail delivers to an S3 bucket in the Log Archive account (a different account from workloads), mandatory controls deny members from altering the trail/bucket/roles, S3 Object Lock enforces WORM retention, and log-file validation produces tamper-evident digests. A workload-account compromise cannot delete its own audit trail.

Q9. What is SCP evaluation logic, and does it grant permissions? SCPs never grant permission; they cap what IAM can allow. Effective permissions are the intersection of IAM allows and SCP allows, and an explicit Deny anywhere overrides any Allow. They also do not apply to the management account or restrict service-linked roles.

Q10. How do you decommission a workload account? There is no instant delete. Move it to the Suspended OU and attach a deny-all SCP to freeze it, drain and back up needed data, remove its AFT request so the pipeline stops reconciling it, then close-account — it enters a ~90-day suspended state before AWS permanently deletes it.

Q11. What is the role of RCPs versus SCPs in a data perimeter? SCPs cap what your principals can do (identity perimeter); Resource Control Policies cap who can access your resources, including principals outside the org (resource perimeter). Together with aws:PrincipalOrgID conditions they close confused-deputy and cross-org-access gaps that plain IAM leaves open.

Q12. How do you detect and remediate drift in Control Tower? Control Tower’s dashboard surfaces drift when an account diverges from its baseline (a control disabled out-of-band, an account moved between OUs manually, a stopped Config recorder). Remediate by re-registering the OU or re-enrolling the account — never by hand-patching managed roles/SCPs/buckets, which itself creates drift.

Quick check

  1. You apply a region-allowlist SCP and immediately lose access to IAM. What single change fixes it?
  2. A brand-new account vended fine but ignores all your guardrails. What was almost certainly skipped?
  3. After clicking “Update landing zone,” old accounts fail on kms:GenerateDataKey but new ones are fine. What is the cause and the fix?
  4. Which account holds the immutable central log store, and why is it a separate account from the management account?
  5. Where do you put a standard VPC that every vended account should receive — global customizations or account customizations?

Answers

  1. Add the global-service exemptions — list iam, organizations, route53, cloudfront, support, and sts in the SCP’s NotAction so the region Deny does not catch services that authenticate through us-east-1.
  2. The OU was never registered with Control Tower (enable-baseline on that OU). An unregistered OU does not apply the baseline/controls to accounts placed in it.
  3. Cause: the in-place upgrade updated the baseline definition but did not re-apply it to already-enrolled OUs, so old accounts drifted. Fix: re-run enable-baseline (current version) on each affected OU; never hand-edit the KMS key.
  4. The Log Archive account. Keeping it separate from management enforces separation of duties — a compromise of the org-changing account (management) cannot reach or delete the audit logs, and members cannot tamper with their own trail.
  5. Global customizations — they run on every account AFT touches, so the standard VPC lands everywhere automatically. Account customizations are for environment-specific differences (e.g. a larger prod CIDR).

Glossary

Next steps

AWSControl TowerLanding ZoneAccount FactoryTerraformOrganizations
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments