Multi-Account AWS Governance: Tying Wiz, ServiceNow, and Control Tower Together

A national health-insurance carrier — call it the kind of payer that processes a few million claims a month under HIPAA and HITRUST — wakes up to the same problem every fast-growing regulated company eventually hits: it no longer has an AWS account, it has 220 of them, and the number grows every sprint. Each product squad spun up its own account to move fast, which worked beautifully until the security team tried to answer a board question after a peer breach: “are any of our member-data buckets public, and how would we know within the hour?” The honest answer was a quarterly spreadsheet assembled by hand, already stale by the time it reached the audit committee. The CISO’s mandate landed the next morning: every account, old and new, lives inside one governed Organization; a misconfiguration that exposes regulated data is detected in minutes, ticketed to the owning team automatically, and closed against an SLA — and the auditor can see the whole loop. This article is the reference architecture for that loop: AWS Control Tower as the policy floor, Wiz as the continuous posture engine, and ServiceNow as the system of record that makes remediation someone’s job, not a Slack message that scrolls away.

The pressures are the familiar enterprise four, and naming them frames every decision below. Regulation means a public PHI bucket is not a bug, it is a reportable event with a 60-day clock, so detection latency is a compliance metric. Scale means controls cannot depend on a human remembering to apply them to account number 221 — governance has to be the default a new account is born into. Velocity means the security team cannot be the bottleneck that approves every account, or the squads will route around them. And cost means you cannot afford an agent on every one of tens of thousands of resources just to know whether a bucket is public. The architecture threads all four by separating concerns cleanly: Control Tower prevents and standardizes, Wiz detects and prioritizes, ServiceNow assigns and tracks, and a thin automation layer closes the loop — each doing the one thing it is best at.

Why not the obvious shortcuts

Three cheaper-looking answers will be proposed in the first planning meeting, and each fails in a way worth naming so the room can move past them.

“Just turn on a few AWS Config rules per account.” Config is essential plumbing — and in fact sits underneath this design — but managing rules, conformance packs, and aggregators by hand across 220 accounts is its own sprawling problem, and Config tells you a resource is non-compliant without telling you whether it is reachable from the internet and holds PHI. A public bucket with no sensitive data and a public bucket full of member records get the same flat alert, and your team drowns.

“Write a Lambda that scans for public buckets nightly.” This is the path of a thousand bespoke scripts: it covers exactly the checks someone thought of, rots the moment AWS ships a new service, has no concept of an attack path (the IAM role that turns a low-severity finding into a crown-jewel breach), and quietly stops running when its execution role drifts. You are now maintaining a half-built CSPM with one engineer.

“Buy a scanner and email the findings to security.” A posture tool with no ticketing is a firehose into an inbox. Findings without an owner, a severity-driven SLA, and a closure record do not get fixed; they get triaged into oblivion, and the auditor sees an alert backlog instead of a remediation program. The missing piece is never detection — it is the closed loop.

The architecture below keeps the parts that work — Config as the data substrate, automation for the genuinely automatable fixes — and wraps them in the two systems that turn raw findings into governed outcomes: Control Tower to make good configuration the default, and Wiz-to-ServiceNow to make bad configuration an owned, tracked, expiring obligation.

Architecture overview

Multi-Account AWS Governance: Tying Wiz, ServiceNow, and Control Tower Together — architecture

The design has three planes that operate on different rhythms, and keeping them separate in your head is the first step to running this well: a provisioning plane (Control Tower + Account Factory) that governs how accounts are born and standardized, a detection plane (Wiz) that continuously reads posture across the whole Organization, and a response plane (ServiceNow + automation) that turns findings into tracked, time-bound remediation. Underneath all three sits the Organization itself — the structural fact that everything else hangs off.

The defining property of the topology is the one the auditor cares about most: policy is enforced at the Organization root, not per account. Service Control Policies (SCPs) attached to Organizational Units set a ceiling on what any principal — including a root user in a member account — is permitted to do. A squad cannot disable CloudTrail, leave a region ungoverned, or detach the security tooling, because the SCP denies it above their account. Prevention lives in the org structure; detection and response handle everything prevention cannot.

The Organization, laid out by OU: Control Tower’s landing zone organizes accounts into OUs that mirror governance intent, not the org chart. A Security OU holds the dedicated Log Archive account (the immutable, central sink for CloudTrail and Config from every account) and the Audit account (cross-account read for the security team and tooling). Workload OUs — Prod, Non-Prod, Sandbox — carry different guardrail strictness: Sandbox permits experimentation but still denies anything that exposes data or disables logging; Prod is locked down hard. A Suspended OU with a deny-all SCP is where a compromised or decommissioned account goes to be quarantined.

Provisioning flow, following an account’s birth:

A squad requests an account through a ServiceNow catalog item — not a console click — which captures the cost center, data classification, and owning team up front. Approval routes to the platform team.
On approval, Account Factory for Terraform (AFT) provisions the account: Control Tower enrolls it into the correct OU, which immediately inherits that OU’s SCPs and detective controls, and AFT layers on the customizations every account must have — a standard IAM baseline, Wiz’s cross-account scanner role, the CrowdStrike Falcon deployment hooks, centralized logging wiring, and tag enforcement.
The account is now born governed: CloudTrail to Log Archive, Config recording to the aggregator, SCP ceiling applied, Wiz scanning it, all before a single workload resource exists. There is no window where account 221 is ungoverned.

Detection flow, continuous and agentless-first: Wiz connects to the Organization through a least-privilege cross-account role and scans every enrolled account agentlessly — it snapshots EBS volumes and reads cloud configuration via the AWS APIs, so it sees misconfigurations, exposed data, vulnerabilities, and toxic combinations (a public-facing workload + a critical CVE + an over-permissioned role that reaches PHI) without an agent on every resource. The Wiz Security Graph is the differentiator: it does not just flag “this bucket is public,” it computes the attack path and tells you “this public bucket is reachable, holds data classified PHI, and an attached role can pivot to the claims database” — which is the difference between alert number 4,000 and the one finding that matters tonight.

Response flow, the closed loop: A new or changed Wiz finding above a severity threshold is pushed (Wiz integration → EventBridge / direct connector) into ServiceNow, which opens a Security Incident enriched with the finding’s severity, the affected account and resource, the computed attack path, and — critically — the owning team derived from the account’s tags, so it lands in the right queue with an SLA clock already running. For a defined set of safe, deterministic fixes (re-enable S3 Block Public Access, remove a 0.0.0.0/0 SG rule on a sensitive port, revoke a public AMI share), an automation runbook executes the remediation, and ServiceNow records what changed; for everything else, a human owns it to closure. When the resource is fixed, Wiz re-scans, confirms, and the ticket auto-resolves — the loop closes itself, and the auditor sees detection-to-closure timestamps on every finding.

Component breakdown

Component	Service / tool	Role in the governance system	Key configuration choices
Org structure & guardrails	AWS Control Tower	Landing zone, OU model, preventive + detective guardrails	OU-aligned strictness; Log Archive + Audit accounts; drift detection on
Policy ceiling	Service Control Policies	Org-wide deny boundary no member can exceed	Region lockdown, deny-disable-logging, deny-public-access at root
Account provisioning	Account Factory for Terraform (AFT)	Governed, repeatable account vending with baselines	Per-OU customizations; Wiz/Falcon hooks; tag + logging baseline
Config substrate	AWS Config + aggregator	Resource inventory, conformance packs, change feed	Org aggregator in Audit account; HIPAA conformance pack
Audit trail	CloudTrail (org trail)	Immutable, central API audit from every account	Org trail to Log Archive; log-file validation; never disableable by SCP
Posture / CSPM	Wiz	Agentless misconfig, exposure, vuln & attack-path detection	Cross-account role; Security Graph; data classification for PHI
IaC scanning	Wiz Code	Catch misconfigs in Terraform before they deploy	PR scan in CI; policy-as-code; block on critical
Runtime security	CrowdStrike Falcon	Workload runtime threat detection on EC2/EKS	Sensor via AFT baseline; detections to the SOC
ITSM / system of record	ServiceNow	Account requests, security incidents, SLA-tracked remediation	Catalog vending; tag-derived assignment; auto-resolve on re-scan
Identity / SSO	Okta + Entra ID	Workforce SSO into AWS via federated, least-privilege roles	Okta → IAM Identity Center (SAML/SCIM); Entra for M365-side apps
Secrets	HashiCorp Vault	Dynamic, short-lived AWS creds for automation & pipelines	AWS secrets engine; STS leases; no long-lived keys in CI
Observability	Datadog / Dynatrace	Org-wide telemetry, compliance posture dashboards, alerting	Cloud integration per account; governance KPI dashboard
CI / IaC	GitHub Actions + Terraform	Provision and change infra; gate on policy and Wiz Code	OIDC to AWS (no stored keys); plan + scan as required checks
Edge / WAF	Akamai	Perimeter TLS, WAF, bot mitigation for public workloads	Origin shield; WAF rules feeding the same SOC

A few of these deserve the why, because they are the choices teams get wrong.

Why SCPs are the floor, not the whole story. An SCP can deny — it cannot grant, and it cannot detect a resource that became non-compliant within what it permits. SCPs are a blunt, powerful guarantee (“no one, ever, disables CloudTrail in this OU”) and the right tool for invariants you will never relax. But you cannot express “an S3 bucket holding PHI must not be public” as a clean SCP, because the SCP layer does not know which buckets hold PHI. Prevention handles the universal invariants; Wiz handles the contextual, data-aware judgments. Confusing the two — trying to make SCPs do detection’s job — produces brittle policies that block legitimate work and still miss the real exposures.

Why agentless-first scanning at this scale. Putting a posture agent on every resource across 220 accounts is operationally and financially untenable, and it cannot see a resource an agent was never installed on — which is exactly where the forgotten public bucket lives. Wiz’s agentless model reads the cloud’s own configuration plane and snapshots disks, so coverage is complete by default the moment an account is enrolled, with no per-resource rollout. Runtime threat detection — a different question, “is this running workload under active attack” — is where CrowdStrike Falcon earns its agent, deployed via the AFT baseline onto EC2 and EKS nodes. The two are complementary: Wiz answers “is this exposed,” Falcon answers “is this being exploited right now.”

Why the loop runs through ServiceNow, not a chat channel. A finding posted to Slack has no owner, no SLA, and no closure record; it is a notification, not a workflow. Routing through ServiceNow gives every finding a ticket with a tag-derived assignee, a severity-driven SLA clock, an auditable state machine (New → Assigned → In Progress → Resolved → Closed), and a permanent record the audit committee can pull. The same ServiceNow instance that vends the account (capturing its owner and data classification) is what later assigns its findings — the tag set at birth is what makes routing automatic years later.

Implementation guidance

Stand up the landing zone first, and treat the OU model as the real design artifact. The hierarchy is where governance intent becomes structure, and getting it wrong is expensive to undo once accounts are populated.

Enable Control Tower in the management account; it creates the Log Archive and Audit accounts and the baseline OUs.
Design OUs around governance strictness, not the org chart: Security, Infrastructure, Workloads/Prod, Workloads/Non-Prod, Sandbox, Suspended. Strictness, not team identity, is what determines which SCPs and guardrails apply.
Attach SCPs at the OU level. Keep them invariant-focused. A region-lockdown SCP is the canonical first one — it shrinks the attack and audit surface to the regions you actually operate in:

{
  "Sid": "DenyOutOfScopeRegions",
  "Effect": "Deny",
  "NotAction": ["iam:*", "sts:*", "organizations:*", "cloudfront:*", "route53:*", "support:*"],
  "Resource": "*",
  "Condition": {
    "StringNotEquals": { "aws:RequestedRegion": ["us-east-1", "us-west-2"] }
  }
}

Pair it with a deny on disabling CloudTrail/Config and a deny on detaching the Wiz and Falcon roles, so the security substrate itself is unremovable from below.

Vend accounts through code, never the console. Wire Account Factory for Terraform (AFT) so the only path to a new account is the ServiceNow catalog request → approval → AFT pipeline. AFT applies per-account customizations from a Terraform repo: the IAM baseline, the cross-account roles for Wiz and CrowdStrike Falcon, centralized-logging wiring, and a mandatory-tag policy (Owner, CostCenter, DataClassification). That DataClassification tag is load-bearing — it is what later lets Wiz prioritize a PHI bucket and ServiceNow route the ticket — so enforce it at vending time, because retrofitting tags across 220 live accounts is the migration nobody wants.

Shift posture left with Wiz Code. Finding a public bucket in production is good; never letting it deploy is better. Run Wiz Code as a required check in the GitHub Actions pipeline so a Terraform plan that would create a 0.0.0.0/0 ingress on a sensitive port, or an unencrypted PHI bucket, fails the pull request with the exact line flagged — the same policy logic Wiz enforces at runtime, applied to the IaC before merge. The pipeline authenticates to AWS via OIDC federation (no stored access keys), and the genuinely sensitive credentials it cannot get from OIDC — third-party API tokens, the ServiceNow integration secret — are leased dynamically from HashiCorp Vault’s AWS secrets engine as short-lived STS credentials, so nothing long-lived sits in a CI variable.

Identity: federate the humans, kill the static keys. Wire workforce SSO as Okta → AWS IAM Identity Center over SAML with SCIM provisioning, so engineers assume least-privilege, time-bound roles into specific accounts and there are no IAM users with long-lived keys to leak; Microsoft Entra ID governs the M365-side applications and conditional access, federated where the two identity planes meet. Permission sets in Identity Center map Okta groups to scoped roles per OU — a Non-Prod developer never holds Prod write. Every automation identity (the remediation runbooks, the pipelines) pulls credentials from HashiCorp Vault with short TTLs, so even the machines hold nothing durable.

Enterprise considerations

The closed-loop remediation workflow, concretely. This is the heart of the system, so make its states explicit. A Wiz finding crosses a severity threshold → a ServiceNow Security Incident opens, enriched with severity, account, resource, attack path, and tag-derived owner, SLA clock started. The path forks by fix type:

Finding class	Example	Response	Who closes it
Safe & deterministic	Public S3 bucket; `0.0.0.0/0` on port 22; public AMI share	Automated runbook remediates; ServiceNow logs the change	Wiz re-scan auto-resolves
Contextual / risky	Over-permissive IAM role reaching PHI; exposed RDS	Ticket to owning squad with attack-path detail	Human, against SLA
Needs design change	Architecture exposes data by default	Ticket + change request; security review	Squad + security, tracked

Auto-remediating only the deterministic class is the deliberate tradeoff: an automated fix to an IAM policy or a security group on a production database can cause an outage worse than the finding, so those route to a human who understands the blast radius. The loop’s value is not full automation — it is that every finding has an owner, a clock, and a closure record, and the safe ones close themselves.

Security & Zero Trust. The model is defense-in-depth across the three planes: SCPs as an unremovable preventive floor; Wiz as continuous detection with attack-path context so the team works the reachable, data-bearing findings first; CrowdStrike Falcon for runtime threat detection on live workloads, feeding the SOC; Akamai at the edge with WAF and bot mitigation for the public-facing claims and member portals, its events flowing to the same SOC. Identity is least-privilege and federated (Okta → Identity Center), automation credentials are short-lived from Vault, and the Audit account’s read-only cross-account access means the security team observes the whole Organization without standing write access anywhere. Crucially, the controls verify each other: Control Tower’s drift detection flags a guardrail that was tampered with, and Wiz independently confirms the SCPs are actually holding — neither is trusted alone.

Cost optimization. Governance has a real bill; engineer it down deliberately.

Lever	Mechanism	Typical effect
Agentless posture	Wiz reads config + disk snapshots, no per-resource agent	Avoids agent licensing across tens of thousands of resources
Right-tiered scanning	Deep/frequent scans on Prod; lighter cadence on Sandbox	Cuts scan cost where risk is low
Centralized logging	One Log Archive sink, lifecycle to Glacier	Avoids duplicate CloudTrail/Config spend per account
Auto-remediate the safe class	Runbooks close deterministic findings without analyst time	Reclaims expensive analyst hours
Tag-driven chargeback	`CostCenter` tag enforced at vending → cost allocation	Each squad owns its spend

The CostCenter tag enforced at account birth feeds AWS Cost Categories, and the per-team posture-and-spend view lands in Datadog (or Dynatrace) as the dashboard the CFO and CISO read side by side — security posture and cost, per business unit, in one place.

Scalability. The architecture’s whole point is that it scales without linear human effort. A new account is governed the instant AFT enrolls it — SCPs, logging, Wiz, and Falcon all apply by OU inheritance, so account 221 and account 500 are identical-by-construction. Wiz’s agentless scanning extends to new accounts automatically through the Organization integration. The ceiling is not technical but procedural: the rate at which the platform team approves and vends accounts, which is why the ServiceNow catalog and AFT pipeline matter — they make vending self-service-within-guardrails rather than a security bottleneck the squads route around.

Failure modes, and what each looks like. Name them before they page you.

A squad detaches or breaks the Wiz role in one account — that account goes dark to posture scanning while looking fine on the surface. Mitigation: deny the detach via SCP, and alert on missing scan coverage in Wiz, not just on findings.
The Wiz → ServiceNow integration silently fails — findings pile up in Wiz but no tickets open, so the loop is broken without an obvious alarm. Mitigation: a heartbeat/synthetic finding and a reconciliation check that the ticket count tracks the finding count.
An over-eager auto-remediation causes an outage — a runbook “fixes” a security group that a legitimate integration depended on. Mitigation: restrict automation to the deterministic class, require change records, and keep a fast rollback.
SCP over-reach blocks legitimate work — an invariant SCP is too broad and a squad cannot do its job, so pressure builds to weaken governance org-wide. Mitigation: test SCPs in a Sandbox OU first, scope tightly, and prefer detective controls for anything contextual.
Control Tower landing-zone drift — a manual console change diverges an account from its baseline. Mitigation: Control Tower drift detection plus AFT re-apply, and Wiz as the independent confirmation that posture still holds.

Reliability & DR. Governance tooling is itself critical infrastructure. The Log Archive account is the durable source of truth — org CloudTrail and Config land there with log-file validation and an immutability/lifecycle policy, geo-resilient and the real audit-recovery guarantee. Control Tower and the management account are single points of governance, so guard them hard (MFA, tight SCP on the root, break-glass procedures documented in ServiceNow). For the response loop, ServiceNow is the carrier’s existing enterprise ITSM with its own HA/DR posture; Wiz is SaaS. A pragmatic stance: the preventive plane (SCPs, logging) must survive any single account or region event by design, while the detection and response planes degrade gracefully — a brief Wiz or ServiceNow outage delays ticketing but the SCP floor and immutable audit trail never lapse.

Observability & governance. Instrument the program’s own KPIs, because “are we governed” is now a measurable question: mean-time-to-detect and mean-time-to-remediate per severity, percentage of accounts under full guardrail coverage, open critical findings by team against SLA, and auto-remediation success rate — surfaced in Datadog/Dynatrace as the dashboard the audit committee sees. Pin Control Tower and guardrail versions, keep SCPs and AFT customizations in version control so every policy change is reviewed and revertable, and run AWS Config conformance packs (the HIPAA pack as a baseline) in the Audit-account aggregator as a second, AWS-native compliance lens beside Wiz. New guardrails and SCP changes promote through the same GitHub Actions pipeline, scanned by Wiz Code, so a policy change is reviewed before it touches the Organization.

Explicit tradeoffs

Accept these or do not build it. This is a multi-system program, not a single product, and that is the cost. Control Tower’s OU and SCP model is a genuine learning curve, and a badly designed hierarchy is painful to refactor once accounts are populated — invest in the OU design before vending. Three vendors (AWS, Wiz, ServiceNow) plus the identity and secrets layers means three integrations to keep healthy and a real risk that a silent integration failure breaks the loop, which is why the heartbeat checks above are not optional. Auto-remediation is deliberately limited to deterministic fixes, so a meaningful share of findings still needs human hands — this is governance, not magic, and selling it as full automation will burn the security team when an analyst still has to work the IAM finding at 2 a.m. And SCPs are a blunt instrument: the same property that makes them an ironclad guarantee makes them dangerous when over-broad, so contextual rules belong in Wiz’s detective layer, not in an SCP.

The alternatives, and when they win. If you have a handful of accounts, this is over-engineered — AWS Organizations with a few SCPs and Security Hub, ticketing by hand, is proportionate, and you graduate to this when account sprawl and audit pressure demand it. If you are AWS-native and budget-constrained, Security Hub + GuardDuty + Config conformance packs is the first-party stack that covers much of the detection plane without Wiz’s licensing — you trade away the cross-cloud Security Graph, the attack-path prioritization, and the agentless data-classification that make the PHI-bucket finding jump the queue. If you are genuinely multi-cloud (this carrier also has an Azure footprint governed by Entra ID and a learning platform on Moodle behind Akamai), a CNAPP like Wiz that spans clouds and even reaches into virtual appliances and self-managed VMs is precisely where it earns its price, because a per-cloud native tool leaves you stitching consoles. And if your blocker is cultural rather than technical — squads that will not adopt governance — no tool fixes that; the ServiceNow self-service catalog that makes the governed path also the fast path is the part that actually changes behavior.

The shape of the win

For the carrier’s security team, the payoff is not “a dashboard.” It is that the board question — “are any member-data buckets public, and how fast would we know” — now has a live, defensible answer: a public PHI bucket is detected by Wiz in minutes with its attack path computed, a ServiceNow incident opens automatically against the owning squad with an SLA clock, the safe class self-heals and the rest is tracked to closure, and the audit committee sees detection-to-remediation timestamps on every finding across all 220-plus accounts. Account 221 is born into the same guardrails as account 1, because Control Tower and AFT made governance the default rather than a checklist. Everything upstream — the OU-aligned SCP floor, the AFT vending pipeline, the agentless Wiz Security Graph, the Falcon runtime sensors, the Vault-leased automation creds, the Okta-federated least-privilege access — exists so that a CISO, an auditor, and a CFO each say yes. Start narrower if you must; one Organization, a strict OU model, Wiz on top, and the ServiceNow loop is where a regulated company at account-sprawl scale has to land.

Multi-Account AWS Governance: Tying Wiz, ServiceNow, and Control Tower Together

Why not the obvious shortcuts

Architecture overview

Component breakdown

Implementation guidance

Enterprise considerations

Explicit tradeoffs

The shape of the win

Written by Vinod

Comments

Keep Reading

The AWS Architecting Ladder: From a Static Site to Multi-Region Active-Active

The Azure Architecting Ladder: From a Simple Web App to Mission-Critical

Azure Architecture Case Studies: Real Proposal Walkthroughs (Easy → Complex)