AWS Well-Architected: Security — Foundations, IAM, Detection, Infrastructure & Data Protection, Incident Response, and AppSec

Where this fits

The Security pillar is the second of the six pillars in the AWS Well-Architected Framework (after Operational Excellence, and before Reliability, Performance Efficiency, Cost Optimization, and Sustainability). Its design principles are familiar but worth restating because every decision below ladders up to them: implement a strong identity foundation, maintain traceability, apply security at all layers, automate security best practices, protect data in transit and at rest, keep people away from data, and prepare for security events. The pillar decomposes into seven areas — security foundations, identity and access management, detection, infrastructure protection, data protection, incident response, and application security — and the Framework expresses its expectations as numbered best-practice questions (SEC 1 through SEC 11). This article walks each area as you would actually implement it in a multi-account AWS organization, naming the concrete services, artifacts, and trade-offs.

AWS Well-Architected Framework — animated overview

Security foundations (SEC 1)

What it is. Foundations is the operating model that everything else sits on: how you separate workloads across AWS accounts, how you centrally govern those accounts, how you stay aware of threats and compliance obligations, and how you keep your security guardrails as code rather than as tribal knowledge. It maps to SEC 1 (“How do you securely operate your workload?”).

Why it matters. A single AWS account is a single blast radius. The moment you have production data, a CI/CD pipeline, and a sandbox in the same account, a leaked credential or a misconfigured IAM policy threatens all three. Account separation is the cheapest, strongest isolation boundary AWS gives you, and the foundations area is where you decide how to use it.

How to do it well. Use AWS Organizations to create an organizational unit (OU) structure, then govern it. The reference pattern is AWS Control Tower (or, increasingly, a hand-rolled landing zone) which stands up a multi-account environment with a management account, a dedicated Log Archive account, and a Security Tooling / Audit account in a Security OU. Apply service control policies (SCPs) at the OU level as preventive guardrails — these set the maximum permissions available in member accounts and cannot be overridden by an account-local administrator. Classic SCPs deny disabling CloudTrail, GuardDuty, or Config; deny leaving the organization; restrict to approved Regions; and deny use of the account root user for anything but the handful of tasks that require it. Newer resource control policies (RCPs) complement SCPs by setting an upper bound on resource-based policies (e.g., enforcing aws:PrincipalOrgID so an S3 bucket can never be shared outside the org). Declarative policies let you enforce a desired configuration for a service (such as blocking public AMIs) that persists even as the service adds features. Maintain everything as code — Terraform, CloudFormation, or AWS CDK — and stamp new accounts through Account Factory for Terraform (AFT) or Control Tower Account Factory so the baseline is identical every time.

Foundational control	AWS service	Scope	What it prevents/provides
Account/OU structure	AWS Organizations, Control Tower	Org-wide	Blast-radius isolation, central billing
Preventive guardrails	SCPs, RCPs, declarative policies	OU / account	Bound max permissions, enforce Region/config
Config baseline as code	AFT, CloudFormation StackSets	Per account	Repeatable, drift-free account provisioning
Central identity broker	IAM Identity Center	Org-wide	Single sign-on, no long-lived IAM users
Threat & advisory intel	AWS Security Hub, Trusted Advisor, AWS Health	Org-wide	Staying aware of vulnerabilities and posture

Artifacts and decisions. A landing-zone design document; the OU hierarchy diagram; the SCP/RCP policy set in version control; an account vending process; a Region-restriction decision; and a documented account baseline (centralized logging, mandatory CloudTrail, default encryption, tagging policy). The key decision is granularity: too few accounts and you lose isolation; too many and IAM Identity Center permission-set sprawl becomes its own problem. A common landing on this is one account per workload per environment (prod/stage/dev), grouped into Workload, Sandbox, Infrastructure, and Security OUs.

Identity and access management (SEC 2, SEC 3)

What it is. IAM is how you manage human identities (workforce, partners) and machine identities (workloads, services), and how you grant each the least privilege needed. SEC 2 covers managing identities; SEC 3 covers managing permissions for those identities.

Why it matters. Most cloud breaches are identity breaches — a stolen access key, an over-permissive role, a forgotten admin user with no MFA. Getting identity right closes the most-exploited door.

How to do it well — humans. Stop creating IAM users. Federate workforce identity through IAM Identity Center (formerly AWS SSO), connected to your IdP (Microsoft Entra ID, Okta, or the built-in store) and ideally provisioned via SCIM. Users assume permission sets (which become IAM roles in the target accounts) and receive short-lived credentials — no long-term keys to leak. Enforce MFA universally; for the highest assurance, require phishing-resistant FIDO2/WebAuthn. The root user of every account is locked down: a hardware MFA, no access keys, and — for member accounts in an organization — centralized root access management so you don’t even store member-account root credentials.

How to do it well — machines. Never embed access keys in code or AMIs. EC2 workloads use IAM roles via instance profiles; EKS pods use IAM Roles for Service Accounts (IRSA) or the newer EKS Pod Identity; Lambda uses an execution role. For workloads outside AWS or in CI/CD, use IAM Roles Anywhere (X.509-based) or OIDC federation (e.g., GitHub Actions assuming a role with no stored secret). For application secrets and credentials that must exist, use AWS Secrets Manager with automatic rotation, not environment variables.

How to do it well — permissions. Practice least privilege as a lifecycle, not a one-time grant. Start from AWS managed policies, then tighten. Use IAM Access Analyzer to: find resources shared externally, generate fine-grained policies from CloudTrail access history, and validate policies against best practices in CI. Constrain with permissions boundaries (a ceiling on what a delegated admin can grant), session policies, and attribute-based access control (ABAC) using tags so policies scale without per-resource rewrites. Review continuously with last-accessed data to strip unused permissions.

Concern	Anti-pattern	Well-Architected pattern
Workforce login	Shared IAM users + passwords	IAM Identity Center + IdP federation + SCIM
MFA	Optional, SMS-based	Mandatory, FIDO2/WebAuthn phishing-resistant
EC2 credentials	Access keys in user-data	Instance profile (IAM role)
CI/CD to AWS	Stored long-lived access key	OIDC/Roles Anywhere, short-lived STS creds
Permission scoping	`:` “to unblock the team”	Access Analyzer-generated least-privilege policy
Delegated admin	Full IAM access	Permissions boundary caps the grantable set

Artifacts and decisions. An identity architecture diagram (IdP -> Identity Center -> permission sets -> accounts); the permission-set catalog mapped to job functions; a secrets-rotation policy; an Access Analyzer finding-triage runbook; and a documented break-glass procedure for the rare case where federation is down.

Detection (SEC 4)

What it is. Detection is your ability to identify a misconfiguration, an unexpected change, or an active threat — and to investigate it. It maps to SEC 4 (“How do you detect and investigate security events?”).

Why it matters. Prevention will eventually fail or be bypassed; detection is what bounds the dwell time of an attacker and proves to auditors that you’d notice. The Framework’s traceability principle lives here.

How to do it well. Build on three layers — logging, analysis, and alerting — and centralize all three in your Security/Log Archive accounts.

Logging. Enable an organization trail in AWS CloudTrail capturing management events (and selectively data events) for every account into a central, locked S3 bucket. Turn on VPC Flow Logs, DNS query logging (Route 53 Resolver), and service-specific logs (ELB, CloudFront, WAF, S3 access). CloudTrail Lake gives you a managed, SQL-queryable event store if you don’t want to run your own.
Analysis & posture. AWS Config records resource configuration and evaluates it against rules and conformance packs (e.g., CIS, PCI DSS) — this is your continuous-compliance and change-detection engine. Amazon GuardDuty is the managed threat-detection service: it analyzes CloudTrail, VPC Flow Logs, DNS logs, and (via add-on protections) EKS audit logs, S3, RDS login activity, Lambda, and EBS malware, producing prioritized findings with no agents.
Aggregation & alerting. AWS Security Hub aggregates findings from GuardDuty, Inspector, Macie, IAM Access Analyzer, Config, and partners into a single normalized view (OCSF/ASFF), runs automated security standards (AWS FSBP, CIS, PCI, NIST 800-53), and calculates a security score. Route findings via Amazon EventBridge to ticketing, Slack, or automated remediation (SSM Automation / Lambda). For deep investigation, Amazon Detective builds behavior graphs to pivot from a finding to root cause; Amazon Security Lake normalizes logs into an OCSF data lake for your SIEM or Athena queries.

Capability	Primary service	What it answers
Who did what, when	CloudTrail (org trail)	API/management audit trail
Did config drift / violate policy	AWS Config + conformance packs	Continuous compliance & change detection
Is there active threat activity	Amazon GuardDuty	Anomalous/malicious behavior
One pane of glass + scoring	AWS Security Hub	Aggregated, normalized posture
Root-cause investigation	Amazon Detective	Entity behavior over time
Centralized log data lake	Amazon Security Lake	Long-term, queryable, OCSF logs

Artifacts and decisions. A logging architecture (sources -> central bucket/lake -> retention/lifecycle); the Config conformance packs you enforce; a GuardDuty/Security Hub delegated-administrator setup (run them org-wide from the Audit account); finding-severity-to-response mappings; and decisions on log retention (regulatory) versus cost (data events and Flow Logs can dominate the bill — sample or scope them).

Infrastructure protection (SEC 5, SEC 6)

What it is. Defending your networks (SEC 5) and your compute resources (SEC 6) through defense in depth. This is “security at all layers” applied to the plumbing.

Why it matters. A flat network or an unpatched host turns a single foothold into lateral movement across the estate. Layered controls ensure no single failure is catastrophic.

How to do it well — network. Design VPCs with private subnets for workloads and no direct internet path; use NAT gateways for egress and VPC endpoints (interface/Gateway) so traffic to AWS services never traverses the internet. Segment with security groups (stateful, instance-level, default-deny) and network ACLs (stateless, subnet-level) as a second layer. Centralize egress and inspection through AWS Network Firewall (stateful IDS/IPS, domain filtering) and Route 53 Resolver DNS Firewall in an inspection VPC fronting a Transit Gateway. At the edge, AWS WAF protects HTTP(S) endpoints (ALB, CloudFront, API Gateway, AppSync) with managed and custom rules, and AWS Shield Advanced provides DDoS protection with cost protection and a response team. Govern WAF/Firewall rules org-wide with AWS Firewall Manager.

How to do it well — compute. Reduce the attack surface and patch continuously. Harden AMIs with EC2 Image Builder producing golden, scanned images on a schedule. Manage patching with AWS Systems Manager Patch Manager and operate hosts agentlessly via SSM Session Manager (no SSH bastions, no inbound 22). Scan continuously with Amazon Inspector for CVEs and network reachability across EC2, ECR container images, and Lambda. Enforce IMDSv2 to defeat SSRF-based credential theft. Prefer immutable infrastructure: rebuild and redeploy rather than patch in place. For serverless and containers, the principle is the same — minimal base images, scanned in the pipeline, with the smallest possible execution role.

Layer	Control	AWS service
Edge / L7	WAF rules, DDoS mitigation	AWS WAF, AWS Shield Advanced
Network perimeter	IDS/IPS, domain/egress filtering	AWS Network Firewall, DNS Firewall
Segmentation	Stateful + stateless rules	Security groups, network ACLs
Private connectivity	Keep traffic off the internet	VPC endpoints, PrivateLink
Host access	Agentless shell, no inbound SSH	SSM Session Manager
Patch & image hygiene	Golden images, patch baselines	EC2 Image Builder, Patch Manager
Vulnerability scanning	CVE + reachability	Amazon Inspector
Org-wide rule enforcement	Central WAF/SG policy	AWS Firewall Manager

Artifacts and decisions. Network topology and segmentation diagrams; the centralized-inspection design; AMI hardening pipeline definitions; a patch SLA per environment; and the WAF rule baseline. Key decision: centralized inspection VPC (clean, governable, adds latency/cost) versus distributed firewalls.

Data protection (SEC 7, SEC 8, SEC 9)

What it is. Classifying data (SEC 7), protecting it at rest (SEC 8) and in transit (SEC 9), and keeping people away from it. The unifying goal of the pillar — reduce the chance that the wrong eyes ever see sensitive data.

Why it matters. Data is the asset you actually protect. Encryption and classification limit the impact of every other control failing.

How to do it well — classify. You cannot protect uniformly what you don’t understand. Define data classification tiers (e.g., Public / Internal / Confidential / Restricted) and tag resources accordingly. Use Amazon Macie to discover and classify sensitive data (PII, credentials, financial data) in S3 automatically, feeding findings into Security Hub.

How to do it well — at rest. Encrypt everything, and make it the default. AWS KMS is the backbone: use customer-managed keys (CMKs) for sensitive data so you control the key policy, rotation, and grants; the key policy plus IAM is how you enforce separation of duties (the team that uses data isn’t the team that can delete its key). Enable default encryption on S3, EBS, RDS/Aurora, DynamoDB, and snapshots; enforce it preventively with SCPs and Config rules (deny unencrypted creation). For regulatory custody requirements, AWS CloudHSM gives you single-tenant FIPS 140-2/3 Level 3 HSMs. Protect against accidental or malicious deletion with S3 Object Lock, versioning, MFA Delete, and AWS Backup with vault lock (WORM). Reduce direct human access to data entirely with tokenization and query-only access patterns.

How to do it well — in transit. Enforce TLS everywhere. Issue and rotate certificates with AWS Certificate Manager (ACM); for internal PKI, use AWS Private CA. Terminate TLS at ALB/CloudFront/API Gateway with modern policies; enforce HTTPS-only via WAF or listener rules; and use VPC endpoints so service traffic stays on the AWS network. Set S3 bucket policies that deny aws:SecureTransport = false.

Goal	Control	AWS service
Know what’s sensitive	Automated discovery/classification	Amazon Macie
Encrypt at rest, you hold keys	Customer-managed keys + key policy	AWS KMS (CMK)
Highest custody assurance	Single-tenant FIPS L3 HSM	AWS CloudHSM
Prevent unencrypted resources	Preventive guardrail	SCP / RCP / Config rule
Immutable, recoverable data	WORM backups, object lock	AWS Backup Vault Lock, S3 Object Lock
Encrypt in transit	TLS certs + enforcement	ACM, Private CA, secure-transport policy
Keep people away from data	Tokenization, query-only access	Macie + IAM + least privilege

Artifacts and decisions. A data classification policy and tagging standard; a KMS key hierarchy with documented key policies and rotation; an encryption-by-default enforcement set; a backup and retention (WORM) plan; and a TLS policy. Key decisions: KMS managed rotation versus manual; CloudHSM only where regulation demands it (it’s expensive and operationally heavy).

Incident response (SEC 10)

What it is. Your prepared, rehearsed ability to respond to a security event and recover with minimal impact — SEC 10 (“How do you anticipate, respond to, and recover from incidents?”).

Why it matters. Assume compromise will happen. The difference between a contained incident and a breach headline is preparation: pre-provisioned access, automation, and practice.

How to do it well. Prepare before the incident. Educate the team and define roles (incident commander, investigator, communications). Pre-stage a dedicated security/forensics account with the tooling and a clean environment for analysis. Pre-create least-privilege IR IAM roles so responders aren’t fumbling for access mid-incident, plus a break-glass path. Codify response so it’s fast and consistent: EventBridge rules trigger Systems Manager Automation runbooks or Lambda to isolate a compromised EC2 instance (swap to a quarantine security group, snapshot its EBS volume for forensics, deregister it), revoke sessions, or disable a leaked key. Use GuardDuty findings as the common trigger and Detective for scoping. Keep detailed, immutable logs (the central CloudTrail/Security Lake) as your forensic source of truth. Critically, run game days — simulate a credential leak or a public S3 bucket and execute the playbook end to end, then feed lessons back into automation.

IR phase	Preparation artifact	AWS mechanism
Prepare	Runbooks, roles, forensics account	IAM roles, dedicated account, SSM docs
Detect	Alerting on findings	GuardDuty, Security Hub -> EventBridge
Contain	Isolate instance, revoke creds	SSM Automation, Lambda, quarantine SG
Eradicate/Recover	Rebuild from clean images	Image Builder, AWS Backup, IaC redeploy
Investigate	Forensic capture & analysis	EBS snapshots, Detective, Security Lake
Learn	Post-incident review	Game-day findings -> automation updates

Artifacts and decisions. A documented IR plan mapping severity to actions; named playbooks for the top scenarios (leaked key, exposed bucket, compromised instance, IAM privilege escalation); pre-provisioned IR roles and forensics account; and a game-day cadence (quarterly). Decision: how much to automate containment outright versus require a human approval gate (over-aggressive auto-isolation can cause its own outage).

Application security (SEC 11)

What it is. Building security into how applications are designed, built, and shipped — shifting left so vulnerabilities are caught in the pipeline, not in production. SEC 11 is the newest area in the pillar.

Why it matters. The application layer is where business logic — and its flaws — live. Network and host controls won’t stop a SQL injection or a leaked secret in source. Security must be a property of the SDLC.

How to do it well. Embed controls across the pipeline and make secure the easy path. Train developers and provide paved-road templates with security built in. In the pipeline: scan source with SAST, dependencies with SCA (e.g., GitHub/CodeGuru, third-party scanners), and container images with Amazon Inspector in ECR; block on critical findings as a CI gate. Catch hardcoded secrets before commit with secret scanning, and store real secrets in Secrets Manager. Use Amazon CodeGuru Security (and increasingly Amazon Q Developer’s review capabilities) for ML-assisted code review of vulnerabilities and leaked credentials. Enforce that infrastructure is reviewed too — run IAM Access Analyzer policy validation and cfn-guard / Checkov against IaC in CI. At runtime, protect the deployed app with AWS WAF (managed rule groups for OWASP Top 10, bot control, rate limiting) and authenticate/authorize APIs with Amazon Cognito or a custom authorizer. Validate the whole thing periodically with penetration testing. Manage the build-to-deploy chain for integrity (signed artifacts, provenance) and centralize developer scanning at scale.

SDLC stage	Risk addressed	Tool/service
Code	Injection, insecure patterns	SAST, Amazon CodeGuru Security, Amazon Q
Dependencies	Vulnerable libraries	SCA / Inspector for Lambda & functions
Secrets	Hardcoded credentials	Pre-commit secret scanning + Secrets Manager
Containers	Vulnerable images	Amazon Inspector (ECR)
Infrastructure as code	Misconfig, over-permission	Access Analyzer validation, cfn-guard/Checkov
Runtime (L7)	OWASP Top 10, bots, scraping	AWS WAF managed rules + Shield
AuthN/Z	Broken access control	Amazon Cognito, custom authorizers

Artifacts and decisions. A secure SDLC standard; the CI gate policy (what severity blocks a merge); a paved-road service template; a dependency/SBOM policy; a pen-test schedule; and a WAF rule baseline for every public endpoint. Decision: how hard to gate (blocking criticals is non-negotiable; blocking every medium will get the gate disabled — tune to keep developer trust).

Real-world enterprise scenario

FinPeak Lending is a fictional digital consumer-lending company (~600 engineers, regulated under PCI DSS and regional banking rules) migrating from a sprawl of 40-plus unmanaged AWS accounts into a governed organization ahead of an audit. Their CISO mandates Well-Architected Security alignment in two quarters. Here is what they do for each area.

Security foundations. They deploy AWS Control Tower, restructure into a Security OU (Log Archive + Audit accounts), Workloads OU (prod/stage/dev per the four product lines), Infrastructure OU (shared networking, CI/CD), and Sandbox OU. SCPs deny disabling CloudTrail/GuardDuty/Config, restrict to two Regions (ap-south-1, eu-west-1), and deny root access except for break-glass. An RCP enforces aws:PrincipalOrgID on all resource policies. Accounts are vended via Account Factory for Terraform — a baseline now takes 25 minutes instead of the prior two-day manual checklist.

Identity and access management. They federate Entra ID into IAM Identity Center with SCIM, retire 280 standing IAM users, and define 22 permission sets mapped to job functions. FIDO2 MFA is mandatory. GitHub Actions deploys via OIDC role assumption — they delete 60-plus long-lived CI keys. Access Analyzer runs in CI and flags one external-sharing finding on a marketing bucket within the first week.

Detection. An organization CloudTrail trail lands in the locked Log Archive bucket. GuardDuty and Security Hub run org-wide from the Audit account (delegated admin), with the AWS FSBP and PCI DSS standards enabled; their initial Security Hub score is 71%. Config conformance packs enforce PCI controls. EventBridge routes high/critical findings to PagerDuty and Slack; Security Lake centralizes logs for their Splunk SIEM.

Infrastructure protection. Workloads move into private subnets behind a centralized inspection VPC (Transit Gateway + AWS Network Firewall + DNS Firewall). Public endpoints get AWS WAF (managed OWASP rules) governed by Firewall Manager, plus Shield Advanced on the customer-facing CloudFront distributions. Bastions are gone — all access is via SSM Session Manager. Inspector scans EC2 and ECR continuously; IMDSv2 is enforced fleet-wide via an SCP-backed Config rule.

Data protection. Macie scans S3 and finds unencrypted PII in two legacy buckets on day three. They define four classification tiers, enable default KMS CMK encryption on S3/EBS/RDS/DynamoDB, and add an SCP denying creation of unencrypted resources. Cardholder data sits in a dedicated account with a CloudHSM-backed key and tokenization, so application teams query tokens, not PANs. AWS Backup with vault lock provides WORM recovery; ACM + Private CA enforce TLS end to end.

Incident response. A dedicated forensics account is pre-staged with IR roles and SSM Automation runbooks: a GuardDuty “compromised instance” finding auto-snapshots the EBS volume, moves the instance to a quarantine SG, and pages the on-call IC. They run a quarterly game day; the first one (simulated leaked access key) exposes a missing session-revocation step, which they then automate.

Application security. Every pipeline gains SAST, SCA, secret scanning (pre-commit), CodeGuru Security review, and ECR image scanning; criticals block the merge. IaC is validated with cfn-guard and Access Analyzer in CI. A paved-road service template ships with a least-privilege role, WAF, and Cognito wired in.

Measurable outcome. Within two quarters: Security Hub score rises from 71% to 94%; standing IAM users drop from 280 to 0 and long-lived CI keys from 60+ to 0; mean time to detect (via GuardDuty -> PagerDuty) falls from days to under 15 minutes; 100% of S3/EBS/RDS encrypted with customer-managed keys; account provisioning drops from ~2 days to 25 minutes; and FinPeak passes its PCI DSS assessment with zero critical findings.

Deliverables & checklist

Common pitfalls

Treating account separation as optional. Putting prod and dev in one account to “save effort” collapses the blast radius. Fix: adopt the multi-account landing zone early — it is far cheaper to start segmented than to untangle later.
Long-lived IAM access keys everywhere. Keys in CI, AMIs, and laptops are the #1 breach vector. Fix: federate humans through Identity Center, give machines roles, and use OIDC/Roles Anywhere for CI — drive standing keys to zero.
Enabling GuardDuty/Security Hub but routing findings nowhere. Detection without alerting and ownership is theatre. Fix: wire EventBridge to PagerDuty/Slack/ticketing with severity-based routing and a triage runbook.
Encryption “where convenient” instead of by default. Optional encryption guarantees gaps. Fix: enable default encryption on every store and enforce it preventively with SCPs/Config so unencrypted resources cannot be created.
Writing an IR plan but never rehearsing it. A plan no one has executed fails under pressure — missing access, stale runbooks. Fix: run quarterly game days and feed every gap back into automation.
A security gate so strict developers route around it. Blocking every low/medium finding erodes trust and gets the gate disabled. Fix: block only criticals (and known-bad like leaked secrets), make the paved road the easy path, and tighten over time.

What’s next

Part 3 of the AWS Well-Architected Framework series tackles the Reliability pillar — designing for resilience, recovery, and graceful degradation across foundations, workload architecture, change management, and failure management.

AWS Well-Architected: Security — Foundations, IAM, Detection, Infrastructure & Data Protection, Incident Response, and AppSec

Where this fits

Security foundations (SEC 1)

Identity and access management (SEC 2, SEC 3)

Detection (SEC 4)

Infrastructure protection (SEC 5, SEC 6)

Data protection (SEC 7, SEC 8, SEC 9)

Incident response (SEC 10)

Application security (SEC 11)

Real-world enterprise scenario

Deliverables & checklist

Common pitfalls

What’s next

Written by Vinod

Comments

Keep Reading

The AWS Architecting Ladder: From a Static Site to Multi-Region Active-Active

The Azure Architecting Ladder: From a Simple Web App to Mission-Critical

Azure Architecture Case Studies: Real Proposal Walkthroughs (Easy → Complex)