Architecture AWS

AWS Well-Architected: Security — Foundations, IAM, Detection, Infrastructure & Data Protection, Incident Response, and AppSec

Where this fits

The Security pillar is the second of the six pillars in the AWS Well-Architected Framework (after Operational Excellence, and before Reliability, Performance Efficiency, Cost Optimization, and Sustainability). Its design principles are familiar but worth restating because every decision below ladders up to them: implement a strong identity foundation, maintain traceability, apply security at all layers, automate security best practices, protect data in transit and at rest, keep people away from data, and prepare for security events. The pillar decomposes into seven areas — security foundations, identity and access management, detection, infrastructure protection, data protection, incident response, and application security — and the Framework expresses its expectations as numbered best-practice questions (SEC 1 through SEC 11). This article walks each area as you would actually implement it in a multi-account AWS organization, naming the concrete services, artifacts, and trade-offs.

AWS Well-Architected Framework — animated overview

Security foundations (SEC 1)

What it is. Foundations is the operating model that everything else sits on: how you separate workloads across AWS accounts, how you centrally govern those accounts, how you stay aware of threats and compliance obligations, and how you keep your security guardrails as code rather than as tribal knowledge. It maps to SEC 1 (“How do you securely operate your workload?”).

Why it matters. A single AWS account is a single blast radius. The moment you have production data, a CI/CD pipeline, and a sandbox in the same account, a leaked credential or a misconfigured IAM policy threatens all three. Account separation is the cheapest, strongest isolation boundary AWS gives you, and the foundations area is where you decide how to use it.

How to do it well. Use AWS Organizations to create an organizational unit (OU) structure, then govern it. The reference pattern is AWS Control Tower (or, increasingly, a hand-rolled landing zone) which stands up a multi-account environment with a management account, a dedicated Log Archive account, and a Security Tooling / Audit account in a Security OU. Apply service control policies (SCPs) at the OU level as preventive guardrails — these set the maximum permissions available in member accounts and cannot be overridden by an account-local administrator. Classic SCPs deny disabling CloudTrail, GuardDuty, or Config; deny leaving the organization; restrict to approved Regions; and deny use of the account root user for anything but the handful of tasks that require it. Newer resource control policies (RCPs) complement SCPs by setting an upper bound on resource-based policies (e.g., enforcing aws:PrincipalOrgID so an S3 bucket can never be shared outside the org). Declarative policies let you enforce a desired configuration for a service (such as blocking public AMIs) that persists even as the service adds features. Maintain everything as code — Terraform, CloudFormation, or AWS CDK — and stamp new accounts through Account Factory for Terraform (AFT) or Control Tower Account Factory so the baseline is identical every time.

Foundational control AWS service Scope What it prevents/provides
Account/OU structure AWS Organizations, Control Tower Org-wide Blast-radius isolation, central billing
Preventive guardrails SCPs, RCPs, declarative policies OU / account Bound max permissions, enforce Region/config
Config baseline as code AFT, CloudFormation StackSets Per account Repeatable, drift-free account provisioning
Central identity broker IAM Identity Center Org-wide Single sign-on, no long-lived IAM users
Threat & advisory intel AWS Security Hub, Trusted Advisor, AWS Health Org-wide Staying aware of vulnerabilities and posture

Artifacts and decisions. A landing-zone design document; the OU hierarchy diagram; the SCP/RCP policy set in version control; an account vending process; a Region-restriction decision; and a documented account baseline (centralized logging, mandatory CloudTrail, default encryption, tagging policy). The key decision is granularity: too few accounts and you lose isolation; too many and IAM Identity Center permission-set sprawl becomes its own problem. A common landing on this is one account per workload per environment (prod/stage/dev), grouped into Workload, Sandbox, Infrastructure, and Security OUs.

Identity and access management (SEC 2, SEC 3)

What it is. IAM is how you manage human identities (workforce, partners) and machine identities (workloads, services), and how you grant each the least privilege needed. SEC 2 covers managing identities; SEC 3 covers managing permissions for those identities.

Why it matters. Most cloud breaches are identity breaches — a stolen access key, an over-permissive role, a forgotten admin user with no MFA. Getting identity right closes the most-exploited door.

How to do it well — humans. Stop creating IAM users. Federate workforce identity through IAM Identity Center (formerly AWS SSO), connected to your IdP (Microsoft Entra ID, Okta, or the built-in store) and ideally provisioned via SCIM. Users assume permission sets (which become IAM roles in the target accounts) and receive short-lived credentials — no long-term keys to leak. Enforce MFA universally; for the highest assurance, require phishing-resistant FIDO2/WebAuthn. The root user of every account is locked down: a hardware MFA, no access keys, and — for member accounts in an organization — centralized root access management so you don’t even store member-account root credentials.

How to do it well — machines. Never embed access keys in code or AMIs. EC2 workloads use IAM roles via instance profiles; EKS pods use IAM Roles for Service Accounts (IRSA) or the newer EKS Pod Identity; Lambda uses an execution role. For workloads outside AWS or in CI/CD, use IAM Roles Anywhere (X.509-based) or OIDC federation (e.g., GitHub Actions assuming a role with no stored secret). For application secrets and credentials that must exist, use AWS Secrets Manager with automatic rotation, not environment variables.

How to do it well — permissions. Practice least privilege as a lifecycle, not a one-time grant. Start from AWS managed policies, then tighten. Use IAM Access Analyzer to: find resources shared externally, generate fine-grained policies from CloudTrail access history, and validate policies against best practices in CI. Constrain with permissions boundaries (a ceiling on what a delegated admin can grant), session policies, and attribute-based access control (ABAC) using tags so policies scale without per-resource rewrites. Review continuously with last-accessed data to strip unused permissions.

Concern Anti-pattern Well-Architected pattern
Workforce login Shared IAM users + passwords IAM Identity Center + IdP federation + SCIM
MFA Optional, SMS-based Mandatory, FIDO2/WebAuthn phishing-resistant
EC2 credentials Access keys in user-data Instance profile (IAM role)
CI/CD to AWS Stored long-lived access key OIDC/Roles Anywhere, short-lived STS creds
Permission scoping *:* “to unblock the team” Access Analyzer-generated least-privilege policy
Delegated admin Full IAM access Permissions boundary caps the grantable set

Artifacts and decisions. An identity architecture diagram (IdP -> Identity Center -> permission sets -> accounts); the permission-set catalog mapped to job functions; a secrets-rotation policy; an Access Analyzer finding-triage runbook; and a documented break-glass procedure for the rare case where federation is down.

Detection (SEC 4)

What it is. Detection is your ability to identify a misconfiguration, an unexpected change, or an active threat — and to investigate it. It maps to SEC 4 (“How do you detect and investigate security events?”).

Why it matters. Prevention will eventually fail or be bypassed; detection is what bounds the dwell time of an attacker and proves to auditors that you’d notice. The Framework’s traceability principle lives here.

How to do it well. Build on three layers — logging, analysis, and alerting — and centralize all three in your Security/Log Archive accounts.

Capability Primary service What it answers
Who did what, when CloudTrail (org trail) API/management audit trail
Did config drift / violate policy AWS Config + conformance packs Continuous compliance & change detection
Is there active threat activity Amazon GuardDuty Anomalous/malicious behavior
One pane of glass + scoring AWS Security Hub Aggregated, normalized posture
Root-cause investigation Amazon Detective Entity behavior over time
Centralized log data lake Amazon Security Lake Long-term, queryable, OCSF logs

Artifacts and decisions. A logging architecture (sources -> central bucket/lake -> retention/lifecycle); the Config conformance packs you enforce; a GuardDuty/Security Hub delegated-administrator setup (run them org-wide from the Audit account); finding-severity-to-response mappings; and decisions on log retention (regulatory) versus cost (data events and Flow Logs can dominate the bill — sample or scope them).

Infrastructure protection (SEC 5, SEC 6)

What it is. Defending your networks (SEC 5) and your compute resources (SEC 6) through defense in depth. This is “security at all layers” applied to the plumbing.

Why it matters. A flat network or an unpatched host turns a single foothold into lateral movement across the estate. Layered controls ensure no single failure is catastrophic.

How to do it well — network. Design VPCs with private subnets for workloads and no direct internet path; use NAT gateways for egress and VPC endpoints (interface/Gateway) so traffic to AWS services never traverses the internet. Segment with security groups (stateful, instance-level, default-deny) and network ACLs (stateless, subnet-level) as a second layer. Centralize egress and inspection through AWS Network Firewall (stateful IDS/IPS, domain filtering) and Route 53 Resolver DNS Firewall in an inspection VPC fronting a Transit Gateway. At the edge, AWS WAF protects HTTP(S) endpoints (ALB, CloudFront, API Gateway, AppSync) with managed and custom rules, and AWS Shield Advanced provides DDoS protection with cost protection and a response team. Govern WAF/Firewall rules org-wide with AWS Firewall Manager.

How to do it well — compute. Reduce the attack surface and patch continuously. Harden AMIs with EC2 Image Builder producing golden, scanned images on a schedule. Manage patching with AWS Systems Manager Patch Manager and operate hosts agentlessly via SSM Session Manager (no SSH bastions, no inbound 22). Scan continuously with Amazon Inspector for CVEs and network reachability across EC2, ECR container images, and Lambda. Enforce IMDSv2 to defeat SSRF-based credential theft. Prefer immutable infrastructure: rebuild and redeploy rather than patch in place. For serverless and containers, the principle is the same — minimal base images, scanned in the pipeline, with the smallest possible execution role.

Layer Control AWS service
Edge / L7 WAF rules, DDoS mitigation AWS WAF, AWS Shield Advanced
Network perimeter IDS/IPS, domain/egress filtering AWS Network Firewall, DNS Firewall
Segmentation Stateful + stateless rules Security groups, network ACLs
Private connectivity Keep traffic off the internet VPC endpoints, PrivateLink
Host access Agentless shell, no inbound SSH SSM Session Manager
Patch & image hygiene Golden images, patch baselines EC2 Image Builder, Patch Manager
Vulnerability scanning CVE + reachability Amazon Inspector
Org-wide rule enforcement Central WAF/SG policy AWS Firewall Manager

Artifacts and decisions. Network topology and segmentation diagrams; the centralized-inspection design; AMI hardening pipeline definitions; a patch SLA per environment; and the WAF rule baseline. Key decision: centralized inspection VPC (clean, governable, adds latency/cost) versus distributed firewalls.

Data protection (SEC 7, SEC 8, SEC 9)

What it is. Classifying data (SEC 7), protecting it at rest (SEC 8) and in transit (SEC 9), and keeping people away from it. The unifying goal of the pillar — reduce the chance that the wrong eyes ever see sensitive data.

Why it matters. Data is the asset you actually protect. Encryption and classification limit the impact of every other control failing.

How to do it well — classify. You cannot protect uniformly what you don’t understand. Define data classification tiers (e.g., Public / Internal / Confidential / Restricted) and tag resources accordingly. Use Amazon Macie to discover and classify sensitive data (PII, credentials, financial data) in S3 automatically, feeding findings into Security Hub.

How to do it well — at rest. Encrypt everything, and make it the default. AWS KMS is the backbone: use customer-managed keys (CMKs) for sensitive data so you control the key policy, rotation, and grants; the key policy plus IAM is how you enforce separation of duties (the team that uses data isn’t the team that can delete its key). Enable default encryption on S3, EBS, RDS/Aurora, DynamoDB, and snapshots; enforce it preventively with SCPs and Config rules (deny unencrypted creation). For regulatory custody requirements, AWS CloudHSM gives you single-tenant FIPS 140-2/3 Level 3 HSMs. Protect against accidental or malicious deletion with S3 Object Lock, versioning, MFA Delete, and AWS Backup with vault lock (WORM). Reduce direct human access to data entirely with tokenization and query-only access patterns.

How to do it well — in transit. Enforce TLS everywhere. Issue and rotate certificates with AWS Certificate Manager (ACM); for internal PKI, use AWS Private CA. Terminate TLS at ALB/CloudFront/API Gateway with modern policies; enforce HTTPS-only via WAF or listener rules; and use VPC endpoints so service traffic stays on the AWS network. Set S3 bucket policies that deny aws:SecureTransport = false.

Goal Control AWS service
Know what’s sensitive Automated discovery/classification Amazon Macie
Encrypt at rest, you hold keys Customer-managed keys + key policy AWS KMS (CMK)
Highest custody assurance Single-tenant FIPS L3 HSM AWS CloudHSM
Prevent unencrypted resources Preventive guardrail SCP / RCP / Config rule
Immutable, recoverable data WORM backups, object lock AWS Backup Vault Lock, S3 Object Lock
Encrypt in transit TLS certs + enforcement ACM, Private CA, secure-transport policy
Keep people away from data Tokenization, query-only access Macie + IAM + least privilege

Artifacts and decisions. A data classification policy and tagging standard; a KMS key hierarchy with documented key policies and rotation; an encryption-by-default enforcement set; a backup and retention (WORM) plan; and a TLS policy. Key decisions: KMS managed rotation versus manual; CloudHSM only where regulation demands it (it’s expensive and operationally heavy).

Incident response (SEC 10)

What it is. Your prepared, rehearsed ability to respond to a security event and recover with minimal impact — SEC 10 (“How do you anticipate, respond to, and recover from incidents?”).

Why it matters. Assume compromise will happen. The difference between a contained incident and a breach headline is preparation: pre-provisioned access, automation, and practice.

How to do it well. Prepare before the incident. Educate the team and define roles (incident commander, investigator, communications). Pre-stage a dedicated security/forensics account with the tooling and a clean environment for analysis. Pre-create least-privilege IR IAM roles so responders aren’t fumbling for access mid-incident, plus a break-glass path. Codify response so it’s fast and consistent: EventBridge rules trigger Systems Manager Automation runbooks or Lambda to isolate a compromised EC2 instance (swap to a quarantine security group, snapshot its EBS volume for forensics, deregister it), revoke sessions, or disable a leaked key. Use GuardDuty findings as the common trigger and Detective for scoping. Keep detailed, immutable logs (the central CloudTrail/Security Lake) as your forensic source of truth. Critically, run game days — simulate a credential leak or a public S3 bucket and execute the playbook end to end, then feed lessons back into automation.

IR phase Preparation artifact AWS mechanism
Prepare Runbooks, roles, forensics account IAM roles, dedicated account, SSM docs
Detect Alerting on findings GuardDuty, Security Hub -> EventBridge
Contain Isolate instance, revoke creds SSM Automation, Lambda, quarantine SG
Eradicate/Recover Rebuild from clean images Image Builder, AWS Backup, IaC redeploy
Investigate Forensic capture & analysis EBS snapshots, Detective, Security Lake
Learn Post-incident review Game-day findings -> automation updates

Artifacts and decisions. A documented IR plan mapping severity to actions; named playbooks for the top scenarios (leaked key, exposed bucket, compromised instance, IAM privilege escalation); pre-provisioned IR roles and forensics account; and a game-day cadence (quarterly). Decision: how much to automate containment outright versus require a human approval gate (over-aggressive auto-isolation can cause its own outage).

Application security (SEC 11)

What it is. Building security into how applications are designed, built, and shipped — shifting left so vulnerabilities are caught in the pipeline, not in production. SEC 11 is the newest area in the pillar.

Why it matters. The application layer is where business logic — and its flaws — live. Network and host controls won’t stop a SQL injection or a leaked secret in source. Security must be a property of the SDLC.

How to do it well. Embed controls across the pipeline and make secure the easy path. Train developers and provide paved-road templates with security built in. In the pipeline: scan source with SAST, dependencies with SCA (e.g., GitHub/CodeGuru, third-party scanners), and container images with Amazon Inspector in ECR; block on critical findings as a CI gate. Catch hardcoded secrets before commit with secret scanning, and store real secrets in Secrets Manager. Use Amazon CodeGuru Security (and increasingly Amazon Q Developer’s review capabilities) for ML-assisted code review of vulnerabilities and leaked credentials. Enforce that infrastructure is reviewed too — run IAM Access Analyzer policy validation and cfn-guard / Checkov against IaC in CI. At runtime, protect the deployed app with AWS WAF (managed rule groups for OWASP Top 10, bot control, rate limiting) and authenticate/authorize APIs with Amazon Cognito or a custom authorizer. Validate the whole thing periodically with penetration testing. Manage the build-to-deploy chain for integrity (signed artifacts, provenance) and centralize developer scanning at scale.

SDLC stage Risk addressed Tool/service
Code Injection, insecure patterns SAST, Amazon CodeGuru Security, Amazon Q
Dependencies Vulnerable libraries SCA / Inspector for Lambda & functions
Secrets Hardcoded credentials Pre-commit secret scanning + Secrets Manager
Containers Vulnerable images Amazon Inspector (ECR)
Infrastructure as code Misconfig, over-permission Access Analyzer validation, cfn-guard/Checkov
Runtime (L7) OWASP Top 10, bots, scraping AWS WAF managed rules + Shield
AuthN/Z Broken access control Amazon Cognito, custom authorizers

Artifacts and decisions. A secure SDLC standard; the CI gate policy (what severity blocks a merge); a paved-road service template; a dependency/SBOM policy; a pen-test schedule; and a WAF rule baseline for every public endpoint. Decision: how hard to gate (blocking criticals is non-negotiable; blocking every medium will get the gate disabled — tune to keep developer trust).

Real-world enterprise scenario

FinPeak Lending is a fictional digital consumer-lending company (~600 engineers, regulated under PCI DSS and regional banking rules) migrating from a sprawl of 40-plus unmanaged AWS accounts into a governed organization ahead of an audit. Their CISO mandates Well-Architected Security alignment in two quarters. Here is what they do for each area.

Security foundations. They deploy AWS Control Tower, restructure into a Security OU (Log Archive + Audit accounts), Workloads OU (prod/stage/dev per the four product lines), Infrastructure OU (shared networking, CI/CD), and Sandbox OU. SCPs deny disabling CloudTrail/GuardDuty/Config, restrict to two Regions (ap-south-1, eu-west-1), and deny root access except for break-glass. An RCP enforces aws:PrincipalOrgID on all resource policies. Accounts are vended via Account Factory for Terraform — a baseline now takes 25 minutes instead of the prior two-day manual checklist.

Identity and access management. They federate Entra ID into IAM Identity Center with SCIM, retire 280 standing IAM users, and define 22 permission sets mapped to job functions. FIDO2 MFA is mandatory. GitHub Actions deploys via OIDC role assumption — they delete 60-plus long-lived CI keys. Access Analyzer runs in CI and flags one external-sharing finding on a marketing bucket within the first week.

Detection. An organization CloudTrail trail lands in the locked Log Archive bucket. GuardDuty and Security Hub run org-wide from the Audit account (delegated admin), with the AWS FSBP and PCI DSS standards enabled; their initial Security Hub score is 71%. Config conformance packs enforce PCI controls. EventBridge routes high/critical findings to PagerDuty and Slack; Security Lake centralizes logs for their Splunk SIEM.

Infrastructure protection. Workloads move into private subnets behind a centralized inspection VPC (Transit Gateway + AWS Network Firewall + DNS Firewall). Public endpoints get AWS WAF (managed OWASP rules) governed by Firewall Manager, plus Shield Advanced on the customer-facing CloudFront distributions. Bastions are gone — all access is via SSM Session Manager. Inspector scans EC2 and ECR continuously; IMDSv2 is enforced fleet-wide via an SCP-backed Config rule.

Data protection. Macie scans S3 and finds unencrypted PII in two legacy buckets on day three. They define four classification tiers, enable default KMS CMK encryption on S3/EBS/RDS/DynamoDB, and add an SCP denying creation of unencrypted resources. Cardholder data sits in a dedicated account with a CloudHSM-backed key and tokenization, so application teams query tokens, not PANs. AWS Backup with vault lock provides WORM recovery; ACM + Private CA enforce TLS end to end.

Incident response. A dedicated forensics account is pre-staged with IR roles and SSM Automation runbooks: a GuardDuty “compromised instance” finding auto-snapshots the EBS volume, moves the instance to a quarantine SG, and pages the on-call IC. They run a quarterly game day; the first one (simulated leaked access key) exposes a missing session-revocation step, which they then automate.

Application security. Every pipeline gains SAST, SCA, secret scanning (pre-commit), CodeGuru Security review, and ECR image scanning; criticals block the merge. IaC is validated with cfn-guard and Access Analyzer in CI. A paved-road service template ships with a least-privilege role, WAF, and Cognito wired in.

Measurable outcome. Within two quarters: Security Hub score rises from 71% to 94%; standing IAM users drop from 280 to 0 and long-lived CI keys from 60+ to 0; mean time to detect (via GuardDuty -> PagerDuty) falls from days to under 15 minutes; 100% of S3/EBS/RDS encrypted with customer-managed keys; account provisioning drops from ~2 days to 25 minutes; and FinPeak passes its PCI DSS assessment with zero critical findings.

Deliverables & checklist

Common pitfalls

What’s next

Part 3 of the AWS Well-Architected Framework series tackles the Reliability pillar — designing for resilience, recovery, and graceful degradation across foundations, workload architecture, change management, and failure management.

AWSWell-ArchitectedSecurityEnterprise
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

// part 2 of 6 · AWS Well-Architected Framework

Keep Reading