AWS Certification Prep Kit: CLF, SAA, SOA, DVA, SAP & DOP — Checklists, Practice Questions & Tips

Certifications do not make you an architect, but they do two useful things: they force you to fill the gaps you have been quietly working around, and they give a hiring manager a cheap signal that you have breadth. The trouble with most prep is that it teaches you to recognise answers rather than to reason about scenarios — which is exactly what the modern AWS exams refuse to reward. Since the SAA-C03 and the C02-generation exams landed, almost every question is a short scenario: a workload, a constraint or two (cost, latency, operational overhead, compliance), and four plausible designs. You pass by eliminating the three that violate a constraint, not by remembering a definition.

This kit is built for that reality. It covers the whole AWS ladder — the foundational CLF-C02, the three associates SAA-C03 / SOA-C02 / DVA-C02, and the two professionals SAP-C02 / DOP-C02 — plus a touch of the specialties. For each exam you get the domain breakdown with official weightings, a one-page cheat sheet, and a bank of scenario questions with worked answers and an explanation of why each wrong option is wrong, because the distractor analysis is where the real learning lives. There is a dedicated section on the services examiners deliberately confuse you between (SQS vs SNS vs EventBridge, ALB vs NLB, EBS vs EFS vs S3, security group vs NACL), a recommended order, a study-plan template you can copy, and a plain explanation of how the scaled 100–1000 score actually works so you stop panicking about “how many can I get wrong”.

Learning objectives

By the end of this lesson you will be able to:

Choose the right exam and the right order through the AWS certification ladder for your role and experience.
Recite the domains and weightings for CLF-C02, SAA-C03, SOA-C02, DVA-C02, SAP-C02 and DOP-C02 and use them to budget study time.
Decode the question formats (multiple-choice, multiple-response, and the newer ordering/matching/case-study styles) and apply a repeatable elimination technique.
Work scenario questions the way the exam wants — reading for the deciding constraint and using distractor analysis to confirm the answer.
Distinguish the commonly-confused services that decide a large fraction of associate-level questions.
Build a realistic study plan and understand scaled scoring so you walk in calibrated, not anxious.

Prerequisites

You should already have hands-on AWS exposure roughly equal to the earlier lessons in this course: comfort with the global infrastructure and pricing model (AWS Cloud Fundamentals), IAM (IAM Fundamentals), core compute/storage/networking, and ideally the architecting and portfolio lessons (the Architecting Ladder and Portfolio Projects). This lesson is the readiness layer on top of that knowledge — it assumes the concepts and drills the exam. If a service name here is unfamiliar, treat it as a gap to close before booking the test. This is the final study lesson before the Well-Architected capstone.

The AWS certification ladder and how to choose

AWS groups its certifications into four tiers. The ladder is not strictly linear — there are no formal prerequisites any more — but there is a sensible order, and trying to skip rungs usually wastes money on a failed sitting.

Tier	Exam	Code	Questions	Time	Cost (USD)	Who it is for
Foundational	Cloud Practitioner	CLF-C02	65	90 min	100	Anyone new to AWS; non-engineers; sales/PM/finance
Associate	Solutions Architect – Associate	SAA-C03	65	130 min	150	The default engineer/architect cert; broadest value
Associate	SysOps Administrator – Associate	SOA-C02	65	130 min	150	Operations, SRE, on-call; ops-heavy roles
Associate	Developer – Associate	DVA-C02	65	130 min	150	Application developers building on AWS SDKs/serverless
Professional	Solutions Architect – Professional	SAP-C02	75	180 min	300	Senior architects; complex, multi-account, migration scope
Professional	DevOps Engineer – Professional	DOP-C02	75	180 min	300	Senior platform/SRE; CI/CD, IaC, observability at scale
Specialty	Advanced Networking	ANS-C01	65	170 min	300	Network specialists; hybrid, Transit Gateway, Direct Connect
Specialty	Security	SCS-C02	65	170 min	300	Security engineers; the most broadly useful specialty
Specialty	Machine Learning	MLS-C01	65	180 min	300	ML engineers/data scientists (legacy flagship)
Specialty	ML Engineer – Associate	MLA-C01	65	130 min	150	Operationalising ML on AWS (the newer, narrower cert)
Specialty	Data Engineer – Associate	DEA-C01	65	130 min	150	Pipelines, analytics, Glue/Redshift/Kinesis

A few practical notes. Question and time figures are the published targets; AWS includes ~15 unscored items in each exam (you will not be told which), which is why the visible count and your study expectations should never assume every question counts. Prices are the standard global fee in US dollars and vary by region and currency. The professional exams are a genuine step up in difficulty and reading load — 75 dense scenarios in 180 minutes is roughly two and a half minutes per question with a long stem to parse each time.

AWS certification ladder

The diagram above lays the ladder out as a path: start at the foundational rung if you are new, take the associate that matches your job, then climb to the professional in the same column — most people go CLF (optional) → SAA → SAP, or SAA → DOP if their work is platform/CI-CD heavy, and bolt on the Security specialty when their role demands it.

Recommended order

Brand new to cloud, non-engineer, or you want a confidence win: CLF-C02 first. It is genuinely foundational and cheap; engineers with a year of real AWS can usually skip it and start at SAA.
Engineer/architect: SAA-C03 is the highest-leverage single certification in the whole catalogue. Do it first among the associates.
Then specialise by role: add SOA-C02 if you operate systems (it is the only AWS exam that historically included hands-on labs, though AWS has paused those — see formats below), or DVA-C02 if you build applications. Many people do SAA then one of SOA/DVA.
Professional: SAP-C02 is the natural follow-on to SAA; DOP-C02 pairs naturally with DVA + SOA experience. The combination of DVA + SOA covers a large fraction of the DOP blueprint.
Specialties last, driven by need. SCS-C02 (Security) is the most broadly valuable; ANS-C01 (Advanced Networking) if you live in hybrid connectivity; the data/ML certs if that is your discipline.

Question formats and how the exam is built

Every AWS exam draws from two basic item types, with a handful of newer styles appearing on some exams:

Format	What it is	How to handle it
Multiple choice	One correct answer, three distractors	Read the stem for the deciding constraint, eliminate to one
Multiple response	“Select TWO” / “Select THREE” — each correct option scored	Treat each option as an independent true/false; partial selections score zero
Ordering	Arrange steps into the correct sequence	Anchor the first and last steps you are certain of, fill the middle
Matching	Pair items across two columns	Do the pairs you are sure of first; they constrain the rest
Case study	One scenario, several linked questions	Read the scenario once carefully; constraints carry across questions

There is no penalty for wrong answers — the score is based only on correct ones — so never leave a question blank. Flag-and-review is available; mark anything that takes more than your per-question budget and come back. The exams are scenario-led: a typical SAA or professional stem describes a workload and then asks for the option that is “MOST cost-effective”, “with the LEAST operational overhead”, “MOST highly available”, or “with the FEWEST changes”. Those capitalised qualifiers are the whole question — two options are often both technically correct and only one satisfies the qualifier.

A repeatable technique that works across all of them:

Read the last sentence first to find what is actually being asked and the deciding qualifier (cost / overhead / latency / availability / changes).
Extract the hard constraints from the stem (compliance, RTO/RPO, “no servers to manage”, “existing on-prem”, a specific protocol).
Eliminate options that violate a constraint — usually two fall immediately.
Choose between the survivors using the qualifier, not your personal preference.
Flag and move on if you are over budget; speed on easy questions buys time for hard ones.

CLF-C02 — Cloud Practitioner

Foundational breadth: cloud value, security and compliance basics, core services, and billing. No deep architecture, no code. The goal is vocabulary and the shape of the platform.

Domain	Weighting
1. Cloud Concepts	24%
2. Security and Compliance	30%
3. Cloud Technology and Services	34%
4. Billing, Pricing, and Support	12%

Checklist: shared-responsibility model (who secures what); the value proposition of cloud (capex→opex, elasticity, agility, global reach); the global infrastructure (Regions, AZs, edge locations); core compute (EC2, Lambda, ECS/EKS at a name level), storage (S3 classes, EBS, EFS), database (RDS, Aurora, DynamoDB), networking (VPC, Route 53, CloudFront); IAM basics (users, groups, roles, MFA, root-account protection); the Well-Architected Framework’s six pillars by name; pricing models (On-Demand, Reserved, Savings Plans, Spot, Free Tier) and what drives cost; Billing tools (Cost Explorer, Budgets, Cost and Usage Report); support plans (Basic, Developer, Business, Enterprise On-Ramp, Enterprise) and what each includes; AWS Organizations and consolidated billing; the Trusted Advisor and Health Dashboard at a concept level.

CLF-C02 cheat sheet

Shared responsibility: AWS secures “of the cloud” (hardware, global infra, managed-service internals); you secure “in the cloud” (data, IAM, OS patching on EC2, encryption choices).
Pricing levers: pay-as-you-go, pay less when you commit (Savings Plans/RIs), pay less by using more (volume tiers). Free Tier = 12-month, always-free, and trials.
Support: Developer = business-hours email; Business = 24/7 + full Trusted Advisor; Enterprise = TAM + 15-min Sev1 SLA.
Pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability.

SAA-C03 — Solutions Architect Associate

The flagship associate. Heavily scenario-based around designing resilient, performant, secure and cost-optimised architectures.

Domain	Weighting
1. Design Secure Architectures	30%
2. Design Resilient Architectures	26%
3. Design High-Performing Architectures	24%
4. Design Cost-Optimized Architectures	20%

Checklist: IAM deep enough to reason about policy evaluation, roles, and cross-account access; S3 (storage classes, lifecycle, replication, encryption, Block Public Access, pre-signed URLs); EC2 + EBS + EFS + the purchasing options; Auto Scaling and ELB (ALB vs NLB vs GWLB); VPC design (subnets, route tables, IGW/NAT, security groups vs NACLs, VPC endpoints, peering, Transit Gateway at a concept level); RDS Multi-AZ vs read replicas, Aurora, DynamoDB (capacity modes, GSIs, global tables, DAX); decoupling with SQS/SNS/EventBridge; serverless (Lambda, API Gateway, Step Functions); CloudFront + Route 53 routing policies; caching (CloudFront, ElastiCache, DAX); encryption with KMS; resilience patterns (Multi-AZ, multi-Region, backup/restore vs pilot light vs warm standby vs active-active); cost tools and the cheapest-that-meets-requirements instinct.

SAA-C03 cheat sheet

Decoupling: SQS = queue (pull, buffering, retries); SNS = pub/sub fan-out (push); EventBridge = event router with filtering and SaaS/AWS event sources. Combine SNS→SQS for durable fan-out.
Load balancers: ALB = HTTP/HTTPS, layer 7, path/host routing; NLB = TCP/UDP/TLS, layer 4, ultra-low latency, static IP; GWLB = inline third-party appliances.
Storage: S3 = object, internet-scale, 11 nines durability; EBS = block, single-AZ, attached to one instance (io2/gp3); EFS = NFS shared file, multi-AZ, many instances.
Resilience tiers: Backup & Restore (cheapest, hours) → Pilot Light → Warm Standby → Multi-site Active-Active (priciest, seconds).
RDS HA: Multi-AZ = synchronous standby for failover (availability); read replicas = asynchronous, for read scaling (and can be cross-Region).
Cost: right-size first, then Savings Plans/RIs for steady state, Spot for fault-tolerant/stateless, S3 lifecycle to colder tiers.

SOA-C02 — SysOps Administrator Associate

Operations focus: deploy, manage, and operate workloads; monitoring, automation, security and compliance, networking, and cost/performance. Historically the only AWS exam with hands-on lab questions; AWS has at times paused the labs, so confirm the current format on the official exam guide before you book.

Domain	Weighting
1. Monitoring, Logging, and Remediation	20%
2. Reliability and Business Continuity	16%
3. Deployment, Provisioning, and Automation	18%
4. Security and Compliance	16%
5. Networking and Content Delivery	18%
6. Cost and Performance Optimization	12%

Checklist: CloudWatch (metrics, custom metrics, alarms, composite alarms, dashboards, Logs, Logs Insights, agent); CloudTrail (management vs data events, organisation trails); AWS Config (rules, conformance packs, remediation); Systems Manager (Parameter Store, Session Manager, Run Command, Patch Manager, State Manager, Automation runbooks); EventBridge for automated remediation; Auto Scaling lifecycle hooks and health checks; backup (AWS Backup, EBS snapshots, RDS automated backups, lifecycle); CloudFormation (stacks, change sets, drift, StackSets, nested stacks); ELB health checks and access logs; VPC operations (flow logs, Reachability Analyzer, route troubleshooting); Trusted Advisor and Cost Explorer; quotas/Service Quotas; encryption operations and certificate management with ACM.

SOA-C02 cheat sheet

Observability triad: CloudWatch (metrics/logs/alarms — what is happening), CloudTrail (API audit — who did what), Config (resource state/compliance — what changed and is it allowed).
Default metrics gaps: memory and disk usage are not default EC2 metrics — install the CloudWatch agent.
Automated remediation: CloudWatch alarm or EventBridge rule → SSM Automation / Lambda. Config rules can auto-remediate via SSM.
Patching at scale: SSM Patch Manager + maintenance windows; no SSH required via Session Manager.
CloudFormation safety: change sets to preview, drift detection to catch console edits, StackSets for multi-account/Region.

DVA-C02 — Developer Associate

For application developers building on AWS — serverless, SDK behaviour, deployment, security from the code’s point of view, and troubleshooting.

Domain	Weighting
1. Development with AWS Services	32%
2. Security	26%
3. Deployment	24%
4. Troubleshooting and Optimization	18%

Checklist: Lambda in depth (handlers, environment variables, layers, versions/aliases, concurrency — reserved vs provisioned, event source mappings, destinations, SnapStart); API Gateway (REST vs HTTP APIs, stages, authorizers, throttling, caching, mapping templates); DynamoDB for developers (queries vs scans, partition-key design, conditional writes, optimistic locking, DynamoDB Streams, TTL, transactions); S3 SDK patterns (multipart upload, pre-signed URLs, event notifications); messaging (SQS visibility timeout, long polling, DLQs; SNS; EventBridge); IAM for code (roles vs keys, STS, least privilege, resource policies); Secrets Manager and Parameter Store; the exponential backoff with jitter retry pattern and idempotency; X-Ray tracing and instrumentation; deployment with SAM, CodeDeploy (in-place vs blue/green, canary/linear), CodePipeline/CodeBuild; caching strategies (write-through vs lazy loading) with ElastiCache/DAX; envelope encryption with KMS.

DVA-C02 cheat sheet

Lambda concurrency: reserved = caps/guarantees a function’s share; provisioned = pre-warmed to kill cold starts; SnapStart = faster cold starts for supported runtimes.
DynamoDB: design the partition key for even access; use Query not Scan; conditional writes + version attribute = optimistic locking; Streams for change capture.
SQS: visibility timeout must exceed processing time or you double-process; long polling reduces empty receives/cost; DLQ after maxReceiveCount.
Retries: SDKs retry idempotent ops with exponential backoff and jitter; make your own writes idempotent.
Deploy strategies: CodeDeploy canary (a % then the rest) vs linear (equal increments) vs all-at-once; Lambda aliases shift traffic between versions.
Caching: lazy loading (cache on miss, can serve stale) vs write-through (write to cache on every DB write, never stale but more writes).

SAP-C02 — Solutions Architect Professional

The senior architecture exam: complex, multi-account, organisation-scale design; migration and modernisation; cost control and continuous improvement across large estates. Long stems, multiple defensible options, decided by subtle constraints.

Domain	Weighting
1. Design Solutions for Organizational Complexity	26%
2. Design for New Solutions	29%
3. Continuous Improvement for Existing Solutions	25%
4. Accelerate Workload Migration and Modernization	20%

Checklist: multi-account strategy with Organizations, SCPs, Control Tower, landing zones, IAM Identity Center; cross-account networking (Transit Gateway, PrivateLink, Direct Connect, hybrid DNS with Route 53 Resolver); advanced resilience and DR (RTO/RPO trade-offs across the four strategies, multi-Region active-active with DynamoDB global tables and Aurora Global Database, Route 53 failover/latency/geo routing); migration tooling (Application Migration Service/MGN, DMS + SCT, DataSync, Snow family, Migration Hub, the 7 Rs); cost governance at scale (Savings Plans across accounts, consolidated billing, allocation tags, Budgets, anomaly detection); security at scale (GuardDuty, Security Hub, Macie, KMS multi-Region keys, Secrets Manager rotation); decoupling and modernisation (containers vs serverless trade-offs, event-driven, Step Functions); data strategy across analytics services. The skill being tested is judgement under competing constraints, not recall.

SAP-C02 cheat sheet

Org guardrails: SCPs set the maximum permissions (they never grant); Control Tower bootstraps a multi-account landing zone with guardrails; IAM Identity Center for workforce SSO.
Migration 7 Rs: Retire, Retain, Relocate, Rehost (lift-and-shift, MGN), Replatform (tweak, e.g. to RDS), Repurchase (move to SaaS), Refactor (re-architect, most effort/most cloud-native).
DR by RTO/RPO: Backup & Restore (hours) → Pilot Light (minutes-to-low) → Warm Standby (low minutes) → Active-Active (seconds, highest cost).
Hybrid connectivity: Direct Connect for consistent throughput/private; VPN for cheap/quick or DX backup; Transit Gateway as the hub; PrivateLink to expose a single service privately.
Global data: DynamoDB global tables (multi-active) and Aurora Global Database (cross-Region read + fast promotion) for low-RTO multi-Region.

DOP-C02 — DevOps Engineer Professional

Senior platform/SRE: CI/CD, infrastructure as code, configuration management, monitoring/logging, incident response, and security automation across the SDLC. It overlaps heavily with the combination of DVA and SOA experience.

Domain	Weighting
1. SDLC Automation	22%
2. Configuration Management and IaC	17%
3. Resilient Cloud Solutions	15%
4. Monitoring and Logging	15%
5. Incident and Event Response	14%
6. Security and Compliance Automation	17%

Checklist: the CodeCatalyst/Code* suite (CodePipeline, CodeBuild, CodeDeploy, CodeArtifact) and integrating third-party CI; deployment strategies in depth (in-place, blue/green with ELB/Route 53, canary/linear for Lambda and ECS, all-at-once) and automated rollback on CloudWatch alarms; CloudFormation mastery (StackSets, nested stacks, change sets, drift, custom resources, hooks) plus CDK and an awareness of Terraform; configuration management with Systems Manager and OpsWorks legacy; resilience automation (Auto Scaling, multi-AZ/Region, AWS Backup, self-healing via EventBridge→SSM/Lambda); observability (CloudWatch metrics/alarms/Logs Insights, X-Ray, synthetics, ServiceLens, centralised logging); incident response (EventBridge patterns, Systems Manager Incident Manager, runbooks, GuardDuty→remediation); security automation (Config rules + auto-remediation, Security Hub, Secrets Manager rotation, IAM Access Analyzer, image scanning). The exam rewards automation that removes humans from the loop.

DOP-C02 cheat sheet

Deployment safety: blue/green to cut over with instant rollback; canary/linear to limit blast radius; wire CloudWatch alarms to auto-rollback in CodeDeploy.
IaC at scale: StackSets for multi-account/Region, change sets to preview, drift detection to catch manual edits, CDK for higher-level constructs.
Self-healing: EventBridge rule on a failure event → SSM Automation runbook or Lambda; Config rule → auto-remediation.
Centralised logging: CloudWatch Logs subscription filters → Kinesis Data Firehose → S3/OpenSearch; cross-account via a logging account.
Secrets: Secrets Manager with rotation Lambdas; never bake credentials into AMIs, code, or CloudFormation parameters in plaintext.

A touch of the specialties

You will not study these from this kit, but a SAP/DOP candidate should recognise where they begin:

Specialty	Code	The one-line scope	The signature services
Advanced Networking	ANS-C01	Hybrid + complex VPC connectivity	Transit Gateway, Direct Connect, Route 53 Resolver, Global Accelerator, Network Firewall
Security	SCS-C02	Detective + preventive + data protection	GuardDuty, Security Hub, Macie, KMS, IAM, WAF/Shield, Detective
Machine Learning	MLS-C01	End-to-end ML lifecycle	SageMaker, data engineering for ML, modelling, ops
ML Engineer – Associate	MLA-C01	Operationalising ML	SageMaker pipelines, deployment, monitoring
Data Engineer – Associate	DEA-C01	Pipelines and analytics	Glue, Redshift, Kinesis, EMR, Lake Formation, Athena

SCS-C02 (Security) is the highest-value addition for most engineers because security questions leak into every other exam. ANS-C01 is worth it if hybrid networking is your day job. The data/ML certs are discipline-specific.

Scenario practice questions with explained answers

This is the core of the kit. Work each one cold: read the stem, decide your answer, then read the explanation. Pay attention to the distractor analysis — being able to say why a wrong option is wrong is the skill the exam tests.

Q1 (SAA-C03) — decoupling and fan-out

A retail application must, on each new order, (a) update inventory, (b) email the customer, and © push the event to an analytics pipeline — independently, durably, and with each consumer able to retry without affecting the others. Which design is most appropriate?

A. Publish the order to an SNS topic; subscribe three SQS queues; each downstream service polls its own queue. B. Write the order to a single SQS queue that all three services poll. C. Invoke three Lambda functions synchronously from the order service. D. Publish to an SNS topic with three direct Lambda subscriptions.

Answer: A. SNS fan-out into per-consumer SQS queues gives each consumer its own durable buffer, independent retries, and a DLQ — the classic durable fan-out pattern.

Distractor analysis. B is wrong because a single shared queue means each message is consumed once by one poller; the three services would compete for the same messages, not each get a copy. C couples the order service’s latency and availability to all three downstreams and has no durability — a failed downstream fails the order. D fans out but loses the buffer: if a Lambda subscriber throttles or errors past its retries, the message can be lost; SQS between SNS and the consumer is what makes it durable and independently retryable.

Q2 (SAA-C03) — load balancer choice

A multiplayer game backend needs a load balancer that handles millions of TCP connections at ultra-low latency and must expose a static IP for an allow-list partners maintain. Which load balancer?

A. Application Load Balancer B. Network Load Balancer C. Gateway Load Balancer D. Classic Load Balancer

Answer: B. NLB operates at layer 4 (TCP/UDP/TLS), scales to millions of connections with very low latency, and provides a static IP per AZ (and supports Elastic IPs) — ideal for partner allow-listing.

Distractor analysis. A is layer 7 HTTP/HTTPS; it does not expose a static IP (only a DNS name) and adds latency unsuited to raw TCP gaming traffic. C is for inserting third-party network appliances inline, not for serving application traffic. D is legacy and should not be chosen for new designs.

Q3 (SAA-C03) — shared file storage

Three EC2 instances across two Availability Zones must read and write the same files concurrently with POSIX semantics. Which storage service?

A. Amazon EBS io2 volume attached to all three instances B. Amazon S3 mounted via the SDK C. Amazon EFS D. Instance store

Answer: C. EFS is a managed, multi-AZ NFS file system that many instances can mount and share with POSIX semantics — exactly the requirement.

Distractor analysis. A is wrong: a standard EBS volume attaches to a single instance in a single AZ (Multi-Attach exists only for io1/io2 within one AZ and needs a cluster-aware filesystem — it does not span AZs). B is object storage, not a POSIX filesystem; concurrent read/write file semantics do not apply. D is ephemeral, instance-local, and lost on stop — never shared.

Q4 (SOA-C02) — missing metrics

An operator needs a CloudWatch alarm on memory utilisation of a fleet of EC2 instances but cannot find the metric. What is the correct fix?

A. Enable detailed monitoring on the instances. B. Install and configure the CloudWatch agent to publish a memory metric. C. Raise a support case to enable the metric. D. Use Compute Optimizer instead.

Answer: B. Memory (and disk) are guest-OS metrics that AWS cannot see from the hypervisor; you must install the CloudWatch agent to publish them as custom metrics.

Distractor analysis. A detailed monitoring only changes EC2 metric granularity from 5-minute to 1-minute — it does not add memory. C is unnecessary; this is a configuration task, not an account flag. D Compute Optimizer gives right-sizing recommendations, not a real-time alarmable memory metric.

Q5 (SOA-C02) — who changed it

A security group rule changed unexpectedly and the operator must find who made the change and when. Which service answers this?

A. Amazon CloudWatch Logs B. AWS CloudTrail C. AWS Config D. VPC Flow Logs

Answer: B. CloudTrail records the API call — the principal, the time, the parameters — for the AuthorizeSecurityGroupIngress/Revoke... action. That is the “who did what, when” audit.

Distractor analysis. C Config tells you the security group’s state changed and can show a before/after configuration item, but the authoritative actor/identity attribution is CloudTrail (Config even references the CloudTrail event). A holds application/system logs, not the AWS API audit. D captures network traffic metadata, not control-plane changes. (In practice Config + CloudTrail are used together — but the who is CloudTrail.)

Q6 (DVA-C02) — duplicate processing

A Lambda consumer reading from SQS occasionally processes the same message twice. The processing takes up to 90 seconds. What is the most likely cause and fix?

A. The DLQ is misconfigured; add a DLQ. B. The queue is standard not FIFO; switch to FIFO. C. The visibility timeout is shorter than the processing time; increase it. D. Long polling is disabled; enable it.

Answer: C. If processing (90 s) exceeds the visibility timeout, the message becomes visible again and a second consumer picks it up — classic double-processing. Set the visibility timeout safely above the max processing time (and ideally 6× the function timeout for Lambda event source mappings).

Distractor analysis. A a DLQ handles poison messages after repeated failures; it does not stop a successfully-processing message from being redelivered early. B FIFO guarantees ordering and exactly-once processing within the dedup window, but the root cause here is the timeout; switching queue type is a heavier, often unnecessary change and FIFO has throughput limits. D long polling reduces empty receives and cost; it has nothing to do with redelivery.

Q7 (DVA-C02) — safe concurrent updates

Two Lambda invocations may update the same DynamoDB item concurrently; the application must prevent a lost update without a separate lock service. Which approach?

A. Enable DynamoDB Streams. B. Use a conditional write with a version attribute (optimistic locking). C. Switch the table to provisioned capacity. D. Use a global secondary index.

Answer: B. Optimistic locking — a version attribute plus a ConditionExpression that the version is unchanged — makes the write fail if another writer got there first, so the loser retries. No external lock needed.

Distractor analysis. A Streams capture changes for downstream processing; they do not coordinate concurrent writers. C capacity mode affects throughput/cost, not consistency between writers. D a GSI is an alternate query path, irrelevant to write conflicts.

Q8 (SAP-C02) — org-wide guardrail

A platform team must guarantee that no account in a production OU can disable CloudTrail, regardless of any IAM permissions an account admin grants. What enforces this?

A. An IAM policy attached to every role in those accounts. B. A Service Control Policy on the production OU denying cloudtrail:StopLogging and cloudtrail:DeleteTrail. C. AWS Config rules detecting the change. D. A permission boundary on each admin user.

Answer: B. An SCP sets the maximum permissions for every principal in the OU; an explicit deny on the CloudTrail stop/delete actions cannot be overridden by any IAM grant inside the account. That is the only option that is preventive and unconditional across the OU.

Distractor analysis. A per-role IAM policies can be changed or bypassed by an account admin and must be maintained on every principal — not a guarantee. C Config is detective: it tells you after the fact, it does not prevent the action. D permission boundaries limit specific principals, not the whole account, and an admin could create principals outside the boundary or alter it; they are not an org-wide guarantee.

Q9 (SAP-C02) — multi-Region low RTO

A global write-heavy application needs active-active in two Regions with a recovery point and time measured in seconds for its primary data store. Which data layer?

A. RDS Multi-AZ with a cross-Region read replica. B. DynamoDB global tables. C. Aurora with a cross-Region snapshot copy schedule. D. S3 Cross-Region Replication.

Answer: B. DynamoDB global tables provide multi-active, multi-Region replication with last-writer-wins conflict resolution — writes accepted in every Region, RPO/RTO in seconds. That matches “active-active, write-heavy, seconds”.

Distractor analysis. A Multi-AZ is single-Region HA; a cross-Region read replica is read-only and promotion is manual — not active-active and not seconds. C snapshot copies give an RPO of however often you copy (hours), not seconds, and are restore-based. D S3 is object storage and asynchronous; it is not the application’s transactional write store. (Aurora Global Database would be the relational answer for low-RTO multi-Region, but the option given is snapshot copy, which is the trap.)

Q10 (DOP-C02) — automatic rollback

A team deploys an ECS service via CodeDeploy blue/green and wants the deployment to automatically roll back if error rates spike during the canary. What wires this up?

A. A manual approval action in CodePipeline. B. A CloudWatch alarm associated with the CodeDeploy deployment group so a breach triggers automatic rollback. C. A Lambda function polling logs after deployment. D. Enabling termination protection on the tasks.

Answer: B. CodeDeploy can be configured with CloudWatch alarms; if an alarm goes into ALARM during deployment, CodeDeploy halts and rolls back automatically — humans stay out of the loop, which is exactly the DevOps-professional instinct.

Distractor analysis. A a manual approval inserts a human and a delay; it does not react to error rates. C a polling Lambda is a fragile reinvention of a built-in feature and runs after the window. D termination protection prevents accidental task termination; it has nothing to do with rollback on metrics.

Q11 (DOP-C02) — self-healing remediation

When GuardDuty detects an EC2 instance making connections to a known crypto-mining endpoint, the platform must automatically isolate the instance with zero human action. What pattern achieves this?

A. A CloudWatch dashboard with an alarm emailing the on-call. B. An EventBridge rule matching the GuardDuty finding that triggers an SSM Automation runbook (or Lambda) to apply an isolation security group. C. AWS Config with a conformance pack. D. A scheduled Lambda that scans for findings hourly.

Answer: B. GuardDuty emits findings as events; an EventBridge rule on the finding type invokes an SSM Automation runbook / Lambda that swaps the instance into an isolation security group — event-driven, immediate, no human.

Distractor analysis. A emailing on-call is detection plus a human, not automatic remediation. C Config evaluates resource configuration compliance; it does not react to threat findings. D an hourly scan adds up to an hour of dwell time and reinvents the native event integration.

Q12 (CLF-C02) — shared responsibility

Under the AWS shared-responsibility model, which task is the customer’s responsibility?

A. Patching the hypervisor on the EC2 host. B. Configuring security groups and encrypting application data. C. Maintaining the physical security of data centres. D. Replacing failed disks in the storage fleet.

Answer: B. The customer is responsible for security in the cloud — IAM, security group rules, OS patching on EC2, and choosing/managing encryption of their data.

Distractor analysis. A, C and D are all AWS’s responsibility of the cloud — the hypervisor, physical security, and hardware lifecycle are managed by AWS.

Q13 (SAA-C03) — cost optimisation with a constraint

A nightly batch job runs for two hours, is fully fault-tolerant (checkpoints and resumes), and the team wants the lowest compute cost. Which purchasing option?

A. On-Demand Instances. B. A 3-year Standard Reserved Instance. C. Spot Instances. D. A 1-year Compute Savings Plan.

Answer: C. Spot is the cheapest (up to ~90% off On-Demand) and is appropriate precisely because the workload is interruptible and fault-tolerant — the deciding constraint in the stem.

Distractor analysis. B and D commit you to 1–3 years of baseline usage; for a job that runs two hours a night you would pay for capacity you do not use, so they are not lowest cost here. A On-Demand is more expensive than Spot and brings no benefit for a fault-tolerant job. The fault-tolerance is the signal that Spot is safe.

Commonly-confused services — the exam tips

A surprising share of associate-level questions reduce to telling two similar services apart. Burn these distinctions in.

SQS vs SNS vs EventBridge

	SQS	SNS	EventBridge
Model	Queue (point-to-point, pull)	Pub/sub (push, fan-out)	Event bus / router (push, filtered)
Consumers	One consumer per message	Many subscribers, each gets a copy	Many targets via rules
Buffering/retry	Yes — durable buffer, DLQ	Limited; pair with SQS for durability	Retries + DLQ to targets
Filtering	No (consumer filters)	Subject/attribute filtering	Rich content-based pattern matching
Sources	Your producers	Your publishers	AWS services, SaaS partners, custom
Pick when	Decouple + smooth load + retries	Broadcast one message to many	Route/filter events, schedule, integrate SaaS

Tip: “fan-out durably” = SNS → SQS. “Route events from AWS/SaaS with filtering or on a schedule” = EventBridge. “Buffer work between a producer and a worker” = SQS.

ALB vs NLB (vs GWLB)

	ALB	NLB	GWLB
Layer	7 (HTTP/HTTPS)	4 (TCP/UDP/TLS)	3/4 (GENEVE)
Routing	Path, host, header, method	Connection (flow hash)	To/from appliances
Static IP	No (DNS only)	Yes (per-AZ; Elastic IP)	n/a
Latency	Higher (L7 processing)	Very low	Inline appliance
Pick when	Web apps, microservice routing, WebSockets	Extreme performance, TCP/UDP, static IP, TLS passthrough	Insert firewalls/IDS/IPS inline

Tip: static IP or non-HTTP or millions of low-latency connections → NLB. HTTP routing on path/host → ALB. Third-party security appliance inline → GWLB.

EBS vs EFS vs S3

	EBS	EFS	S3
Type	Block	File (NFS, POSIX)	Object
Access	One instance (one AZ)	Many instances, multi-AZ	Internet-scale, many clients
Durability/scope	AZ-scoped volume	Regional, elastic	11 nines, global namespace
Use	Boot/database volumes	Shared app files, lift-and-shift	Backups, data lake, static assets, large objects

Tip: “attached to one instance / database disk” → EBS. “Several instances share the same files” → EFS. “Objects, web assets, backups, virtually unlimited” → S3.

Security group vs NACL

	Security group	Network ACL
Scope	Instance/ENI level	Subnet level
State	Stateful (return traffic auto-allowed)	Stateless (must allow return explicitly)
Rules	Allow only	Allow and deny
Evaluation	All rules evaluated	Numbered, lowest first, first match wins
Default	Deny inbound, allow outbound	Default allows all; custom denies all until rules added

Tip: need an explicit deny (e.g. block one IP) or subnet-wide control → NACL. Per-instance allow-listing with automatic return traffic → security group. The single most-tested fact: security groups are stateful, NACLs are stateless.

Hands-on lab — a free, self-marking practice harness

You cannot replicate the real exam, but you can build the habit of timed, scenario-style practice for free. This lab spins up nothing chargeable — it uses the AWS Free Tier only to confirm the service facts behind a few questions, then a tiny local quiz loop to drill the elimination technique.

Step 1 — confirm a fact the exam will test (free, read-only). Verify that a default EC2 instance has no memory metric, which is Q4’s point:

# Lists CloudWatch metrics in the EC2 namespace for your account.
# You will see CPUUtilization, NetworkIn/Out, etc. — but NO mem_used_percent
# unless the CloudWatch agent is installed. That absence IS the lesson.
aws cloudwatch list-metrics --namespace AWS/EC2 \
  --query "Metrics[].MetricName" --output text | tr '\t' '\n' | sort -u

Expected output: a list including CPUUtilization, NetworkIn, NetworkOut, StatusCheckFailed, and similar — and the conspicuous absence of any memory metric. list-metrics is a read-only call and is free.

Step 2 — confirm security-group statefulness conceptually (free). Describe the default security group and note it has an outbound allow-all and a restrictive inbound — return traffic for allowed inbound is automatic because the group is stateful:

aws ec2 describe-security-groups \
  --filters Name=group-name,Values=default \
  --query "SecurityGroups[0].{In:IpPermissions,Out:IpPermissionsEgress}"

Step 3 — build a local timed quiz loop (no AWS, no cost). Save a few questions as JSON and drill them with a timer so you practise the budget (about 2 minutes each):

cat > /tmp/quiz.json <<'JSON'
[
  {"q":"Durable fan-out to 3 independent consumers?","a":"SNS->SQS per consumer"},
  {"q":"Static IP + millions of low-latency TCP connections?","a":"NLB"},
  {"q":"Several instances share the same POSIX files?","a":"EFS"},
  {"q":"Stop any account in an OU from disabling CloudTrail?","a":"SCP deny"},
  {"q":"Who changed a security group rule, and when?","a":"CloudTrail"}
]
JSON

python3 - <<'PY'
import json, time
qs = json.load(open("/tmp/quiz.json"))
score = 0
for i, item in enumerate(qs, 1):
    start = time.time()
    print(f"\nQ{i}: {item['q']}")
    input("  (think, then press Enter to reveal) ")
    print(f"  Answer: {item['a']}   [{time.time()-start:0.0f}s]")
    if input("  Did you get it right? (y/n) ").strip().lower() == "y":
        score += 1
print(f"\nScore: {score}/{len(qs)}  — aim for sub-120s per question.")
PY

Validation: Step 1 should show no memory metric (proving Q4); Step 3 should report your score and per-question time. If any single question took more than ~120 seconds, that is a topic to revise.

Cleanup: there is nothing chargeable to delete — only remove the temp files:

rm -f /tmp/quiz.json

Cost note: every command here is either a read-only API call (list-metrics, describe-security-groups — free) or runs locally. The lab cost is 0. The lesson: build the timed-elimination habit before you pay for the real sitting.

Common mistakes & troubleshooting

Symptom	Likely cause	Fix
You “know the services” but fail practice scenarios	Answering on recognition, not by eliminating against the constraint	Read the last sentence first; find the qualifier (cost/overhead/latency); eliminate
Multiple-response questions score zero despite “mostly right”	Partial credit does not exist — one wrong selection voids the item	Treat each option as independent true/false; only select what you can defend
Running out of time on the professional exams	Spending too long on hard items early	Budget ~2 min (assoc) / ~2.5 min (pro); flag-and-move; never leave blanks
Confusing two similar services repeatedly (SQS/SNS, ALB/NLB)	Studied features in isolation, not side by side	Drill the comparison tables in this lesson until the distinctions are reflexive
Picking a “correct but not best” option	Ignoring the capitalised qualifier (MOST/LEAST/FEWEST)	Underline the qualifier mentally; choose among technically-correct survivors by it
Over-engineering the answer	Reaching for the most advanced service	Prefer the option that meets the stated requirement with the least overhead/cost
Booking too early and failing	No timed full-length practice at passing standard	Sit timed mocks; only book when consistently above the passing range
Panicking over “how many can I miss”	Misunderstanding scaled scoring	It is scaled 100–1000 with ~15 unscored items; calibrate on practice %, not raw counts

Best practices

Study from the official exam guide first. The PDF for each exam lists the in-scope tasks and services by domain — it is the source of truth for what is testable, and the weightings tell you where to spend hours.
Build before you memorise. The portfolio projects in this course are the fastest route to durable knowledge; you remember a thing you deployed far longer than a thing you read.
Practise the format, timed. Do full-length, timed mocks under exam conditions; the time pressure on the professional exams is itself a skill.
Keep a “confused services” sheet. Every time two services trip you up, write the one-line distinction. The four pairs in this lesson are the usual suspects.
Read the qualifier, eliminate ruthlessly. Most questions have two defensible options; the qualifier and the constraints decide which.
Climb in order. SAA before SAP, DVA/SOA before DOP. The professional exams assume associate-level fluency and will punish gaps.
Schedule the exam to create a deadline. Open-ended study expands to fill infinite time; a booked date focuses it.

Security notes

Certification study is also security study — much of every blueprint is security, and the habits transfer to production:

Least privilege is the default exam-correct answer. When two options differ only by scope of permissions, the narrower one usually wins. Carry that instinct into real IAM work.
Prefer roles over long-lived keys. Questions that offer “store access keys on the instance” versus “attach an IAM role” almost always want the role; the same is true in production.
Preventive beats detective when the question says “guarantee” or “prevent”. SCPs and explicit denies for hard guarantees; Config/GuardDuty for detection. Know which the stem is asking for.
Encrypt by default and manage keys with KMS. Envelope encryption, CMKs, and “encryption at rest/in transit” appear across every exam — and are table stakes in real systems.
Do not practise against shared or production accounts. Use a personal sandbox for any hands-on study, with a budget alarm, so a study mistake never touches anything that matters.

Interview & exam questions

Q: When would you choose SNS→SQS over a direct SNS→Lambda subscription? A: When each consumer needs a durable buffer, independent retries, and a DLQ; SQS decouples consumer availability/throttling from delivery so a slow or failing consumer cannot lose messages.
Q: Security group vs NACL — give the two differences that decide most questions. A: Security groups are stateful (return traffic auto-allowed) and allow-only, applied at the instance/ENI; NACLs are stateless (must allow return explicitly), support deny rules, and apply at the subnet.
Q: A read replica versus Multi-AZ on RDS — what does each give you? A: Multi-AZ is a synchronous standby for availability/failover (not for reads); read replicas are asynchronous, for read scaling and can be cross-Region. They solve different problems and are often used together.
Q: What makes an SQS consumer process the same message twice, and how do you stop it? A: The visibility timeout is shorter than processing time, so the message reappears mid-processing. Raise the visibility timeout above the maximum processing time (for Lambda event source mappings, ~6× the function timeout).
Q: How do you guarantee no account in an OU can disable a control, regardless of IAM? A: A Service Control Policy with an explicit deny on the relevant actions; SCPs cap the maximum permissions for every principal in the OU and cannot be overridden by in-account IAM grants.
Q: Which AWS service answers “who changed this resource and when”? A: CloudTrail records the API call with the principal, time, and parameters. Config shows the state change and references the CloudTrail event, but the identity attribution is CloudTrail.
Q: NLB or ALB for a service needing a static IP and TCP at scale? A: NLB — layer 4, ultra-low latency, millions of connections, and a static IP per AZ (plus Elastic IP support). ALB exposes only a DNS name and is layer 7.
Q: How do you achieve automatic rollback on a bad deployment? A: Associate CloudWatch alarms with the CodeDeploy deployment group (or the Lambda/ECS deployment); an alarm breach during the canary/linear window halts and rolls back automatically — no human in the loop.
Q: Lazy loading vs write-through caching — trade-offs? A: Lazy loading populates the cache on a miss (cheap, resilient to cache failure, but can serve stale data and has a cold-cache penalty); write-through writes to the cache on every DB write (data is never stale, but every write costs a cache write and the cache fills with rarely-read data). Often combined with a TTL.
Q: Give the DR strategies in order of RTO/RPO and cost. A: Backup & Restore (cheapest, hours) → Pilot Light (core running, minutes) → Warm Standby (scaled-down full stack, low minutes) → Multi-site Active-Active (full scale in multiple Regions, seconds, most expensive).
Q: EBS, EFS, or S3 for a fleet of app servers that must share the same uploaded files? A: EFS — a shared, multi-AZ NFS filesystem many instances mount with POSIX semantics. EBS is single-instance/single-AZ block; S3 is object storage without filesystem semantics.
Q: How does scaled scoring work and how should it change your strategy? A: Scores are scaled 100–1000 with a fixed passing line per exam (commonly ~700), and ~15 items are unscored pilots you cannot identify. Because you cannot know which count and there is no wrong-answer penalty, you answer every question and calibrate readiness on your practice percentage, not a raw “allowed misses” count.

Quick check

You must fan out one order event to three consumers, each with its own durable buffer and retries. What do you use?
Which is stateful — a security group or a NACL?
A workload is fully fault-tolerant and you want the cheapest compute. Which purchasing option?
Which service tells you who disabled CloudTrail and when?
The exam score is scaled over what range, and should you ever leave a question blank?

Answers

SNS → SQS with one SQS queue per consumer (durable fan-out). A direct SNS→Lambda fan-out lacks the buffer.
The security group is stateful (return traffic auto-allowed); the NACL is stateless.
Spot Instances — the fault-tolerance is the signal that interruption is acceptable, and Spot is the cheapest.
AWS CloudTrail (the API audit; Config references the same CloudTrail event for attribution).
100–1000, with a fixed pass line and ~15 unscored items. Never leave a question blank — there is no penalty for guessing.

Exercise

Pick the next exam you intend to sit and produce a one-page readiness plan of your own:

Download the official exam guide for your target (e.g. SAA-C03) and copy its domain table with weightings.
Self-score 1–5 per domain on honest current confidence, then multiply each gap by the domain weighting to get a priority score — study the highest-priority gaps first.
Write your own four “confused-services” cards for the pairs you personally muddle (beyond the four in this lesson).
Draft a four-week plan using the template below, ending with two timed full-length mocks.
Book the exam for the end of week four to create the deadline — and only move it if your timed mocks are not yet consistently above the passing range.

Four-week study-plan template (adapt to your timeline and exam):

Week	Focus	Activity	Output
1	Highest-weighted/lowest-confidence domain	Read exam guide + course lessons; build one small lab	Notes + a deployed mini-project
2	Next two domains	Hands-on for each; start a confused-services sheet	Working examples + the sheet
3	Remaining domains + cross-cutting (security, cost)	Targeted reading; first timed mock	Mock score + error log
4	Weak areas from the mock	Re-drill errors; second timed mock; light review	Consistent pass-range mocks → sit the exam

Certification mapping

This lesson is the readiness layer for the entire AWS ladder: CLF-C02, SAA-C03, SOA-C02, DVA-C02, SAP-C02 and DOP-C02, with pointers into the specialties (ANS-C01, SCS-C02, MLS-C01/MLA-C01, DEA-C01). The domain checklists and weightings map directly to each official exam guide; the practice questions are tagged by exam; the confused-services section targets the associate level where those distinctions decide the most questions; and the scoring/format notes apply to every exam in the catalogue.

Glossary

Scaled score: a normalised result on a fixed 100–1000 range that accounts for slight difficulty differences between exam forms; compared against a fixed passing line, not a raw percentage.
Unscored (pilot) item: a trial question included in the exam to gather statistics; it does not affect your score and is not identified.
Multiple-response item: a question requiring you to select N correct options (“select TWO/THREE”); scored as a unit with no partial credit.
Distractor: an incorrect answer option deliberately designed to look plausible; analysing why it is wrong is the core study skill.
Qualifier: the capitalised word in a stem (MOST/LEAST/FEWEST/BEST) that decides between technically-correct options.
Domain weighting: the published percentage of an exam devoted to a domain; used to budget study time.
SCP (Service Control Policy): an Organizations policy that sets the maximum permissions for principals in an account/OU; it never grants, only constrains.
Optimistic locking: a concurrency technique using a version attribute and a conditional write so a stale update fails and retries — used here as a recurring DVA answer.
DLQ (dead-letter queue): a queue that receives messages a consumer repeatedly fails to process, isolating poison messages for inspection.

Next steps

You now have the checklists, the question-working technique, the confused-services distinctions, and a plan. Turn study into proof by building the real thing: continue to the AWS Capstone — Build a Well-Architected Multi-Account Landing Zone + 3-Tier App, which exercises the SAA/SAP/DOP blueprints end to end. For depth on the topics the questions probe, revisit the AWS Architecting Ladder, Portfolio Projects, IAM Fundamentals, and the troubleshooting playbooks (single-service and multi-service RCA). Book the date, work the plan, and pass.