Certifications do not make you an architect, but they do two useful things: they force you to fill the gaps you have been quietly working around, and they give a hiring manager a cheap signal that you have breadth. The trouble with most prep is that it teaches you to recognise answers rather than to reason about scenarios — which is exactly what the modern AWS exams refuse to reward. Since the SAA-C03 and the C02-generation exams landed, almost every question is a short scenario: a workload, a constraint or two (cost, latency, operational overhead, compliance), and four plausible designs. You pass by eliminating the three that violate a constraint, not by remembering a definition.
This kit is built for that reality. It covers the whole AWS ladder — the foundational CLF-C02, the three associates SAA-C03 / SOA-C02 / DVA-C02, and the two professionals SAP-C02 / DOP-C02 — plus a touch of the specialties. For each exam you get the domain breakdown with official weightings, a one-page cheat sheet, and a bank of scenario questions with worked answers and an explanation of why each wrong option is wrong, because the distractor analysis is where the real learning lives. There is a dedicated section on the services examiners deliberately confuse you between (SQS vs SNS vs EventBridge, ALB vs NLB, EBS vs EFS vs S3, security group vs NACL), a recommended order, a study-plan template you can copy, and a plain explanation of how the scaled 100–1000 score actually works so you stop panicking about “how many can I get wrong”.
Learning objectives
By the end of this lesson you will be able to:
- Choose the right exam and the right order through the AWS certification ladder for your role and experience.
- Recite the domains and weightings for CLF-C02, SAA-C03, SOA-C02, DVA-C02, SAP-C02 and DOP-C02 and use them to budget study time.
- Decode the question formats (multiple-choice, multiple-response, and the newer ordering/matching/case-study styles) and apply a repeatable elimination technique.
- Work scenario questions the way the exam wants — reading for the deciding constraint and using distractor analysis to confirm the answer.
- Distinguish the commonly-confused services that decide a large fraction of associate-level questions.
- Build a realistic study plan and understand scaled scoring so you walk in calibrated, not anxious.
Prerequisites
You should already have hands-on AWS exposure roughly equal to the earlier lessons in this course: comfort with the global infrastructure and pricing model (AWS Cloud Fundamentals), IAM (IAM Fundamentals), core compute/storage/networking, and ideally the architecting and portfolio lessons (the Architecting Ladder and Portfolio Projects). This lesson is the readiness layer on top of that knowledge — it assumes the concepts and drills the exam. If a service name here is unfamiliar, treat it as a gap to close before booking the test. This is the final study lesson before the Well-Architected capstone.
The AWS certification ladder and how to choose
AWS groups its certifications into four tiers. The ladder is not strictly linear — there are no formal prerequisites any more — but there is a sensible order, and trying to skip rungs usually wastes money on a failed sitting.
| Tier | Exam | Code | Questions | Time | Cost (USD) | Who it is for |
|---|---|---|---|---|---|---|
| Foundational | Cloud Practitioner | CLF-C02 | 65 | 90 min | 100 | Anyone new to AWS; non-engineers; sales/PM/finance |
| Associate | Solutions Architect – Associate | SAA-C03 | 65 | 130 min | 150 | The default engineer/architect cert; broadest value |
| Associate | SysOps Administrator – Associate | SOA-C02 | 65 | 130 min | 150 | Operations, SRE, on-call; ops-heavy roles |
| Associate | Developer – Associate | DVA-C02 | 65 | 130 min | 150 | Application developers building on AWS SDKs/serverless |
| Professional | Solutions Architect – Professional | SAP-C02 | 75 | 180 min | 300 | Senior architects; complex, multi-account, migration scope |
| Professional | DevOps Engineer – Professional | DOP-C02 | 75 | 180 min | 300 | Senior platform/SRE; CI/CD, IaC, observability at scale |
| Specialty | Advanced Networking | ANS-C01 | 65 | 170 min | 300 | Network specialists; hybrid, Transit Gateway, Direct Connect |
| Specialty | Security | SCS-C02 | 65 | 170 min | 300 | Security engineers; the most broadly useful specialty |
| Specialty | Machine Learning | MLS-C01 | 65 | 180 min | 300 | ML engineers/data scientists (legacy flagship) |
| Specialty | ML Engineer – Associate | MLA-C01 | 65 | 130 min | 150 | Operationalising ML on AWS (the newer, narrower cert) |
| Specialty | Data Engineer – Associate | DEA-C01 | 65 | 130 min | 150 | Pipelines, analytics, Glue/Redshift/Kinesis |
A few practical notes. Question and time figures are the published targets; AWS includes ~15 unscored items in each exam (you will not be told which), which is why the visible count and your study expectations should never assume every question counts. Prices are the standard global fee in US dollars and vary by region and currency. The professional exams are a genuine step up in difficulty and reading load — 75 dense scenarios in 180 minutes is roughly two and a half minutes per question with a long stem to parse each time.
The diagram above lays the ladder out as a path: start at the foundational rung if you are new, take the associate that matches your job, then climb to the professional in the same column — most people go CLF (optional) → SAA → SAP, or SAA → DOP if their work is platform/CI-CD heavy, and bolt on the Security specialty when their role demands it.
Recommended order
- Brand new to cloud, non-engineer, or you want a confidence win: CLF-C02 first. It is genuinely foundational and cheap; engineers with a year of real AWS can usually skip it and start at SAA.
- Engineer/architect: SAA-C03 is the highest-leverage single certification in the whole catalogue. Do it first among the associates.
- Then specialise by role: add SOA-C02 if you operate systems (it is the only AWS exam that historically included hands-on labs, though AWS has paused those — see formats below), or DVA-C02 if you build applications. Many people do SAA then one of SOA/DVA.
- Professional: SAP-C02 is the natural follow-on to SAA; DOP-C02 pairs naturally with DVA + SOA experience. The combination of DVA + SOA covers a large fraction of the DOP blueprint.
- Specialties last, driven by need. SCS-C02 (Security) is the most broadly valuable; ANS-C01 (Advanced Networking) if you live in hybrid connectivity; the data/ML certs if that is your discipline.
Question formats and how the exam is built
Every AWS exam draws from two basic item types, with a handful of newer styles appearing on some exams:
| Format | What it is | How to handle it |
|---|---|---|
| Multiple choice | One correct answer, three distractors | Read the stem for the deciding constraint, eliminate to one |
| Multiple response | “Select TWO” / “Select THREE” — each correct option scored | Treat each option as an independent true/false; partial selections score zero |
| Ordering | Arrange steps into the correct sequence | Anchor the first and last steps you are certain of, fill the middle |
| Matching | Pair items across two columns | Do the pairs you are sure of first; they constrain the rest |
| Case study | One scenario, several linked questions | Read the scenario once carefully; constraints carry across questions |
There is no penalty for wrong answers — the score is based only on correct ones — so never leave a question blank. Flag-and-review is available; mark anything that takes more than your per-question budget and come back. The exams are scenario-led: a typical SAA or professional stem describes a workload and then asks for the option that is “MOST cost-effective”, “with the LEAST operational overhead”, “MOST highly available”, or “with the FEWEST changes”. Those capitalised qualifiers are the whole question — two options are often both technically correct and only one satisfies the qualifier.
A repeatable technique that works across all of them:
- Read the last sentence first to find what is actually being asked and the deciding qualifier (cost / overhead / latency / availability / changes).
- Extract the hard constraints from the stem (compliance, RTO/RPO, “no servers to manage”, “existing on-prem”, a specific protocol).
- Eliminate options that violate a constraint — usually two fall immediately.
- Choose between the survivors using the qualifier, not your personal preference.
- Flag and move on if you are over budget; speed on easy questions buys time for hard ones.
CLF-C02 — Cloud Practitioner
Foundational breadth: cloud value, security and compliance basics, core services, and billing. No deep architecture, no code. The goal is vocabulary and the shape of the platform.
| Domain | Weighting |
|---|---|
| 1. Cloud Concepts | 24% |
| 2. Security and Compliance | 30% |
| 3. Cloud Technology and Services | 34% |
| 4. Billing, Pricing, and Support | 12% |
Checklist: shared-responsibility model (who secures what); the value proposition of cloud (capex→opex, elasticity, agility, global reach); the global infrastructure (Regions, AZs, edge locations); core compute (EC2, Lambda, ECS/EKS at a name level), storage (S3 classes, EBS, EFS), database (RDS, Aurora, DynamoDB), networking (VPC, Route 53, CloudFront); IAM basics (users, groups, roles, MFA, root-account protection); the Well-Architected Framework’s six pillars by name; pricing models (On-Demand, Reserved, Savings Plans, Spot, Free Tier) and what drives cost; Billing tools (Cost Explorer, Budgets, Cost and Usage Report); support plans (Basic, Developer, Business, Enterprise On-Ramp, Enterprise) and what each includes; AWS Organizations and consolidated billing; the Trusted Advisor and Health Dashboard at a concept level.
CLF-C02 cheat sheet
- Shared responsibility: AWS secures “of the cloud” (hardware, global infra, managed-service internals); you secure “in the cloud” (data, IAM, OS patching on EC2, encryption choices).
- Pricing levers: pay-as-you-go, pay less when you commit (Savings Plans/RIs), pay less by using more (volume tiers). Free Tier = 12-month, always-free, and trials.
- Support: Developer = business-hours email; Business = 24/7 + full Trusted Advisor; Enterprise = TAM + 15-min Sev1 SLA.
- Pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability.
SAA-C03 — Solutions Architect Associate
The flagship associate. Heavily scenario-based around designing resilient, performant, secure and cost-optimised architectures.
| Domain | Weighting |
|---|---|
| 1. Design Secure Architectures | 30% |
| 2. Design Resilient Architectures | 26% |
| 3. Design High-Performing Architectures | 24% |
| 4. Design Cost-Optimized Architectures | 20% |
Checklist: IAM deep enough to reason about policy evaluation, roles, and cross-account access; S3 (storage classes, lifecycle, replication, encryption, Block Public Access, pre-signed URLs); EC2 + EBS + EFS + the purchasing options; Auto Scaling and ELB (ALB vs NLB vs GWLB); VPC design (subnets, route tables, IGW/NAT, security groups vs NACLs, VPC endpoints, peering, Transit Gateway at a concept level); RDS Multi-AZ vs read replicas, Aurora, DynamoDB (capacity modes, GSIs, global tables, DAX); decoupling with SQS/SNS/EventBridge; serverless (Lambda, API Gateway, Step Functions); CloudFront + Route 53 routing policies; caching (CloudFront, ElastiCache, DAX); encryption with KMS; resilience patterns (Multi-AZ, multi-Region, backup/restore vs pilot light vs warm standby vs active-active); cost tools and the cheapest-that-meets-requirements instinct.
SAA-C03 cheat sheet
- Decoupling: SQS = queue (pull, buffering, retries); SNS = pub/sub fan-out (push); EventBridge = event router with filtering and SaaS/AWS event sources. Combine SNS→SQS for durable fan-out.
- Load balancers: ALB = HTTP/HTTPS, layer 7, path/host routing; NLB = TCP/UDP/TLS, layer 4, ultra-low latency, static IP; GWLB = inline third-party appliances.
- Storage: S3 = object, internet-scale, 11 nines durability; EBS = block, single-AZ, attached to one instance (io2/gp3); EFS = NFS shared file, multi-AZ, many instances.
- Resilience tiers: Backup & Restore (cheapest, hours) → Pilot Light → Warm Standby → Multi-site Active-Active (priciest, seconds).
- RDS HA: Multi-AZ = synchronous standby for failover (availability); read replicas = asynchronous, for read scaling (and can be cross-Region).
- Cost: right-size first, then Savings Plans/RIs for steady state, Spot for fault-tolerant/stateless, S3 lifecycle to colder tiers.
SOA-C02 — SysOps Administrator Associate
Operations focus: deploy, manage, and operate workloads; monitoring, automation, security and compliance, networking, and cost/performance. Historically the only AWS exam with hands-on lab questions; AWS has at times paused the labs, so confirm the current format on the official exam guide before you book.
| Domain | Weighting |
|---|---|
| 1. Monitoring, Logging, and Remediation | 20% |
| 2. Reliability and Business Continuity | 16% |
| 3. Deployment, Provisioning, and Automation | 18% |
| 4. Security and Compliance | 16% |
| 5. Networking and Content Delivery | 18% |
| 6. Cost and Performance Optimization | 12% |
Checklist: CloudWatch (metrics, custom metrics, alarms, composite alarms, dashboards, Logs, Logs Insights, agent); CloudTrail (management vs data events, organisation trails); AWS Config (rules, conformance packs, remediation); Systems Manager (Parameter Store, Session Manager, Run Command, Patch Manager, State Manager, Automation runbooks); EventBridge for automated remediation; Auto Scaling lifecycle hooks and health checks; backup (AWS Backup, EBS snapshots, RDS automated backups, lifecycle); CloudFormation (stacks, change sets, drift, StackSets, nested stacks); ELB health checks and access logs; VPC operations (flow logs, Reachability Analyzer, route troubleshooting); Trusted Advisor and Cost Explorer; quotas/Service Quotas; encryption operations and certificate management with ACM.
SOA-C02 cheat sheet
- Observability triad: CloudWatch (metrics/logs/alarms — what is happening), CloudTrail (API audit — who did what), Config (resource state/compliance — what changed and is it allowed).
- Default metrics gaps: memory and disk usage are not default EC2 metrics — install the CloudWatch agent.
- Automated remediation: CloudWatch alarm or EventBridge rule → SSM Automation / Lambda. Config rules can auto-remediate via SSM.
- Patching at scale: SSM Patch Manager + maintenance windows; no SSH required via Session Manager.
- CloudFormation safety: change sets to preview, drift detection to catch console edits, StackSets for multi-account/Region.
DVA-C02 — Developer Associate
For application developers building on AWS — serverless, SDK behaviour, deployment, security from the code’s point of view, and troubleshooting.
| Domain | Weighting |
|---|---|
| 1. Development with AWS Services | 32% |
| 2. Security | 26% |
| 3. Deployment | 24% |
| 4. Troubleshooting and Optimization | 18% |
Checklist: Lambda in depth (handlers, environment variables, layers, versions/aliases, concurrency — reserved vs provisioned, event source mappings, destinations, SnapStart); API Gateway (REST vs HTTP APIs, stages, authorizers, throttling, caching, mapping templates); DynamoDB for developers (queries vs scans, partition-key design, conditional writes, optimistic locking, DynamoDB Streams, TTL, transactions); S3 SDK patterns (multipart upload, pre-signed URLs, event notifications); messaging (SQS visibility timeout, long polling, DLQs; SNS; EventBridge); IAM for code (roles vs keys, STS, least privilege, resource policies); Secrets Manager and Parameter Store; the exponential backoff with jitter retry pattern and idempotency; X-Ray tracing and instrumentation; deployment with SAM, CodeDeploy (in-place vs blue/green, canary/linear), CodePipeline/CodeBuild; caching strategies (write-through vs lazy loading) with ElastiCache/DAX; envelope encryption with KMS.
DVA-C02 cheat sheet
- Lambda concurrency: reserved = caps/guarantees a function’s share; provisioned = pre-warmed to kill cold starts; SnapStart = faster cold starts for supported runtimes.
- DynamoDB: design the partition key for even access; use Query not Scan; conditional writes + version attribute = optimistic locking; Streams for change capture.
- SQS: visibility timeout must exceed processing time or you double-process; long polling reduces empty receives/cost; DLQ after
maxReceiveCount. - Retries: SDKs retry idempotent ops with exponential backoff and jitter; make your own writes idempotent.
- Deploy strategies: CodeDeploy canary (a % then the rest) vs linear (equal increments) vs all-at-once; Lambda aliases shift traffic between versions.
- Caching: lazy loading (cache on miss, can serve stale) vs write-through (write to cache on every DB write, never stale but more writes).
SAP-C02 — Solutions Architect Professional
The senior architecture exam: complex, multi-account, organisation-scale design; migration and modernisation; cost control and continuous improvement across large estates. Long stems, multiple defensible options, decided by subtle constraints.
| Domain | Weighting |
|---|---|
| 1. Design Solutions for Organizational Complexity | 26% |
| 2. Design for New Solutions | 29% |
| 3. Continuous Improvement for Existing Solutions | 25% |
| 4. Accelerate Workload Migration and Modernization | 20% |
Checklist: multi-account strategy with Organizations, SCPs, Control Tower, landing zones, IAM Identity Center; cross-account networking (Transit Gateway, PrivateLink, Direct Connect, hybrid DNS with Route 53 Resolver); advanced resilience and DR (RTO/RPO trade-offs across the four strategies, multi-Region active-active with DynamoDB global tables and Aurora Global Database, Route 53 failover/latency/geo routing); migration tooling (Application Migration Service/MGN, DMS + SCT, DataSync, Snow family, Migration Hub, the 7 Rs); cost governance at scale (Savings Plans across accounts, consolidated billing, allocation tags, Budgets, anomaly detection); security at scale (GuardDuty, Security Hub, Macie, KMS multi-Region keys, Secrets Manager rotation); decoupling and modernisation (containers vs serverless trade-offs, event-driven, Step Functions); data strategy across analytics services. The skill being tested is judgement under competing constraints, not recall.
SAP-C02 cheat sheet
- Org guardrails: SCPs set the maximum permissions (they never grant); Control Tower bootstraps a multi-account landing zone with guardrails; IAM Identity Center for workforce SSO.
- Migration 7 Rs: Retire, Retain, Relocate, Rehost (lift-and-shift, MGN), Replatform (tweak, e.g. to RDS), Repurchase (move to SaaS), Refactor (re-architect, most effort/most cloud-native).
- DR by RTO/RPO: Backup & Restore (hours) → Pilot Light (minutes-to-low) → Warm Standby (low minutes) → Active-Active (seconds, highest cost).
- Hybrid connectivity: Direct Connect for consistent throughput/private; VPN for cheap/quick or DX backup; Transit Gateway as the hub; PrivateLink to expose a single service privately.
- Global data: DynamoDB global tables (multi-active) and Aurora Global Database (cross-Region read + fast promotion) for low-RTO multi-Region.
DOP-C02 — DevOps Engineer Professional
Senior platform/SRE: CI/CD, infrastructure as code, configuration management, monitoring/logging, incident response, and security automation across the SDLC. It overlaps heavily with the combination of DVA and SOA experience.
| Domain | Weighting |
|---|---|
| 1. SDLC Automation | 22% |
| 2. Configuration Management and IaC | 17% |
| 3. Resilient Cloud Solutions | 15% |
| 4. Monitoring and Logging | 15% |
| 5. Incident and Event Response | 14% |
| 6. Security and Compliance Automation | 17% |
Checklist: the CodeCatalyst/Code* suite (CodePipeline, CodeBuild, CodeDeploy, CodeArtifact) and integrating third-party CI; deployment strategies in depth (in-place, blue/green with ELB/Route 53, canary/linear for Lambda and ECS, all-at-once) and automated rollback on CloudWatch alarms; CloudFormation mastery (StackSets, nested stacks, change sets, drift, custom resources, hooks) plus CDK and an awareness of Terraform; configuration management with Systems Manager and OpsWorks legacy; resilience automation (Auto Scaling, multi-AZ/Region, AWS Backup, self-healing via EventBridge→SSM/Lambda); observability (CloudWatch metrics/alarms/Logs Insights, X-Ray, synthetics, ServiceLens, centralised logging); incident response (EventBridge patterns, Systems Manager Incident Manager, runbooks, GuardDuty→remediation); security automation (Config rules + auto-remediation, Security Hub, Secrets Manager rotation, IAM Access Analyzer, image scanning). The exam rewards automation that removes humans from the loop.
DOP-C02 cheat sheet
- Deployment safety: blue/green to cut over with instant rollback; canary/linear to limit blast radius; wire CloudWatch alarms to auto-rollback in CodeDeploy.
- IaC at scale: StackSets for multi-account/Region, change sets to preview, drift detection to catch manual edits, CDK for higher-level constructs.
- Self-healing: EventBridge rule on a failure event → SSM Automation runbook or Lambda; Config rule → auto-remediation.
- Centralised logging: CloudWatch Logs subscription filters → Kinesis Data Firehose → S3/OpenSearch; cross-account via a logging account.
- Secrets: Secrets Manager with rotation Lambdas; never bake credentials into AMIs, code, or CloudFormation parameters in plaintext.
A touch of the specialties
You will not study these from this kit, but a SAP/DOP candidate should recognise where they begin:
| Specialty | Code | The one-line scope | The signature services |
|---|---|---|---|
| Advanced Networking | ANS-C01 | Hybrid + complex VPC connectivity | Transit Gateway, Direct Connect, Route 53 Resolver, Global Accelerator, Network Firewall |
| Security | SCS-C02 | Detective + preventive + data protection | GuardDuty, Security Hub, Macie, KMS, IAM, WAF/Shield, Detective |
| Machine Learning | MLS-C01 | End-to-end ML lifecycle | SageMaker, data engineering for ML, modelling, ops |
| ML Engineer – Associate | MLA-C01 | Operationalising ML | SageMaker pipelines, deployment, monitoring |
| Data Engineer – Associate | DEA-C01 | Pipelines and analytics | Glue, Redshift, Kinesis, EMR, Lake Formation, Athena |
SCS-C02 (Security) is the highest-value addition for most engineers because security questions leak into every other exam. ANS-C01 is worth it if hybrid networking is your day job. The data/ML certs are discipline-specific.
Scenario practice questions with explained answers
This is the core of the kit. Work each one cold: read the stem, decide your answer, then read the explanation. Pay attention to the distractor analysis — being able to say why a wrong option is wrong is the skill the exam tests.
Q1 (SAA-C03) — decoupling and fan-out
A retail application must, on each new order, (a) update inventory, (b) email the customer, and © push the event to an analytics pipeline — independently, durably, and with each consumer able to retry without affecting the others. Which design is most appropriate?
A. Publish the order to an SNS topic; subscribe three SQS queues; each downstream service polls its own queue. B. Write the order to a single SQS queue that all three services poll. C. Invoke three Lambda functions synchronously from the order service. D. Publish to an SNS topic with three direct Lambda subscriptions.
Answer: A. SNS fan-out into per-consumer SQS queues gives each consumer its own durable buffer, independent retries, and a DLQ — the classic durable fan-out pattern.
Distractor analysis. B is wrong because a single shared queue means each message is consumed once by one poller; the three services would compete for the same messages, not each get a copy. C couples the order service’s latency and availability to all three downstreams and has no durability — a failed downstream fails the order. D fans out but loses the buffer: if a Lambda subscriber throttles or errors past its retries, the message can be lost; SQS between SNS and the consumer is what makes it durable and independently retryable.
Q2 (SAA-C03) — load balancer choice
A multiplayer game backend needs a load balancer that handles millions of TCP connections at ultra-low latency and must expose a static IP for an allow-list partners maintain. Which load balancer?
A. Application Load Balancer B. Network Load Balancer C. Gateway Load Balancer D. Classic Load Balancer
Answer: B. NLB operates at layer 4 (TCP/UDP/TLS), scales to millions of connections with very low latency, and provides a static IP per AZ (and supports Elastic IPs) — ideal for partner allow-listing.
Distractor analysis. A is layer 7 HTTP/HTTPS; it does not expose a static IP (only a DNS name) and adds latency unsuited to raw TCP gaming traffic. C is for inserting third-party network appliances inline, not for serving application traffic. D is legacy and should not be chosen for new designs.
Q3 (SAA-C03) — shared file storage
Three EC2 instances across two Availability Zones must read and write the same files concurrently with POSIX semantics. Which storage service?
A. Amazon EBS io2 volume attached to all three instances B. Amazon S3 mounted via the SDK C. Amazon EFS D. Instance store
Answer: C. EFS is a managed, multi-AZ NFS file system that many instances can mount and share with POSIX semantics — exactly the requirement.
Distractor analysis. A is wrong: a standard EBS volume attaches to a single instance in a single AZ (Multi-Attach exists only for io1/io2 within one AZ and needs a cluster-aware filesystem — it does not span AZs). B is object storage, not a POSIX filesystem; concurrent read/write file semantics do not apply. D is ephemeral, instance-local, and lost on stop — never shared.
Q4 (SOA-C02) — missing metrics
An operator needs a CloudWatch alarm on memory utilisation of a fleet of EC2 instances but cannot find the metric. What is the correct fix?
A. Enable detailed monitoring on the instances. B. Install and configure the CloudWatch agent to publish a memory metric. C. Raise a support case to enable the metric. D. Use Compute Optimizer instead.
Answer: B. Memory (and disk) are guest-OS metrics that AWS cannot see from the hypervisor; you must install the CloudWatch agent to publish them as custom metrics.
Distractor analysis. A detailed monitoring only changes EC2 metric granularity from 5-minute to 1-minute — it does not add memory. C is unnecessary; this is a configuration task, not an account flag. D Compute Optimizer gives right-sizing recommendations, not a real-time alarmable memory metric.
Q5 (SOA-C02) — who changed it
A security group rule changed unexpectedly and the operator must find who made the change and when. Which service answers this?
A. Amazon CloudWatch Logs B. AWS CloudTrail C. AWS Config D. VPC Flow Logs
Answer: B. CloudTrail records the API call — the principal, the time, the parameters — for the AuthorizeSecurityGroupIngress/Revoke... action. That is the “who did what, when” audit.
Distractor analysis. C Config tells you the security group’s state changed and can show a before/after configuration item, but the authoritative actor/identity attribution is CloudTrail (Config even references the CloudTrail event). A holds application/system logs, not the AWS API audit. D captures network traffic metadata, not control-plane changes. (In practice Config + CloudTrail are used together — but the who is CloudTrail.)
Q6 (DVA-C02) — duplicate processing
A Lambda consumer reading from SQS occasionally processes the same message twice. The processing takes up to 90 seconds. What is the most likely cause and fix?
A. The DLQ is misconfigured; add a DLQ. B. The queue is standard not FIFO; switch to FIFO. C. The visibility timeout is shorter than the processing time; increase it. D. Long polling is disabled; enable it.
Answer: C. If processing (90 s) exceeds the visibility timeout, the message becomes visible again and a second consumer picks it up — classic double-processing. Set the visibility timeout safely above the max processing time (and ideally 6× the function timeout for Lambda event source mappings).
Distractor analysis. A a DLQ handles poison messages after repeated failures; it does not stop a successfully-processing message from being redelivered early. B FIFO guarantees ordering and exactly-once processing within the dedup window, but the root cause here is the timeout; switching queue type is a heavier, often unnecessary change and FIFO has throughput limits. D long polling reduces empty receives and cost; it has nothing to do with redelivery.
Q7 (DVA-C02) — safe concurrent updates
Two Lambda invocations may update the same DynamoDB item concurrently; the application must prevent a lost update without a separate lock service. Which approach?
A. Enable DynamoDB Streams. B. Use a conditional write with a version attribute (optimistic locking). C. Switch the table to provisioned capacity. D. Use a global secondary index.
Answer: B. Optimistic locking — a version attribute plus a ConditionExpression that the version is unchanged — makes the write fail if another writer got there first, so the loser retries. No external lock needed.
Distractor analysis. A Streams capture changes for downstream processing; they do not coordinate concurrent writers. C capacity mode affects throughput/cost, not consistency between writers. D a GSI is an alternate query path, irrelevant to write conflicts.
Q8 (SAP-C02) — org-wide guardrail
A platform team must guarantee that no account in a production OU can disable CloudTrail, regardless of any IAM permissions an account admin grants. What enforces this?
A. An IAM policy attached to every role in those accounts.
B. A Service Control Policy on the production OU denying cloudtrail:StopLogging and cloudtrail:DeleteTrail.
C. AWS Config rules detecting the change.
D. A permission boundary on each admin user.
Answer: B. An SCP sets the maximum permissions for every principal in the OU; an explicit deny on the CloudTrail stop/delete actions cannot be overridden by any IAM grant inside the account. That is the only option that is preventive and unconditional across the OU.
Distractor analysis. A per-role IAM policies can be changed or bypassed by an account admin and must be maintained on every principal — not a guarantee. C Config is detective: it tells you after the fact, it does not prevent the action. D permission boundaries limit specific principals, not the whole account, and an admin could create principals outside the boundary or alter it; they are not an org-wide guarantee.
Q9 (SAP-C02) — multi-Region low RTO
A global write-heavy application needs active-active in two Regions with a recovery point and time measured in seconds for its primary data store. Which data layer?
A. RDS Multi-AZ with a cross-Region read replica. B. DynamoDB global tables. C. Aurora with a cross-Region snapshot copy schedule. D. S3 Cross-Region Replication.
Answer: B. DynamoDB global tables provide multi-active, multi-Region replication with last-writer-wins conflict resolution — writes accepted in every Region, RPO/RTO in seconds. That matches “active-active, write-heavy, seconds”.
Distractor analysis. A Multi-AZ is single-Region HA; a cross-Region read replica is read-only and promotion is manual — not active-active and not seconds. C snapshot copies give an RPO of however often you copy (hours), not seconds, and are restore-based. D S3 is object storage and asynchronous; it is not the application’s transactional write store. (Aurora Global Database would be the relational answer for low-RTO multi-Region, but the option given is snapshot copy, which is the trap.)
Q10 (DOP-C02) — automatic rollback
A team deploys an ECS service via CodeDeploy blue/green and wants the deployment to automatically roll back if error rates spike during the canary. What wires this up?
A. A manual approval action in CodePipeline. B. A CloudWatch alarm associated with the CodeDeploy deployment group so a breach triggers automatic rollback. C. A Lambda function polling logs after deployment. D. Enabling termination protection on the tasks.
Answer: B. CodeDeploy can be configured with CloudWatch alarms; if an alarm goes into ALARM during deployment, CodeDeploy halts and rolls back automatically — humans stay out of the loop, which is exactly the DevOps-professional instinct.
Distractor analysis. A a manual approval inserts a human and a delay; it does not react to error rates. C a polling Lambda is a fragile reinvention of a built-in feature and runs after the window. D termination protection prevents accidental task termination; it has nothing to do with rollback on metrics.
Q11 (DOP-C02) — self-healing remediation
When GuardDuty detects an EC2 instance making connections to a known crypto-mining endpoint, the platform must automatically isolate the instance with zero human action. What pattern achieves this?
A. A CloudWatch dashboard with an alarm emailing the on-call. B. An EventBridge rule matching the GuardDuty finding that triggers an SSM Automation runbook (or Lambda) to apply an isolation security group. C. AWS Config with a conformance pack. D. A scheduled Lambda that scans for findings hourly.
Answer: B. GuardDuty emits findings as events; an EventBridge rule on the finding type invokes an SSM Automation runbook / Lambda that swaps the instance into an isolation security group — event-driven, immediate, no human.
Distractor analysis. A emailing on-call is detection plus a human, not automatic remediation. C Config evaluates resource configuration compliance; it does not react to threat findings. D an hourly scan adds up to an hour of dwell time and reinvents the native event integration.
Q12 (CLF-C02) — shared responsibility
Under the AWS shared-responsibility model, which task is the customer’s responsibility?
A. Patching the hypervisor on the EC2 host. B. Configuring security groups and encrypting application data. C. Maintaining the physical security of data centres. D. Replacing failed disks in the storage fleet.
Answer: B. The customer is responsible for security in the cloud — IAM, security group rules, OS patching on EC2, and choosing/managing encryption of their data.
Distractor analysis. A, C and D are all AWS’s responsibility of the cloud — the hypervisor, physical security, and hardware lifecycle are managed by AWS.
Q13 (SAA-C03) — cost optimisation with a constraint
A nightly batch job runs for two hours, is fully fault-tolerant (checkpoints and resumes), and the team wants the lowest compute cost. Which purchasing option?
A. On-Demand Instances. B. A 3-year Standard Reserved Instance. C. Spot Instances. D. A 1-year Compute Savings Plan.
Answer: C. Spot is the cheapest (up to ~90% off On-Demand) and is appropriate precisely because the workload is interruptible and fault-tolerant — the deciding constraint in the stem.
Distractor analysis. B and D commit you to 1–3 years of baseline usage; for a job that runs two hours a night you would pay for capacity you do not use, so they are not lowest cost here. A On-Demand is more expensive than Spot and brings no benefit for a fault-tolerant job. The fault-tolerance is the signal that Spot is safe.
Commonly-confused services — the exam tips
A surprising share of associate-level questions reduce to telling two similar services apart. Burn these distinctions in.
SQS vs SNS vs EventBridge
| SQS | SNS | EventBridge | |
|---|---|---|---|
| Model | Queue (point-to-point, pull) | Pub/sub (push, fan-out) | Event bus / router (push, filtered) |
| Consumers | One consumer per message | Many subscribers, each gets a copy | Many targets via rules |
| Buffering/retry | Yes — durable buffer, DLQ | Limited; pair with SQS for durability | Retries + DLQ to targets |
| Filtering | No (consumer filters) | Subject/attribute filtering | Rich content-based pattern matching |
| Sources | Your producers | Your publishers | AWS services, SaaS partners, custom |
| Pick when | Decouple + smooth load + retries | Broadcast one message to many | Route/filter events, schedule, integrate SaaS |
Tip: “fan-out durably” = SNS → SQS. “Route events from AWS/SaaS with filtering or on a schedule” = EventBridge. “Buffer work between a producer and a worker” = SQS.
ALB vs NLB (vs GWLB)
| ALB | NLB | GWLB | |
|---|---|---|---|
| Layer | 7 (HTTP/HTTPS) | 4 (TCP/UDP/TLS) | 3/4 (GENEVE) |
| Routing | Path, host, header, method | Connection (flow hash) | To/from appliances |
| Static IP | No (DNS only) | Yes (per-AZ; Elastic IP) | n/a |
| Latency | Higher (L7 processing) | Very low | Inline appliance |
| Pick when | Web apps, microservice routing, WebSockets | Extreme performance, TCP/UDP, static IP, TLS passthrough | Insert firewalls/IDS/IPS inline |
Tip: static IP or non-HTTP or millions of low-latency connections → NLB. HTTP routing on path/host → ALB. Third-party security appliance inline → GWLB.
EBS vs EFS vs S3
| EBS | EFS | S3 | |
|---|---|---|---|
| Type | Block | File (NFS, POSIX) | Object |
| Access | One instance (one AZ) | Many instances, multi-AZ | Internet-scale, many clients |
| Durability/scope | AZ-scoped volume | Regional, elastic | 11 nines, global namespace |
| Use | Boot/database volumes | Shared app files, lift-and-shift | Backups, data lake, static assets, large objects |
Tip: “attached to one instance / database disk” → EBS. “Several instances share the same files” → EFS. “Objects, web assets, backups, virtually unlimited” → S3.
Security group vs NACL
| Security group | Network ACL | |
|---|---|---|
| Scope | Instance/ENI level | Subnet level |
| State | Stateful (return traffic auto-allowed) | Stateless (must allow return explicitly) |
| Rules | Allow only | Allow and deny |
| Evaluation | All rules evaluated | Numbered, lowest first, first match wins |
| Default | Deny inbound, allow outbound | Default allows all; custom denies all until rules added |
Tip: need an explicit deny (e.g. block one IP) or subnet-wide control → NACL. Per-instance allow-listing with automatic return traffic → security group. The single most-tested fact: security groups are stateful, NACLs are stateless.
Hands-on lab — a free, self-marking practice harness
You cannot replicate the real exam, but you can build the habit of timed, scenario-style practice for free. This lab spins up nothing chargeable — it uses the AWS Free Tier only to confirm the service facts behind a few questions, then a tiny local quiz loop to drill the elimination technique.
Step 1 — confirm a fact the exam will test (free, read-only). Verify that a default EC2 instance has no memory metric, which is Q4’s point:
# Lists CloudWatch metrics in the EC2 namespace for your account.
# You will see CPUUtilization, NetworkIn/Out, etc. — but NO mem_used_percent
# unless the CloudWatch agent is installed. That absence IS the lesson.
aws cloudwatch list-metrics --namespace AWS/EC2 \
--query "Metrics[].MetricName" --output text | tr '\t' '\n' | sort -u
Expected output: a list including CPUUtilization, NetworkIn, NetworkOut, StatusCheckFailed, and similar — and the conspicuous absence of any memory metric. list-metrics is a read-only call and is free.
Step 2 — confirm security-group statefulness conceptually (free). Describe the default security group and note it has an outbound allow-all and a restrictive inbound — return traffic for allowed inbound is automatic because the group is stateful:
aws ec2 describe-security-groups \
--filters Name=group-name,Values=default \
--query "SecurityGroups[0].{In:IpPermissions,Out:IpPermissionsEgress}"
Step 3 — build a local timed quiz loop (no AWS, no cost). Save a few questions as JSON and drill them with a timer so you practise the budget (about 2 minutes each):
cat > /tmp/quiz.json <<'JSON'
[
{"q":"Durable fan-out to 3 independent consumers?","a":"SNS->SQS per consumer"},
{"q":"Static IP + millions of low-latency TCP connections?","a":"NLB"},
{"q":"Several instances share the same POSIX files?","a":"EFS"},
{"q":"Stop any account in an OU from disabling CloudTrail?","a":"SCP deny"},
{"q":"Who changed a security group rule, and when?","a":"CloudTrail"}
]
JSON
python3 - <<'PY'
import json, time
qs = json.load(open("/tmp/quiz.json"))
score = 0
for i, item in enumerate(qs, 1):
start = time.time()
print(f"\nQ{i}: {item['q']}")
input(" (think, then press Enter to reveal) ")
print(f" Answer: {item['a']} [{time.time()-start:0.0f}s]")
if input(" Did you get it right? (y/n) ").strip().lower() == "y":
score += 1
print(f"\nScore: {score}/{len(qs)} — aim for sub-120s per question.")
PY
Validation: Step 1 should show no memory metric (proving Q4); Step 3 should report your score and per-question time. If any single question took more than ~120 seconds, that is a topic to revise.
Cleanup: there is nothing chargeable to delete — only remove the temp files:
rm -f /tmp/quiz.json
Cost note: every command here is either a read-only API call (list-metrics, describe-security-groups — free) or runs locally. The lab cost is 0. The lesson: build the timed-elimination habit before you pay for the real sitting.
Common mistakes & troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| You “know the services” but fail practice scenarios | Answering on recognition, not by eliminating against the constraint | Read the last sentence first; find the qualifier (cost/overhead/latency); eliminate |
| Multiple-response questions score zero despite “mostly right” | Partial credit does not exist — one wrong selection voids the item | Treat each option as independent true/false; only select what you can defend |
| Running out of time on the professional exams | Spending too long on hard items early | Budget ~2 min (assoc) / ~2.5 min (pro); flag-and-move; never leave blanks |
| Confusing two similar services repeatedly (SQS/SNS, ALB/NLB) | Studied features in isolation, not side by side | Drill the comparison tables in this lesson until the distinctions are reflexive |
| Picking a “correct but not best” option | Ignoring the capitalised qualifier (MOST/LEAST/FEWEST) | Underline the qualifier mentally; choose among technically-correct survivors by it |
| Over-engineering the answer | Reaching for the most advanced service | Prefer the option that meets the stated requirement with the least overhead/cost |
| Booking too early and failing | No timed full-length practice at passing standard | Sit timed mocks; only book when consistently above the passing range |
| Panicking over “how many can I miss” | Misunderstanding scaled scoring | It is scaled 100–1000 with ~15 unscored items; calibrate on practice %, not raw counts |
Best practices
- Study from the official exam guide first. The PDF for each exam lists the in-scope tasks and services by domain — it is the source of truth for what is testable, and the weightings tell you where to spend hours.
- Build before you memorise. The portfolio projects in this course are the fastest route to durable knowledge; you remember a thing you deployed far longer than a thing you read.
- Practise the format, timed. Do full-length, timed mocks under exam conditions; the time pressure on the professional exams is itself a skill.
- Keep a “confused services” sheet. Every time two services trip you up, write the one-line distinction. The four pairs in this lesson are the usual suspects.
- Read the qualifier, eliminate ruthlessly. Most questions have two defensible options; the qualifier and the constraints decide which.
- Climb in order. SAA before SAP, DVA/SOA before DOP. The professional exams assume associate-level fluency and will punish gaps.
- Schedule the exam to create a deadline. Open-ended study expands to fill infinite time; a booked date focuses it.
Security notes
Certification study is also security study — much of every blueprint is security, and the habits transfer to production:
- Least privilege is the default exam-correct answer. When two options differ only by scope of permissions, the narrower one usually wins. Carry that instinct into real IAM work.
- Prefer roles over long-lived keys. Questions that offer “store access keys on the instance” versus “attach an IAM role” almost always want the role; the same is true in production.
- Preventive beats detective when the question says “guarantee” or “prevent”. SCPs and explicit denies for hard guarantees; Config/GuardDuty for detection. Know which the stem is asking for.
- Encrypt by default and manage keys with KMS. Envelope encryption, CMKs, and “encryption at rest/in transit” appear across every exam — and are table stakes in real systems.
- Do not practise against shared or production accounts. Use a personal sandbox for any hands-on study, with a budget alarm, so a study mistake never touches anything that matters.
Interview & exam questions
-
Q: When would you choose SNS→SQS over a direct SNS→Lambda subscription? A: When each consumer needs a durable buffer, independent retries, and a DLQ; SQS decouples consumer availability/throttling from delivery so a slow or failing consumer cannot lose messages.
-
Q: Security group vs NACL — give the two differences that decide most questions. A: Security groups are stateful (return traffic auto-allowed) and allow-only, applied at the instance/ENI; NACLs are stateless (must allow return explicitly), support deny rules, and apply at the subnet.
-
Q: A read replica versus Multi-AZ on RDS — what does each give you? A: Multi-AZ is a synchronous standby for availability/failover (not for reads); read replicas are asynchronous, for read scaling and can be cross-Region. They solve different problems and are often used together.
-
Q: What makes an SQS consumer process the same message twice, and how do you stop it? A: The visibility timeout is shorter than processing time, so the message reappears mid-processing. Raise the visibility timeout above the maximum processing time (for Lambda event source mappings, ~6× the function timeout).
-
Q: How do you guarantee no account in an OU can disable a control, regardless of IAM? A: A Service Control Policy with an explicit deny on the relevant actions; SCPs cap the maximum permissions for every principal in the OU and cannot be overridden by in-account IAM grants.
-
Q: Which AWS service answers “who changed this resource and when”? A: CloudTrail records the API call with the principal, time, and parameters. Config shows the state change and references the CloudTrail event, but the identity attribution is CloudTrail.
-
Q: NLB or ALB for a service needing a static IP and TCP at scale? A: NLB — layer 4, ultra-low latency, millions of connections, and a static IP per AZ (plus Elastic IP support). ALB exposes only a DNS name and is layer 7.
-
Q: How do you achieve automatic rollback on a bad deployment? A: Associate CloudWatch alarms with the CodeDeploy deployment group (or the Lambda/ECS deployment); an alarm breach during the canary/linear window halts and rolls back automatically — no human in the loop.
-
Q: Lazy loading vs write-through caching — trade-offs? A: Lazy loading populates the cache on a miss (cheap, resilient to cache failure, but can serve stale data and has a cold-cache penalty); write-through writes to the cache on every DB write (data is never stale, but every write costs a cache write and the cache fills with rarely-read data). Often combined with a TTL.
-
Q: Give the DR strategies in order of RTO/RPO and cost. A: Backup & Restore (cheapest, hours) → Pilot Light (core running, minutes) → Warm Standby (scaled-down full stack, low minutes) → Multi-site Active-Active (full scale in multiple Regions, seconds, most expensive).
-
Q: EBS, EFS, or S3 for a fleet of app servers that must share the same uploaded files? A: EFS — a shared, multi-AZ NFS filesystem many instances mount with POSIX semantics. EBS is single-instance/single-AZ block; S3 is object storage without filesystem semantics.
-
Q: How does scaled scoring work and how should it change your strategy? A: Scores are scaled 100–1000 with a fixed passing line per exam (commonly ~700), and ~15 items are unscored pilots you cannot identify. Because you cannot know which count and there is no wrong-answer penalty, you answer every question and calibrate readiness on your practice percentage, not a raw “allowed misses” count.
Quick check
- You must fan out one order event to three consumers, each with its own durable buffer and retries. What do you use?
- Which is stateful — a security group or a NACL?
- A workload is fully fault-tolerant and you want the cheapest compute. Which purchasing option?
- Which service tells you who disabled CloudTrail and when?
- The exam score is scaled over what range, and should you ever leave a question blank?
Answers
- SNS → SQS with one SQS queue per consumer (durable fan-out). A direct SNS→Lambda fan-out lacks the buffer.
- The security group is stateful (return traffic auto-allowed); the NACL is stateless.
- Spot Instances — the fault-tolerance is the signal that interruption is acceptable, and Spot is the cheapest.
- AWS CloudTrail (the API audit; Config references the same CloudTrail event for attribution).
- 100–1000, with a fixed pass line and ~15 unscored items. Never leave a question blank — there is no penalty for guessing.
Exercise
Pick the next exam you intend to sit and produce a one-page readiness plan of your own:
- Download the official exam guide for your target (e.g. SAA-C03) and copy its domain table with weightings.
- Self-score 1–5 per domain on honest current confidence, then multiply each gap by the domain weighting to get a priority score — study the highest-priority gaps first.
- Write your own four “confused-services” cards for the pairs you personally muddle (beyond the four in this lesson).
- Draft a four-week plan using the template below, ending with two timed full-length mocks.
- Book the exam for the end of week four to create the deadline — and only move it if your timed mocks are not yet consistently above the passing range.
Four-week study-plan template (adapt to your timeline and exam):
| Week | Focus | Activity | Output |
|---|---|---|---|
| 1 | Highest-weighted/lowest-confidence domain | Read exam guide + course lessons; build one small lab | Notes + a deployed mini-project |
| 2 | Next two domains | Hands-on for each; start a confused-services sheet | Working examples + the sheet |
| 3 | Remaining domains + cross-cutting (security, cost) | Targeted reading; first timed mock | Mock score + error log |
| 4 | Weak areas from the mock | Re-drill errors; second timed mock; light review | Consistent pass-range mocks → sit the exam |
Certification mapping
This lesson is the readiness layer for the entire AWS ladder: CLF-C02, SAA-C03, SOA-C02, DVA-C02, SAP-C02 and DOP-C02, with pointers into the specialties (ANS-C01, SCS-C02, MLS-C01/MLA-C01, DEA-C01). The domain checklists and weightings map directly to each official exam guide; the practice questions are tagged by exam; the confused-services section targets the associate level where those distinctions decide the most questions; and the scoring/format notes apply to every exam in the catalogue.
Glossary
- Scaled score: a normalised result on a fixed 100–1000 range that accounts for slight difficulty differences between exam forms; compared against a fixed passing line, not a raw percentage.
- Unscored (pilot) item: a trial question included in the exam to gather statistics; it does not affect your score and is not identified.
- Multiple-response item: a question requiring you to select N correct options (“select TWO/THREE”); scored as a unit with no partial credit.
- Distractor: an incorrect answer option deliberately designed to look plausible; analysing why it is wrong is the core study skill.
- Qualifier: the capitalised word in a stem (MOST/LEAST/FEWEST/BEST) that decides between technically-correct options.
- Domain weighting: the published percentage of an exam devoted to a domain; used to budget study time.
- SCP (Service Control Policy): an Organizations policy that sets the maximum permissions for principals in an account/OU; it never grants, only constrains.
- Optimistic locking: a concurrency technique using a version attribute and a conditional write so a stale update fails and retries — used here as a recurring DVA answer.
- DLQ (dead-letter queue): a queue that receives messages a consumer repeatedly fails to process, isolating poison messages for inspection.
Next steps
You now have the checklists, the question-working technique, the confused-services distinctions, and a plan. Turn study into proof by building the real thing: continue to the AWS Capstone — Build a Well-Architected Multi-Account Landing Zone + 3-Tier App, which exercises the SAA/SAP/DOP blueprints end to end. For depth on the topics the questions probe, revisit the AWS Architecting Ladder, Portfolio Projects, IAM Fundamentals, and the troubleshooting playbooks (single-service and multi-service RCA). Book the date, work the plan, and pass.