AWS Serverless

AWS Messaging Fundamentals: SQS, SNS & EventBridge — When to Use Which

Every system that grows past a single process eventually needs its components to talk to each other without being chained together. The order service should not fall over because the email service is slow. The image uploader should not block while three downstream jobs run. A new analytics consumer should be able to start listening to “order placed” events tomorrow without anyone touching the order service. The technique that makes all of this possible is asynchronous messaging — putting a durable, managed layer between the producer of an event and its consumers so that each can scale, fail, and deploy independently. This is decoupling, and on AWS it is delivered by three services you must know cold: Amazon SQS, Amazon SNS, and Amazon EventBridge.

They are easy to confuse because they all “move messages around”, but they solve genuinely different problems. SQS is a queue: one producer (or many) drops messages in, and consumers pull them out and process them one batch at a time — work that must be done exactly once, durably, at the consumer’s own pace. SNS is a pub/sub topic: a producer pushes one message and SNS fans it out to every subscriber at once — one event, many interested parties, delivered immediately. EventBridge is an event router: events flow onto a bus and rules match them with rich content-based filters and route each to the right target — the backbone of event-driven architectures, with deep AWS-service and SaaS integration. A fourth service, Amazon Kinesis, handles streaming — ordered, replayable, high-throughput records read by position — and it rounds out the decision.

This lesson is the fundamentals layer that ties the advanced messaging lessons together. We will walk every core setting of SQS, SNS, and EventBridge — the ones an interviewer probes and a certification exam tests — then put them side by side in a decision table, then build the single most important pattern in AWS messaging: SNS-to-SQS fan-out. By the end you will reach for the right service on instinct and be able to justify the choice out loud.

Learning objectives

By the end of this lesson you will be able to:

Prerequisites & where this fits

You need an AWS account, the AWS CLI configured (aws configure), and a working grasp of IAM — every one of these services is gated by IAM identity policies and (for SQS topics and queues) resource policies, and the fan-out pattern lives or dies on getting a resource policy right. A little familiarity with Lambda helps, because Lambda is the most common consumer of all three, but it is not required to follow the configuration. This is a Serverless lesson in the AWS Zero-to-Hero course, and it is deliberately the foundation beneath the two advanced messaging lessons: once you finish here, the resilient-messaging deep dive (sqs-sns-fan-out-fifo-ordering-dlq-poison-message-handling) takes FIFO, DLQs, and idempotent consumers further, and the event-driven-architecture deep dive (eventbridge-event-driven-architecture-buses-schema-pipes) takes EventBridge buses, schemas, and archive/replay further. After this lesson the course turns to operations with the AWS troubleshooting playbooks (aws-troubleshooting-methodology-ec2-vpc-iam-s3-lambda).

Core concepts: queue, topic, bus, stream

Before the settings, fix the four mental models. The vocabulary recurs throughout and the distinctions are exactly what gets tested.

The single most useful way to keep them straight is the delivery direction and copy count:

A second concept to internalise now is at-least-once delivery. SQS standard queues, SNS, and EventBridge all guarantee a message is delivered at least once — which means a duplicate is always possible (a network blip causes a redelivery). Your consumers must therefore be idempotent: processing the same message twice must produce the same result as processing it once. Only FIFO queues and topics add exactly-once processing within their throughput limits. Design for duplicates from day one; do not bolt idempotency on after an incident.


Part 1 — Amazon SQS (the queue)

Amazon Simple Queue Service (SQS) is a fully managed message queue. Producers send messages; SQS stores them durably across multiple Availability Zones; consumers poll, process, and delete them. There are no servers to run and effectively no capacity to provision — SQS scales to any throughput. Its job is to decouple and buffer: it absorbs spikes, lets a slow consumer catch up, and ensures that work is not lost if a consumer crashes mid-process.

SQS: standard vs FIFO queues

The first and most consequential choice is the queue type, fixed at creation and immutable thereafter.

Standard queue FIFO queue
Ordering Best-effort — usually in order, not guaranteed Strict first-in-first-out within a message group
Delivery At-least-once (duplicates possible) Exactly-once processing (dedup within a 5-min window)
Throughput Nearly unlimited (near-infinite TPS) 300 msg/s without batching, 3,000 msg/s with batching of 10; high-throughput mode raises this substantially
Name requirement Any name Must end in .fifo
Dedup None Content-based (SHA-256 of body) or explicit MessageDeduplicationId
Use when Maximum throughput, order does not matter, consumers are idempotent Order matters (e.g. per-account transactions) or duplicates are unacceptable

The decision in one line: default to standard for its unlimited scale and only choose FIFO when business correctness demands strict ordering or exactly-once. FIFO buys ordering with throughput limits and a small amount of extra ceremony (group IDs and dedup IDs).

Two FIFO terms you must know:

SQS: visibility timeout (the concept interviewers love)

When a consumer receives a message, SQS does not delete it — it makes the message invisible to other consumers for the visibility timeout, then waits for the consumer to explicitly delete it after successful processing. This is the heart of SQS’s reliability: if the consumer crashes before deleting, the timeout expires, the message reappears, and another consumer picks it up. Nothing is lost.

Setting What it is Range / default When to change Gotcha
Visibility timeout How long a received message stays hidden before reappearing 0 s – 12 h; default 30 s Set to longer than your worst-case processing time Too short → message reappears mid-processing and is processed twice; too long → a crashed consumer’s message is stuck invisible for ages

The rule: visibility timeout > maximum processing time, with margin. If processing might take 4 minutes, do not leave it at 30 seconds — set 6 minutes. For variable workloads, a consumer can call ChangeMessageVisibility to extend the timeout (a “heartbeat”) while it is still working, rather than over-provisioning a single large value. When SQS triggers a Lambda consumer, AWS recommends the queue’s visibility timeout be at least 6× the function timeout to avoid premature redelivery during retries.

The processing contract, then, is: receive → process → delete. Forgetting the final delete is the classic bug: the message reappears after the timeout and is processed again forever. (Lambda’s SQS event source mapping deletes successfully processed messages for you on a clean return — but if your handler throws, the whole batch becomes visible again unless you report partial batch failures.)

SQS: long polling vs short polling

When a consumer calls ReceiveMessage, how SQS waits matters for both latency and cost.

Mode Behaviour Cost / latency When
Short polling Returns immediately, sampling a subset of servers — may return empty even when messages exist Many empty receives → more API calls, higher cost Almost never the right default
Long polling Waits up to WaitTimeSeconds (1–20 s) for a message to arrive before returning Far fewer empty responses, lower cost, lower latency Always prefer — set ReceiveMessageWaitTimeSeconds to 20

Long polling is the single cheapest win in SQS. Set the queue’s ReceiveMessageWaitTimeSeconds to 20 (the maximum) — or pass WaitTimeSeconds on the receive call — and SQS holds the connection open until a message arrives or the timer expires, instead of hammering the API with empty short-poll responses that you still pay for. There is essentially no downside; turn it on by default.

SQS: dead-letter queues and redrive

A dead-letter queue (DLQ) is an ordinary SQS queue that catches messages a consumer could not process after repeated attempts — a poison message (malformed body, a bug, a permanently failing downstream). Without a DLQ, a poison message would be received, fail, reappear after the visibility timeout, fail again, forever — blocking the queue and burning money.

You attach a DLQ to a source queue via its redrive policy:

Setting What it is Notes
deadLetterTargetArn The DLQ to send exhausted messages to Must be the same type (FIFO source → FIFO DLQ) and same Region/account
maxReceiveCount How many times a message may be received before moving to the DLQ Typically 3–5; counts receives, not retries elapsed

Once messages land in the DLQ you investigate them, fix the cause, and then redrive them back to the source queue (or to a custom destination) using DLQ redrive — no custom script required. Treat the DLQ as a monitored signal: a CloudWatch alarm on ApproximateNumberOfMessagesVisible > 0 on the DLQ tells you something is broken now.

SQS: the rest of the create-time settings

Setting What it does Range / default Notes / gotcha
Message retention period How long an undelivered message is kept 60 s – 14 days; default 4 days After this, undelivered messages are silently deleted — size it to your worst recovery window
Maximum message size Largest message body 1 KB – 256 KB; default 256 KB For larger payloads use the SQS Extended Client (store the body in S3, send a pointer)
Delivery delay Hide new messages for a fixed time after sending 0 s – 15 min; default 0 Queue-wide; per-message DelaySeconds works on standard queues (not FIFO)
Receive message wait time Long-polling wait 0–20 s; default 0 Set to 20 (see above)
Encryption At-rest encryption SSE-SQS (default, AWS-managed) or SSE-KMS (your CMK) KMS gives you key control and audit; charges per API call
Access policy Resource (queue) policy Who may send/receive; required for cross-account and for SNS/EventBridge to send to the queue
High-throughput FIFO Raises FIFO TPS Off by default Opt-in per FIFO queue when 300/3,000 TPS is not enough

A note on delay queues vs visibility timeout: a delay hides a message when it is first sent (useful to defer work); a visibility timeout hides a message after it has been received (to allow processing). They are different controls for different moments — do not conflate them.

SQS in the CLI

# Create a standard queue with long polling, a 5-minute visibility timeout, and KMS encryption
aws sqs create-queue \
  --queue-name orders-work \
  --attributes '{
    "VisibilityTimeout":"300",
    "ReceiveMessageWaitTimeSeconds":"20",
    "MessageRetentionPeriod":"345600",
    "SqsManagedSseEnabled":"true"
  }'

# Attach a dead-letter queue after 5 failed receives
aws sqs set-queue-attributes \
  --queue-url "$QUEUE_URL" \
  --attributes '{"RedrivePolicy":"{\"deadLetterTargetArn\":\"'"$DLQ_ARN"'\",\"maxReceiveCount\":\"5\"}"}'

# Send, receive (long poll), then delete — the receive→process→delete contract
aws sqs send-message --queue-url "$QUEUE_URL" --message-body '{"orderId":"A-1001"}'
aws sqs receive-message --queue-url "$QUEUE_URL" --wait-time-seconds 20 --max-number-of-messages 10
aws sqs delete-message --queue-url "$QUEUE_URL" --receipt-handle "$RECEIPT_HANDLE"

Part 2 — Amazon SNS (the topic)

Amazon Simple Notification Service (SNS) is fully managed publish/subscribe messaging. A publisher sends one message to a topic; SNS immediately pushes a copy to every subscription on that topic. There is no polling and no storage you manage — SNS delivers and (with retries) forgets. Its job is fan-out and notification: one event, many recipients, delivered now.

SNS: topics, subscriptions, and protocols

A topic is the access point publishers send to and subscribers attach to. A subscription is one delivery endpoint with a protocol.

Protocol Endpoint Typical use
SQS A queue ARN Fan-out — the canonical durable pattern (Part 5)
Lambda A function ARN Trigger code on every message
HTTP / HTTPS A URL Webhooks into your own or third-party services
Email / Email-JSON An address Human notifications (must be confirmed)
SMS A phone number Text alerts (no subscription needed for direct publish)
Application A mobile push endpoint iOS/Android push via platform endpoints
Kinesis Data Firehose A delivery-stream ARN Archive every message to S3/Redshift/OpenSearch

Most subscriptions require confirmation: SNS sends a confirmation request to the endpoint, and only a confirmed subscription receives messages (this prevents you subscribing an endpoint you do not control). SQS and Lambda subscriptions are auto-confirmed when the permissions are in place.

SNS: the fan-out model

Fan-out is SNS’s reason to exist. One Publish call delivers to every subscriber simultaneously and independently — add a new consumer by adding a subscription, with zero change to the publisher. This is the textbook way to let many systems react to one event: an “order placed” message can, in a single publish, kick off the fulfilment queue, the analytics pipeline, and the customer-email Lambda at once.

The crucial production refinement is SNS → SQS rather than SNS → Lambda/HTTP directly. SNS delivery is push with retries but no long-term storage: if a subscribed endpoint is down past the retry policy, that copy is dropped (unless a subscription DLQ catches it). Subscribing SQS queues instead gives every consumer its own durable buffer — the message waits safely until that consumer is ready. We build this in Part 5.

SNS: filter policies (subscription filtering)

By default every subscriber gets every message. A filter policy is a JSON document on a subscription that tells SNS to deliver only messages whose attributes (or, with message-body filtering, whose body fields) match — so each subscriber receives only the subset it cares about, and you avoid fan-out-then-discard waste.

{
  "eventType": ["order_placed", "order_cancelled"],
  "amount":    [{ "numeric": [">=", 100] }],
  "region":    [{ "anything-but": ["test"] }]
}

A message is delivered to that subscription only if all attributes match (AND across keys, OR within a key’s array). Filter policies support exact match, prefix, anything-but, numeric ranges, and existence checks. Set the FilterPolicyScope to MessageAttributes (default) or MessageBody to filter on the payload itself. Filtering at the subscription is far cheaper and cleaner than delivering everywhere and discarding in code.

SNS: standard vs FIFO topics, attributes, retries, encryption

Setting What it does Notes / gotcha
Topic type Standard (high throughput, at-least-once, best-effort order) vs FIFO (strict order, exactly-once, name ends .fifo) FIFO topics deliver only to FIFO SQS queues; throughput is capped like FIFO queues
Message attributes Key/value metadata sent alongside the body Drive filter policies; also used by SQS/Lambda consumers
Message structure A single string, or a JSON map keyed by protocol Lets you send different text to email vs SMS vs SQS in one publish
Delivery retry policy Retry schedule for HTTP/S endpoints (immediate, pre-backoff, backoff, post-backoff phases) Tune for flaky webhooks; SQS/Lambda use AWS-internal retries
Subscription DLQ A redrive policy on the subscription Catches messages SNS fails to deliver to that endpoint — essential for HTTP/Lambda subscribers
Encryption SSE with an AWS-managed or customer-managed KMS key For SNS→SQS with KMS, the queue’s key policy must allow the SNS service principal
Message size Up to 256 KB (same as SQS) Use the payload-offloading pattern (S3 pointer) for larger
Access policy Resource policy on the topic Controls who may publish and subscribe; needed for cross-account and for EventBridge/S3 to publish

SNS in the CLI

# Create a standard topic
aws sns create-topic --name order-events

# Subscribe a queue, then attach a filter policy so it only gets high-value placements
aws sns subscribe --topic-arn "$TOPIC_ARN" --protocol sqs --notification-endpoint "$QUEUE_ARN"
aws sns set-subscription-attributes \
  --subscription-arn "$SUB_ARN" \
  --attribute-name FilterPolicy \
  --attribute-value '{"eventType":["order_placed"],"amount":[{"numeric":[">=",100]}]}'

# Publish with a message attribute that the filter policy reads
aws sns publish \
  --topic-arn "$TOPIC_ARN" \
  --message '{"orderId":"A-1001","amount":250}' \
  --message-attributes '{"eventType":{"DataType":"String","StringValue":"order_placed"},"amount":{"DataType":"Number","StringValue":"250"}}'

Part 3 — Amazon EventBridge (the event router)

Amazon EventBridge is a fully managed event bus and router. Events — from AWS services, your own applications, or SaaS partners — arrive on a bus; rules match them with rich content-based event patterns; and each matching rule routes the event to one or more targets (Lambda, SQS, SNS, Step Functions, API destinations, another bus, and ~20 more). Where SNS broadcasts to subscribers and SQS buffers work, EventBridge routes by content and is the connective tissue of event-driven architectures.

EventBridge: event buses (default, custom, partner)

Bus type What it is When to use
Default event bus Auto-created per account/Region; receives all AWS service events (EC2 state changes, S3 events via the service, CodePipeline, etc.) Reacting to AWS’s own events
Custom event bus A bus you create for your application’s domain events Your events — isolate domains, apply separate policies, route cross-account
Partner event bus Created when you connect a SaaS partner (e.g. a payments or auth provider) Ingesting third-party SaaS events natively

A best practice is a custom bus per bounded context (e.g. orders-bus, payments-bus): it keeps your domain events off the noisy default bus, lets you scope IAM and rules per domain, and makes cross-account routing (bus-to-bus) clean.

EventBridge: rules and event patterns

A rule has two parts: a match (when it fires) and targets (where the event goes). The match is either a schedule (see Scheduler below) or an event pattern — a JSON document compared structurally against incoming events. An event looks like this (the envelope is standard; detail is yours):

{
  "source": "com.acme.orders",
  "detail-type": "OrderPlaced",
  "detail": { "amount": 250, "region": "eu-west-1", "tier": "gold" }
}

A pattern matches by example, with operators:

{
  "source": ["com.acme.orders"],
  "detail-type": ["OrderPlaced"],
  "detail": {
    "amount": [{ "numeric": [">=", 100] }],
    "tier":   ["gold", "platinum"],
    "region": [{ "anything-but": ["test-region"] }]
  }
}

Pattern matching supports exact, prefix/suffix, anything-but, numeric ranges, exists, IP-address, and $or operators — far richer than SNS attribute filtering because it matches on the whole event structure, not just flat attributes. Each rule can have up to 5 targets; one bus can hold up to 300 rules (a soft limit you can raise).

Two more rule features matter:

EventBridge: the schema registry

The schema registry discovers and stores the structure of your events. EventBridge can infer schemas from events flowing on a bus, you can register your own (OpenAPI/JSONSchema), and it hosts schemas for all AWS service events. From a schema you generate code bindings for your language, so producers and consumers share a typed contract and evolve events safely. This is what keeps a large event-driven estate from descending into guesswork about “what fields does this event actually have”.

EventBridge: Pipes (point-to-point with filter/enrich/transform)

EventBridge Pipes create a point-to-point integration from a source to a target with optional filtering, enrichment, and transformation in between — no glue code. A pipe is the managed answer to “read from X, optionally filter and enrich each record, write to Y”:

Pipes replace a mountain of “Lambda that reads a stream, filters, calls an API, and forwards” boilerplate with configuration. Reach for a Pipe when the shape is one-source-to-one-target with light processing; reach for a bus + rules when one event must fan out to many targets by content.

EventBridge: Scheduler (cron and one-time schedules)

EventBridge Scheduler is a dedicated, serverless scheduler — the modern replacement for the old “scheduled rule” on a bus. It fires events on a cron or rate expression, or one time at a specific timestamp, to any of the ~270 AWS targets, with time zones, flexible time windows, retries, and a DLQ. It scales to millions of schedules and is the right tool for “run this every night”, “poll that API every 5 minutes”, or “fire this once next Tuesday”. Prefer Scheduler over a scheduled EventBridge rule for new work — it is purpose-built, supports far more targets, and does not consume your bus’s rule budget.

EventBridge in the CLI

# A custom bus for the orders domain
aws events create-event-bus --name orders-bus

# A rule that matches high-value gold/platinum placements and routes to a Lambda
aws events put-rule \
  --name high-value-orders \
  --event-bus-name orders-bus \
  --event-pattern '{
    "source": ["com.acme.orders"],
    "detail-type": ["OrderPlaced"],
    "detail": { "amount": [{ "numeric": [">=", 100] }], "tier": ["gold","platinum"] }
  }'

aws events put-targets \
  --rule high-value-orders --event-bus-name orders-bus \
  --targets "Id=1,Arn=$LAMBDA_ARN"

# Publish a custom event onto the bus
aws events put-events --entries '[{
  "Source":"com.acme.orders","DetailType":"OrderPlaced","EventBusName":"orders-bus",
  "Detail":"{\"amount\":250,\"tier\":\"gold\",\"region\":\"eu-west-1\"}"
}]'

# A nightly schedule (note: EventBridge Scheduler is a separate API — aws scheduler)
aws scheduler create-schedule \
  --name nightly-rollup \
  --schedule-expression 'cron(0 2 * * ? *)' \
  --schedule-expression-timezone 'Europe/London' \
  --flexible-time-window '{"Mode":"OFF"}' \
  --target "{\"Arn\":\"$LAMBDA_ARN\",\"RoleArn\":\"$SCHEDULER_ROLE_ARN\"}"

Part 4 — Choosing: SQS vs SNS vs EventBridge vs Kinesis

This is the decision a certification exam and an interviewer will both put to you. Read the table, then the heuristics.

Dimension SQS SNS EventBridge Kinesis Data Streams
Model Queue (point-to-point) Pub/sub topic Event bus / router Streaming log
Delivery Pull (consumers poll) Push to subscribers Push via rules to targets Pull by shard position
Consumers per message One consumer Many (one copy each) Many (per matching rule/target) Many independent readers
Ordering FIFO option (per group) FIFO topic option Best-effort Strict per shard
Filtering / routing None (one queue = one stream) Attribute/body filter policies Rich content-based patterns None (consumer filters)
Replay / history No (delete after read) No Archive & replay (feature) Yes — re-read within retention
Retention 1 min – 14 days None (delivers then forgets) Archive as configured 24 h – 365 days
AWS/SaaS source integration Limited (targets) Some ~Dozens of native sources Producers via API/Firehose
Throughput shape Near-unlimited (standard) Very high High Provisioned/on-demand shards
Pricing Per request Per publish + per delivery Per event published (AWS-service events to default bus are free) Per shard-hour + payload
Best at Durable work buffering, decoupling, smoothing spikes Fan-out / notifications Event-driven routing & integration High-rate ordered, replayable data

Heuristics that resolve most cases:

And the combinations matter more than the rivalry: real systems wire these together — EventBridge routes an event to an SQS queue for durable processing; SNS fans out to several SQS queues; a Pipe reads a Kinesis stream, filters, and writes to an EventBridge bus. These are tools in one toolbox, not four contestants.

A quick word on SNS vs EventBridge, the most common confusion: choose SNS for high-throughput, low-latency fan-out to a known set of subscribers (especially SQS/Lambda) and for SMS/email/mobile push; choose EventBridge when you need content-based routing, many AWS-service or SaaS sources, schema discovery, or archive/replay. SNS is the faster, simpler broadcaster; EventBridge is the richer router.


Part 5 — The SNS → SQS fan-out pattern

This is the single most important pattern in AWS messaging, and it appears on every relevant exam. The goal: one event, multiple independent consumers, each with its own durable buffer.

The shape: a publisher sends to one SNS topic; several SQS queues subscribe to it; each queue is drained by its own consumer (a Lambda, an ECS service, a worker fleet). One publish fans out to all queues at once, and because each consumer reads from its own queue, a slow or failing consumer never affects the others — its messages simply wait in its queue. Add a new consumer tomorrow by subscribing a new queue; the publisher never changes.

AWS messaging: SQS, SNS, EventBridge

The diagram shows the three building blocks side by side — SQS pulling work to a single consumer, SNS pushing one publish to many subscribers, and EventBridge routing by event pattern to multiple targets — and then the composite SNS-to-SQS fan-out where one topic feeds several durable queues.

Why SNS → SQS and not SNS → Lambda directly? Because SNS delivers with retries but no durable storage: if a direct Lambda/HTTP subscriber is throttled or down past the retry policy, that copy is lost (absent a subscription DLQ). Inserting an SQS queue per consumer gives each one a durable buffer that absorbs spikes and holds messages until the consumer recovers. You also get per-queue filter policies (each consumer subscribes with its own filter and receives only its subset) and per-queue DLQs for poison messages.

The one configuration that trips everyone up is the queue resource policy. For SNS to deliver into a queue, the queue’s access policy must allow the SNS service principal to sqs:SendMessage, conditioned on the topic ARN:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": { "Service": "sns.amazonaws.com" },
    "Action": "sqs:SendMessage",
    "Resource": "arn:aws:sqs:eu-west-1:111122223333:orders-fulfilment",
    "Condition": { "ArnEquals": { "aws:SourceArn": "arn:aws:sns:eu-west-1:111122223333:order-events" } }
  }]
}

Two more must-dos for production fan-out:

That is the whole pattern: a topic, N queues each with a resource policy (and optional filter policy and DLQ), raw message delivery, and one consumer per queue. The advanced lesson sqs-sns-fan-out-fifo-ordering-dlq-poison-message-handling takes it further with FIFO ordering across the fan-out and idempotent-consumer patterns.

Hands-on lab: build SNS → SQS fan-out and watch it work

You will create one SNS topic, two SQS queues subscribed to it, set the resource policies, publish once, and confirm both queues received the message. Everything here is comfortably inside the AWS Free Tier (SQS: 1M requests/month free; SNS: 1M publishes/month free).

1. Set variables and create the topic and two queues.

ACID=$(aws sts get-caller-identity --query Account --output text)
REGION=$(aws configure get region); REGION=${REGION:-eu-west-1}

TOPIC_ARN=$(aws sns create-topic --name lab-order-events --query TopicArn --output text)

for q in lab-fulfilment lab-analytics; do
  aws sqs create-queue --queue-name "$q" \
    --attributes '{"ReceiveMessageWaitTimeSeconds":"20","VisibilityTimeout":"60"}' >/dev/null
done
FUL_URL=$(aws sqs get-queue-url --queue-name lab-fulfilment --query QueueUrl --output text)
ANA_URL=$(aws sqs get-queue-url --queue-name lab-analytics  --query QueueUrl --output text)
FUL_ARN=$(aws sqs get-queue-attributes --queue-url "$FUL_URL" --attribute-names QueueArn --query Attributes.QueueArn --output text)
ANA_ARN=$(aws sqs get-queue-attributes --queue-url "$ANA_URL" --attribute-names QueueArn --query Attributes.QueueArn --output text)

2. Allow SNS to send to each queue (the resource policy), then subscribe with raw delivery.

for pair in "$FUL_URL|$FUL_ARN" "$ANA_URL|$ANA_ARN"; do
  URL=${pair%%|*}; ARN=${pair##*|}
  POLICY=$(printf '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"sns.amazonaws.com"},"Action":"sqs:SendMessage","Resource":"%s","Condition":{"ArnEquals":{"aws:SourceArn":"%s"}}}]}' "$ARN" "$TOPIC_ARN")
  aws sqs set-queue-attributes --queue-url "$URL" --attributes "{\"Policy\":$(printf '%s' "$POLICY" | jq -Rs .)}"
  SUB=$(aws sns subscribe --topic-arn "$TOPIC_ARN" --protocol sqs --notification-endpoint "$ARN" --query SubscriptionArn --output text)
  aws sns set-subscription-attributes --subscription-arn "$SUB" --attribute-name RawMessageDelivery --attribute-value true
done

3. Publish ONE message and confirm BOTH queues received it.

aws sns publish --topic-arn "$TOPIC_ARN" \
  --message '{"orderId":"A-1001","amount":250}' \
  --message-attributes '{"eventType":{"DataType":"String","StringValue":"order_placed"}}'

# Each queue should report 1 message — fan-out delivered a copy to both
for URL in "$FUL_URL" "$ANA_URL"; do
  echo "Queue: $URL"
  aws sqs receive-message --queue-url "$URL" --wait-time-seconds 20 --max-number-of-messages 1 \
    --query 'Messages[0].Body' --output text
done

Expected output (validation). Each queue prints the same body — {"orderId":"A-1001","amount":250} — proving the single publish fanned out to both durable queues. Because you enabled raw message delivery, the body is your exact JSON with no SNS envelope. (Without step 2’s resource policy, the subscribe would succeed but no messages would arrive — the classic fan-out failure.)

4. Cleanup. Remove everything so nothing lingers.

aws sns delete-topic --topic-arn "$TOPIC_ARN"
aws sqs delete-queue --queue-url "$FUL_URL"
aws sqs delete-queue --queue-url "$ANA_URL"

Cost note. Within the Free Tier this lab is free (a handful of SNS publishes and SQS requests against millions of free monthly requests). Even beyond the Free Tier the cost is a fraction of a cent — SQS and SNS are billed per request/publish at roughly $0.40–$0.50 per million. The only way to run up a bill with these services is short-polling a busy queue in a tight loop (millions of empty receives) — which is exactly why long polling is the default recommendation.

Common mistakes & troubleshooting

Symptom Likely cause Fix
Messages reappear and process twice Consumer didn’t delete after processing, or visibility timeout < processing time Always delete on success; set visibility timeout > worst-case processing (≥ 6× for Lambda)
High SQS cost / empty receives Short polling in a tight loop Set ReceiveMessageWaitTimeSeconds=20 (long polling)
SNS → SQS fan-out delivers nothing Queue resource policy doesn’t allow the SNS principal Add the sqs:SendMessage allow for sns.amazonaws.com conditioned on the topic ARN
Consumer sees an SNS envelope instead of the raw body RawMessageDelivery not enabled on the subscription Set RawMessageDelivery=true on each SQS subscription
Encrypted fan-out queue receives nothing SSE-SQS used, or KMS key policy missing SNS permissions Use SSE-KMS and allow sns.amazonaws.com kms:GenerateDataKey/Decrypt on the key
A poison message blocks the queue forever No DLQ / maxReceiveCount set Attach a DLQ with maxReceiveCount 3–5; alarm on DLQ depth
FIFO throughput far lower than expected One shared MessageGroupId serialises everything Use a higher-cardinality group ID; enable high-throughput FIFO
EventBridge target never fires Event pattern doesn’t match, or target IAM/permission missing Test the pattern against a sample event; check the target’s resource policy / rule role; add a target DLQ to catch failures
Messages disappear before processing Retention period elapsed while undelivered Increase MessageRetentionPeriod (max 14 days) and fix the stuck consumer

Best practices

Security notes

Interview & exam questions

1. What problem does messaging solve, in one sentence? It decouples producers from consumers so each can scale, fail, and deploy independently — absorbing spikes and preventing one slow component from taking down others.

2. SQS vs SNS — the core difference? SQS is a queue: pull-based, one consumer processes each message. SNS is a topic: push-based, every subscriber gets its own copy. SQS is for durable work; SNS is for fan-out/notification.

3. Explain the visibility timeout. When a consumer receives a message, SQS hides it for the visibility timeout instead of deleting it; the consumer must explicitly delete it after success. If the consumer crashes, the timeout expires and the message reappears for another consumer. Set the timeout longer than worst-case processing (≥ 6× the Lambda timeout) — too short causes double-processing, too long strands crashed work.

4. Long polling vs short polling — which and why? Long polling (WaitTimeSeconds up to 20). It waits for a message before returning, drastically cutting empty responses, cost, and latency. Short polling returns immediately and may be empty even when messages exist. Long polling is the default recommendation.

5. Standard vs FIFO SQS — when FIFO? FIFO when you need strict ordering (within a message group) or exactly-once processing. The cost is lower throughput (300/3,000 TPS, or more with high-throughput mode) and the need for MessageGroupId and dedup. Default to standard otherwise for near-unlimited scale.

6. What is a dead-letter queue and when does a message land there? A DLQ catches poison messages that fail processing. With a redrive policy (maxReceiveCount of, say, 5), a message that is received that many times without being deleted moves to the DLQ for investigation and later redrive, instead of looping forever and blocking the queue.

7. What is the SNS-to-SQS fan-out pattern, and why insert SQS rather than subscribe Lambda directly? A topic with multiple SQS queues subscribed, each drained by its own consumer. SNS delivery has retries but no durable storage, so a direct Lambda/HTTP subscriber that’s down can lose its copy. SQS gives each consumer a durable buffer, plus per-queue filtering and DLQs. (Don’t forget the queue resource policy allowing the SNS principal and raw message delivery.)

8. SNS vs EventBridge — when each? SNS for high-throughput, low-latency fan-out to known subscribers (SQS/Lambda) and for SMS/email/mobile push. EventBridge for content-based routing, AWS-service/SaaS event sources, schema discovery, and archive/replay. SNS is the faster broadcaster; EventBridge is the richer router.

9. When would you choose Kinesis over SQS or EventBridge? For a high-rate, ordered, replayable stream read by multiple independent consumers by position — analytics, clickstreams, log/metric pipelines. SQS deletes after read (no replay); EventBridge routes events but isn’t a streaming log. Kinesis retains records (up to 365 days) and preserves order per shard.

10. How do SNS filter policies and EventBridge event patterns differ? SNS filter policies match on a subscription against message attributes (or, with body scope, body fields) — flat key/value matching. EventBridge patterns match the whole event structure with richer operators (numeric, prefix, anything-but, exists, $or, IP) and route to targets. EventBridge filtering is more powerful; SNS filtering is lighter and faster for fan-out.

11. What are EventBridge Pipes and when do you use one? A point-to-point integration (source → optional filter → optional enrichment → target) with no glue code. Use a Pipe for one-source-to-one-target processing of SQS/Kinesis/DynamoDB-Streams/MQ/Kafka; use a bus + rules when one event must fan out to many targets by content.

12. EventBridge Scheduler vs a scheduled rule? Scheduler is the purpose-built, serverless scheduler — cron/rate/one-time, time zones, flexible windows, retries, a DLQ, ~270 targets, and millions of schedules — and it doesn’t consume your bus’s rule budget. Prefer it over the legacy scheduled rule for all new cron/one-time work.

Quick check

  1. A consumer received a message but crashed before deleting it. What happens, and which setting governs it?
  2. You need one “user signed up” event to trigger an email Lambda, a welcome-kit fulfilment queue, and an analytics pipeline, each isolated from the others’ failures. Which pattern and services?
  3. Your standard SQS queue is racking up cost from empty ReceiveMessage calls. What is the one-line fix?
  4. You must route events to different targets based on the value of a field inside the event body, and also react to S3 and EC2 service events. Which service?
  5. You need strict per-customer ordering and no duplicate processing. Which queue type and which two message fields?

Answers

  1. After the visibility timeout expires, the message becomes visible again and another consumer receives it — nothing is lost. The visibility timeout governs how long it stays hidden.
  2. SNS → SQS fan-out: publish to one SNS topic; subscribe three SQS queues (one per consumer); drain each with its own consumer. A failure in one consumer leaves the others untouched because each reads from its own durable queue.
  3. Enable long polling — set ReceiveMessageWaitTimeSeconds to 20.
  4. Amazon EventBridge — content-based event patterns for routing, and native ingestion of AWS-service events (S3, EC2, etc.).
  5. A FIFO queue, using MessageGroupId (per customer, for ordering) and MessageDeduplicationId / content-based dedup (for exactly-once).

Exercise

Design the messaging for a small e-commerce order flow and justify each choice in writing:

  1. An order is placed and must (a) be processed for fulfilment exactly once at the warehouse’s pace, (b) trigger a confirmation email immediately, and © feed an analytics stream you may want to replay when you change the model. Pick the service for each of (a), (b), © and explain why.
  2. Wire (a) and (b) off a single publish using SNS → SQS fan-out. Write the queue resource policy the fulfilment queue needs and explain the aws:SourceArn condition.
  3. Add a DLQ to the fulfilment queue with a sensible maxReceiveCount, and a CloudWatch alarm on its depth. Explain what a non-zero DLQ tells you.
  4. The fulfilment job sometimes takes up to 4 minutes. Choose a visibility timeout and explain the failure modes of setting it too low and too high.
  5. Marketing wants only orders over ₹10,000 to hit a “VIP” queue. Add a filter policy (or EventBridge pattern) that achieves this and explain where the filtering happens and why that’s cheaper than filtering in code.

Write your answers as if defending them in a design review — the reasoning is the point.

Certification mapping

Across all three: know at-least-once vs exactly-once, the visibility-timeout mechanics, the fan-out resource policy, and the four-way decision cold.

Glossary

Next steps

AWSSQSSNSEventBridgeMessagingEvent-Driven
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading