Every system that grows past a single process eventually needs its components to talk to each other without being chained together. The order service should not fall over because the email service is slow. The image uploader should not block while three downstream jobs run. A new analytics consumer should be able to start listening to “order placed” events tomorrow without anyone touching the order service. The technique that makes all of this possible is asynchronous messaging — putting a durable, managed layer between the producer of an event and its consumers so that each can scale, fail, and deploy independently. This is decoupling, and on AWS it is delivered by three services you must know cold: Amazon SQS, Amazon SNS, and Amazon EventBridge.
They are easy to confuse because they all “move messages around”, but they solve genuinely different problems. SQS is a queue: one producer (or many) drops messages in, and consumers pull them out and process them one batch at a time — work that must be done exactly once, durably, at the consumer’s own pace. SNS is a pub/sub topic: a producer pushes one message and SNS fans it out to every subscriber at once — one event, many interested parties, delivered immediately. EventBridge is an event router: events flow onto a bus and rules match them with rich content-based filters and route each to the right target — the backbone of event-driven architectures, with deep AWS-service and SaaS integration. A fourth service, Amazon Kinesis, handles streaming — ordered, replayable, high-throughput records read by position — and it rounds out the decision.
This lesson is the fundamentals layer that ties the advanced messaging lessons together. We will walk every core setting of SQS, SNS, and EventBridge — the ones an interviewer probes and a certification exam tests — then put them side by side in a decision table, then build the single most important pattern in AWS messaging: SNS-to-SQS fan-out. By the end you will reach for the right service on instinct and be able to justify the choice out loud.
Learning objectives
By the end of this lesson you will be able to:
- Explain queue vs pub/sub vs event-router vs stream and pick the correct AWS service for a given requirement.
- Configure an SQS queue end to end — standard vs FIFO, visibility timeout, long polling, dead-letter queues, retention, message size, delay, and redrive.
- Configure an SNS topic end to end — subscriptions and protocols, fan-out, filter policies, FIFO topics, message attributes, and delivery retries.
- Configure EventBridge — event buses (default, custom, partner), rules and event patterns, input transformers, the schema registry, Pipes, and Scheduler.
- Compare SQS vs SNS vs EventBridge vs Kinesis across delivery model, ordering, fan-out, filtering, replay, and cost.
- Build and reason about the SNS-to-SQS fan-out pattern, including the topic resource policy and per-queue filtering.
- Apply messaging best practices and security controls (encryption, least-privilege policies, idempotency, DLQs).
Prerequisites & where this fits
You need an AWS account, the AWS CLI configured (aws configure), and a working grasp of IAM — every one of these services is gated by IAM identity policies and (for SQS topics and queues) resource policies, and the fan-out pattern lives or dies on getting a resource policy right. A little familiarity with Lambda helps, because Lambda is the most common consumer of all three, but it is not required to follow the configuration. This is a Serverless lesson in the AWS Zero-to-Hero course, and it is deliberately the foundation beneath the two advanced messaging lessons: once you finish here, the resilient-messaging deep dive (sqs-sns-fan-out-fifo-ordering-dlq-poison-message-handling) takes FIFO, DLQs, and idempotent consumers further, and the event-driven-architecture deep dive (eventbridge-event-driven-architecture-buses-schema-pipes) takes EventBridge buses, schemas, and archive/replay further. After this lesson the course turns to operations with the AWS troubleshooting playbooks (aws-troubleshooting-methodology-ec2-vpc-iam-s3-lambda).
Core concepts: queue, topic, bus, stream
Before the settings, fix the four mental models. The vocabulary recurs throughout and the distinctions are exactly what gets tested.
- Message — a unit of data moved through the system: a JSON blob, a small payload, an event. It has a body and optional attributes (metadata key/value pairs used for filtering and routing).
- Producer / publisher — the component that creates a message and hands it to AWS.
- Consumer / subscriber — the component that receives and processes a message.
- Queue (point-to-point) — a durable buffer. A producer sends a message; one consumer receives and then deletes it. Each message is processed by exactly one consumer. This is pull-based: consumers poll for work. SQS is the queue.
- Topic (publish/subscribe) — a broadcast channel. A producer publishes one message and every subscriber gets its own copy, delivered immediately. This is push-based. SNS is the topic.
- Event bus / router — a pipe that events flow onto, where rules inspect each event’s content and route matching events to one or more targets. Filtering is rich (match on any field, not just a tag), and there are dozens of native AWS and SaaS sources and targets. EventBridge is the router.
- Stream (log) — an ordered, append-only sequence of records split into shards, retained for a window, where consumers read by position and can re-read (replay) history. Multiple consumers read the same records independently. Kinesis (and Kafka via MSK) is the stream.
The single most useful way to keep them straight is the delivery direction and copy count:
- SQS — pull, one consumer per message. “I have work to do; let me drain it reliably at my own pace.”
- SNS — push, N copies, one per subscriber. “This happened; tell everyone who cares, right now.”
- EventBridge — push via rules, N targets per matching rule, with content filtering. “Route this event to the right places based on what it contains.”
- Kinesis — pull by position, ordered, replayable, many independent readers. “A high-rate stream of records I may want to read again or process in order.”
A second concept to internalise now is at-least-once delivery. SQS standard queues, SNS, and EventBridge all guarantee a message is delivered at least once — which means a duplicate is always possible (a network blip causes a redelivery). Your consumers must therefore be idempotent: processing the same message twice must produce the same result as processing it once. Only FIFO queues and topics add exactly-once processing within their throughput limits. Design for duplicates from day one; do not bolt idempotency on after an incident.
Part 1 — Amazon SQS (the queue)
Amazon Simple Queue Service (SQS) is a fully managed message queue. Producers send messages; SQS stores them durably across multiple Availability Zones; consumers poll, process, and delete them. There are no servers to run and effectively no capacity to provision — SQS scales to any throughput. Its job is to decouple and buffer: it absorbs spikes, lets a slow consumer catch up, and ensures that work is not lost if a consumer crashes mid-process.
SQS: standard vs FIFO queues
The first and most consequential choice is the queue type, fixed at creation and immutable thereafter.
| Standard queue | FIFO queue | |
|---|---|---|
| Ordering | Best-effort — usually in order, not guaranteed | Strict first-in-first-out within a message group |
| Delivery | At-least-once (duplicates possible) | Exactly-once processing (dedup within a 5-min window) |
| Throughput | Nearly unlimited (near-infinite TPS) | 300 msg/s without batching, 3,000 msg/s with batching of 10; high-throughput mode raises this substantially |
| Name requirement | Any name | Must end in .fifo |
| Dedup | None | Content-based (SHA-256 of body) or explicit MessageDeduplicationId |
| Use when | Maximum throughput, order does not matter, consumers are idempotent | Order matters (e.g. per-account transactions) or duplicates are unacceptable |
The decision in one line: default to standard for its unlimited scale and only choose FIFO when business correctness demands strict ordering or exactly-once. FIFO buys ordering with throughput limits and a small amount of extra ceremony (group IDs and dedup IDs).
Two FIFO terms you must know:
- MessageGroupId — the ordering boundary. SQS guarantees order within a group, and processes different groups in parallel. Choose a group ID with the right granularity:
customer-123keeps each customer’s events ordered while letting different customers flow concurrently. A single shared group ID serialises everything and throttles you. - MessageDeduplicationId — the exactly-once key. Two messages with the same dedup ID within the 5-minute deduplication interval are treated as one. Either send it explicitly or enable content-based deduplication to have SQS hash the body for you.
SQS: visibility timeout (the concept interviewers love)
When a consumer receives a message, SQS does not delete it — it makes the message invisible to other consumers for the visibility timeout, then waits for the consumer to explicitly delete it after successful processing. This is the heart of SQS’s reliability: if the consumer crashes before deleting, the timeout expires, the message reappears, and another consumer picks it up. Nothing is lost.
| Setting | What it is | Range / default | When to change | Gotcha |
|---|---|---|---|---|
| Visibility timeout | How long a received message stays hidden before reappearing | 0 s – 12 h; default 30 s | Set to longer than your worst-case processing time | Too short → message reappears mid-processing and is processed twice; too long → a crashed consumer’s message is stuck invisible for ages |
The rule: visibility timeout > maximum processing time, with margin. If processing might take 4 minutes, do not leave it at 30 seconds — set 6 minutes. For variable workloads, a consumer can call ChangeMessageVisibility to extend the timeout (a “heartbeat”) while it is still working, rather than over-provisioning a single large value. When SQS triggers a Lambda consumer, AWS recommends the queue’s visibility timeout be at least 6× the function timeout to avoid premature redelivery during retries.
The processing contract, then, is: receive → process → delete. Forgetting the final delete is the classic bug: the message reappears after the timeout and is processed again forever. (Lambda’s SQS event source mapping deletes successfully processed messages for you on a clean return — but if your handler throws, the whole batch becomes visible again unless you report partial batch failures.)
SQS: long polling vs short polling
When a consumer calls ReceiveMessage, how SQS waits matters for both latency and cost.
| Mode | Behaviour | Cost / latency | When |
|---|---|---|---|
| Short polling | Returns immediately, sampling a subset of servers — may return empty even when messages exist | Many empty receives → more API calls, higher cost | Almost never the right default |
| Long polling | Waits up to WaitTimeSeconds (1–20 s) for a message to arrive before returning | Far fewer empty responses, lower cost, lower latency | Always prefer — set ReceiveMessageWaitTimeSeconds to 20 |
Long polling is the single cheapest win in SQS. Set the queue’s ReceiveMessageWaitTimeSeconds to 20 (the maximum) — or pass WaitTimeSeconds on the receive call — and SQS holds the connection open until a message arrives or the timer expires, instead of hammering the API with empty short-poll responses that you still pay for. There is essentially no downside; turn it on by default.
SQS: dead-letter queues and redrive
A dead-letter queue (DLQ) is an ordinary SQS queue that catches messages a consumer could not process after repeated attempts — a poison message (malformed body, a bug, a permanently failing downstream). Without a DLQ, a poison message would be received, fail, reappear after the visibility timeout, fail again, forever — blocking the queue and burning money.
You attach a DLQ to a source queue via its redrive policy:
| Setting | What it is | Notes |
|---|---|---|
deadLetterTargetArn |
The DLQ to send exhausted messages to | Must be the same type (FIFO source → FIFO DLQ) and same Region/account |
maxReceiveCount |
How many times a message may be received before moving to the DLQ | Typically 3–5; counts receives, not retries elapsed |
Once messages land in the DLQ you investigate them, fix the cause, and then redrive them back to the source queue (or to a custom destination) using DLQ redrive — no custom script required. Treat the DLQ as a monitored signal: a CloudWatch alarm on ApproximateNumberOfMessagesVisible > 0 on the DLQ tells you something is broken now.
SQS: the rest of the create-time settings
| Setting | What it does | Range / default | Notes / gotcha |
|---|---|---|---|
| Message retention period | How long an undelivered message is kept | 60 s – 14 days; default 4 days | After this, undelivered messages are silently deleted — size it to your worst recovery window |
| Maximum message size | Largest message body | 1 KB – 256 KB; default 256 KB | For larger payloads use the SQS Extended Client (store the body in S3, send a pointer) |
| Delivery delay | Hide new messages for a fixed time after sending | 0 s – 15 min; default 0 | Queue-wide; per-message DelaySeconds works on standard queues (not FIFO) |
| Receive message wait time | Long-polling wait | 0–20 s; default 0 | Set to 20 (see above) |
| Encryption | At-rest encryption | SSE-SQS (default, AWS-managed) or SSE-KMS (your CMK) | KMS gives you key control and audit; charges per API call |
| Access policy | Resource (queue) policy | — | Who may send/receive; required for cross-account and for SNS/EventBridge to send to the queue |
| High-throughput FIFO | Raises FIFO TPS | Off by default | Opt-in per FIFO queue when 300/3,000 TPS is not enough |
A note on delay queues vs visibility timeout: a delay hides a message when it is first sent (useful to defer work); a visibility timeout hides a message after it has been received (to allow processing). They are different controls for different moments — do not conflate them.
SQS in the CLI
# Create a standard queue with long polling, a 5-minute visibility timeout, and KMS encryption
aws sqs create-queue \
--queue-name orders-work \
--attributes '{
"VisibilityTimeout":"300",
"ReceiveMessageWaitTimeSeconds":"20",
"MessageRetentionPeriod":"345600",
"SqsManagedSseEnabled":"true"
}'
# Attach a dead-letter queue after 5 failed receives
aws sqs set-queue-attributes \
--queue-url "$QUEUE_URL" \
--attributes '{"RedrivePolicy":"{\"deadLetterTargetArn\":\"'"$DLQ_ARN"'\",\"maxReceiveCount\":\"5\"}"}'
# Send, receive (long poll), then delete — the receive→process→delete contract
aws sqs send-message --queue-url "$QUEUE_URL" --message-body '{"orderId":"A-1001"}'
aws sqs receive-message --queue-url "$QUEUE_URL" --wait-time-seconds 20 --max-number-of-messages 10
aws sqs delete-message --queue-url "$QUEUE_URL" --receipt-handle "$RECEIPT_HANDLE"
Part 2 — Amazon SNS (the topic)
Amazon Simple Notification Service (SNS) is fully managed publish/subscribe messaging. A publisher sends one message to a topic; SNS immediately pushes a copy to every subscription on that topic. There is no polling and no storage you manage — SNS delivers and (with retries) forgets. Its job is fan-out and notification: one event, many recipients, delivered now.
SNS: topics, subscriptions, and protocols
A topic is the access point publishers send to and subscribers attach to. A subscription is one delivery endpoint with a protocol.
| Protocol | Endpoint | Typical use |
|---|---|---|
| SQS | A queue ARN | Fan-out — the canonical durable pattern (Part 5) |
| Lambda | A function ARN | Trigger code on every message |
| HTTP / HTTPS | A URL | Webhooks into your own or third-party services |
| Email / Email-JSON | An address | Human notifications (must be confirmed) |
| SMS | A phone number | Text alerts (no subscription needed for direct publish) |
| Application | A mobile push endpoint | iOS/Android push via platform endpoints |
| Kinesis Data Firehose | A delivery-stream ARN | Archive every message to S3/Redshift/OpenSearch |
Most subscriptions require confirmation: SNS sends a confirmation request to the endpoint, and only a confirmed subscription receives messages (this prevents you subscribing an endpoint you do not control). SQS and Lambda subscriptions are auto-confirmed when the permissions are in place.
SNS: the fan-out model
Fan-out is SNS’s reason to exist. One Publish call delivers to every subscriber simultaneously and independently — add a new consumer by adding a subscription, with zero change to the publisher. This is the textbook way to let many systems react to one event: an “order placed” message can, in a single publish, kick off the fulfilment queue, the analytics pipeline, and the customer-email Lambda at once.
The crucial production refinement is SNS → SQS rather than SNS → Lambda/HTTP directly. SNS delivery is push with retries but no long-term storage: if a subscribed endpoint is down past the retry policy, that copy is dropped (unless a subscription DLQ catches it). Subscribing SQS queues instead gives every consumer its own durable buffer — the message waits safely until that consumer is ready. We build this in Part 5.
SNS: filter policies (subscription filtering)
By default every subscriber gets every message. A filter policy is a JSON document on a subscription that tells SNS to deliver only messages whose attributes (or, with message-body filtering, whose body fields) match — so each subscriber receives only the subset it cares about, and you avoid fan-out-then-discard waste.
{
"eventType": ["order_placed", "order_cancelled"],
"amount": [{ "numeric": [">=", 100] }],
"region": [{ "anything-but": ["test"] }]
}
A message is delivered to that subscription only if all attributes match (AND across keys, OR within a key’s array). Filter policies support exact match, prefix, anything-but, numeric ranges, and existence checks. Set the FilterPolicyScope to MessageAttributes (default) or MessageBody to filter on the payload itself. Filtering at the subscription is far cheaper and cleaner than delivering everywhere and discarding in code.
SNS: standard vs FIFO topics, attributes, retries, encryption
| Setting | What it does | Notes / gotcha |
|---|---|---|
| Topic type | Standard (high throughput, at-least-once, best-effort order) vs FIFO (strict order, exactly-once, name ends .fifo) |
FIFO topics deliver only to FIFO SQS queues; throughput is capped like FIFO queues |
| Message attributes | Key/value metadata sent alongside the body | Drive filter policies; also used by SQS/Lambda consumers |
| Message structure | A single string, or a JSON map keyed by protocol | Lets you send different text to email vs SMS vs SQS in one publish |
| Delivery retry policy | Retry schedule for HTTP/S endpoints (immediate, pre-backoff, backoff, post-backoff phases) | Tune for flaky webhooks; SQS/Lambda use AWS-internal retries |
| Subscription DLQ | A redrive policy on the subscription | Catches messages SNS fails to deliver to that endpoint — essential for HTTP/Lambda subscribers |
| Encryption | SSE with an AWS-managed or customer-managed KMS key | For SNS→SQS with KMS, the queue’s key policy must allow the SNS service principal |
| Message size | Up to 256 KB (same as SQS) | Use the payload-offloading pattern (S3 pointer) for larger |
| Access policy | Resource policy on the topic | Controls who may publish and subscribe; needed for cross-account and for EventBridge/S3 to publish |
SNS in the CLI
# Create a standard topic
aws sns create-topic --name order-events
# Subscribe a queue, then attach a filter policy so it only gets high-value placements
aws sns subscribe --topic-arn "$TOPIC_ARN" --protocol sqs --notification-endpoint "$QUEUE_ARN"
aws sns set-subscription-attributes \
--subscription-arn "$SUB_ARN" \
--attribute-name FilterPolicy \
--attribute-value '{"eventType":["order_placed"],"amount":[{"numeric":[">=",100]}]}'
# Publish with a message attribute that the filter policy reads
aws sns publish \
--topic-arn "$TOPIC_ARN" \
--message '{"orderId":"A-1001","amount":250}' \
--message-attributes '{"eventType":{"DataType":"String","StringValue":"order_placed"},"amount":{"DataType":"Number","StringValue":"250"}}'
Part 3 — Amazon EventBridge (the event router)
Amazon EventBridge is a fully managed event bus and router. Events — from AWS services, your own applications, or SaaS partners — arrive on a bus; rules match them with rich content-based event patterns; and each matching rule routes the event to one or more targets (Lambda, SQS, SNS, Step Functions, API destinations, another bus, and ~20 more). Where SNS broadcasts to subscribers and SQS buffers work, EventBridge routes by content and is the connective tissue of event-driven architectures.
EventBridge: event buses (default, custom, partner)
| Bus type | What it is | When to use |
|---|---|---|
| Default event bus | Auto-created per account/Region; receives all AWS service events (EC2 state changes, S3 events via the service, CodePipeline, etc.) | Reacting to AWS’s own events |
| Custom event bus | A bus you create for your application’s domain events | Your events — isolate domains, apply separate policies, route cross-account |
| Partner event bus | Created when you connect a SaaS partner (e.g. a payments or auth provider) | Ingesting third-party SaaS events natively |
A best practice is a custom bus per bounded context (e.g. orders-bus, payments-bus): it keeps your domain events off the noisy default bus, lets you scope IAM and rules per domain, and makes cross-account routing (bus-to-bus) clean.
EventBridge: rules and event patterns
A rule has two parts: a match (when it fires) and targets (where the event goes). The match is either a schedule (see Scheduler below) or an event pattern — a JSON document compared structurally against incoming events. An event looks like this (the envelope is standard; detail is yours):
{
"source": "com.acme.orders",
"detail-type": "OrderPlaced",
"detail": { "amount": 250, "region": "eu-west-1", "tier": "gold" }
}
A pattern matches by example, with operators:
{
"source": ["com.acme.orders"],
"detail-type": ["OrderPlaced"],
"detail": {
"amount": [{ "numeric": [">=", 100] }],
"tier": ["gold", "platinum"],
"region": [{ "anything-but": ["test-region"] }]
}
}
Pattern matching supports exact, prefix/suffix, anything-but, numeric ranges, exists, IP-address, and $or operators — far richer than SNS attribute filtering because it matches on the whole event structure, not just flat attributes. Each rule can have up to 5 targets; one bus can hold up to 300 rules (a soft limit you can raise).
Two more rule features matter:
- Input transformer — reshape the event before it reaches the target (pull a few fields out of
detailand pass a custom JSON or string), so the target sees exactly what it expects. - Target DLQ + retries — each target has its own retry policy (max attempts, max event age) and an optional dead-letter queue for events that cannot be delivered. Configure these — undeliverable events are otherwise dropped.
EventBridge: the schema registry
The schema registry discovers and stores the structure of your events. EventBridge can infer schemas from events flowing on a bus, you can register your own (OpenAPI/JSONSchema), and it hosts schemas for all AWS service events. From a schema you generate code bindings for your language, so producers and consumers share a typed contract and evolve events safely. This is what keeps a large event-driven estate from descending into guesswork about “what fields does this event actually have”.
EventBridge: Pipes (point-to-point with filter/enrich/transform)
EventBridge Pipes create a point-to-point integration from a source to a target with optional filtering, enrichment, and transformation in between — no glue code. A pipe is the managed answer to “read from X, optionally filter and enrich each record, write to Y”:
- Sources: SQS, Kinesis, DynamoDB Streams, Amazon MQ, MSK/self-managed Kafka.
- Filter: drop records you do not care about before you pay to process them.
- Enrichment (optional): call a Lambda, Step Functions, API Gateway, or API destination to augment each record.
- Target: any EventBridge target (Lambda, Step Functions, an event bus, SQS, SNS, etc.).
Pipes replace a mountain of “Lambda that reads a stream, filters, calls an API, and forwards” boilerplate with configuration. Reach for a Pipe when the shape is one-source-to-one-target with light processing; reach for a bus + rules when one event must fan out to many targets by content.
EventBridge: Scheduler (cron and one-time schedules)
EventBridge Scheduler is a dedicated, serverless scheduler — the modern replacement for the old “scheduled rule” on a bus. It fires events on a cron or rate expression, or one time at a specific timestamp, to any of the ~270 AWS targets, with time zones, flexible time windows, retries, and a DLQ. It scales to millions of schedules and is the right tool for “run this every night”, “poll that API every 5 minutes”, or “fire this once next Tuesday”. Prefer Scheduler over a scheduled EventBridge rule for new work — it is purpose-built, supports far more targets, and does not consume your bus’s rule budget.
EventBridge in the CLI
# A custom bus for the orders domain
aws events create-event-bus --name orders-bus
# A rule that matches high-value gold/platinum placements and routes to a Lambda
aws events put-rule \
--name high-value-orders \
--event-bus-name orders-bus \
--event-pattern '{
"source": ["com.acme.orders"],
"detail-type": ["OrderPlaced"],
"detail": { "amount": [{ "numeric": [">=", 100] }], "tier": ["gold","platinum"] }
}'
aws events put-targets \
--rule high-value-orders --event-bus-name orders-bus \
--targets "Id=1,Arn=$LAMBDA_ARN"
# Publish a custom event onto the bus
aws events put-events --entries '[{
"Source":"com.acme.orders","DetailType":"OrderPlaced","EventBusName":"orders-bus",
"Detail":"{\"amount\":250,\"tier\":\"gold\",\"region\":\"eu-west-1\"}"
}]'
# A nightly schedule (note: EventBridge Scheduler is a separate API — aws scheduler)
aws scheduler create-schedule \
--name nightly-rollup \
--schedule-expression 'cron(0 2 * * ? *)' \
--schedule-expression-timezone 'Europe/London' \
--flexible-time-window '{"Mode":"OFF"}' \
--target "{\"Arn\":\"$LAMBDA_ARN\",\"RoleArn\":\"$SCHEDULER_ROLE_ARN\"}"
Part 4 — Choosing: SQS vs SNS vs EventBridge vs Kinesis
This is the decision a certification exam and an interviewer will both put to you. Read the table, then the heuristics.
| Dimension | SQS | SNS | EventBridge | Kinesis Data Streams |
|---|---|---|---|---|
| Model | Queue (point-to-point) | Pub/sub topic | Event bus / router | Streaming log |
| Delivery | Pull (consumers poll) | Push to subscribers | Push via rules to targets | Pull by shard position |
| Consumers per message | One consumer | Many (one copy each) | Many (per matching rule/target) | Many independent readers |
| Ordering | FIFO option (per group) | FIFO topic option | Best-effort | Strict per shard |
| Filtering / routing | None (one queue = one stream) | Attribute/body filter policies | Rich content-based patterns | None (consumer filters) |
| Replay / history | No (delete after read) | No | Archive & replay (feature) | Yes — re-read within retention |
| Retention | 1 min – 14 days | None (delivers then forgets) | Archive as configured | 24 h – 365 days |
| AWS/SaaS source integration | Limited (targets) | Some | ~Dozens of native sources | Producers via API/Firehose |
| Throughput shape | Near-unlimited (standard) | Very high | High | Provisioned/on-demand shards |
| Pricing | Per request | Per publish + per delivery | Per event published (AWS-service events to default bus are free) | Per shard-hour + payload |
| Best at | Durable work buffering, decoupling, smoothing spikes | Fan-out / notifications | Event-driven routing & integration | High-rate ordered, replayable data |
Heuristics that resolve most cases:
- Need work done once, reliably, at the consumer’s pace → SQS. (One worker per message; buffer spikes; retry via visibility timeout; DLQ poison messages.)
- One event, tell many subscribers right now → SNS (and subscribe SQS queues for durability).
- Route events to different places based on their content, or react to AWS-service / SaaS events → EventBridge.
- A high-rate stream you must read in order, fan out to several analytics consumers, and possibly replay → Kinesis.
And the combinations matter more than the rivalry: real systems wire these together — EventBridge routes an event to an SQS queue for durable processing; SNS fans out to several SQS queues; a Pipe reads a Kinesis stream, filters, and writes to an EventBridge bus. These are tools in one toolbox, not four contestants.
A quick word on SNS vs EventBridge, the most common confusion: choose SNS for high-throughput, low-latency fan-out to a known set of subscribers (especially SQS/Lambda) and for SMS/email/mobile push; choose EventBridge when you need content-based routing, many AWS-service or SaaS sources, schema discovery, or archive/replay. SNS is the faster, simpler broadcaster; EventBridge is the richer router.
Part 5 — The SNS → SQS fan-out pattern
This is the single most important pattern in AWS messaging, and it appears on every relevant exam. The goal: one event, multiple independent consumers, each with its own durable buffer.
The shape: a publisher sends to one SNS topic; several SQS queues subscribe to it; each queue is drained by its own consumer (a Lambda, an ECS service, a worker fleet). One publish fans out to all queues at once, and because each consumer reads from its own queue, a slow or failing consumer never affects the others — its messages simply wait in its queue. Add a new consumer tomorrow by subscribing a new queue; the publisher never changes.
The diagram shows the three building blocks side by side — SQS pulling work to a single consumer, SNS pushing one publish to many subscribers, and EventBridge routing by event pattern to multiple targets — and then the composite SNS-to-SQS fan-out where one topic feeds several durable queues.
Why SNS → SQS and not SNS → Lambda directly? Because SNS delivers with retries but no durable storage: if a direct Lambda/HTTP subscriber is throttled or down past the retry policy, that copy is lost (absent a subscription DLQ). Inserting an SQS queue per consumer gives each one a durable buffer that absorbs spikes and holds messages until the consumer recovers. You also get per-queue filter policies (each consumer subscribes with its own filter and receives only its subset) and per-queue DLQs for poison messages.
The one configuration that trips everyone up is the queue resource policy. For SNS to deliver into a queue, the queue’s access policy must allow the SNS service principal to sqs:SendMessage, conditioned on the topic ARN:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": { "Service": "sns.amazonaws.com" },
"Action": "sqs:SendMessage",
"Resource": "arn:aws:sqs:eu-west-1:111122223333:orders-fulfilment",
"Condition": { "ArnEquals": { "aws:SourceArn": "arn:aws:sns:eu-west-1:111122223333:order-events" } }
}]
}
Two more must-dos for production fan-out:
- Enable
RawMessageDeliveryon each SQS subscription. Without it, SNS wraps your payload in its own JSON envelope and your consumer must unwrap it; raw delivery passes the body through unchanged so SQS and direct publishers look identical to the consumer. - If the queues use KMS encryption (SSE-KMS), the KMS key policy must allow the SNS service principal to
kms:GenerateDataKeyandkms:Decrypt— otherwise delivery silently fails. (SSE-SQS, the default AWS-managed encryption, does not allow SNS to deliver; use SSE-KMS with a customer-managed key when you need encryption on a fan-out target.)
That is the whole pattern: a topic, N queues each with a resource policy (and optional filter policy and DLQ), raw message delivery, and one consumer per queue. The advanced lesson sqs-sns-fan-out-fifo-ordering-dlq-poison-message-handling takes it further with FIFO ordering across the fan-out and idempotent-consumer patterns.
Hands-on lab: build SNS → SQS fan-out and watch it work
You will create one SNS topic, two SQS queues subscribed to it, set the resource policies, publish once, and confirm both queues received the message. Everything here is comfortably inside the AWS Free Tier (SQS: 1M requests/month free; SNS: 1M publishes/month free).
1. Set variables and create the topic and two queues.
ACID=$(aws sts get-caller-identity --query Account --output text)
REGION=$(aws configure get region); REGION=${REGION:-eu-west-1}
TOPIC_ARN=$(aws sns create-topic --name lab-order-events --query TopicArn --output text)
for q in lab-fulfilment lab-analytics; do
aws sqs create-queue --queue-name "$q" \
--attributes '{"ReceiveMessageWaitTimeSeconds":"20","VisibilityTimeout":"60"}' >/dev/null
done
FUL_URL=$(aws sqs get-queue-url --queue-name lab-fulfilment --query QueueUrl --output text)
ANA_URL=$(aws sqs get-queue-url --queue-name lab-analytics --query QueueUrl --output text)
FUL_ARN=$(aws sqs get-queue-attributes --queue-url "$FUL_URL" --attribute-names QueueArn --query Attributes.QueueArn --output text)
ANA_ARN=$(aws sqs get-queue-attributes --queue-url "$ANA_URL" --attribute-names QueueArn --query Attributes.QueueArn --output text)
2. Allow SNS to send to each queue (the resource policy), then subscribe with raw delivery.
for pair in "$FUL_URL|$FUL_ARN" "$ANA_URL|$ANA_ARN"; do
URL=${pair%%|*}; ARN=${pair##*|}
POLICY=$(printf '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"sns.amazonaws.com"},"Action":"sqs:SendMessage","Resource":"%s","Condition":{"ArnEquals":{"aws:SourceArn":"%s"}}}]}' "$ARN" "$TOPIC_ARN")
aws sqs set-queue-attributes --queue-url "$URL" --attributes "{\"Policy\":$(printf '%s' "$POLICY" | jq -Rs .)}"
SUB=$(aws sns subscribe --topic-arn "$TOPIC_ARN" --protocol sqs --notification-endpoint "$ARN" --query SubscriptionArn --output text)
aws sns set-subscription-attributes --subscription-arn "$SUB" --attribute-name RawMessageDelivery --attribute-value true
done
3. Publish ONE message and confirm BOTH queues received it.
aws sns publish --topic-arn "$TOPIC_ARN" \
--message '{"orderId":"A-1001","amount":250}' \
--message-attributes '{"eventType":{"DataType":"String","StringValue":"order_placed"}}'
# Each queue should report 1 message — fan-out delivered a copy to both
for URL in "$FUL_URL" "$ANA_URL"; do
echo "Queue: $URL"
aws sqs receive-message --queue-url "$URL" --wait-time-seconds 20 --max-number-of-messages 1 \
--query 'Messages[0].Body' --output text
done
Expected output (validation). Each queue prints the same body — {"orderId":"A-1001","amount":250} — proving the single publish fanned out to both durable queues. Because you enabled raw message delivery, the body is your exact JSON with no SNS envelope. (Without step 2’s resource policy, the subscribe would succeed but no messages would arrive — the classic fan-out failure.)
4. Cleanup. Remove everything so nothing lingers.
aws sns delete-topic --topic-arn "$TOPIC_ARN"
aws sqs delete-queue --queue-url "$FUL_URL"
aws sqs delete-queue --queue-url "$ANA_URL"
Cost note. Within the Free Tier this lab is free (a handful of SNS publishes and SQS requests against millions of free monthly requests). Even beyond the Free Tier the cost is a fraction of a cent — SQS and SNS are billed per request/publish at roughly $0.40–$0.50 per million. The only way to run up a bill with these services is short-polling a busy queue in a tight loop (millions of empty receives) — which is exactly why long polling is the default recommendation.
Common mistakes & troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| Messages reappear and process twice | Consumer didn’t delete after processing, or visibility timeout < processing time | Always delete on success; set visibility timeout > worst-case processing (≥ 6× for Lambda) |
| High SQS cost / empty receives | Short polling in a tight loop | Set ReceiveMessageWaitTimeSeconds=20 (long polling) |
| SNS → SQS fan-out delivers nothing | Queue resource policy doesn’t allow the SNS principal | Add the sqs:SendMessage allow for sns.amazonaws.com conditioned on the topic ARN |
| Consumer sees an SNS envelope instead of the raw body | RawMessageDelivery not enabled on the subscription | Set RawMessageDelivery=true on each SQS subscription |
| Encrypted fan-out queue receives nothing | SSE-SQS used, or KMS key policy missing SNS permissions | Use SSE-KMS and allow sns.amazonaws.com kms:GenerateDataKey/Decrypt on the key |
| A poison message blocks the queue forever | No DLQ / maxReceiveCount set |
Attach a DLQ with maxReceiveCount 3–5; alarm on DLQ depth |
| FIFO throughput far lower than expected | One shared MessageGroupId serialises everything | Use a higher-cardinality group ID; enable high-throughput FIFO |
| EventBridge target never fires | Event pattern doesn’t match, or target IAM/permission missing | Test the pattern against a sample event; check the target’s resource policy / rule role; add a target DLQ to catch failures |
| Messages disappear before processing | Retention period elapsed while undelivered | Increase MessageRetentionPeriod (max 14 days) and fix the stuck consumer |
Best practices
- Make consumers idempotent. At-least-once delivery means duplicates will happen on standard queues, SNS, and EventBridge. De-duplicate on a business key; never assume exactly-once outside FIFO.
- Always attach a DLQ to SQS queues and to EventBridge targets, and alarm on DLQ depth. A DLQ with messages in it is your earliest signal of a broken consumer or poison data.
- Use long polling everywhere (
ReceiveMessageWaitTimeSeconds=20). It is cheaper, lower-latency, and has no downside. - Prefer SNS → SQS over SNS → Lambda/HTTP for fan-out, so each consumer gets a durable buffer; enable raw message delivery.
- Filter at the source. SNS filter policies and EventBridge patterns mean each consumer receives only what it needs — cheaper and simpler than delivering everywhere and discarding in code.
- Size the visibility timeout to processing time, and use
ChangeMessageVisibilityheartbeats for long, variable jobs rather than one giant timeout. - One custom EventBridge bus per domain, with a schema registry for typed contracts and archive/replay for recovery.
- Offload large payloads (>256 KB) to S3 and send a pointer (SQS/SNS Extended Client).
- Right-size FIFO group IDs for parallelism, and only choose FIFO when ordering or exactly-once is a real requirement.
- Use EventBridge Scheduler (not scheduled rules) for new cron/one-time work; it scales further and supports more targets.
Security notes
- Encrypt at rest. Enable SSE on queues and topics — SSE-KMS with a customer-managed key when you need key control, rotation, and CloudTrail auditing. Remember the SNS-to-SQS KMS key-policy requirement above.
- Least-privilege resource policies. Scope queue and topic policies to specific principals and condition on
aws:SourceArn/aws:SourceAccountto prevent the confused-deputy problem (a stranger publishing to your topic or sending to your queue). - Use HTTPS/TLS in transit. Enforce
aws:SecureTransportin resource policies to reject plaintext access. - Separate identity from resource policy. Producers and consumers get IAM permissions to publish/send/receive; cross-account and service-to-service access additionally needs the resource policy on the topic/queue.
- Validate input on consumers. Treat message bodies as untrusted; malformed or malicious payloads should fail safely into the DLQ, not crash the worker or get executed.
- Audit with CloudTrail. Management events (create/subscribe/set-policy) are logged; use them to detect unexpected new subscriptions or policy changes on sensitive topics.
Interview & exam questions
1. What problem does messaging solve, in one sentence? It decouples producers from consumers so each can scale, fail, and deploy independently — absorbing spikes and preventing one slow component from taking down others.
2. SQS vs SNS — the core difference? SQS is a queue: pull-based, one consumer processes each message. SNS is a topic: push-based, every subscriber gets its own copy. SQS is for durable work; SNS is for fan-out/notification.
3. Explain the visibility timeout. When a consumer receives a message, SQS hides it for the visibility timeout instead of deleting it; the consumer must explicitly delete it after success. If the consumer crashes, the timeout expires and the message reappears for another consumer. Set the timeout longer than worst-case processing (≥ 6× the Lambda timeout) — too short causes double-processing, too long strands crashed work.
4. Long polling vs short polling — which and why?
Long polling (WaitTimeSeconds up to 20). It waits for a message before returning, drastically cutting empty responses, cost, and latency. Short polling returns immediately and may be empty even when messages exist. Long polling is the default recommendation.
5. Standard vs FIFO SQS — when FIFO? FIFO when you need strict ordering (within a message group) or exactly-once processing. The cost is lower throughput (300/3,000 TPS, or more with high-throughput mode) and the need for MessageGroupId and dedup. Default to standard otherwise for near-unlimited scale.
6. What is a dead-letter queue and when does a message land there?
A DLQ catches poison messages that fail processing. With a redrive policy (maxReceiveCount of, say, 5), a message that is received that many times without being deleted moves to the DLQ for investigation and later redrive, instead of looping forever and blocking the queue.
7. What is the SNS-to-SQS fan-out pattern, and why insert SQS rather than subscribe Lambda directly? A topic with multiple SQS queues subscribed, each drained by its own consumer. SNS delivery has retries but no durable storage, so a direct Lambda/HTTP subscriber that’s down can lose its copy. SQS gives each consumer a durable buffer, plus per-queue filtering and DLQs. (Don’t forget the queue resource policy allowing the SNS principal and raw message delivery.)
8. SNS vs EventBridge — when each? SNS for high-throughput, low-latency fan-out to known subscribers (SQS/Lambda) and for SMS/email/mobile push. EventBridge for content-based routing, AWS-service/SaaS event sources, schema discovery, and archive/replay. SNS is the faster broadcaster; EventBridge is the richer router.
9. When would you choose Kinesis over SQS or EventBridge? For a high-rate, ordered, replayable stream read by multiple independent consumers by position — analytics, clickstreams, log/metric pipelines. SQS deletes after read (no replay); EventBridge routes events but isn’t a streaming log. Kinesis retains records (up to 365 days) and preserves order per shard.
10. How do SNS filter policies and EventBridge event patterns differ?
SNS filter policies match on a subscription against message attributes (or, with body scope, body fields) — flat key/value matching. EventBridge patterns match the whole event structure with richer operators (numeric, prefix, anything-but, exists, $or, IP) and route to targets. EventBridge filtering is more powerful; SNS filtering is lighter and faster for fan-out.
11. What are EventBridge Pipes and when do you use one? A point-to-point integration (source → optional filter → optional enrichment → target) with no glue code. Use a Pipe for one-source-to-one-target processing of SQS/Kinesis/DynamoDB-Streams/MQ/Kafka; use a bus + rules when one event must fan out to many targets by content.
12. EventBridge Scheduler vs a scheduled rule? Scheduler is the purpose-built, serverless scheduler — cron/rate/one-time, time zones, flexible windows, retries, a DLQ, ~270 targets, and millions of schedules — and it doesn’t consume your bus’s rule budget. Prefer it over the legacy scheduled rule for all new cron/one-time work.
Quick check
- A consumer received a message but crashed before deleting it. What happens, and which setting governs it?
- You need one “user signed up” event to trigger an email Lambda, a welcome-kit fulfilment queue, and an analytics pipeline, each isolated from the others’ failures. Which pattern and services?
- Your standard SQS queue is racking up cost from empty
ReceiveMessagecalls. What is the one-line fix? - You must route events to different targets based on the value of a field inside the event body, and also react to S3 and EC2 service events. Which service?
- You need strict per-customer ordering and no duplicate processing. Which queue type and which two message fields?
Answers
- After the visibility timeout expires, the message becomes visible again and another consumer receives it — nothing is lost. The visibility timeout governs how long it stays hidden.
- SNS → SQS fan-out: publish to one SNS topic; subscribe three SQS queues (one per consumer); drain each with its own consumer. A failure in one consumer leaves the others untouched because each reads from its own durable queue.
- Enable long polling — set
ReceiveMessageWaitTimeSecondsto 20. - Amazon EventBridge — content-based event patterns for routing, and native ingestion of AWS-service events (S3, EC2, etc.).
- A FIFO queue, using MessageGroupId (per customer, for ordering) and MessageDeduplicationId / content-based dedup (for exactly-once).
Exercise
Design the messaging for a small e-commerce order flow and justify each choice in writing:
- An order is placed and must (a) be processed for fulfilment exactly once at the warehouse’s pace, (b) trigger a confirmation email immediately, and © feed an analytics stream you may want to replay when you change the model. Pick the service for each of (a), (b), © and explain why.
- Wire (a) and (b) off a single publish using SNS → SQS fan-out. Write the queue resource policy the fulfilment queue needs and explain the
aws:SourceArncondition. - Add a DLQ to the fulfilment queue with a sensible
maxReceiveCount, and a CloudWatch alarm on its depth. Explain what a non-zero DLQ tells you. - The fulfilment job sometimes takes up to 4 minutes. Choose a visibility timeout and explain the failure modes of setting it too low and too high.
- Marketing wants only orders over ₹10,000 to hit a “VIP” queue. Add a filter policy (or EventBridge pattern) that achieves this and explain where the filtering happens and why that’s cheaper than filtering in code.
Write your answers as if defending them in a design review — the reasoning is the point.
Certification mapping
- AWS Certified Developer – Associate (DVA-C02) — Heavily tested. SQS visibility timeout, long vs short polling, standard vs FIFO, DLQs and
maxReceiveCount; SNS fan-out and filter policies; EventBridge rules/patterns and Pipes; and the SNS-to-SQS pattern with its resource policy all appear directly. - AWS Certified Solutions Architect – Associate (SAA-C03) — The decision lesson: choosing SQS vs SNS vs EventBridge vs Kinesis for a decoupling requirement, designing fan-out, and selecting the right service for ordering/replay/filtering needs.
- AWS Certified Solutions Architect – Professional (SAP-C02) — Cross-account event routing (custom buses, bus-to-bus), large-scale event-driven design, and resilience patterns (DLQs, replay) at architecture scale.
Across all three: know at-least-once vs exactly-once, the visibility-timeout mechanics, the fan-out resource policy, and the four-way decision cold.
Glossary
- Decoupling — separating producer and consumer through a messaging layer so each scales, fails, and deploys independently.
- At-least-once delivery — a message is delivered one or more times; duplicates are possible (requires idempotent consumers).
- Exactly-once processing — a message is processed precisely once; provided by SQS/SNS FIFO within their limits.
- Visibility timeout — the interval an SQS message is hidden after being received, before it reappears if not deleted.
- Long polling — waiting up to
WaitTimeSecondsfor a message before returning, reducing empty receives and cost. - Dead-letter queue (DLQ) — a queue that captures messages that repeatedly fail processing (poison messages).
- Poison message — a message that cannot be processed successfully and would otherwise loop forever.
- MessageGroupId — the FIFO ordering boundary; order is guaranteed within a group, parallel across groups.
- MessageDeduplicationId — the FIFO exactly-once key within the 5-minute deduplication window.
- Fan-out — one published message delivered to many subscribers/queues at once.
- Filter policy — a JSON rule on an SNS subscription that delivers only matching messages.
- Event pattern — EventBridge’s structural JSON match against an event, with rich operators.
- Event bus — the EventBridge pipe events flow onto (default, custom, or partner).
- Rule / target — an EventBridge match and the destination(s) it routes matching events to.
- EventBridge Pipes — a point-to-point source→filter→enrich→target integration with no glue code.
- EventBridge Scheduler — a serverless cron/rate/one-time scheduler to ~270 AWS targets.
- Raw message delivery — an SNS-to-SQS setting that passes the body through without the SNS envelope.
- Resource policy — a policy on a queue/topic that grants cross-account or service principals access.
Next steps
- Go deeper on resilient queue-and-topic patterns with Resilient Messaging with SQS and SNS: Fan-Out, FIFO Ordering, DLQs, and Poison-Message Handling — FIFO ordering across a fan-out, redrive, and idempotent consumers on Lambda and ECS.
- Go deeper on event-driven architecture with Designing Event-Driven Architectures with Amazon EventBridge: Buses, Rules, Schemas, and Archive/Replay — custom buses, content-based rules, the schema registry, cross-account routing, and archive/replay.
- Then continue the course with AWS Troubleshooting Playbooks: EC2, VPC, IAM, S3 & Lambda — a repeatable mindset and symptom-to-fix tables for the services you have just learned to wire together.