Quick take — A reusable hashicorp/aws ~> 5.0 Terraform module for AWS SQS that wires up FIFO/standard queues, a dead-letter queue with redrive policy, SSE-KMS encryption, and least-privilege access policies. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "aws" {
region = "us-east-1"
}
module "sqs" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-sqs?ref=v1.0.0"
name = "..." # Base queue name (no `.fifo` suffix; 1-75 chars, alphanu…
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
Amazon SQS (Simple Queue Service) is a fully managed message queue that decouples producers from consumers so that a downstream service can fail, slow down, or scale independently without dropping work. A single aws_sqs_queue resource looks trivial, but a production queue is never just one resource: you almost always also need a dead-letter queue (DLQ) to catch poison messages, a redrive policy that ties the two together, server-side encryption (SSE-KMS or SSE-SQS), and a queue policy that grants only the specific principals that should be allowed to send or receive.
This module wraps all of that into one opinionated, var-driven unit. You pass a name and a few knobs, and it gives you back a correctly-configured primary queue, an optional companion DLQ with a sane maxReceiveCount, encryption-at-rest, and the queue ARN/URL outputs that every consumer Lambda, ECS task, or IAM policy downstream needs. It handles the FIFO-vs-standard quirks (the .fifo suffix, content-based deduplication, throughput limits) so that callers do not have to remember them every time they stand up a new queue.
When to use it
Reach for this module when you want a queue that is safe to run in production rather than a throwaway demo queue:
- Asynchronous work offloading — an API accepts a request, drops a message on the queue, and returns
202immediately while a worker processes it later. - Buffering and backpressure — smoothing out spiky traffic so a downstream database or third-party API is not overwhelmed.
- Event fan-out with SNS — SNS publishes to multiple SQS queues, each owned by a different consumer service (the classic “topic-queue” pattern).
- FIFO ordering and exactly-once processing — order-sensitive workflows (payments, inventory) where you need strict ordering within a message group and deduplication.
- Standardising poison-message handling — every team gets a DLQ and redrive policy by default, instead of discovering at 2 a.m. that a malformed message is being retried forever.
If you only need an ephemeral, unencrypted queue for a local experiment, the raw resource is fine. The moment a queue carries real data across team or account boundaries, use the module so encryption, DLQ, and access policy are not forgotten.
Module structure
terraform-module-aws-sqs/
├── versions.tf # provider + Terraform version constraints
├── main.tf # primary queue, DLQ, redrive, queue policy
├── variables.tf # all tunable inputs with validation
└── outputs.tf # ids, ARNs, URLs for the queue and DLQ
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
main.tf
locals {
# FIFO queues must end in ".fifo"; standard queues must not.
queue_name = var.fifo_queue ? "${var.name}.fifo" : var.name
dlq_name = var.fifo_queue ? "${var.name}-dlq.fifo" : "${var.name}-dlq"
# Pick SSE-KMS when a key is provided, otherwise fall back to SSE-SQS
# (managed) encryption when sse_enabled is true.
use_kms = var.kms_master_key_id != null
}
# Optional dead-letter queue. Created only when create_dlq = true.
resource "aws_sqs_queue" "dlq" {
count = var.create_dlq ? 1 : 0
name = local.dlq_name
fifo_queue = var.fifo_queue
content_based_deduplication = var.fifo_queue ? var.content_based_deduplication : null
# Keep failed messages long enough to investigate (default 14 days).
message_retention_seconds = var.dlq_message_retention_seconds
sqs_managed_sse_enabled = local.use_kms ? null : var.sse_enabled
kms_master_key_id = local.use_kms ? var.kms_master_key_id : null
kms_data_key_reuse_period_seconds = local.use_kms ? var.kms_data_key_reuse_period_seconds : null
tags = merge(var.tags, { "queue-role" = "dead-letter" })
}
# Primary queue.
resource "aws_sqs_queue" "this" {
name = local.queue_name
fifo_queue = var.fifo_queue
content_based_deduplication = var.fifo_queue ? var.content_based_deduplication : null
deduplication_scope = var.fifo_queue ? var.deduplication_scope : null
fifo_throughput_limit = var.fifo_queue ? var.fifo_throughput_limit : null
visibility_timeout_seconds = var.visibility_timeout_seconds
message_retention_seconds = var.message_retention_seconds
delay_seconds = var.delay_seconds
max_message_size = var.max_message_size
receive_wait_time_seconds = var.receive_wait_time_seconds
# Encryption at rest: SSE-KMS when a key is supplied, else SSE-SQS.
sqs_managed_sse_enabled = local.use_kms ? null : var.sse_enabled
kms_master_key_id = local.use_kms ? var.kms_master_key_id : null
kms_data_key_reuse_period_seconds = local.use_kms ? var.kms_data_key_reuse_period_seconds : null
# Wire the DLQ in via a redrive policy when the DLQ exists.
redrive_policy = var.create_dlq ? jsonencode({
deadLetterTargetArn = aws_sqs_queue.dlq[0].arn
maxReceiveCount = var.max_receive_count
}) : null
tags = merge(var.tags, { "queue-role" = "primary" })
}
# Allow the DLQ to accept redriven messages only from this source queue.
resource "aws_sqs_queue_redrive_allow_policy" "dlq" {
count = var.create_dlq ? 1 : 0
queue_url = aws_sqs_queue.dlq[0].id
redrive_allow_policy = jsonencode({
redrivePermission = "byQueue"
sourceQueueArns = [aws_sqs_queue.this.arn]
})
}
# Optional resource-based access policy (e.g. let an SNS topic publish).
resource "aws_sqs_queue_policy" "this" {
count = var.queue_policy_json != null ? 1 : 0
queue_url = aws_sqs_queue.this.id
policy = var.queue_policy_json
}
variables.tf
variable "name" {
description = "Base name of the queue. The module appends '.fifo' automatically for FIFO queues; do not include the suffix yourself."
type = string
validation {
condition = !endswith(var.name, ".fifo")
error_message = "Do not include the '.fifo' suffix in var.name; set fifo_queue = true instead."
}
validation {
condition = can(regex("^[A-Za-z0-9_-]{1,75}$", var.name))
error_message = "Queue name must be 1-75 chars (the module reserves room for the '-dlq.fifo' suffix) using only alphanumerics, hyphens, and underscores."
}
}
variable "fifo_queue" {
description = "If true, create FIFO queues (ordered, exactly-once). If false, create standard queues."
type = bool
default = false
}
variable "content_based_deduplication" {
description = "FIFO only. Enable content-based deduplication so SQS derives the dedup ID from the message body."
type = bool
default = false
}
variable "deduplication_scope" {
description = "FIFO only. Scope of deduplication: 'messageGroup' or 'queue'. Required as 'messageGroup' for high throughput FIFO."
type = string
default = "queue"
validation {
condition = contains(["messageGroup", "queue"], var.deduplication_scope)
error_message = "deduplication_scope must be either 'messageGroup' or 'queue'."
}
}
variable "fifo_throughput_limit" {
description = "FIFO only. Throughput quota scope: 'perQueue' or 'perMessageGroupId' (high throughput FIFO)."
type = string
default = "perQueue"
validation {
condition = contains(["perQueue", "perMessageGroupId"], var.fifo_throughput_limit)
error_message = "fifo_throughput_limit must be 'perQueue' or 'perMessageGroupId'."
}
}
variable "visibility_timeout_seconds" {
description = "How long a message stays invisible after a consumer receives it (0-43200). Set higher than your worker's max processing time."
type = number
default = 30
validation {
condition = var.visibility_timeout_seconds >= 0 && var.visibility_timeout_seconds <= 43200
error_message = "visibility_timeout_seconds must be between 0 and 43200 (12 hours)."
}
}
variable "message_retention_seconds" {
description = "How long SQS keeps a message that is not deleted (60-1209600). Default 4 days."
type = number
default = 345600
validation {
condition = var.message_retention_seconds >= 60 && var.message_retention_seconds <= 1209600
error_message = "message_retention_seconds must be between 60 and 1209600 (14 days)."
}
}
variable "delay_seconds" {
description = "Delay before a message becomes available for delivery (0-900)."
type = number
default = 0
validation {
condition = var.delay_seconds >= 0 && var.delay_seconds <= 900
error_message = "delay_seconds must be between 0 and 900 (15 minutes)."
}
}
variable "max_message_size" {
description = "Maximum message size in bytes (1024-262144). Default 256 KiB."
type = number
default = 262144
validation {
condition = var.max_message_size >= 1024 && var.max_message_size <= 262144
error_message = "max_message_size must be between 1024 and 262144 bytes (256 KiB)."
}
}
variable "receive_wait_time_seconds" {
description = "Long-poll wait time (0-20). Set to 20 to reduce empty receives and API cost."
type = number
default = 0
validation {
condition = var.receive_wait_time_seconds >= 0 && var.receive_wait_time_seconds <= 20
error_message = "receive_wait_time_seconds must be between 0 and 20."
}
}
variable "sse_enabled" {
description = "Enable SQS-managed (SSE-SQS) encryption at rest. Ignored when kms_master_key_id is set (SSE-KMS takes over)."
type = bool
default = true
}
variable "kms_master_key_id" {
description = "KMS key id, alias (e.g. 'alias/my-key'), or ARN for SSE-KMS encryption. When set, SSE-KMS is used instead of SSE-SQS."
type = string
default = null
}
variable "kms_data_key_reuse_period_seconds" {
description = "Seconds SQS can reuse a KMS data key before calling KMS again (60-86400). Higher = fewer KMS calls (lower cost)."
type = number
default = 300
validation {
condition = var.kms_data_key_reuse_period_seconds >= 60 && var.kms_data_key_reuse_period_seconds <= 86400
error_message = "kms_data_key_reuse_period_seconds must be between 60 and 86400 (24 hours)."
}
}
variable "create_dlq" {
description = "If true, create a companion dead-letter queue and attach a redrive policy to the primary queue."
type = bool
default = true
}
variable "max_receive_count" {
description = "Number of times a message can be received before being moved to the DLQ."
type = number
default = 5
validation {
condition = var.max_receive_count >= 1 && var.max_receive_count <= 1000
error_message = "max_receive_count must be between 1 and 1000."
}
}
variable "dlq_message_retention_seconds" {
description = "Retention for the dead-letter queue (60-1209600). Default 14 days to maximise investigation time."
type = number
default = 1209600
validation {
condition = var.dlq_message_retention_seconds >= 60 && var.dlq_message_retention_seconds <= 1209600
error_message = "dlq_message_retention_seconds must be between 60 and 1209600 (14 days)."
}
}
variable "queue_policy_json" {
description = "Optional JSON resource policy attached to the primary queue (e.g. to allow an SNS topic to send). Null to skip."
type = string
default = null
}
variable "tags" {
description = "Tags applied to all queues created by this module."
type = map(string)
default = {}
}
outputs.tf
output "queue_id" {
description = "The URL of the primary SQS queue (used by the SDK/CLI as QueueUrl)."
value = aws_sqs_queue.this.id
}
output "queue_url" {
description = "The URL of the primary SQS queue (alias of queue_id, for readability)."
value = aws_sqs_queue.this.url
}
output "queue_arn" {
description = "The ARN of the primary SQS queue (use in IAM policies and SNS subscriptions)."
value = aws_sqs_queue.this.arn
}
output "queue_name" {
description = "The resolved name of the primary queue (including any '.fifo' suffix)."
value = aws_sqs_queue.this.name
}
output "dlq_id" {
description = "The URL of the dead-letter queue, or null when create_dlq is false."
value = try(aws_sqs_queue.dlq[0].id, null)
}
output "dlq_arn" {
description = "The ARN of the dead-letter queue, or null when create_dlq is false."
value = try(aws_sqs_queue.dlq[0].arn, null)
}
How to use it
module "orders_queue" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-sqs?ref=v1.0.0"
name = "orders-processing"
fifo_queue = true
content_based_deduplication = true
deduplication_scope = "messageGroup"
fifo_throughput_limit = "perMessageGroupId"
# Workers can take up to 5 minutes; keep messages invisible for 6.
visibility_timeout_seconds = 360
receive_wait_time_seconds = 20 # long polling
# Encrypt with a customer-managed key.
kms_master_key_id = aws_kms_key.orders.arn
# Poison messages go to the DLQ after 3 failed attempts.
create_dlq = true
max_receive_count = 3
tags = {
Environment = "prod"
Team = "fulfilment"
CostCentre = "ECOM-204"
}
}
# Downstream: grant a consumer Lambda permission to read from the queue
# using the module's queue_arn output.
data "aws_iam_policy_document" "consumer" {
statement {
sid = "ConsumeOrders"
effect = "Allow"
actions = [
"sqs:ReceiveMessage",
"sqs:DeleteMessage",
"sqs:GetQueueAttributes",
]
resources = [module.orders_queue.queue_arn]
}
}
# Downstream: an event source mapping wiring the queue to a Lambda,
# referencing the queue URL/ARN outputs.
resource "aws_lambda_event_source_mapping" "orders" {
event_source_arn = module.orders_queue.queue_arn
function_name = aws_lambda_function.order_worker.arn
batch_size = 10
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "s3"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...s3 state bucket/container + key per path...
}
}
2. Module config — live/prod/sqs/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-sqs?ref=v1.0.0"
}
inputs = {
name = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/sqs && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
name |
string |
— | Yes | Base queue name (no .fifo suffix; 1-75 chars, alphanumerics/-/_). |
fifo_queue |
bool |
false |
No | Create FIFO (ordered, exactly-once) queues instead of standard. |
content_based_deduplication |
bool |
false |
No | FIFO only. Derive the dedup ID from the message body. |
deduplication_scope |
string |
"queue" |
No | FIFO only. messageGroup or queue. |
fifo_throughput_limit |
string |
"perQueue" |
No | FIFO only. perQueue or perMessageGroupId (high throughput). |
visibility_timeout_seconds |
number |
30 |
No | Invisibility window after receive (0-43200). |
message_retention_seconds |
number |
345600 |
No | Retention for the primary queue (60-1209600). |
delay_seconds |
number |
0 |
No | Delivery delay for new messages (0-900). |
max_message_size |
number |
262144 |
No | Max message size in bytes (1024-262144). |
receive_wait_time_seconds |
number |
0 |
No | Long-poll wait time (0-20). |
sse_enabled |
bool |
true |
No | Enable SSE-SQS managed encryption (ignored when kms_master_key_id set). |
kms_master_key_id |
string |
null |
No | KMS key id/alias/ARN; switches encryption to SSE-KMS. |
kms_data_key_reuse_period_seconds |
number |
300 |
No | KMS data-key reuse window (60-86400). |
create_dlq |
bool |
true |
No | Create a companion DLQ and attach the redrive policy. |
max_receive_count |
number |
5 |
No | Receives before a message is moved to the DLQ (1-1000). |
dlq_message_retention_seconds |
number |
1209600 |
No | Retention for the DLQ (60-1209600). |
queue_policy_json |
string |
null |
No | Optional JSON resource policy for the primary queue. |
tags |
map(string) |
{} |
No | Tags applied to all queues. |
Outputs
| Name | Description |
|---|---|
queue_id |
URL of the primary queue (the SDK/CLI QueueUrl). |
queue_url |
URL of the primary queue (alias of queue_id). |
queue_arn |
ARN of the primary queue, for IAM policies and SNS subscriptions. |
queue_name |
Resolved primary queue name including any .fifo suffix. |
dlq_id |
URL of the dead-letter queue, or null when create_dlq = false. |
dlq_arn |
ARN of the dead-letter queue, or null when create_dlq = false. |
Enterprise scenario
A retail platform’s checkout service publishes every confirmed order to an orders-processing.fifo queue created by this module, with deduplication_scope = "messageGroup" and fifo_throughput_limit = "perMessageGroupId" so that orders for the same customer stay strictly ordered while unrelated customers process in parallel at high throughput. The fulfilment Lambda consumes via an event source mapping; if the warehouse API is down and a message fails three times (max_receive_count = 3), it lands on the auto-provisioned DLQ where it is retained for 14 days. A CloudWatch alarm on the DLQ’s ApproximateNumberOfMessagesVisible pages on-call, and once the warehouse API recovers, operators use SQS redrive to replay the parked orders back onto the primary queue with no data loss.
Best practices
- Always ship a DLQ and tune
maxReceiveCount. Leavecreate_dlq = trueand pick amax_receive_countthat matches how many transient failures you expect; too low loses recoverable messages, too high keeps retrying poison messages and burns money. Alarm on DLQ depth, not just on errors. - Set
visibility_timeout_secondsabove your worker’s worst-case processing time. If a consumer takes longer than the timeout, SQS redelivers the message and you get duplicate processing. A common rule is 6x your function timeout for Lambda event source mappings. - Prefer SSE-KMS with a customer-managed key for sensitive payloads, and raise
kms_data_key_reuse_period_seconds. A higher reuse window (e.g. 300s) dramatically cuts KMS API calls and cost on high-volume queues while keeping data encrypted at rest. Use plain SSE-SQS only when KMS is overkill. - Use long polling (
receive_wait_time_seconds = 20). It reduces emptyReceiveMessageresponses, which lowers request count, cost, and latency compared with short polling — there is rarely a reason to leave it at 0 in production. - Lock down access with a least-privilege queue policy and IAM. Pass
queue_policy_jsonto allow only the specific SNS topic or account that should send, and grant consumers onlyReceiveMessage/DeleteMessage/GetQueueAttributeson the exactqueue_arn— neversqs:*on*. - Name consistently and let the module own the
.fifosuffix. Useservice-purposenaming (e.g.orders-processing) and setfifo_queue = truerather than hand-typing.fifo; the module derives both the queue and-dlqnames so environments stay predictable and the validation guard catches mistakes.