Terraform Module: AWS SQS Queue — Production-Ready Queues with DLQ, Encryption & Redrive

Quick take — A reusable hashicorp/aws ~> 5.0 Terraform module for AWS SQS that wires up FIFO/standard queues, a dead-letter queue with redrive policy, SSE-KMS encryption, and least-privilege access policies. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "aws" {
  region = "us-east-1"
}

module "sqs" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-sqs?ref=v1.0.0"

  name = "..."  # Base queue name (no `.fifo` suffix; 1-75 chars, alphanu…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Amazon SQS (Simple Queue Service) is a fully managed message queue that decouples producers from consumers so that a downstream service can fail, slow down, or scale independently without dropping work. A single aws_sqs_queue resource looks trivial, but a production queue is never just one resource: you almost always also need a dead-letter queue (DLQ) to catch poison messages, a redrive policy that ties the two together, server-side encryption (SSE-KMS or SSE-SQS), and a queue policy that grants only the specific principals that should be allowed to send or receive.

This module wraps all of that into one opinionated, var-driven unit. You pass a name and a few knobs, and it gives you back a correctly-configured primary queue, an optional companion DLQ with a sane maxReceiveCount, encryption-at-rest, and the queue ARN/URL outputs that every consumer Lambda, ECS task, or IAM policy downstream needs. It handles the FIFO-vs-standard quirks (the .fifo suffix, content-based deduplication, throughput limits) so that callers do not have to remember them every time they stand up a new queue.

When to use it

Reach for this module when you want a queue that is safe to run in production rather than a throwaway demo queue:

Asynchronous work offloading — an API accepts a request, drops a message on the queue, and returns 202 immediately while a worker processes it later.
Buffering and backpressure — smoothing out spiky traffic so a downstream database or third-party API is not overwhelmed.
Event fan-out with SNS — SNS publishes to multiple SQS queues, each owned by a different consumer service (the classic “topic-queue” pattern).
FIFO ordering and exactly-once processing — order-sensitive workflows (payments, inventory) where you need strict ordering within a message group and deduplication.
Standardising poison-message handling — every team gets a DLQ and redrive policy by default, instead of discovering at 2 a.m. that a malformed message is being retried forever.

If you only need an ephemeral, unencrypted queue for a local experiment, the raw resource is fine. The moment a queue carries real data across team or account boundaries, use the module so encryption, DLQ, and access policy are not forgotten.

Module structure

terraform-module-aws-sqs/
├── versions.tf      # provider + Terraform version constraints
├── main.tf          # primary queue, DLQ, redrive, queue policy
├── variables.tf     # all tunable inputs with validation
└── outputs.tf       # ids, ARNs, URLs for the queue and DLQ

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  # FIFO queues must end in ".fifo"; standard queues must not.
  queue_name = var.fifo_queue ? "${var.name}.fifo" : var.name
  dlq_name   = var.fifo_queue ? "${var.name}-dlq.fifo" : "${var.name}-dlq"

  # Pick SSE-KMS when a key is provided, otherwise fall back to SSE-SQS
  # (managed) encryption when sse_enabled is true.
  use_kms = var.kms_master_key_id != null
}

# Optional dead-letter queue. Created only when create_dlq = true.
resource "aws_sqs_queue" "dlq" {
  count = var.create_dlq ? 1 : 0

  name                        = local.dlq_name
  fifo_queue                  = var.fifo_queue
  content_based_deduplication = var.fifo_queue ? var.content_based_deduplication : null

  # Keep failed messages long enough to investigate (default 14 days).
  message_retention_seconds = var.dlq_message_retention_seconds

  sqs_managed_sse_enabled = local.use_kms ? null : var.sse_enabled
  kms_master_key_id       = local.use_kms ? var.kms_master_key_id : null
  kms_data_key_reuse_period_seconds = local.use_kms ? var.kms_data_key_reuse_period_seconds : null

  tags = merge(var.tags, { "queue-role" = "dead-letter" })
}

# Primary queue.
resource "aws_sqs_queue" "this" {
  name                        = local.queue_name
  fifo_queue                  = var.fifo_queue
  content_based_deduplication = var.fifo_queue ? var.content_based_deduplication : null
  deduplication_scope         = var.fifo_queue ? var.deduplication_scope : null
  fifo_throughput_limit       = var.fifo_queue ? var.fifo_throughput_limit : null

  visibility_timeout_seconds = var.visibility_timeout_seconds
  message_retention_seconds  = var.message_retention_seconds
  delay_seconds              = var.delay_seconds
  max_message_size           = var.max_message_size
  receive_wait_time_seconds  = var.receive_wait_time_seconds

  # Encryption at rest: SSE-KMS when a key is supplied, else SSE-SQS.
  sqs_managed_sse_enabled           = local.use_kms ? null : var.sse_enabled
  kms_master_key_id                 = local.use_kms ? var.kms_master_key_id : null
  kms_data_key_reuse_period_seconds = local.use_kms ? var.kms_data_key_reuse_period_seconds : null

  # Wire the DLQ in via a redrive policy when the DLQ exists.
  redrive_policy = var.create_dlq ? jsonencode({
    deadLetterTargetArn = aws_sqs_queue.dlq[0].arn
    maxReceiveCount     = var.max_receive_count
  }) : null

  tags = merge(var.tags, { "queue-role" = "primary" })
}

# Allow the DLQ to accept redriven messages only from this source queue.
resource "aws_sqs_queue_redrive_allow_policy" "dlq" {
  count = var.create_dlq ? 1 : 0

  queue_url = aws_sqs_queue.dlq[0].id

  redrive_allow_policy = jsonencode({
    redrivePermission = "byQueue"
    sourceQueueArns   = [aws_sqs_queue.this.arn]
  })
}

# Optional resource-based access policy (e.g. let an SNS topic publish).
resource "aws_sqs_queue_policy" "this" {
  count = var.queue_policy_json != null ? 1 : 0

  queue_url = aws_sqs_queue.this.id
  policy    = var.queue_policy_json
}

variables.tf

variable "name" {
  description = "Base name of the queue. The module appends '.fifo' automatically for FIFO queues; do not include the suffix yourself."
  type        = string

  validation {
    condition     = !endswith(var.name, ".fifo")
    error_message = "Do not include the '.fifo' suffix in var.name; set fifo_queue = true instead."
  }

  validation {
    condition     = can(regex("^[A-Za-z0-9_-]{1,75}$", var.name))
    error_message = "Queue name must be 1-75 chars (the module reserves room for the '-dlq.fifo' suffix) using only alphanumerics, hyphens, and underscores."
  }
}

variable "fifo_queue" {
  description = "If true, create FIFO queues (ordered, exactly-once). If false, create standard queues."
  type        = bool
  default     = false
}

variable "content_based_deduplication" {
  description = "FIFO only. Enable content-based deduplication so SQS derives the dedup ID from the message body."
  type        = bool
  default     = false
}

variable "deduplication_scope" {
  description = "FIFO only. Scope of deduplication: 'messageGroup' or 'queue'. Required as 'messageGroup' for high throughput FIFO."
  type        = string
  default     = "queue"

  validation {
    condition     = contains(["messageGroup", "queue"], var.deduplication_scope)
    error_message = "deduplication_scope must be either 'messageGroup' or 'queue'."
  }
}

variable "fifo_throughput_limit" {
  description = "FIFO only. Throughput quota scope: 'perQueue' or 'perMessageGroupId' (high throughput FIFO)."
  type        = string
  default     = "perQueue"

  validation {
    condition     = contains(["perQueue", "perMessageGroupId"], var.fifo_throughput_limit)
    error_message = "fifo_throughput_limit must be 'perQueue' or 'perMessageGroupId'."
  }
}

variable "visibility_timeout_seconds" {
  description = "How long a message stays invisible after a consumer receives it (0-43200). Set higher than your worker's max processing time."
  type        = number
  default     = 30

  validation {
    condition     = var.visibility_timeout_seconds >= 0 && var.visibility_timeout_seconds <= 43200
    error_message = "visibility_timeout_seconds must be between 0 and 43200 (12 hours)."
  }
}

variable "message_retention_seconds" {
  description = "How long SQS keeps a message that is not deleted (60-1209600). Default 4 days."
  type        = number
  default     = 345600

  validation {
    condition     = var.message_retention_seconds >= 60 && var.message_retention_seconds <= 1209600
    error_message = "message_retention_seconds must be between 60 and 1209600 (14 days)."
  }
}

variable "delay_seconds" {
  description = "Delay before a message becomes available for delivery (0-900)."
  type        = number
  default     = 0

  validation {
    condition     = var.delay_seconds >= 0 && var.delay_seconds <= 900
    error_message = "delay_seconds must be between 0 and 900 (15 minutes)."
  }
}

variable "max_message_size" {
  description = "Maximum message size in bytes (1024-262144). Default 256 KiB."
  type        = number
  default     = 262144

  validation {
    condition     = var.max_message_size >= 1024 && var.max_message_size <= 262144
    error_message = "max_message_size must be between 1024 and 262144 bytes (256 KiB)."
  }
}

variable "receive_wait_time_seconds" {
  description = "Long-poll wait time (0-20). Set to 20 to reduce empty receives and API cost."
  type        = number
  default     = 0

  validation {
    condition     = var.receive_wait_time_seconds >= 0 && var.receive_wait_time_seconds <= 20
    error_message = "receive_wait_time_seconds must be between 0 and 20."
  }
}

variable "sse_enabled" {
  description = "Enable SQS-managed (SSE-SQS) encryption at rest. Ignored when kms_master_key_id is set (SSE-KMS takes over)."
  type        = bool
  default     = true
}

variable "kms_master_key_id" {
  description = "KMS key id, alias (e.g. 'alias/my-key'), or ARN for SSE-KMS encryption. When set, SSE-KMS is used instead of SSE-SQS."
  type        = string
  default     = null
}

variable "kms_data_key_reuse_period_seconds" {
  description = "Seconds SQS can reuse a KMS data key before calling KMS again (60-86400). Higher = fewer KMS calls (lower cost)."
  type        = number
  default     = 300

  validation {
    condition     = var.kms_data_key_reuse_period_seconds >= 60 && var.kms_data_key_reuse_period_seconds <= 86400
    error_message = "kms_data_key_reuse_period_seconds must be between 60 and 86400 (24 hours)."
  }
}

variable "create_dlq" {
  description = "If true, create a companion dead-letter queue and attach a redrive policy to the primary queue."
  type        = bool
  default     = true
}

variable "max_receive_count" {
  description = "Number of times a message can be received before being moved to the DLQ."
  type        = number
  default     = 5

  validation {
    condition     = var.max_receive_count >= 1 && var.max_receive_count <= 1000
    error_message = "max_receive_count must be between 1 and 1000."
  }
}

variable "dlq_message_retention_seconds" {
  description = "Retention for the dead-letter queue (60-1209600). Default 14 days to maximise investigation time."
  type        = number
  default     = 1209600

  validation {
    condition     = var.dlq_message_retention_seconds >= 60 && var.dlq_message_retention_seconds <= 1209600
    error_message = "dlq_message_retention_seconds must be between 60 and 1209600 (14 days)."
  }
}

variable "queue_policy_json" {
  description = "Optional JSON resource policy attached to the primary queue (e.g. to allow an SNS topic to send). Null to skip."
  type        = string
  default     = null
}

variable "tags" {
  description = "Tags applied to all queues created by this module."
  type        = map(string)
  default     = {}
}

outputs.tf

output "queue_id" {
  description = "The URL of the primary SQS queue (used by the SDK/CLI as QueueUrl)."
  value       = aws_sqs_queue.this.id
}

output "queue_url" {
  description = "The URL of the primary SQS queue (alias of queue_id, for readability)."
  value       = aws_sqs_queue.this.url
}

output "queue_arn" {
  description = "The ARN of the primary SQS queue (use in IAM policies and SNS subscriptions)."
  value       = aws_sqs_queue.this.arn
}

output "queue_name" {
  description = "The resolved name of the primary queue (including any '.fifo' suffix)."
  value       = aws_sqs_queue.this.name
}

output "dlq_id" {
  description = "The URL of the dead-letter queue, or null when create_dlq is false."
  value       = try(aws_sqs_queue.dlq[0].id, null)
}

output "dlq_arn" {
  description = "The ARN of the dead-letter queue, or null when create_dlq is false."
  value       = try(aws_sqs_queue.dlq[0].arn, null)
}

How to use it

module "orders_queue" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-sqs?ref=v1.0.0"

  name                       = "orders-processing"
  fifo_queue                 = true
  content_based_deduplication = true
  deduplication_scope        = "messageGroup"
  fifo_throughput_limit      = "perMessageGroupId"

  # Workers can take up to 5 minutes; keep messages invisible for 6.
  visibility_timeout_seconds = 360
  receive_wait_time_seconds  = 20 # long polling

  # Encrypt with a customer-managed key.
  kms_master_key_id          = aws_kms_key.orders.arn

  # Poison messages go to the DLQ after 3 failed attempts.
  create_dlq                 = true
  max_receive_count          = 3

  tags = {
    Environment = "prod"
    Team        = "fulfilment"
    CostCentre  = "ECOM-204"
  }
}

# Downstream: grant a consumer Lambda permission to read from the queue
# using the module's queue_arn output.
data "aws_iam_policy_document" "consumer" {
  statement {
    sid    = "ConsumeOrders"
    effect = "Allow"
    actions = [
      "sqs:ReceiveMessage",
      "sqs:DeleteMessage",
      "sqs:GetQueueAttributes",
    ]
    resources = [module.orders_queue.queue_arn]
  }
}

# Downstream: an event source mapping wiring the queue to a Lambda,
# referencing the queue URL/ARN outputs.
resource "aws_lambda_event_source_mapping" "orders" {
  event_source_arn = module.orders_queue.queue_arn
  function_name    = aws_lambda_function.order_worker.arn
  batch_size       = 10
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "s3"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...s3 state bucket/container + key per path...
  }
}

2. Module config — live/prod/sqs/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-sqs?ref=v1.0.0"
}

inputs = {
  name = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/sqs && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
`name`	`string`	—	Yes	Base queue name (no `.fifo` suffix; 1-75 chars, alphanumerics/`-`/`_`).
`fifo_queue`	`bool`	`false`	No	Create FIFO (ordered, exactly-once) queues instead of standard.
`content_based_deduplication`	`bool`	`false`	No	FIFO only. Derive the dedup ID from the message body.
`deduplication_scope`	`string`	`"queue"`	No	FIFO only. `messageGroup` or `queue`.
`fifo_throughput_limit`	`string`	`"perQueue"`	No	FIFO only. `perQueue` or `perMessageGroupId` (high throughput).
`visibility_timeout_seconds`	`number`	`30`	No	Invisibility window after receive (0-43200).
`message_retention_seconds`	`number`	`345600`	No	Retention for the primary queue (60-1209600).
`delay_seconds`	`number`	`0`	No	Delivery delay for new messages (0-900).
`max_message_size`	`number`	`262144`	No	Max message size in bytes (1024-262144).
`receive_wait_time_seconds`	`number`	`0`	No	Long-poll wait time (0-20).
`sse_enabled`	`bool`	`true`	No	Enable SSE-SQS managed encryption (ignored when `kms_master_key_id` set).
`kms_master_key_id`	`string`	`null`	No	KMS key id/alias/ARN; switches encryption to SSE-KMS.
`kms_data_key_reuse_period_seconds`	`number`	`300`	No	KMS data-key reuse window (60-86400).
`create_dlq`	`bool`	`true`	No	Create a companion DLQ and attach the redrive policy.
`max_receive_count`	`number`	`5`	No	Receives before a message is moved to the DLQ (1-1000).
`dlq_message_retention_seconds`	`number`	`1209600`	No	Retention for the DLQ (60-1209600).
`queue_policy_json`	`string`	`null`	No	Optional JSON resource policy for the primary queue.
`tags`	`map(string)`	`{}`	No	Tags applied to all queues.

Outputs

Name	Description
`queue_id`	URL of the primary queue (the SDK/CLI `QueueUrl`).
`queue_url`	URL of the primary queue (alias of `queue_id`).
`queue_arn`	ARN of the primary queue, for IAM policies and SNS subscriptions.
`queue_name`	Resolved primary queue name including any `.fifo` suffix.
`dlq_id`	URL of the dead-letter queue, or `null` when `create_dlq = false`.
`dlq_arn`	ARN of the dead-letter queue, or `null` when `create_dlq = false`.

Enterprise scenario

A retail platform’s checkout service publishes every confirmed order to an orders-processing.fifo queue created by this module, with deduplication_scope = "messageGroup" and fifo_throughput_limit = "perMessageGroupId" so that orders for the same customer stay strictly ordered while unrelated customers process in parallel at high throughput. The fulfilment Lambda consumes via an event source mapping; if the warehouse API is down and a message fails three times (max_receive_count = 3), it lands on the auto-provisioned DLQ where it is retained for 14 days. A CloudWatch alarm on the DLQ’s ApproximateNumberOfMessagesVisible pages on-call, and once the warehouse API recovers, operators use SQS redrive to replay the parked orders back onto the primary queue with no data loss.

Best practices

Always ship a DLQ and tune maxReceiveCount. Leave create_dlq = true and pick a max_receive_count that matches how many transient failures you expect; too low loses recoverable messages, too high keeps retrying poison messages and burns money. Alarm on DLQ depth, not just on errors.
Set visibility_timeout_seconds above your worker’s worst-case processing time. If a consumer takes longer than the timeout, SQS redelivers the message and you get duplicate processing. A common rule is 6x your function timeout for Lambda event source mappings.
Prefer SSE-KMS with a customer-managed key for sensitive payloads, and raise kms_data_key_reuse_period_seconds. A higher reuse window (e.g. 300s) dramatically cuts KMS API calls and cost on high-volume queues while keeping data encrypted at rest. Use plain SSE-SQS only when KMS is overkill.
Use long polling (receive_wait_time_seconds = 20). It reduces empty ReceiveMessage responses, which lowers request count, cost, and latency compared with short polling — there is rarely a reason to leave it at 0 in production.
Lock down access with a least-privilege queue policy and IAM. Pass queue_policy_json to allow only the specific SNS topic or account that should send, and grant consumers only ReceiveMessage/DeleteMessage/GetQueueAttributes on the exact queue_arn — never sqs:* on *.
Name consistently and let the module own the .fifo suffix. Use service-purpose naming (e.g. orders-processing) and set fifo_queue = true rather than hand-typing .fifo; the module derives both the queue and -dlq names so environments stay predictable and the validation guard catches mistakes.