IaC AWS

Terraform Module: AWS SQS Queue — Production-Ready Queues with DLQ, Encryption & Redrive

Quick take — A reusable hashicorp/aws ~> 5.0 Terraform module for AWS SQS that wires up FIFO/standard queues, a dead-letter queue with redrive policy, SSE-KMS encryption, and least-privilege access policies. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "aws" {
  region = "us-east-1"
}

module "sqs" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-sqs?ref=v1.0.0"

  name = "..."  # Base queue name (no `.fifo` suffix; 1-75 chars, alphanu…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Amazon SQS (Simple Queue Service) is a fully managed message queue that decouples producers from consumers so that a downstream service can fail, slow down, or scale independently without dropping work. A single aws_sqs_queue resource looks trivial, but a production queue is never just one resource: you almost always also need a dead-letter queue (DLQ) to catch poison messages, a redrive policy that ties the two together, server-side encryption (SSE-KMS or SSE-SQS), and a queue policy that grants only the specific principals that should be allowed to send or receive.

This module wraps all of that into one opinionated, var-driven unit. You pass a name and a few knobs, and it gives you back a correctly-configured primary queue, an optional companion DLQ with a sane maxReceiveCount, encryption-at-rest, and the queue ARN/URL outputs that every consumer Lambda, ECS task, or IAM policy downstream needs. It handles the FIFO-vs-standard quirks (the .fifo suffix, content-based deduplication, throughput limits) so that callers do not have to remember them every time they stand up a new queue.

When to use it

Reach for this module when you want a queue that is safe to run in production rather than a throwaway demo queue:

If you only need an ephemeral, unencrypted queue for a local experiment, the raw resource is fine. The moment a queue carries real data across team or account boundaries, use the module so encryption, DLQ, and access policy are not forgotten.

Module structure

terraform-module-aws-sqs/
├── versions.tf      # provider + Terraform version constraints
├── main.tf          # primary queue, DLQ, redrive, queue policy
├── variables.tf     # all tunable inputs with validation
└── outputs.tf       # ids, ARNs, URLs for the queue and DLQ

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  # FIFO queues must end in ".fifo"; standard queues must not.
  queue_name = var.fifo_queue ? "${var.name}.fifo" : var.name
  dlq_name   = var.fifo_queue ? "${var.name}-dlq.fifo" : "${var.name}-dlq"

  # Pick SSE-KMS when a key is provided, otherwise fall back to SSE-SQS
  # (managed) encryption when sse_enabled is true.
  use_kms = var.kms_master_key_id != null
}

# Optional dead-letter queue. Created only when create_dlq = true.
resource "aws_sqs_queue" "dlq" {
  count = var.create_dlq ? 1 : 0

  name                        = local.dlq_name
  fifo_queue                  = var.fifo_queue
  content_based_deduplication = var.fifo_queue ? var.content_based_deduplication : null

  # Keep failed messages long enough to investigate (default 14 days).
  message_retention_seconds = var.dlq_message_retention_seconds

  sqs_managed_sse_enabled = local.use_kms ? null : var.sse_enabled
  kms_master_key_id       = local.use_kms ? var.kms_master_key_id : null
  kms_data_key_reuse_period_seconds = local.use_kms ? var.kms_data_key_reuse_period_seconds : null

  tags = merge(var.tags, { "queue-role" = "dead-letter" })
}

# Primary queue.
resource "aws_sqs_queue" "this" {
  name                        = local.queue_name
  fifo_queue                  = var.fifo_queue
  content_based_deduplication = var.fifo_queue ? var.content_based_deduplication : null
  deduplication_scope         = var.fifo_queue ? var.deduplication_scope : null
  fifo_throughput_limit       = var.fifo_queue ? var.fifo_throughput_limit : null

  visibility_timeout_seconds = var.visibility_timeout_seconds
  message_retention_seconds  = var.message_retention_seconds
  delay_seconds              = var.delay_seconds
  max_message_size           = var.max_message_size
  receive_wait_time_seconds  = var.receive_wait_time_seconds

  # Encryption at rest: SSE-KMS when a key is supplied, else SSE-SQS.
  sqs_managed_sse_enabled           = local.use_kms ? null : var.sse_enabled
  kms_master_key_id                 = local.use_kms ? var.kms_master_key_id : null
  kms_data_key_reuse_period_seconds = local.use_kms ? var.kms_data_key_reuse_period_seconds : null

  # Wire the DLQ in via a redrive policy when the DLQ exists.
  redrive_policy = var.create_dlq ? jsonencode({
    deadLetterTargetArn = aws_sqs_queue.dlq[0].arn
    maxReceiveCount     = var.max_receive_count
  }) : null

  tags = merge(var.tags, { "queue-role" = "primary" })
}

# Allow the DLQ to accept redriven messages only from this source queue.
resource "aws_sqs_queue_redrive_allow_policy" "dlq" {
  count = var.create_dlq ? 1 : 0

  queue_url = aws_sqs_queue.dlq[0].id

  redrive_allow_policy = jsonencode({
    redrivePermission = "byQueue"
    sourceQueueArns   = [aws_sqs_queue.this.arn]
  })
}

# Optional resource-based access policy (e.g. let an SNS topic publish).
resource "aws_sqs_queue_policy" "this" {
  count = var.queue_policy_json != null ? 1 : 0

  queue_url = aws_sqs_queue.this.id
  policy    = var.queue_policy_json
}

variables.tf

variable "name" {
  description = "Base name of the queue. The module appends '.fifo' automatically for FIFO queues; do not include the suffix yourself."
  type        = string

  validation {
    condition     = !endswith(var.name, ".fifo")
    error_message = "Do not include the '.fifo' suffix in var.name; set fifo_queue = true instead."
  }

  validation {
    condition     = can(regex("^[A-Za-z0-9_-]{1,75}$", var.name))
    error_message = "Queue name must be 1-75 chars (the module reserves room for the '-dlq.fifo' suffix) using only alphanumerics, hyphens, and underscores."
  }
}

variable "fifo_queue" {
  description = "If true, create FIFO queues (ordered, exactly-once). If false, create standard queues."
  type        = bool
  default     = false
}

variable "content_based_deduplication" {
  description = "FIFO only. Enable content-based deduplication so SQS derives the dedup ID from the message body."
  type        = bool
  default     = false
}

variable "deduplication_scope" {
  description = "FIFO only. Scope of deduplication: 'messageGroup' or 'queue'. Required as 'messageGroup' for high throughput FIFO."
  type        = string
  default     = "queue"

  validation {
    condition     = contains(["messageGroup", "queue"], var.deduplication_scope)
    error_message = "deduplication_scope must be either 'messageGroup' or 'queue'."
  }
}

variable "fifo_throughput_limit" {
  description = "FIFO only. Throughput quota scope: 'perQueue' or 'perMessageGroupId' (high throughput FIFO)."
  type        = string
  default     = "perQueue"

  validation {
    condition     = contains(["perQueue", "perMessageGroupId"], var.fifo_throughput_limit)
    error_message = "fifo_throughput_limit must be 'perQueue' or 'perMessageGroupId'."
  }
}

variable "visibility_timeout_seconds" {
  description = "How long a message stays invisible after a consumer receives it (0-43200). Set higher than your worker's max processing time."
  type        = number
  default     = 30

  validation {
    condition     = var.visibility_timeout_seconds >= 0 && var.visibility_timeout_seconds <= 43200
    error_message = "visibility_timeout_seconds must be between 0 and 43200 (12 hours)."
  }
}

variable "message_retention_seconds" {
  description = "How long SQS keeps a message that is not deleted (60-1209600). Default 4 days."
  type        = number
  default     = 345600

  validation {
    condition     = var.message_retention_seconds >= 60 && var.message_retention_seconds <= 1209600
    error_message = "message_retention_seconds must be between 60 and 1209600 (14 days)."
  }
}

variable "delay_seconds" {
  description = "Delay before a message becomes available for delivery (0-900)."
  type        = number
  default     = 0

  validation {
    condition     = var.delay_seconds >= 0 && var.delay_seconds <= 900
    error_message = "delay_seconds must be between 0 and 900 (15 minutes)."
  }
}

variable "max_message_size" {
  description = "Maximum message size in bytes (1024-262144). Default 256 KiB."
  type        = number
  default     = 262144

  validation {
    condition     = var.max_message_size >= 1024 && var.max_message_size <= 262144
    error_message = "max_message_size must be between 1024 and 262144 bytes (256 KiB)."
  }
}

variable "receive_wait_time_seconds" {
  description = "Long-poll wait time (0-20). Set to 20 to reduce empty receives and API cost."
  type        = number
  default     = 0

  validation {
    condition     = var.receive_wait_time_seconds >= 0 && var.receive_wait_time_seconds <= 20
    error_message = "receive_wait_time_seconds must be between 0 and 20."
  }
}

variable "sse_enabled" {
  description = "Enable SQS-managed (SSE-SQS) encryption at rest. Ignored when kms_master_key_id is set (SSE-KMS takes over)."
  type        = bool
  default     = true
}

variable "kms_master_key_id" {
  description = "KMS key id, alias (e.g. 'alias/my-key'), or ARN for SSE-KMS encryption. When set, SSE-KMS is used instead of SSE-SQS."
  type        = string
  default     = null
}

variable "kms_data_key_reuse_period_seconds" {
  description = "Seconds SQS can reuse a KMS data key before calling KMS again (60-86400). Higher = fewer KMS calls (lower cost)."
  type        = number
  default     = 300

  validation {
    condition     = var.kms_data_key_reuse_period_seconds >= 60 && var.kms_data_key_reuse_period_seconds <= 86400
    error_message = "kms_data_key_reuse_period_seconds must be between 60 and 86400 (24 hours)."
  }
}

variable "create_dlq" {
  description = "If true, create a companion dead-letter queue and attach a redrive policy to the primary queue."
  type        = bool
  default     = true
}

variable "max_receive_count" {
  description = "Number of times a message can be received before being moved to the DLQ."
  type        = number
  default     = 5

  validation {
    condition     = var.max_receive_count >= 1 && var.max_receive_count <= 1000
    error_message = "max_receive_count must be between 1 and 1000."
  }
}

variable "dlq_message_retention_seconds" {
  description = "Retention for the dead-letter queue (60-1209600). Default 14 days to maximise investigation time."
  type        = number
  default     = 1209600

  validation {
    condition     = var.dlq_message_retention_seconds >= 60 && var.dlq_message_retention_seconds <= 1209600
    error_message = "dlq_message_retention_seconds must be between 60 and 1209600 (14 days)."
  }
}

variable "queue_policy_json" {
  description = "Optional JSON resource policy attached to the primary queue (e.g. to allow an SNS topic to send). Null to skip."
  type        = string
  default     = null
}

variable "tags" {
  description = "Tags applied to all queues created by this module."
  type        = map(string)
  default     = {}
}

outputs.tf

output "queue_id" {
  description = "The URL of the primary SQS queue (used by the SDK/CLI as QueueUrl)."
  value       = aws_sqs_queue.this.id
}

output "queue_url" {
  description = "The URL of the primary SQS queue (alias of queue_id, for readability)."
  value       = aws_sqs_queue.this.url
}

output "queue_arn" {
  description = "The ARN of the primary SQS queue (use in IAM policies and SNS subscriptions)."
  value       = aws_sqs_queue.this.arn
}

output "queue_name" {
  description = "The resolved name of the primary queue (including any '.fifo' suffix)."
  value       = aws_sqs_queue.this.name
}

output "dlq_id" {
  description = "The URL of the dead-letter queue, or null when create_dlq is false."
  value       = try(aws_sqs_queue.dlq[0].id, null)
}

output "dlq_arn" {
  description = "The ARN of the dead-letter queue, or null when create_dlq is false."
  value       = try(aws_sqs_queue.dlq[0].arn, null)
}

How to use it

module "orders_queue" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-sqs?ref=v1.0.0"

  name                       = "orders-processing"
  fifo_queue                 = true
  content_based_deduplication = true
  deduplication_scope        = "messageGroup"
  fifo_throughput_limit      = "perMessageGroupId"

  # Workers can take up to 5 minutes; keep messages invisible for 6.
  visibility_timeout_seconds = 360
  receive_wait_time_seconds  = 20 # long polling

  # Encrypt with a customer-managed key.
  kms_master_key_id          = aws_kms_key.orders.arn

  # Poison messages go to the DLQ after 3 failed attempts.
  create_dlq                 = true
  max_receive_count          = 3

  tags = {
    Environment = "prod"
    Team        = "fulfilment"
    CostCentre  = "ECOM-204"
  }
}

# Downstream: grant a consumer Lambda permission to read from the queue
# using the module's queue_arn output.
data "aws_iam_policy_document" "consumer" {
  statement {
    sid    = "ConsumeOrders"
    effect = "Allow"
    actions = [
      "sqs:ReceiveMessage",
      "sqs:DeleteMessage",
      "sqs:GetQueueAttributes",
    ]
    resources = [module.orders_queue.queue_arn]
  }
}

# Downstream: an event source mapping wiring the queue to a Lambda,
# referencing the queue URL/ARN outputs.
resource "aws_lambda_event_source_mapping" "orders" {
  event_source_arn = module.orders_queue.queue_arn
  function_name    = aws_lambda_function.order_worker.arn
  batch_size       = 10
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "s3"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...s3 state bucket/container + key per path...
  }
}

2. Module configlive/prod/sqs/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-sqs?ref=v1.0.0"
}

inputs = {
  name = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/sqs && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
name string Yes Base queue name (no .fifo suffix; 1-75 chars, alphanumerics/-/_).
fifo_queue bool false No Create FIFO (ordered, exactly-once) queues instead of standard.
content_based_deduplication bool false No FIFO only. Derive the dedup ID from the message body.
deduplication_scope string "queue" No FIFO only. messageGroup or queue.
fifo_throughput_limit string "perQueue" No FIFO only. perQueue or perMessageGroupId (high throughput).
visibility_timeout_seconds number 30 No Invisibility window after receive (0-43200).
message_retention_seconds number 345600 No Retention for the primary queue (60-1209600).
delay_seconds number 0 No Delivery delay for new messages (0-900).
max_message_size number 262144 No Max message size in bytes (1024-262144).
receive_wait_time_seconds number 0 No Long-poll wait time (0-20).
sse_enabled bool true No Enable SSE-SQS managed encryption (ignored when kms_master_key_id set).
kms_master_key_id string null No KMS key id/alias/ARN; switches encryption to SSE-KMS.
kms_data_key_reuse_period_seconds number 300 No KMS data-key reuse window (60-86400).
create_dlq bool true No Create a companion DLQ and attach the redrive policy.
max_receive_count number 5 No Receives before a message is moved to the DLQ (1-1000).
dlq_message_retention_seconds number 1209600 No Retention for the DLQ (60-1209600).
queue_policy_json string null No Optional JSON resource policy for the primary queue.
tags map(string) {} No Tags applied to all queues.

Outputs

Name Description
queue_id URL of the primary queue (the SDK/CLI QueueUrl).
queue_url URL of the primary queue (alias of queue_id).
queue_arn ARN of the primary queue, for IAM policies and SNS subscriptions.
queue_name Resolved primary queue name including any .fifo suffix.
dlq_id URL of the dead-letter queue, or null when create_dlq = false.
dlq_arn ARN of the dead-letter queue, or null when create_dlq = false.

Enterprise scenario

A retail platform’s checkout service publishes every confirmed order to an orders-processing.fifo queue created by this module, with deduplication_scope = "messageGroup" and fifo_throughput_limit = "perMessageGroupId" so that orders for the same customer stay strictly ordered while unrelated customers process in parallel at high throughput. The fulfilment Lambda consumes via an event source mapping; if the warehouse API is down and a message fails three times (max_receive_count = 3), it lands on the auto-provisioned DLQ where it is retained for 14 days. A CloudWatch alarm on the DLQ’s ApproximateNumberOfMessagesVisible pages on-call, and once the warehouse API recovers, operators use SQS redrive to replay the parked orders back onto the primary queue with no data loss.

Best practices

TerraformAWSSQS QueueModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading