Terraform Module: AWS X-Ray — Codify Trace Sampling, Groups, and KMS Encryption as One Unit

Quick take — A reusable hashicorp/aws ~> 5.0 Terraform module for AWS X-Ray: version-controlled sampling rules with priority/reservoir/rate, filter-expression trace groups, and account-wide KMS encryption config. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "aws" {
  region = "us-east-1"
}

module "xray" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-xray?ref=v1.0.0"

  rule_name         = "..."  # Unique name of the sampling rule.
  priority          = 0      # Evaluation order, 1–9999; lower runs first. Must be uni…
  reservoir_size    = 0      # Guaranteed traces/sec captured before `fixed_rate` appl…
  fixed_rate        = 0      # Sampling rate (0.0–1.0) for matching requests beyond th…
  group_name        = "..."  # Unique name of the X-Ray group.
  filter_expression = "..."  # X-Ray filter expression scoping the group (e.g. `http.s…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

AWS X-Ray is the distributed-tracing service that stitches together the path a single request takes across your microservices, Lambda functions, queues, and databases — turning a vague “checkout is slow” complaint into a flame graph that says “the 800ms is a synchronous DynamoDB call inside the payments service.” The traces themselves are emitted by the X-Ray SDK or the OpenTelemetry/ADOT collector running next to your code; what you actually configure on the AWS side is the control plane around them: how much of your traffic gets traced (sampling rules), how you slice the resulting traces for dashboards and insights (groups), and whether trace data is encrypted with your own key (encryption config).

That control plane is small but easy to get wrong by hand. Sampling rules are priority-ordered and evaluated highest-first; a single mis-prioritised rule with reservoir_size = 0 and fixed_rate = 0.0 can silently stop tracing an entire service, and the default 1-request-per-second reservoir is rarely what a high-traffic API actually wants. Groups need valid filter expressions (service("payments") AND http.status >= 500) or they create empty, useless views. And the X-Ray encryption setting is account- and region-wide singleton state — exactly the kind of global config that should live in version control with a clear owner, not be flipped in the console by whoever logged in last.

This module wraps aws_xray_sampling_rule together with aws_xray_group and the singleton aws_xray_encryption_config into one var-driven unit. You declare your sampling matrix and your trace groups as data, point at a KMS key, and get back consistent, reviewable tracing configuration with sane production defaults — instead of bespoke, drift-prone X-Ray HCL copied between every service repo.

When to use it

You run a microservices or serverless platform instrumented with the X-Ray SDK or ADOT and want sampling decisions in code, not clicked into the console per service.
You need to trace 100% of error/slow requests for a critical service while keeping a low baseline rate on healthy traffic to control cost — a classic priority-ordered sampling matrix.
You want named X-Ray groups (per service, per status class) so the Service Map, Insights, and CloudWatch group metrics are sliced the way your teams actually triage.
You need to turn on customer-managed KMS encryption for trace data to satisfy a compliance control, and you want that account-wide singleton owned by Terraform with a clear key policy.
You are standardising observability across many accounts/regions and want one reviewed module instead of duplicated sampling and group definitions.

If you only need the default sampling behaviour (1 req/sec + 5% reservoir) and no grouping, you may not need any X-Ray resources at all — the service ships a built-in Default rule. Reach for this module the moment you want anything beyond that default.

Module structure

terraform-module-aws-xray/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  tags = merge(var.tags, {
    ManagedBy = "terraform"
    Module    = "terraform-module-aws-xray"
  })
}

# ---------------------------------------------------------------------------
# Sampling rules
# Priority-ordered (lower number = evaluated first). Each rule reserves a
# guaranteed number of traces/sec (reservoir_size), then samples the remainder
# at fixed_rate. "*" wildcards match any value for that dimension.
# ---------------------------------------------------------------------------
resource "aws_xray_sampling_rule" "this" {
  for_each = { for r in var.sampling_rules : r.rule_name => r }

  rule_name = each.value.rule_name
  priority  = each.value.priority
  version   = 1

  # Sampling knobs.
  reservoir_size = each.value.reservoir_size
  fixed_rate     = each.value.fixed_rate

  # Matching dimensions — which requests this rule applies to.
  service_name = each.value.service_name
  service_type = each.value.service_type
  host         = each.value.host
  http_method  = each.value.http_method
  url_path     = each.value.url_path
  resource_arn = each.value.resource_arn

  attributes = each.value.attributes

  tags = local.tags
}

# ---------------------------------------------------------------------------
# Trace groups
# A group is a saved filter expression that drives the Service Map view,
# group-scoped CloudWatch metrics, and (optionally) X-Ray Insights.
# ---------------------------------------------------------------------------
resource "aws_xray_group" "this" {
  for_each = { for g in var.groups : g.group_name => g }

  group_name        = each.value.group_name
  filter_expression = each.value.filter_expression

  insights_configuration {
    insights_enabled      = each.value.insights_enabled
    notifications_enabled = each.value.notifications_enabled
  }

  tags = local.tags
}

# ---------------------------------------------------------------------------
# Encryption configuration (account + region singleton)
# Only one of these exists per account/region. Manage it here so the choice
# of SSE-S3 (NONE) vs a customer-managed KMS key is version-controlled.
# ---------------------------------------------------------------------------
resource "aws_xray_encryption_config" "this" {
  count = var.manage_encryption_config ? 1 : 0

  type   = var.encryption_kms_key_id != null ? "KMS" : "NONE"
  key_id = var.encryption_kms_key_id
}

variables.tf

variable "sampling_rules" {
  description = <<-EOT
    List of X-Ray sampling rules. Rules are evaluated by ascending priority
    (1 first). Each rule guarantees `reservoir_size` traces/sec, then samples
    additional matching requests at `fixed_rate` (0.0-1.0). Use "*" to match any
    value for a dimension. Define a low-priority catch-all so every request is
    covered by some rule.
  EOT
  type = list(object({
    rule_name      = string
    priority       = number
    reservoir_size = number
    fixed_rate     = number
    service_name   = optional(string, "*")
    service_type   = optional(string, "*")
    host           = optional(string, "*")
    http_method    = optional(string, "*")
    url_path       = optional(string, "*")
    resource_arn   = optional(string, "*")
    attributes     = optional(map(string), null)
  }))
  default = []

  validation {
    condition     = alltrue([for r in var.sampling_rules : r.fixed_rate >= 0 && r.fixed_rate <= 1])
    error_message = "fixed_rate must be between 0.0 and 1.0 for every sampling rule."
  }

  validation {
    condition     = alltrue([for r in var.sampling_rules : r.priority >= 1 && r.priority <= 9999])
    error_message = "priority must be between 1 and 9999 for every sampling rule."
  }

  validation {
    condition     = alltrue([for r in var.sampling_rules : r.reservoir_size >= 0 && floor(r.reservoir_size) == r.reservoir_size])
    error_message = "reservoir_size must be a non-negative integer for every sampling rule."
  }

  validation {
    condition     = length(distinct([for r in var.sampling_rules : r.rule_name])) == length(var.sampling_rules)
    error_message = "sampling rule_name values must be unique."
  }

  validation {
    condition     = length(distinct([for r in var.sampling_rules : r.priority])) == length(var.sampling_rules)
    error_message = "sampling rule priorities must be unique so evaluation order is deterministic."
  }
}

variable "groups" {
  description = <<-EOT
    List of X-Ray groups. Each is a saved filter expression that scopes the
    Service Map, group CloudWatch metrics, and optional Insights. Example
    filter: service("payments") AND http.status >= 500
  EOT
  type = list(object({
    group_name            = string
    filter_expression     = string
    insights_enabled      = optional(bool, false)
    notifications_enabled = optional(bool, false)
  }))
  default = []

  validation {
    condition     = alltrue([for g in var.groups : length(g.filter_expression) > 0])
    error_message = "filter_expression must be a non-empty X-Ray filter expression for every group."
  }

  validation {
    condition     = length(distinct([for g in var.groups : g.group_name])) == length(var.groups)
    error_message = "group_name values must be unique."
  }

  validation {
    # Notifications require Insights to be enabled on the group.
    condition     = alltrue([for g in var.groups : g.notifications_enabled == false || g.insights_enabled == true])
    error_message = "notifications_enabled can only be true when insights_enabled is also true."
  }
}

variable "manage_encryption_config" {
  description = <<-EOT
    Whether this module manages the account+region X-Ray encryption singleton.
    Set true in exactly ONE Terraform configuration per account/region to avoid
    two states fighting over the same global setting.
  EOT
  type    = bool
  default = false
}

variable "encryption_kms_key_id" {
  description = <<-EOT
    KMS key ARN/ID used to encrypt X-Ray trace data when manage_encryption_config
    is true. The key policy must grant the X-Ray service principal
    kms:GenerateDataKey* and kms:Decrypt. Null = AWS-owned key (type NONE).
  EOT
  type    = string
  default = null

  validation {
    condition     = var.encryption_kms_key_id == null || can(regex("^(arn:aws[a-z-]*:kms:|[0-9a-f-]{36}$|alias/)", var.encryption_kms_key_id))
    error_message = "encryption_kms_key_id must be a KMS key ARN, key UUID, alias, or null."
  }
}

variable "tags" {
  description = "Tags applied to sampling rules and groups created by the module."
  type        = map(string)
  default     = {}
}

outputs.tf

output "sampling_rule_arns" {
  description = "Map of rule_name => sampling rule ARN."
  value       = { for k, r in aws_xray_sampling_rule.this : k => r.arn }
}

output "sampling_rule_names" {
  description = "List of sampling rule names managed by this module."
  value       = [for r in aws_xray_sampling_rule.this : r.rule_name]
}

output "group_arns" {
  description = "Map of group_name => X-Ray group ARN."
  value       = { for k, g in aws_xray_group.this : k => g.arn }
}

output "group_names" {
  description = "List of X-Ray group names managed by this module."
  value       = [for g in aws_xray_group.this : g.group_name]
}

output "encryption_type" {
  description = "Active X-Ray encryption type (KMS or NONE), if managed by this module."
  value       = var.manage_encryption_config ? aws_xray_encryption_config.this[0].type : null
}

output "encryption_key_id" {
  description = "KMS key used for X-Ray trace encryption, if managed and set to KMS."
  value       = var.manage_encryption_config ? aws_xray_encryption_config.this[0].key_id : null
}

How to use it

This example traces a payments platform: 100% of payment-service errors are captured at top priority, a separate rule samples healthy POST /checkout traffic at 5%, and a low-priority catch-all keeps a thin baseline across everything else. Two groups slice the Service Map by faults and by the payments service (with Insights on), and trace data is encrypted with a customer-managed KMS key. A downstream CloudWatch alarm consumes the module’s group ARN to page on a fault spike.

module "x_ray" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-xray?ref=v1.0.0"

  sampling_rules = [
    {
      rule_name      = "payments-errors-always"
      priority       = 100
      reservoir_size = 1
      fixed_rate     = 1.0 # capture 100% of matching traffic
      service_name   = "payments"
      http_method    = "*"
      url_path       = "*"
    },
    {
      rule_name      = "checkout-baseline"
      priority       = 200
      reservoir_size = 2
      fixed_rate     = 0.05 # 5% of healthy checkout traffic
      service_name   = "*"
      http_method    = "POST"
      url_path       = "/checkout"
    },
    {
      rule_name      = "catch-all"
      priority       = 9000
      reservoir_size = 1
      fixed_rate     = 0.01 # thin 1% baseline everywhere else
    },
  ]

  groups = [
    {
      group_name        = "faults-5xx"
      filter_expression = "fault = true OR http.status >= 500"
    },
    {
      group_name            = "payments-service"
      filter_expression     = "service(\"payments\")"
      insights_enabled      = true
      notifications_enabled = true
    },
  ]

  manage_encryption_config = true
  encryption_kms_key_id    = aws_kms_key.xray.arn

  tags = {
    Environment = "prod"
    Team        = "payments-platform"
  }
}

# Downstream: alarm on a fault spike for the "faults-5xx" group using its ARN.
resource "aws_cloudwatch_metric_alarm" "xray_faults" {
  alarm_name          = "xray-faults-5xx-spike"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  period              = 300
  threshold           = 25
  statistic           = "Sum"
  namespace           = "AWS/X-Ray"
  metric_name         = "FaultRate"
  treat_missing_data  = "notBreaching"

  dimensions = {
    GroupARN = module.x_ray.group_arns["faults-5xx"]
  }

  alarm_actions = [aws_sns_topic.observability.arn]
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "s3"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...s3 state bucket/container + key per path...
  }
}

2. Module config — live/prod/xray/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-xray?ref=v1.0.0"
}

inputs = {
  rule_name = "..."
  priority = 0
  reservoir_size = 0
  fixed_rate = 0
  group_name = "..."
  filter_expression = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/xray && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
`sampling_rules`	`list(object)`	`[]`	No	Priority-ordered sampling rules (`reservoir_size`, `fixed_rate`, and match dimensions). Validated for unique names/priorities and rate 0.0–1.0.
`groups`	`list(object)`	`[]`	No	X-Ray groups: a `filter_expression` plus optional Insights/notifications toggles.
`manage_encryption_config`	`bool`	`false`	No	Manage the account+region encryption singleton from this config. Enable in only one place per account/region.
`encryption_kms_key_id`	`string`	`null`	No	KMS key ARN/ID/alias for trace encryption; `null` uses the AWS-owned key (type `NONE`).
`tags`	`map(string)`	`{}`	No	Tags applied to sampling rules and groups.

Per-rule fields (`sampling_rules[*]`)

Name	Type	Default	Required	Description
`rule_name`	`string`	—	Yes	Unique name of the sampling rule.
`priority`	`number`	—	Yes	Evaluation order, 1–9999; lower runs first. Must be unique.
`reservoir_size`	`number`	—	Yes	Guaranteed traces/sec captured before `fixed_rate` applies.
`fixed_rate`	`number`	—	Yes	Sampling rate (0.0–1.0) for matching requests beyond the reservoir.
`service_name`	`string`	`"*"`	No	Match on the instrumented service name.
`service_type`	`string`	`"*"`	No	Match on origin (e.g. `AWS::Lambda::Function`).
`host`	`string`	`"*"`	No	Match on the `Host` header.
`http_method`	`string`	`"*"`	No	Match on HTTP method.
`url_path`	`string`	`"*"`	No	Match on request path.
`resource_arn`	`string`	`"*"`	No	Match on the ARN of the AWS resource the rule applies to.
`attributes`	`map(string)`	`null`	No	Match on custom trace attributes (segment annotations).

Per-group fields (`groups[*]`)

Name	Type	Default	Required	Description
`group_name`	`string`	—	Yes	Unique name of the X-Ray group.
`filter_expression`	`string`	—	Yes	X-Ray filter expression scoping the group (e.g. `http.status >= 500`).
`insights_enabled`	`bool`	`false`	No	Enable X-Ray Insights anomaly detection for the group.
`notifications_enabled`	`bool`	`false`	No	Send Insights notifications (requires `insights_enabled = true`).

Outputs

Name	Description
`sampling_rule_arns`	Map of `rule_name` to sampling rule ARN.
`sampling_rule_names`	List of sampling rule names managed by the module.
`group_arns`	Map of `group_name` to X-Ray group ARN.
`group_names`	List of X-Ray group names managed by the module.
`encryption_type`	Active encryption type (`KMS` or `NONE`), if managed here.
`encryption_key_id`	KMS key used for trace encryption, if managed and set to `KMS`.

Enterprise scenario

A retail company runs ~40 microservices behind an API Gateway, all instrumented with the ADOT collector. The platform team deploys this module once per region from a shared observability Terraform stack: a sampling matrix captures 100% of any request with a fault, 10% of checkout and payment traffic, and a 2% catch-all everywhere else, keeping the X-Ray bill predictable during Black Friday while never dropping an error trace. Named groups (faults-5xx, checkout-funnel, per-domain service groups) drive team-specific Service Map dashboards and Insights, and manage_encryption_config = true with a customer-managed KMS key satisfies the PCI requirement that trace payloads — which can carry request metadata — are encrypted with a key the company controls and can rotate.

Best practices

Always define a low-priority catch-all rule. Sampling rules are evaluated highest-priority-first and the first match wins; if no custom rule matches, X-Ray falls back to the built-in Default (1/sec + 5%). A priority = 9000 catch-all with a small fixed_rate makes the floor explicit and version-controlled instead of relying on an invisible default.
Reserve the reservoir for low-traffic services, lean on fixed_rate for high-traffic ones. reservoir_size guarantees N traces/sec before the percentage kicks in — ideal so a quiet service still produces some traces. On a high-RPS API, the reservoir is a rounding error and fixed_rate is what actually governs both visibility and cost, so tune it deliberately rather than leaving the 5% default.
Trace 100% of errors, sample the happy path. Put a top-priority rule (fixed_rate = 1.0) scoped to fault/error conditions or your most critical service, and keep the broad baseline low. You get complete error traces for debugging without paying to record millions of identical healthy requests.
Treat encryption config as a singleton with one owner. aws_xray_encryption_config is account+region global — two Terraform states managing it will ping-pong the setting on every apply. Set manage_encryption_config = true in exactly one stack per account/region, and scope the KMS key policy to grant the X-Ray service principal only kms:GenerateDataKey* and kms:Decrypt.
Write filter expressions that match how teams triage, and name groups accordingly. A group named faults-5xx with fault = true OR http.status >= 500 is immediately greppable and reusable as a CloudWatch dimension; vague names like group1 and overly broad expressions produce noisy, low-signal Service Maps and Insights.
Keep sampling and groups in code, not the console. Because both are easy to flip by hand, drift is the norm without IaC. Managing them through this one module — with unique-name/priority validations catching mistakes at plan time — keeps tracing behaviour consistent across services and reviewable in pull requests.

Terraform Module: AWS X-Ray — Codify Trace Sampling, Groups, and KMS Encryption as One Unit

Quickstart (copy-paste)

What this module is

When to use it

Module structure

versions.tf

main.tf

variables.tf

outputs.tf

How to use it

With Terragrunt

Inputs

Per-rule fields (`sampling_rules[*]`)

Per-group fields (`groups[*]`)

Outputs

Enterprise scenario

Best practices

Written by Vinod

Comments

Keep Reading

The Terraform Architecting Ladder: From a Single Module to an Enterprise IaC Platform

HashiCorp Terraform Associate (003) Prep Kit: Objectives, Practice Questions & Cheat Sheet

Terraform Fundamentals: HCL, Providers, State & the Core Workflow

Terraform Module: AWS X-Ray — Codify Trace Sampling, Groups, and KMS Encryption as One Unit

Quickstart (copy-paste)

What this module is

When to use it

Module structure

versions.tf

main.tf

variables.tf

outputs.tf

How to use it

With Terragrunt

Inputs

Per-rule fields (sampling_rules[*])

Per-group fields (groups[*])

Outputs

Enterprise scenario

Best practices

Written by Vinod

Comments

Keep Reading

The Terraform Architecting Ladder: From a Single Module to an Enterprise IaC Platform

HashiCorp Terraform Associate (003) Prep Kit: Objectives, Practice Questions & Cheat Sheet

Terraform Fundamentals: HCL, Providers, State & the Core Workflow

Per-rule fields (`sampling_rules[*]`)

Per-group fields (`groups[*]`)