Quick take — A reusable hashicorp/aws ~> 5.0 Terraform module for AWS X-Ray: version-controlled sampling rules with priority/reservoir/rate, filter-expression trace groups, and account-wide KMS encryption config. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "aws" {
region = "us-east-1"
}
module "xray" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-xray?ref=v1.0.0"
rule_name = "..." # Unique name of the sampling rule.
priority = 0 # Evaluation order, 1–9999; lower runs first. Must be uni…
reservoir_size = 0 # Guaranteed traces/sec captured before `fixed_rate` appl…
fixed_rate = 0 # Sampling rate (0.0–1.0) for matching requests beyond th…
group_name = "..." # Unique name of the X-Ray group.
filter_expression = "..." # X-Ray filter expression scoping the group (e.g. `http.s…
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
AWS X-Ray is the distributed-tracing service that stitches together the path a single request takes across your microservices, Lambda functions, queues, and databases — turning a vague “checkout is slow” complaint into a flame graph that says “the 800ms is a synchronous DynamoDB call inside the payments service.” The traces themselves are emitted by the X-Ray SDK or the OpenTelemetry/ADOT collector running next to your code; what you actually configure on the AWS side is the control plane around them: how much of your traffic gets traced (sampling rules), how you slice the resulting traces for dashboards and insights (groups), and whether trace data is encrypted with your own key (encryption config).
That control plane is small but easy to get wrong by hand. Sampling rules are priority-ordered and evaluated highest-first; a single mis-prioritised rule with reservoir_size = 0 and fixed_rate = 0.0 can silently stop tracing an entire service, and the default 1-request-per-second reservoir is rarely what a high-traffic API actually wants. Groups need valid filter expressions (service("payments") AND http.status >= 500) or they create empty, useless views. And the X-Ray encryption setting is account- and region-wide singleton state — exactly the kind of global config that should live in version control with a clear owner, not be flipped in the console by whoever logged in last.
This module wraps aws_xray_sampling_rule together with aws_xray_group and the singleton aws_xray_encryption_config into one var-driven unit. You declare your sampling matrix and your trace groups as data, point at a KMS key, and get back consistent, reviewable tracing configuration with sane production defaults — instead of bespoke, drift-prone X-Ray HCL copied between every service repo.
When to use it
- You run a microservices or serverless platform instrumented with the X-Ray SDK or ADOT and want sampling decisions in code, not clicked into the console per service.
- You need to trace 100% of error/slow requests for a critical service while keeping a low baseline rate on healthy traffic to control cost — a classic priority-ordered sampling matrix.
- You want named X-Ray groups (per service, per status class) so the Service Map, Insights, and CloudWatch group metrics are sliced the way your teams actually triage.
- You need to turn on customer-managed KMS encryption for trace data to satisfy a compliance control, and you want that account-wide singleton owned by Terraform with a clear key policy.
- You are standardising observability across many accounts/regions and want one reviewed module instead of duplicated sampling and group definitions.
If you only need the default sampling behaviour (1 req/sec + 5% reservoir) and no grouping, you may not need any X-Ray resources at all — the service ships a built-in Default rule. Reach for this module the moment you want anything beyond that default.
Module structure
terraform-module-aws-xray/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
main.tf
locals {
tags = merge(var.tags, {
ManagedBy = "terraform"
Module = "terraform-module-aws-xray"
})
}
# ---------------------------------------------------------------------------
# Sampling rules
# Priority-ordered (lower number = evaluated first). Each rule reserves a
# guaranteed number of traces/sec (reservoir_size), then samples the remainder
# at fixed_rate. "*" wildcards match any value for that dimension.
# ---------------------------------------------------------------------------
resource "aws_xray_sampling_rule" "this" {
for_each = { for r in var.sampling_rules : r.rule_name => r }
rule_name = each.value.rule_name
priority = each.value.priority
version = 1
# Sampling knobs.
reservoir_size = each.value.reservoir_size
fixed_rate = each.value.fixed_rate
# Matching dimensions — which requests this rule applies to.
service_name = each.value.service_name
service_type = each.value.service_type
host = each.value.host
http_method = each.value.http_method
url_path = each.value.url_path
resource_arn = each.value.resource_arn
attributes = each.value.attributes
tags = local.tags
}
# ---------------------------------------------------------------------------
# Trace groups
# A group is a saved filter expression that drives the Service Map view,
# group-scoped CloudWatch metrics, and (optionally) X-Ray Insights.
# ---------------------------------------------------------------------------
resource "aws_xray_group" "this" {
for_each = { for g in var.groups : g.group_name => g }
group_name = each.value.group_name
filter_expression = each.value.filter_expression
insights_configuration {
insights_enabled = each.value.insights_enabled
notifications_enabled = each.value.notifications_enabled
}
tags = local.tags
}
# ---------------------------------------------------------------------------
# Encryption configuration (account + region singleton)
# Only one of these exists per account/region. Manage it here so the choice
# of SSE-S3 (NONE) vs a customer-managed KMS key is version-controlled.
# ---------------------------------------------------------------------------
resource "aws_xray_encryption_config" "this" {
count = var.manage_encryption_config ? 1 : 0
type = var.encryption_kms_key_id != null ? "KMS" : "NONE"
key_id = var.encryption_kms_key_id
}
variables.tf
variable "sampling_rules" {
description = <<-EOT
List of X-Ray sampling rules. Rules are evaluated by ascending priority
(1 first). Each rule guarantees `reservoir_size` traces/sec, then samples
additional matching requests at `fixed_rate` (0.0-1.0). Use "*" to match any
value for a dimension. Define a low-priority catch-all so every request is
covered by some rule.
EOT
type = list(object({
rule_name = string
priority = number
reservoir_size = number
fixed_rate = number
service_name = optional(string, "*")
service_type = optional(string, "*")
host = optional(string, "*")
http_method = optional(string, "*")
url_path = optional(string, "*")
resource_arn = optional(string, "*")
attributes = optional(map(string), null)
}))
default = []
validation {
condition = alltrue([for r in var.sampling_rules : r.fixed_rate >= 0 && r.fixed_rate <= 1])
error_message = "fixed_rate must be between 0.0 and 1.0 for every sampling rule."
}
validation {
condition = alltrue([for r in var.sampling_rules : r.priority >= 1 && r.priority <= 9999])
error_message = "priority must be between 1 and 9999 for every sampling rule."
}
validation {
condition = alltrue([for r in var.sampling_rules : r.reservoir_size >= 0 && floor(r.reservoir_size) == r.reservoir_size])
error_message = "reservoir_size must be a non-negative integer for every sampling rule."
}
validation {
condition = length(distinct([for r in var.sampling_rules : r.rule_name])) == length(var.sampling_rules)
error_message = "sampling rule_name values must be unique."
}
validation {
condition = length(distinct([for r in var.sampling_rules : r.priority])) == length(var.sampling_rules)
error_message = "sampling rule priorities must be unique so evaluation order is deterministic."
}
}
variable "groups" {
description = <<-EOT
List of X-Ray groups. Each is a saved filter expression that scopes the
Service Map, group CloudWatch metrics, and optional Insights. Example
filter: service("payments") AND http.status >= 500
EOT
type = list(object({
group_name = string
filter_expression = string
insights_enabled = optional(bool, false)
notifications_enabled = optional(bool, false)
}))
default = []
validation {
condition = alltrue([for g in var.groups : length(g.filter_expression) > 0])
error_message = "filter_expression must be a non-empty X-Ray filter expression for every group."
}
validation {
condition = length(distinct([for g in var.groups : g.group_name])) == length(var.groups)
error_message = "group_name values must be unique."
}
validation {
# Notifications require Insights to be enabled on the group.
condition = alltrue([for g in var.groups : g.notifications_enabled == false || g.insights_enabled == true])
error_message = "notifications_enabled can only be true when insights_enabled is also true."
}
}
variable "manage_encryption_config" {
description = <<-EOT
Whether this module manages the account+region X-Ray encryption singleton.
Set true in exactly ONE Terraform configuration per account/region to avoid
two states fighting over the same global setting.
EOT
type = bool
default = false
}
variable "encryption_kms_key_id" {
description = <<-EOT
KMS key ARN/ID used to encrypt X-Ray trace data when manage_encryption_config
is true. The key policy must grant the X-Ray service principal
kms:GenerateDataKey* and kms:Decrypt. Null = AWS-owned key (type NONE).
EOT
type = string
default = null
validation {
condition = var.encryption_kms_key_id == null || can(regex("^(arn:aws[a-z-]*:kms:|[0-9a-f-]{36}$|alias/)", var.encryption_kms_key_id))
error_message = "encryption_kms_key_id must be a KMS key ARN, key UUID, alias, or null."
}
}
variable "tags" {
description = "Tags applied to sampling rules and groups created by the module."
type = map(string)
default = {}
}
outputs.tf
output "sampling_rule_arns" {
description = "Map of rule_name => sampling rule ARN."
value = { for k, r in aws_xray_sampling_rule.this : k => r.arn }
}
output "sampling_rule_names" {
description = "List of sampling rule names managed by this module."
value = [for r in aws_xray_sampling_rule.this : r.rule_name]
}
output "group_arns" {
description = "Map of group_name => X-Ray group ARN."
value = { for k, g in aws_xray_group.this : k => g.arn }
}
output "group_names" {
description = "List of X-Ray group names managed by this module."
value = [for g in aws_xray_group.this : g.group_name]
}
output "encryption_type" {
description = "Active X-Ray encryption type (KMS or NONE), if managed by this module."
value = var.manage_encryption_config ? aws_xray_encryption_config.this[0].type : null
}
output "encryption_key_id" {
description = "KMS key used for X-Ray trace encryption, if managed and set to KMS."
value = var.manage_encryption_config ? aws_xray_encryption_config.this[0].key_id : null
}
How to use it
This example traces a payments platform: 100% of payment-service errors are captured at top priority, a separate rule samples healthy POST /checkout traffic at 5%, and a low-priority catch-all keeps a thin baseline across everything else. Two groups slice the Service Map by faults and by the payments service (with Insights on), and trace data is encrypted with a customer-managed KMS key. A downstream CloudWatch alarm consumes the module’s group ARN to page on a fault spike.
module "x_ray" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-xray?ref=v1.0.0"
sampling_rules = [
{
rule_name = "payments-errors-always"
priority = 100
reservoir_size = 1
fixed_rate = 1.0 # capture 100% of matching traffic
service_name = "payments"
http_method = "*"
url_path = "*"
},
{
rule_name = "checkout-baseline"
priority = 200
reservoir_size = 2
fixed_rate = 0.05 # 5% of healthy checkout traffic
service_name = "*"
http_method = "POST"
url_path = "/checkout"
},
{
rule_name = "catch-all"
priority = 9000
reservoir_size = 1
fixed_rate = 0.01 # thin 1% baseline everywhere else
},
]
groups = [
{
group_name = "faults-5xx"
filter_expression = "fault = true OR http.status >= 500"
},
{
group_name = "payments-service"
filter_expression = "service(\"payments\")"
insights_enabled = true
notifications_enabled = true
},
]
manage_encryption_config = true
encryption_kms_key_id = aws_kms_key.xray.arn
tags = {
Environment = "prod"
Team = "payments-platform"
}
}
# Downstream: alarm on a fault spike for the "faults-5xx" group using its ARN.
resource "aws_cloudwatch_metric_alarm" "xray_faults" {
alarm_name = "xray-faults-5xx-spike"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 1
period = 300
threshold = 25
statistic = "Sum"
namespace = "AWS/X-Ray"
metric_name = "FaultRate"
treat_missing_data = "notBreaching"
dimensions = {
GroupARN = module.x_ray.group_arns["faults-5xx"]
}
alarm_actions = [aws_sns_topic.observability.arn]
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "s3"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...s3 state bucket/container + key per path...
}
}
2. Module config — live/prod/xray/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-xray?ref=v1.0.0"
}
inputs = {
rule_name = "..."
priority = 0
reservoir_size = 0
fixed_rate = 0
group_name = "..."
filter_expression = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/xray && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
sampling_rules |
list(object) |
[] |
No | Priority-ordered sampling rules (reservoir_size, fixed_rate, and match dimensions). Validated for unique names/priorities and rate 0.0–1.0. |
groups |
list(object) |
[] |
No | X-Ray groups: a filter_expression plus optional Insights/notifications toggles. |
manage_encryption_config |
bool |
false |
No | Manage the account+region encryption singleton from this config. Enable in only one place per account/region. |
encryption_kms_key_id |
string |
null |
No | KMS key ARN/ID/alias for trace encryption; null uses the AWS-owned key (type NONE). |
tags |
map(string) |
{} |
No | Tags applied to sampling rules and groups. |
Per-rule fields (sampling_rules[*])
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
rule_name |
string |
— | Yes | Unique name of the sampling rule. |
priority |
number |
— | Yes | Evaluation order, 1–9999; lower runs first. Must be unique. |
reservoir_size |
number |
— | Yes | Guaranteed traces/sec captured before fixed_rate applies. |
fixed_rate |
number |
— | Yes | Sampling rate (0.0–1.0) for matching requests beyond the reservoir. |
service_name |
string |
"*" |
No | Match on the instrumented service name. |
service_type |
string |
"*" |
No | Match on origin (e.g. AWS::Lambda::Function). |
host |
string |
"*" |
No | Match on the Host header. |
http_method |
string |
"*" |
No | Match on HTTP method. |
url_path |
string |
"*" |
No | Match on request path. |
resource_arn |
string |
"*" |
No | Match on the ARN of the AWS resource the rule applies to. |
attributes |
map(string) |
null |
No | Match on custom trace attributes (segment annotations). |
Per-group fields (groups[*])
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
group_name |
string |
— | Yes | Unique name of the X-Ray group. |
filter_expression |
string |
— | Yes | X-Ray filter expression scoping the group (e.g. http.status >= 500). |
insights_enabled |
bool |
false |
No | Enable X-Ray Insights anomaly detection for the group. |
notifications_enabled |
bool |
false |
No | Send Insights notifications (requires insights_enabled = true). |
Outputs
| Name | Description |
|---|---|
sampling_rule_arns |
Map of rule_name to sampling rule ARN. |
sampling_rule_names |
List of sampling rule names managed by the module. |
group_arns |
Map of group_name to X-Ray group ARN. |
group_names |
List of X-Ray group names managed by the module. |
encryption_type |
Active encryption type (KMS or NONE), if managed here. |
encryption_key_id |
KMS key used for trace encryption, if managed and set to KMS. |
Enterprise scenario
A retail company runs ~40 microservices behind an API Gateway, all instrumented with the ADOT collector. The platform team deploys this module once per region from a shared observability Terraform stack: a sampling matrix captures 100% of any request with a fault, 10% of checkout and payment traffic, and a 2% catch-all everywhere else, keeping the X-Ray bill predictable during Black Friday while never dropping an error trace. Named groups (faults-5xx, checkout-funnel, per-domain service groups) drive team-specific Service Map dashboards and Insights, and manage_encryption_config = true with a customer-managed KMS key satisfies the PCI requirement that trace payloads — which can carry request metadata — are encrypted with a key the company controls and can rotate.
Best practices
- Always define a low-priority catch-all rule. Sampling rules are evaluated highest-priority-first and the first match wins; if no custom rule matches, X-Ray falls back to the built-in
Default(1/sec + 5%). Apriority = 9000catch-all with a smallfixed_ratemakes the floor explicit and version-controlled instead of relying on an invisible default. - Reserve the reservoir for low-traffic services, lean on
fixed_ratefor high-traffic ones.reservoir_sizeguarantees N traces/sec before the percentage kicks in — ideal so a quiet service still produces some traces. On a high-RPS API, the reservoir is a rounding error andfixed_rateis what actually governs both visibility and cost, so tune it deliberately rather than leaving the 5% default. - Trace 100% of errors, sample the happy path. Put a top-priority rule (
fixed_rate = 1.0) scoped to fault/error conditions or your most critical service, and keep the broad baseline low. You get complete error traces for debugging without paying to record millions of identical healthy requests. - Treat encryption config as a singleton with one owner.
aws_xray_encryption_configis account+region global — two Terraform states managing it will ping-pong the setting on every apply. Setmanage_encryption_config = truein exactly one stack per account/region, and scope the KMS key policy to grant the X-Ray service principal onlykms:GenerateDataKey*andkms:Decrypt. - Write filter expressions that match how teams triage, and name groups accordingly. A group named
faults-5xxwithfault = true OR http.status >= 500is immediately greppable and reusable as a CloudWatch dimension; vague names likegroup1and overly broad expressions produce noisy, low-signal Service Maps and Insights. - Keep sampling and groups in code, not the console. Because both are easy to flip by hand, drift is the norm without IaC. Managing them through this one module — with unique-name/priority validations catching mistakes at
plantime — keeps tracing behaviour consistent across services and reviewable in pull requests.