Quick take — Provision AWS Athena workgroups with Terraform: enforced S3 result locations, KMS-encrypted output, per-query data-scan limits, CloudWatch metrics, and Athena engine v3 — a reusable hashicorp/aws ~> 5.0 module. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "aws" {
region = "us-east-1"
}
module "athena" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-athena?ref=v1.0.0"
name = "..." # Workgroup name; also derives the managed results bucket…
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
Amazon Athena is a serverless, interactive query service that runs ANSI SQL directly over data sitting in Amazon S3 (and, via connectors, other sources) without you provisioning a single cluster. You pay per terabyte of data scanned, which makes Athena cheap to start with and surprisingly easy to turn into a runaway bill — one analyst running SELECT * against an un-partitioned, uncompressed table can scan terabytes in a single click.
The control plane for all of this is the workgroup. A workgroup is where you pin the Athena engine version, force every query to write results to a specific encrypted S3 location, set a hard ceiling on bytes scanned per query, publish CloudWatch metrics, and decide whether client-side settings can override your server-side configuration. Get the workgroup right and Athena is governed and predictable; leave it as primary with defaults and you have no guardrails at all.
This module wraps aws_athena_workgroup (with an optional managed results bucket, a named query, and a data catalog) so that every team gets a workgroup that is encrypted, cost-capped, metric-emitting, and impossible to silently misconfigure — instead of each project hand-rolling its own and forgetting the bytes_scanned_cutoff_per_query limit.
When to use it
- You want per-team or per-environment workgroups (e.g.
analytics-prod,data-eng-dev) so query costs, results, and access can be attributed and isolated rather than everyone sharingprimary. - You need a hard cost guardrail: a maximum bytes-scanned-per-query so a bad query fails fast instead of scanning your entire data lake.
- You require encrypted query results in a controlled S3 bucket (SSE-KMS or SSE-S3) and want to forbid clients from overriding that with
enforce_workgroup_configuration. - You are standardizing on Athena engine version 3 (Trino-based) and want that pinned in code rather than left to the account default.
- You want CloudWatch metrics on query volume and data scanned for dashboards, budgets, and alerting.
- You do not need this for fully ad-hoc, one-off console exploration where no governance matters — there the default
primaryworkgroup is fine.
Module structure
terraform-module-aws-athena/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
main.tf
locals {
# If the caller does not supply an output location, and we are creating a
# bucket, default results to s3://<bucket>/<prefix>/. Otherwise use the
# explicit value (which may be null to inherit account-level settings).
managed_bucket_name = var.create_results_bucket ? "${var.name}-athena-results-${data.aws_caller_identity.current.account_id}" : null
results_output_location = coalesce(
var.output_location,
var.create_results_bucket ? "s3://${local.managed_bucket_name}/${trim(var.results_prefix, "/")}/" : null
)
common_tags = merge(
{
"Module" = "terraform-module-aws-athena"
"Workgroup" = var.name
},
var.tags
)
}
data "aws_caller_identity" "current" {}
# ---------------------------------------------------------------------------
# Optional: managed, encrypted S3 bucket for Athena query results
# ---------------------------------------------------------------------------
resource "aws_s3_bucket" "results" {
count = var.create_results_bucket ? 1 : 0
bucket = local.managed_bucket_name
force_destroy = var.results_bucket_force_destroy
tags = local.common_tags
}
resource "aws_s3_bucket_public_access_block" "results" {
count = var.create_results_bucket ? 1 : 0
bucket = aws_s3_bucket.results[0].id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_bucket_server_side_encryption_configuration" "results" {
count = var.create_results_bucket ? 1 : 0
bucket = aws_s3_bucket.results[0].id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = var.kms_key_arn == null ? "AES256" : "aws:kms"
kms_master_key_id = var.kms_key_arn
}
bucket_key_enabled = var.kms_key_arn != null
}
}
resource "aws_s3_bucket_lifecycle_configuration" "results" {
count = var.create_results_bucket && var.results_expiration_days > 0 ? 1 : 0
bucket = aws_s3_bucket.results[0].id
rule {
id = "expire-query-results"
status = "Enabled"
filter {
prefix = "${trim(var.results_prefix, "/")}/"
}
expiration {
days = var.results_expiration_days
}
}
}
# ---------------------------------------------------------------------------
# Athena workgroup
# ---------------------------------------------------------------------------
resource "aws_athena_workgroup" "this" {
name = var.name
description = var.description
state = var.state
# Prevent Terraform from deleting a workgroup that still has named queries.
force_destroy = var.force_destroy
configuration {
enforce_workgroup_configuration = var.enforce_workgroup_configuration
publish_cloudwatch_metrics_enabled = var.publish_cloudwatch_metrics
bytes_scanned_cutoff_per_query = var.bytes_scanned_cutoff_per_query
requester_pays_enabled = var.requester_pays_enabled
engine_version {
selected_engine_version = var.selected_engine_version
}
result_configuration {
output_location = local.results_output_location
expected_bucket_owner = var.expected_bucket_owner
encryption_configuration {
encryption_option = var.kms_key_arn == null ? "SSE_S3" : "SSE_KMS"
kms_key = var.kms_key_arn
}
dynamic "acl_configuration" {
for_each = var.results_bucket_owner_full_control ? [1] : []
content {
s3_acl_option = "BUCKET_OWNER_FULL_CONTROL"
}
}
}
}
tags = local.common_tags
}
# ---------------------------------------------------------------------------
# Optional: a saved (named) query bound to this workgroup
# ---------------------------------------------------------------------------
resource "aws_athena_named_query" "this" {
for_each = var.named_queries
name = each.key
description = lookup(each.value, "description", null)
database = each.value.database
query = each.value.query
workgroup = aws_athena_workgroup.this.id
}
# ---------------------------------------------------------------------------
# Optional: a Glue-backed data catalog usable from this workgroup
# ---------------------------------------------------------------------------
resource "aws_athena_data_catalog" "this" {
count = var.create_glue_data_catalog ? 1 : 0
name = "${var.name}-catalog"
description = "Glue data catalog for the ${var.name} Athena workgroup"
type = "GLUE"
parameters = {
"catalog-id" = data.aws_caller_identity.current.account_id
}
tags = local.common_tags
}
variables.tf
variable "name" {
description = "Name of the Athena workgroup. Used to derive the managed results bucket and catalog names."
type = string
validation {
condition = can(regex("^[A-Za-z0-9._-]{1,128}$", var.name))
error_message = "Workgroup name must be 1-128 chars and only contain letters, numbers, period, underscore, or hyphen."
}
}
variable "description" {
description = "Human-readable description of the workgroup."
type = string
default = "Managed Athena workgroup"
}
variable "state" {
description = "Whether the workgroup is ENABLED or DISABLED. A DISABLED workgroup rejects new queries."
type = string
default = "ENABLED"
validation {
condition = contains(["ENABLED", "DISABLED"], var.state)
error_message = "state must be either ENABLED or DISABLED."
}
}
variable "selected_engine_version" {
description = "Athena engine version to pin. Use \"Athena engine version 3\" for the Trino-based engine, or \"AUTO\" to let AWS choose."
type = string
default = "Athena engine version 3"
}
variable "enforce_workgroup_configuration" {
description = "If true, workgroup settings (result location, encryption, scan limit) override any client-side settings. Strongly recommended."
type = bool
default = true
}
variable "publish_cloudwatch_metrics" {
description = "Publish per-query CloudWatch metrics (data scanned, query state, run time) under the AmazonAthena/<workgroup> namespace."
type = bool
default = true
}
variable "bytes_scanned_cutoff_per_query" {
description = "Hard upper bound on bytes a single query may scan before it is cancelled. Must be >= 10485760 (10 MB) or null to disable. Defaults to 1 TB."
type = number
default = 1099511627776 # 1 TiB
validation {
condition = var.bytes_scanned_cutoff_per_query == null || var.bytes_scanned_cutoff_per_query >= 10485760
error_message = "bytes_scanned_cutoff_per_query must be null or at least 10485760 (10 MB)."
}
}
variable "requester_pays_enabled" {
description = "If true, queries against requester-pays S3 buckets are allowed (the workgroup pays for requests)."
type = bool
default = false
}
variable "force_destroy" {
description = "Allow Terraform to destroy the workgroup even if it contains named queries. Use with care in production."
type = bool
default = false
}
# --- Query results location / encryption ----------------------------------
variable "output_location" {
description = "Explicit S3 URI for query results, e.g. s3://my-bucket/athena/. If null and create_results_bucket is true, a managed location is used."
type = string
default = null
validation {
condition = var.output_location == null || can(regex("^s3://", var.output_location))
error_message = "output_location must be null or an s3:// URI."
}
}
variable "kms_key_arn" {
description = "KMS key ARN to encrypt query results (SSE_KMS). If null, SSE_S3 (AES256) is used instead."
type = string
default = null
}
variable "expected_bucket_owner" {
description = "AWS account ID expected to own the results bucket, as a guard against bucket-sniping. Null to skip the check."
type = string
default = null
}
variable "results_bucket_owner_full_control" {
description = "If true, write results with the BUCKET_OWNER_FULL_CONTROL canned ACL (useful for cross-account result buckets)."
type = bool
default = false
}
# --- Managed results bucket -----------------------------------------------
variable "create_results_bucket" {
description = "Create and manage a dedicated, encrypted S3 bucket for query results."
type = bool
default = true
}
variable "results_prefix" {
description = "Key prefix inside the managed results bucket where query output is written."
type = string
default = "query-results"
}
variable "results_expiration_days" {
description = "Lifecycle expiration (in days) for objects under results_prefix in the managed bucket. Set 0 to disable expiration."
type = number
default = 30
validation {
condition = var.results_expiration_days >= 0
error_message = "results_expiration_days must be >= 0."
}
}
variable "results_bucket_force_destroy" {
description = "Allow Terraform to delete the managed results bucket even when it still contains objects."
type = bool
default = false
}
# --- Optional sub-resources -----------------------------------------------
variable "named_queries" {
description = "Map of saved named queries to create, keyed by query name. Each value needs database and query; description is optional."
type = map(object({
database = string
query = string
description = optional(string)
}))
default = {}
}
variable "create_glue_data_catalog" {
description = "Register a Glue-backed Athena data catalog (<name>-catalog) for this workgroup."
type = bool
default = false
}
variable "tags" {
description = "Additional tags applied to all resources created by this module."
type = map(string)
default = {}
}
outputs.tf
output "workgroup_id" {
description = "The ID (name) of the Athena workgroup."
value = aws_athena_workgroup.this.id
}
output "workgroup_name" {
description = "The name of the Athena workgroup, to pass as WorkGroup in StartQueryExecution / the console."
value = aws_athena_workgroup.this.name
}
output "workgroup_arn" {
description = "The ARN of the Athena workgroup, for use in IAM policy conditions."
value = aws_athena_workgroup.this.arn
}
output "output_location" {
description = "Effective S3 URI where query results are written."
value = local.results_output_location
}
output "results_bucket_name" {
description = "Name of the managed results bucket, or null if one was not created."
value = try(aws_s3_bucket.results[0].bucket, null)
}
output "results_bucket_arn" {
description = "ARN of the managed results bucket, or null if one was not created."
value = try(aws_s3_bucket.results[0].arn, null)
}
output "named_query_ids" {
description = "Map of named-query name to its Athena named-query ID."
value = { for k, q in aws_athena_named_query.this : k => q.id }
}
output "data_catalog_name" {
description = "Name of the Glue data catalog created for this workgroup, or null if none."
value = try(aws_athena_data_catalog.this[0].name, null)
}
How to use it
module "athena" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-athena?ref=v1.0.0"
name = "analytics-prod"
description = "Production analytics workgroup for the BI team"
# Pin the modern engine and enforce server-side config on every query.
selected_engine_version = "Athena engine version 3"
enforce_workgroup_configuration = true
publish_cloudwatch_metrics = true
# Hard cost guardrail: cancel any single query that tries to scan > 500 GB.
bytes_scanned_cutoff_per_query = 536870912000 # 500 GiB
# Managed, KMS-encrypted results bucket with a 14-day retention.
create_results_bucket = true
kms_key_arn = aws_kms_key.athena.arn
results_expiration_days = 14
named_queries = {
daily_active_users = {
database = "analytics"
description = "DAU rollup used by the executive dashboard"
query = "SELECT event_date, COUNT(DISTINCT user_id) AS dau FROM analytics.events GROUP BY event_date ORDER BY event_date;"
}
}
tags = {
Team = "bi"
Environment = "prod"
CostCenter = "data-platform"
}
}
# Downstream: grant a BI role permission to run queries only in this workgroup,
# using the module's ARN output in the IAM policy condition.
data "aws_iam_policy_document" "bi_athena_access" {
statement {
sid = "RunQueriesInWorkgroup"
effect = "Allow"
actions = [
"athena:StartQueryExecution",
"athena:GetQueryExecution",
"athena:GetQueryResults",
"athena:StopQueryExecution",
]
resources = [module.athena.workgroup_arn]
}
}
resource "aws_iam_role_policy" "bi_athena" {
name = "bi-athena-access"
role = aws_iam_role.bi_analyst.id
policy = data.aws_iam_policy_document.bi_athena_access.json
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "s3"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...s3 state bucket/container + key per path...
}
}
2. Module config — live/prod/athena/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-athena?ref=v1.0.0"
}
inputs = {
name = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/athena && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
name |
string |
— | Yes | Workgroup name; also derives the managed results bucket and catalog names. |
description |
string |
"Managed Athena workgroup" |
No | Human-readable workgroup description. |
state |
string |
"ENABLED" |
No | ENABLED or DISABLED; a disabled workgroup rejects new queries. |
selected_engine_version |
string |
"Athena engine version 3" |
No | Pinned Athena engine version (or AUTO). |
enforce_workgroup_configuration |
bool |
true |
No | Force server-side settings to override client-side settings. |
publish_cloudwatch_metrics |
bool |
true |
No | Publish per-query CloudWatch metrics. |
bytes_scanned_cutoff_per_query |
number |
1099511627776 |
No | Max bytes scanned per query before cancellation (null disables; min 10 MB). |
requester_pays_enabled |
bool |
false |
No | Allow queries against requester-pays S3 buckets. |
force_destroy |
bool |
false |
No | Allow destroying the workgroup even with named queries present. |
output_location |
string |
null |
No | Explicit s3:// results URI; if null, a managed location is used. |
kms_key_arn |
string |
null |
No | KMS key ARN for SSE_KMS result encryption; null falls back to SSE_S3. |
expected_bucket_owner |
string |
null |
No | Expected account ID owning the results bucket (anti-sniping guard). |
results_bucket_owner_full_control |
bool |
false |
No | Write results with the BUCKET_OWNER_FULL_CONTROL ACL. |
create_results_bucket |
bool |
true |
No | Create a dedicated, encrypted results bucket. |
results_prefix |
string |
"query-results" |
No | Key prefix for results inside the managed bucket. |
results_expiration_days |
number |
30 |
No | Lifecycle expiration (days) for results objects; 0 disables. |
results_bucket_force_destroy |
bool |
false |
No | Allow deleting the managed bucket even when non-empty. |
named_queries |
map(object) |
{} |
No | Saved named queries (database, query, optional description) keyed by name. |
create_glue_data_catalog |
bool |
false |
No | Register a Glue-backed Athena data catalog for the workgroup. |
tags |
map(string) |
{} |
No | Additional tags applied to all created resources. |
Outputs
| Name | Description |
|---|---|
workgroup_id |
The ID (name) of the Athena workgroup. |
workgroup_name |
The workgroup name to pass as WorkGroup in StartQueryExecution / the console. |
workgroup_arn |
The workgroup ARN, for IAM policy conditions. |
output_location |
Effective S3 URI where query results are written. |
results_bucket_name |
Name of the managed results bucket, or null if not created. |
results_bucket_arn |
ARN of the managed results bucket, or null if not created. |
named_query_ids |
Map of named-query name to its Athena named-query ID. |
data_catalog_name |
Name of the Glue data catalog created, or null if none. |
Enterprise scenario
A retail data platform team runs a central data lake in S3 and serves dozens of analysts across finance, marketing, and supply-chain. They instantiate this module once per business domain (finance-prod, marketing-prod, supplychain-prod), each with its own KMS-encrypted results bucket, a 500 GiB bytes_scanned_cutoff_per_query to stop accidental full-lake scans, and enforce_workgroup_configuration = true so analysts cannot redirect results to ungoverned buckets. CloudWatch metrics from each workgroup feed a per-domain cost dashboard and a budget alarm, so finance can chargeback Athena spend accurately and the platform team catches a misbehaving query before it scans terabytes.
Best practices
- Always set a
bytes_scanned_cutoff_per_query. Athena bills per TB scanned, so a hard cap is your single most effective cost control — pair it with partitioned, columnar (Parquet/ORC) data so well-written queries scan far below the limit. - Keep
enforce_workgroup_configuration = true. This is what guarantees every query lands in your encrypted result location and respects the scan cap; without it, a client SDK call can override your server-side settings entirely. - Encrypt results with a customer-managed KMS key via
kms_key_arn, restrict that key’s policy to the workgroup’s roles, and setresults_expiration_daysso transient result files do not accumulate cost and exposure. - Use one workgroup per team or environment, never shared
primary. Separate workgroups give you per-team CloudWatch metrics, cost attribution, and IAM scoping (resources = [module.athena.workgroup_arn]) that a single shared workgroup cannot. - Pin the engine version (
Athena engine version 3) rather thanAUTOso query behavior and SQL semantics stay deterministic across AWS-driven upgrades, and roll versions intentionally in lower environments first. - Adopt a consistent naming convention (
<domain>-<env>, e.g.finance-prod) and propagateTeam,Environment, andCostCentertags throughvar.tagsso workgroups, results buckets, and catalogs are all attributable in Cost Explorer.