Terraform Module: AWS Athena — governed, cost-capped serverless SQL workgroups

Quick take — Provision AWS Athena workgroups with Terraform: enforced S3 result locations, KMS-encrypted output, per-query data-scan limits, CloudWatch metrics, and Athena engine v3 — a reusable hashicorp/aws ~> 5.0 module. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "aws" {
  region = "us-east-1"
}

module "athena" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-athena?ref=v1.0.0"

  name = "..."  # Workgroup name; also derives the managed results bucket…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Amazon Athena is a serverless, interactive query service that runs ANSI SQL directly over data sitting in Amazon S3 (and, via connectors, other sources) without you provisioning a single cluster. You pay per terabyte of data scanned, which makes Athena cheap to start with and surprisingly easy to turn into a runaway bill — one analyst running SELECT * against an un-partitioned, uncompressed table can scan terabytes in a single click.

The control plane for all of this is the workgroup. A workgroup is where you pin the Athena engine version, force every query to write results to a specific encrypted S3 location, set a hard ceiling on bytes scanned per query, publish CloudWatch metrics, and decide whether client-side settings can override your server-side configuration. Get the workgroup right and Athena is governed and predictable; leave it as primary with defaults and you have no guardrails at all.

This module wraps aws_athena_workgroup (with an optional managed results bucket, a named query, and a data catalog) so that every team gets a workgroup that is encrypted, cost-capped, metric-emitting, and impossible to silently misconfigure — instead of each project hand-rolling its own and forgetting the bytes_scanned_cutoff_per_query limit.

When to use it

You want per-team or per-environment workgroups (e.g. analytics-prod, data-eng-dev) so query costs, results, and access can be attributed and isolated rather than everyone sharing primary.
You need a hard cost guardrail: a maximum bytes-scanned-per-query so a bad query fails fast instead of scanning your entire data lake.
You require encrypted query results in a controlled S3 bucket (SSE-KMS or SSE-S3) and want to forbid clients from overriding that with enforce_workgroup_configuration.
You are standardizing on Athena engine version 3 (Trino-based) and want that pinned in code rather than left to the account default.
You want CloudWatch metrics on query volume and data scanned for dashboards, budgets, and alerting.
You do not need this for fully ad-hoc, one-off console exploration where no governance matters — there the default primary workgroup is fine.

Module structure

terraform-module-aws-athena/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  # If the caller does not supply an output location, and we are creating a
  # bucket, default results to s3://<bucket>/<prefix>/. Otherwise use the
  # explicit value (which may be null to inherit account-level settings).
  managed_bucket_name = var.create_results_bucket ? "${var.name}-athena-results-${data.aws_caller_identity.current.account_id}" : null

  results_output_location = coalesce(
    var.output_location,
    var.create_results_bucket ? "s3://${local.managed_bucket_name}/${trim(var.results_prefix, "/")}/" : null
  )

  common_tags = merge(
    {
      "Module"    = "terraform-module-aws-athena"
      "Workgroup" = var.name
    },
    var.tags
  )
}

data "aws_caller_identity" "current" {}

# ---------------------------------------------------------------------------
# Optional: managed, encrypted S3 bucket for Athena query results
# ---------------------------------------------------------------------------
resource "aws_s3_bucket" "results" {
  count = var.create_results_bucket ? 1 : 0

  bucket        = local.managed_bucket_name
  force_destroy = var.results_bucket_force_destroy

  tags = local.common_tags
}

resource "aws_s3_bucket_public_access_block" "results" {
  count = var.create_results_bucket ? 1 : 0

  bucket = aws_s3_bucket.results[0].id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_s3_bucket_server_side_encryption_configuration" "results" {
  count = var.create_results_bucket ? 1 : 0

  bucket = aws_s3_bucket.results[0].id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = var.kms_key_arn == null ? "AES256" : "aws:kms"
      kms_master_key_id = var.kms_key_arn
    }
    bucket_key_enabled = var.kms_key_arn != null
  }
}

resource "aws_s3_bucket_lifecycle_configuration" "results" {
  count = var.create_results_bucket && var.results_expiration_days > 0 ? 1 : 0

  bucket = aws_s3_bucket.results[0].id

  rule {
    id     = "expire-query-results"
    status = "Enabled"

    filter {
      prefix = "${trim(var.results_prefix, "/")}/"
    }

    expiration {
      days = var.results_expiration_days
    }
  }
}

# ---------------------------------------------------------------------------
# Athena workgroup
# ---------------------------------------------------------------------------
resource "aws_athena_workgroup" "this" {
  name        = var.name
  description = var.description
  state       = var.state

  # Prevent Terraform from deleting a workgroup that still has named queries.
  force_destroy = var.force_destroy

  configuration {
    enforce_workgroup_configuration    = var.enforce_workgroup_configuration
    publish_cloudwatch_metrics_enabled = var.publish_cloudwatch_metrics
    bytes_scanned_cutoff_per_query     = var.bytes_scanned_cutoff_per_query
    requester_pays_enabled             = var.requester_pays_enabled

    engine_version {
      selected_engine_version = var.selected_engine_version
    }

    result_configuration {
      output_location       = local.results_output_location
      expected_bucket_owner = var.expected_bucket_owner

      encryption_configuration {
        encryption_option = var.kms_key_arn == null ? "SSE_S3" : "SSE_KMS"
        kms_key           = var.kms_key_arn
      }

      dynamic "acl_configuration" {
        for_each = var.results_bucket_owner_full_control ? [1] : []
        content {
          s3_acl_option = "BUCKET_OWNER_FULL_CONTROL"
        }
      }
    }
  }

  tags = local.common_tags
}

# ---------------------------------------------------------------------------
# Optional: a saved (named) query bound to this workgroup
# ---------------------------------------------------------------------------
resource "aws_athena_named_query" "this" {
  for_each = var.named_queries

  name        = each.key
  description = lookup(each.value, "description", null)
  database    = each.value.database
  query       = each.value.query
  workgroup   = aws_athena_workgroup.this.id
}

# ---------------------------------------------------------------------------
# Optional: a Glue-backed data catalog usable from this workgroup
# ---------------------------------------------------------------------------
resource "aws_athena_data_catalog" "this" {
  count = var.create_glue_data_catalog ? 1 : 0

  name        = "${var.name}-catalog"
  description = "Glue data catalog for the ${var.name} Athena workgroup"
  type        = "GLUE"

  parameters = {
    "catalog-id" = data.aws_caller_identity.current.account_id
  }

  tags = local.common_tags
}

variables.tf

variable "name" {
  description = "Name of the Athena workgroup. Used to derive the managed results bucket and catalog names."
  type        = string

  validation {
    condition     = can(regex("^[A-Za-z0-9._-]{1,128}$", var.name))
    error_message = "Workgroup name must be 1-128 chars and only contain letters, numbers, period, underscore, or hyphen."
  }
}

variable "description" {
  description = "Human-readable description of the workgroup."
  type        = string
  default     = "Managed Athena workgroup"
}

variable "state" {
  description = "Whether the workgroup is ENABLED or DISABLED. A DISABLED workgroup rejects new queries."
  type        = string
  default     = "ENABLED"

  validation {
    condition     = contains(["ENABLED", "DISABLED"], var.state)
    error_message = "state must be either ENABLED or DISABLED."
  }
}

variable "selected_engine_version" {
  description = "Athena engine version to pin. Use \"Athena engine version 3\" for the Trino-based engine, or \"AUTO\" to let AWS choose."
  type        = string
  default     = "Athena engine version 3"
}

variable "enforce_workgroup_configuration" {
  description = "If true, workgroup settings (result location, encryption, scan limit) override any client-side settings. Strongly recommended."
  type        = bool
  default     = true
}

variable "publish_cloudwatch_metrics" {
  description = "Publish per-query CloudWatch metrics (data scanned, query state, run time) under the AmazonAthena/<workgroup> namespace."
  type        = bool
  default     = true
}

variable "bytes_scanned_cutoff_per_query" {
  description = "Hard upper bound on bytes a single query may scan before it is cancelled. Must be >= 10485760 (10 MB) or null to disable. Defaults to 1 TB."
  type        = number
  default     = 1099511627776 # 1 TiB

  validation {
    condition     = var.bytes_scanned_cutoff_per_query == null || var.bytes_scanned_cutoff_per_query >= 10485760
    error_message = "bytes_scanned_cutoff_per_query must be null or at least 10485760 (10 MB)."
  }
}

variable "requester_pays_enabled" {
  description = "If true, queries against requester-pays S3 buckets are allowed (the workgroup pays for requests)."
  type        = bool
  default     = false
}

variable "force_destroy" {
  description = "Allow Terraform to destroy the workgroup even if it contains named queries. Use with care in production."
  type        = bool
  default     = false
}

# --- Query results location / encryption ----------------------------------

variable "output_location" {
  description = "Explicit S3 URI for query results, e.g. s3://my-bucket/athena/. If null and create_results_bucket is true, a managed location is used."
  type        = string
  default     = null

  validation {
    condition     = var.output_location == null || can(regex("^s3://", var.output_location))
    error_message = "output_location must be null or an s3:// URI."
  }
}

variable "kms_key_arn" {
  description = "KMS key ARN to encrypt query results (SSE_KMS). If null, SSE_S3 (AES256) is used instead."
  type        = string
  default     = null
}

variable "expected_bucket_owner" {
  description = "AWS account ID expected to own the results bucket, as a guard against bucket-sniping. Null to skip the check."
  type        = string
  default     = null
}

variable "results_bucket_owner_full_control" {
  description = "If true, write results with the BUCKET_OWNER_FULL_CONTROL canned ACL (useful for cross-account result buckets)."
  type        = bool
  default     = false
}

# --- Managed results bucket -----------------------------------------------

variable "create_results_bucket" {
  description = "Create and manage a dedicated, encrypted S3 bucket for query results."
  type        = bool
  default     = true
}

variable "results_prefix" {
  description = "Key prefix inside the managed results bucket where query output is written."
  type        = string
  default     = "query-results"
}

variable "results_expiration_days" {
  description = "Lifecycle expiration (in days) for objects under results_prefix in the managed bucket. Set 0 to disable expiration."
  type        = number
  default     = 30

  validation {
    condition     = var.results_expiration_days >= 0
    error_message = "results_expiration_days must be >= 0."
  }
}

variable "results_bucket_force_destroy" {
  description = "Allow Terraform to delete the managed results bucket even when it still contains objects."
  type        = bool
  default     = false
}

# --- Optional sub-resources -----------------------------------------------

variable "named_queries" {
  description = "Map of saved named queries to create, keyed by query name. Each value needs database and query; description is optional."
  type = map(object({
    database    = string
    query       = string
    description = optional(string)
  }))
  default = {}
}

variable "create_glue_data_catalog" {
  description = "Register a Glue-backed Athena data catalog (<name>-catalog) for this workgroup."
  type        = bool
  default     = false
}

variable "tags" {
  description = "Additional tags applied to all resources created by this module."
  type        = map(string)
  default     = {}
}

outputs.tf

output "workgroup_id" {
  description = "The ID (name) of the Athena workgroup."
  value       = aws_athena_workgroup.this.id
}

output "workgroup_name" {
  description = "The name of the Athena workgroup, to pass as WorkGroup in StartQueryExecution / the console."
  value       = aws_athena_workgroup.this.name
}

output "workgroup_arn" {
  description = "The ARN of the Athena workgroup, for use in IAM policy conditions."
  value       = aws_athena_workgroup.this.arn
}

output "output_location" {
  description = "Effective S3 URI where query results are written."
  value       = local.results_output_location
}

output "results_bucket_name" {
  description = "Name of the managed results bucket, or null if one was not created."
  value       = try(aws_s3_bucket.results[0].bucket, null)
}

output "results_bucket_arn" {
  description = "ARN of the managed results bucket, or null if one was not created."
  value       = try(aws_s3_bucket.results[0].arn, null)
}

output "named_query_ids" {
  description = "Map of named-query name to its Athena named-query ID."
  value       = { for k, q in aws_athena_named_query.this : k => q.id }
}

output "data_catalog_name" {
  description = "Name of the Glue data catalog created for this workgroup, or null if none."
  value       = try(aws_athena_data_catalog.this[0].name, null)
}

How to use it

module "athena" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-athena?ref=v1.0.0"

  name        = "analytics-prod"
  description = "Production analytics workgroup for the BI team"

  # Pin the modern engine and enforce server-side config on every query.
  selected_engine_version         = "Athena engine version 3"
  enforce_workgroup_configuration = true
  publish_cloudwatch_metrics      = true

  # Hard cost guardrail: cancel any single query that tries to scan > 500 GB.
  bytes_scanned_cutoff_per_query = 536870912000 # 500 GiB

  # Managed, KMS-encrypted results bucket with a 14-day retention.
  create_results_bucket   = true
  kms_key_arn             = aws_kms_key.athena.arn
  results_expiration_days = 14

  named_queries = {
    daily_active_users = {
      database    = "analytics"
      description = "DAU rollup used by the executive dashboard"
      query       = "SELECT event_date, COUNT(DISTINCT user_id) AS dau FROM analytics.events GROUP BY event_date ORDER BY event_date;"
    }
  }

  tags = {
    Team        = "bi"
    Environment = "prod"
    CostCenter  = "data-platform"
  }
}

# Downstream: grant a BI role permission to run queries only in this workgroup,
# using the module's ARN output in the IAM policy condition.
data "aws_iam_policy_document" "bi_athena_access" {
  statement {
    sid    = "RunQueriesInWorkgroup"
    effect = "Allow"
    actions = [
      "athena:StartQueryExecution",
      "athena:GetQueryExecution",
      "athena:GetQueryResults",
      "athena:StopQueryExecution",
    ]
    resources = [module.athena.workgroup_arn]
  }
}

resource "aws_iam_role_policy" "bi_athena" {
  name   = "bi-athena-access"
  role   = aws_iam_role.bi_analyst.id
  policy = data.aws_iam_policy_document.bi_athena_access.json
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "s3"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...s3 state bucket/container + key per path...
  }
}

2. Module config — live/prod/athena/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-athena?ref=v1.0.0"
}

inputs = {
  name = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/athena && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
`name`	`string`	—	Yes	Workgroup name; also derives the managed results bucket and catalog names.
`description`	`string`	`"Managed Athena workgroup"`	No	Human-readable workgroup description.
`state`	`string`	`"ENABLED"`	No	`ENABLED` or `DISABLED`; a disabled workgroup rejects new queries.
`selected_engine_version`	`string`	`"Athena engine version 3"`	No	Pinned Athena engine version (or `AUTO`).
`enforce_workgroup_configuration`	`bool`	`true`	No	Force server-side settings to override client-side settings.
`publish_cloudwatch_metrics`	`bool`	`true`	No	Publish per-query CloudWatch metrics.
`bytes_scanned_cutoff_per_query`	`number`	`1099511627776`	No	Max bytes scanned per query before cancellation (`null` disables; min 10 MB).
`requester_pays_enabled`	`bool`	`false`	No	Allow queries against requester-pays S3 buckets.
`force_destroy`	`bool`	`false`	No	Allow destroying the workgroup even with named queries present.
`output_location`	`string`	`null`	No	Explicit `s3://` results URI; if null, a managed location is used.
`kms_key_arn`	`string`	`null`	No	KMS key ARN for `SSE_KMS` result encryption; null falls back to `SSE_S3`.
`expected_bucket_owner`	`string`	`null`	No	Expected account ID owning the results bucket (anti-sniping guard).
`results_bucket_owner_full_control`	`bool`	`false`	No	Write results with the `BUCKET_OWNER_FULL_CONTROL` ACL.
`create_results_bucket`	`bool`	`true`	No	Create a dedicated, encrypted results bucket.
`results_prefix`	`string`	`"query-results"`	No	Key prefix for results inside the managed bucket.
`results_expiration_days`	`number`	`30`	No	Lifecycle expiration (days) for results objects; `0` disables.
`results_bucket_force_destroy`	`bool`	`false`	No	Allow deleting the managed bucket even when non-empty.
`named_queries`	`map(object)`	`{}`	No	Saved named queries (`database`, `query`, optional `description`) keyed by name.
`create_glue_data_catalog`	`bool`	`false`	No	Register a Glue-backed Athena data catalog for the workgroup.
`tags`	`map(string)`	`{}`	No	Additional tags applied to all created resources.

Outputs

Name	Description
`workgroup_id`	The ID (name) of the Athena workgroup.
`workgroup_name`	The workgroup name to pass as `WorkGroup` in StartQueryExecution / the console.
`workgroup_arn`	The workgroup ARN, for IAM policy conditions.
`output_location`	Effective S3 URI where query results are written.
`results_bucket_name`	Name of the managed results bucket, or `null` if not created.
`results_bucket_arn`	ARN of the managed results bucket, or `null` if not created.
`named_query_ids`	Map of named-query name to its Athena named-query ID.
`data_catalog_name`	Name of the Glue data catalog created, or `null` if none.

Enterprise scenario

A retail data platform team runs a central data lake in S3 and serves dozens of analysts across finance, marketing, and supply-chain. They instantiate this module once per business domain (finance-prod, marketing-prod, supplychain-prod), each with its own KMS-encrypted results bucket, a 500 GiB bytes_scanned_cutoff_per_query to stop accidental full-lake scans, and enforce_workgroup_configuration = true so analysts cannot redirect results to ungoverned buckets. CloudWatch metrics from each workgroup feed a per-domain cost dashboard and a budget alarm, so finance can chargeback Athena spend accurately and the platform team catches a misbehaving query before it scans terabytes.

Best practices

Always set a bytes_scanned_cutoff_per_query. Athena bills per TB scanned, so a hard cap is your single most effective cost control — pair it with partitioned, columnar (Parquet/ORC) data so well-written queries scan far below the limit.
Keep enforce_workgroup_configuration = true. This is what guarantees every query lands in your encrypted result location and respects the scan cap; without it, a client SDK call can override your server-side settings entirely.
Encrypt results with a customer-managed KMS key via kms_key_arn, restrict that key’s policy to the workgroup’s roles, and set results_expiration_days so transient result files do not accumulate cost and exposure.
Use one workgroup per team or environment, never shared primary. Separate workgroups give you per-team CloudWatch metrics, cost attribution, and IAM scoping (resources = [module.athena.workgroup_arn]) that a single shared workgroup cannot.
Pin the engine version (Athena engine version 3) rather than AUTO so query behavior and SQL semantics stay deterministic across AWS-driven upgrades, and roll versions intentionally in lower environments first.
Adopt a consistent naming convention (<domain>-<env>, e.g. finance-prod) and propagate Team, Environment, and CostCenter tags through var.tags so workgroups, results buckets, and catalogs are all attributable in Cost Explorer.