Quick take — A reusable Terraform module that registers S3 locations with AWS Lake Formation and grants fine-grained, auditable database, table, and column permissions to IAM principals — without brittle bucket policies. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "aws" {
region = "us-east-1"
}
module "lake_formation" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-lake-formation?ref=v1.0.0"
s3_resource_arn = "..." # ARN of the S3 bucket/prefix to register with Lake Forma…
database_name = "..." # Glue/Lake Formation database the grants target.
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
AWS Lake Formation is a governance layer that sits on top of an S3-backed data lake and the AWS Glue Data Catalog. Instead of hand-crafting S3 bucket policies and IAM statements for every analyst, ETL job, and BI tool, you register the underlying S3 location with Lake Formation once, then grant database-, table-, and column-level permissions to IAM principals. Athena, Redshift Spectrum, EMR, and Glue all honour those grants, and every access decision is centralized and auditable.
Two resources do the heavy lifting, and they are easy to get subtly wrong:
aws_lakeformation_resource— registers an S3 path with Lake Formation. You choose whether Lake Formation uses a service-linked role or your ownrole_arnto vend temporary credentials for that path. Register the wrong prefix, or forget the trailing slash semantics, and downstream grants silently resolve to nothing.aws_lakeformation_permissions— the actual grant. It is a tri-state resource: you target either adatabase, atable, or atable_with_columnsblock, and you passpermissionsplus optionalpermissions_with_grant_option. Mixing the wrong principal, or grantingSELECTon a database (where onlyDESCRIBE/CREATE_TABLEare valid), produces an apply-time error or a no-op grant.
Wrapping this in a module gives every team a single, version-pinned way to onboard a data domain: register the bucket prefix, grant a read role and a write role with consistent, least-privilege permission sets, and emit the catalog IDs that downstream Athena workgroups and Glue jobs reference. The module hides the tri-state quirks and the service-linked-role decision behind validated variables.
When to use it
- You run an S3 data lake catalogued in AWS Glue and want to retire per-bucket IAM/S3 policies in favour of central, table-level governance.
- Multiple consumers (analysts via Athena, ETL via Glue, BI via Redshift Spectrum) need different slices of the same dataset — e.g. analysts get
SELECTon non-PII columns only, while the pipeline role gets fullALTER/INSERT. - You need an audit trail of who can read which table/column, satisfying data-governance or regulatory review.
- You are standing up a new data domain or lake-house zone (raw / curated / consumption) and want each zone registered and permissioned identically across accounts.
Skip it if your lake has a single trusted consumer and no column-level requirements — plain IAM may be simpler. Also note Lake Formation governs the catalog + S3 credential vending; it does not replace KMS encryption or VPC controls on the bucket itself.
Module structure
terraform-module-aws-lake-formation/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
main.tf
locals {
# Lake Formation grant resources require the account that owns the catalog.
catalog_id = coalesce(var.catalog_id, data.aws_caller_identity.current.account_id)
# Normalise principal -> permission-set maps into a flat list we can for_each.
database_grants = [
for principal, perms in var.database_permissions : {
principal = principal
perms = perms
}
]
table_grants = [
for principal, perms in var.table_permissions : {
principal = principal
perms = perms
}
]
# Column-scoped grants: each entry pins a principal to an explicit column list.
column_grants = {
for grant in var.column_permissions :
"${grant.principal}:${grant.table}" => grant
}
}
data "aws_caller_identity" "current" {}
# ---------------------------------------------------------------------------
# Register the S3 location with Lake Formation so it can vend credentials and
# enforce catalog permissions on objects under this prefix.
# ---------------------------------------------------------------------------
resource "aws_lakeformation_resource" "this" {
arn = var.s3_resource_arn
# When role_arn is null, Lake Formation uses its service-linked role.
role_arn = var.registration_role_arn
# Hybrid mode keeps existing IAM/S3 access working alongside LF permissions;
# set false once you are ready to enforce LF-only access.
use_service_linked_role = var.registration_role_arn == null
hybrid_access_enabled = var.hybrid_access_enabled
}
# ---------------------------------------------------------------------------
# Database-level grants (e.g. DESCRIBE, CREATE_TABLE) for catalog discovery.
# ---------------------------------------------------------------------------
resource "aws_lakeformation_permissions" "database" {
for_each = { for g in local.database_grants : g.principal => g }
principal = each.value.principal
permissions = each.value.perms.permissions
permissions_with_grant_option = each.value.perms.grant_options
catalog_id = local.catalog_id
database {
name = var.database_name
catalog_id = local.catalog_id
}
# Ensure the location is registered before we hand out access to it.
depends_on = [aws_lakeformation_resource.this]
}
# ---------------------------------------------------------------------------
# Whole-table grants (SELECT / INSERT / ALTER / DELETE / DROP) for ETL roles
# and consumers that need every column.
# ---------------------------------------------------------------------------
resource "aws_lakeformation_permissions" "table" {
for_each = { for g in local.table_grants : g.principal => g }
principal = each.value.principal
permissions = each.value.perms.permissions
permissions_with_grant_option = each.value.perms.grant_options
catalog_id = local.catalog_id
table {
database_name = var.database_name
name = each.value.perms.table_name
catalog_id = local.catalog_id
}
depends_on = [aws_lakeformation_resource.this]
}
# ---------------------------------------------------------------------------
# Column-level grants: SELECT on an explicit allow-list of columns, used to
# hide PII from analysts while still exposing the rest of the table.
# ---------------------------------------------------------------------------
resource "aws_lakeformation_permissions" "columns" {
for_each = local.column_grants
principal = each.value.principal
permissions = each.value.permissions
catalog_id = local.catalog_id
table_with_columns {
database_name = var.database_name
name = each.value.table
catalog_id = local.catalog_id
column_names = length(each.value.column_names) > 0 ? each.value.column_names : null
excluded_column_names = length(each.value.excluded_column_names) > 0 ? each.value.excluded_column_names : null
}
depends_on = [aws_lakeformation_resource.this]
}
variables.tf
variable "s3_resource_arn" {
description = "ARN of the S3 bucket or prefix to register with Lake Formation (e.g. arn:aws:s3:::my-lake/curated)."
type = string
validation {
condition = can(regex("^arn:aws[a-z-]*:s3:::", var.s3_resource_arn))
error_message = "s3_resource_arn must be a valid S3 ARN starting with arn:aws:s3:::."
}
}
variable "registration_role_arn" {
description = "IAM role ARN Lake Formation assumes to vend credentials for the location. Leave null to use the LF service-linked role."
type = string
default = null
validation {
condition = var.registration_role_arn == null || can(regex("^arn:aws[a-z-]*:iam::[0-9]{12}:role/", var.registration_role_arn))
error_message = "registration_role_arn must be null or a valid IAM role ARN."
}
}
variable "hybrid_access_enabled" {
description = "Keep existing IAM/S3 permissions effective alongside Lake Formation grants. Set false to enforce LF-only access."
type = bool
default = true
}
variable "catalog_id" {
description = "Glue Data Catalog account ID owning the database. Defaults to the caller's account."
type = string
default = null
validation {
condition = var.catalog_id == null || can(regex("^[0-9]{12}$", var.catalog_id))
error_message = "catalog_id must be null or a 12-digit AWS account ID."
}
}
variable "database_name" {
description = "Name of the Glue/Lake Formation database these grants apply to."
type = string
}
variable "database_permissions" {
description = "Map of principal ARN => database-level permission set. Valid permissions: ALTER, CREATE_TABLE, DESCRIBE, DROP."
type = map(object({
permissions = list(string)
grant_options = optional(list(string), [])
}))
default = {}
validation {
condition = alltrue([
for p in values(var.database_permissions) : alltrue([
for perm in p.permissions :
contains(["ALTER", "CREATE_TABLE", "DESCRIBE", "DROP"], perm)
])
])
error_message = "database_permissions may only use ALTER, CREATE_TABLE, DESCRIBE, or DROP."
}
}
variable "table_permissions" {
description = "Map of principal ARN => whole-table permission set. Valid permissions: SELECT, INSERT, DELETE, ALTER, DROP, DESCRIBE."
type = map(object({
table_name = string
permissions = list(string)
grant_options = optional(list(string), [])
}))
default = {}
validation {
condition = alltrue([
for p in values(var.table_permissions) : alltrue([
for perm in p.permissions :
contains(["SELECT", "INSERT", "DELETE", "ALTER", "DROP", "DESCRIBE"], perm)
])
])
error_message = "table_permissions may only use SELECT, INSERT, DELETE, ALTER, DROP, or DESCRIBE."
}
}
variable "column_permissions" {
description = "Column-scoped SELECT grants. Provide either column_names (allow-list) or excluded_column_names (deny-list) per entry, not both."
type = list(object({
principal = string
table = string
column_names = optional(list(string), [])
excluded_column_names = optional(list(string), [])
}))
default = []
validation {
condition = alltrue([
for g in var.column_permissions :
(length(g.column_names) > 0) != (length(g.excluded_column_names) > 0)
])
error_message = "Each column_permissions entry must set exactly one of column_names or excluded_column_names."
}
}
outputs.tf
output "resource_id" {
description = "ID of the registered Lake Formation resource (the S3 ARN)."
value = aws_lakeformation_resource.this.id
}
output "registered_arn" {
description = "S3 ARN registered with Lake Formation."
value = aws_lakeformation_resource.this.arn
}
output "registration_role_arn" {
description = "IAM role Lake Formation uses to vend credentials for the location (service-linked role when unset)."
value = aws_lakeformation_resource.this.role_arn
}
output "catalog_id" {
description = "Catalog account ID the grants were applied against."
value = local.catalog_id
}
output "database_grant_principals" {
description = "Principals granted database-level permissions."
value = keys(aws_lakeformation_permissions.database)
}
output "table_grant_principals" {
description = "Principals granted whole-table permissions."
value = keys(aws_lakeformation_permissions.table)
}
output "column_grant_keys" {
description = "principal:table keys for column-scoped SELECT grants."
value = keys(aws_lakeformation_permissions.columns)
}
How to use it
module "lake_formation" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-lake-formation?ref=v1.0.0"
s3_resource_arn = "arn:aws:s3:::kloudvin-datalake/curated/sales"
registration_role_arn = aws_iam_role.lf_registration.arn
hybrid_access_enabled = false
database_name = "sales_curated"
# Discovery rights for everyone who touches the database.
database_permissions = {
(aws_iam_role.analyst.arn) = {
permissions = ["DESCRIBE"]
}
(aws_iam_role.etl.arn) = {
permissions = ["DESCRIBE", "CREATE_TABLE", "ALTER"]
}
}
# The ETL role owns the table end to end.
table_permissions = {
(aws_iam_role.etl.arn) = {
table_name = "orders"
permissions = ["SELECT", "INSERT", "ALTER", "DELETE"]
}
}
# Analysts read orders, but never see PII columns.
column_permissions = [
{
principal = aws_iam_role.analyst.arn
table = "orders"
excluded_column_names = ["customer_email", "customer_phone"]
}
]
}
# Downstream: point an Athena workgroup's query results at the governed lake
# and reference the registered ARN so the dependency is explicit.
resource "aws_athena_workgroup" "sales" {
name = "sales-analytics"
configuration {
result_configuration {
output_location = "s3://kloudvin-athena-results/sales/"
}
}
tags = {
GovernedResource = module.lake_formation.registered_arn
Catalog = module.lake_formation.catalog_id
}
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "s3"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...s3 state bucket/container + key per path...
}
}
2. Module config — live/prod/lake_formation/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-lake-formation?ref=v1.0.0"
}
inputs = {
s3_resource_arn = "..."
database_name = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/lake_formation && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
s3_resource_arn |
string |
— | Yes | ARN of the S3 bucket/prefix to register with Lake Formation. |
registration_role_arn |
string |
null |
No | IAM role LF assumes to vend credentials; null uses the service-linked role. |
hybrid_access_enabled |
bool |
true |
No | Keep IAM/S3 access effective alongside LF grants; false enforces LF-only. |
catalog_id |
string |
null |
No | Glue Data Catalog account ID; defaults to the caller’s account. |
database_name |
string |
— | Yes | Glue/Lake Formation database the grants target. |
database_permissions |
map(object) |
{} |
No | Principal ARN => database permission set (ALTER, CREATE_TABLE, DESCRIBE, DROP). |
table_permissions |
map(object) |
{} |
No | Principal ARN => whole-table permission set incl. table_name. |
column_permissions |
list(object) |
[] |
No | Column-scoped SELECT grants via allow-list or deny-list of columns. |
Outputs
| Name | Description |
|---|---|
resource_id |
ID of the registered Lake Formation resource (the S3 ARN). |
registered_arn |
S3 ARN registered with Lake Formation. |
registration_role_arn |
IAM role LF uses to vend credentials (service-linked role when unset). |
catalog_id |
Catalog account ID the grants were applied against. |
database_grant_principals |
Principals granted database-level permissions. |
table_grant_principals |
Principals granted whole-table permissions. |
column_grant_keys |
principal:table keys for column-scoped SELECT grants. |
Enterprise scenario
A retail analytics platform stores curated sales data in s3://kloudvin-datalake/curated/sales, catalogued in Glue and queried by 40+ analysts through Athena. Finance analysts must aggregate revenue but are barred from seeing customer_email and customer_phone under the company’s PII policy. The data platform team instantiates this module once per data domain: the ETL role gets full table permissions to land nightly batches, while the analyst role gets a column-level SELECT grant that excludes the two PII fields — so a stray SELECT * in Athena simply returns no PII column rather than leaking it, and every grant is recorded centrally for the quarterly access review.
Best practices
- Disable hybrid mode once migrated. Keep
hybrid_access_enabled = trueonly while you transition off bucket policies; set it tofalseso Lake Formation is the single source of truth and stale IAM grants can’t bypass column controls. - Prefer column deny-lists for PII. Use
excluded_column_namesfor sensitive fields so newly added non-PII columns are automatically visible to analysts, instead of an allow-list you must update on every schema change. - Grant the narrowest verb set. Database principals rarely need more than
DESCRIBE; reserveALTER/CREATE_TABLEfor pipeline roles, and never handpermissions_with_grant_optionto human users unless they genuinely re-delegate access. - Register prefixes, not whole buckets. Point
s3_resource_arnat the curated/consumption prefix rather than the bucket root, so raw or quarantine zones under the same bucket stay outside this grant’s blast radius. - Use a dedicated registration role. Supply
registration_role_arnwith a least-privilege role scoped to the registered prefix and its KMS key, rather than the broad service-linked role, to bound which objects Lake Formation can vend credentials for. - Name databases by zone and domain. A
sales_curated/sales_consumptionconvention keeps grants legible in audits and prevents accidental cross-zone permissions when the module is reused across the lake-house.