Quick take — Provision an AWS Kendra index with Terraform: enterprise-edition capacity, KMS server-side encryption, a least-privilege IAM service role, and CloudWatch metrics — wrapped in a reusable, var-driven module. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "aws" {
region = "us-east-1"
}
module "kendra" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-kendra?ref=v1.0.0"
index_name = "..." # Name of the Kendra index; also derives the IAM role nam…
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
Amazon Kendra is a managed, ML-powered enterprise search service. Instead of keyword matching, it ranks results by semantic relevance, understands natural-language questions, and surfaces FAQ-style answers and document excerpts across connected repositories (S3, SharePoint, Confluence, RDS, ServiceNow, and more). The atom you provision first is the index — aws_kendra_index — a long-lived, capacity-billed resource that holds your ingested documents and serves the Query API. Data sources, FAQs, and experiences all attach to an index, so getting the index right (edition, encryption, IAM role, capacity units) is the foundation everything else stands on.
Kendra is unusual among AWS services because the index is not cheap to leave running by accident and it is slow to create (the ENTERPRISE_EDITION index can take 30+ minutes to provision). That makes click-ops a poor fit: you want the edition, KMS key, role, and capacity codified, reviewed in a pull request, and reproducible across dev/stage/prod. This module wraps aws_kendra_index plus the two things you almost always need alongside it — a scoped IAM service role (so Kendra can write CloudWatch logs/metrics and, optionally, read your S3 bucket and use your KMS key) and server-side encryption via a customer-managed CMK — behind a small, validated variable surface.
When to use it
- You are building enterprise/internal search, a help-center “ask a question” box, or a RAG retrieval layer and want managed semantic ranking instead of running OpenSearch + an embedding pipeline yourself.
- You need encryption at rest with your own KMS CMK for compliance (the index stores document content and metadata).
- You want index creation, edition sizing, and the service-role trust/permissions to live in version control and ship through CI, not the console.
- You are deploying the same search stack to multiple environments or accounts and need consistent, parameterised indexes.
- Skip it (or use
DEVELOPER_EDITION) if you only need a short-lived proof of concept — and remember to destroy it, because an idle enterprise index still bills per capacity unit per hour.
Module structure
terraform-module-aws-kendra/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
main.tf
data "aws_caller_identity" "current" {}
data "aws_partition" "current" {}
data "aws_region" "current" {}
locals {
account_id = data.aws_caller_identity.current.account_id
partition = data.aws_partition.current.partition
region = data.aws_region.current.name
# Kendra requires a service role that it can assume.
role_name = coalesce(var.role_name, "${var.index_name}-kendra-role")
# CloudWatch log group ARN scope Kendra needs for metrics/logging.
log_group_arn_prefix = "arn:${local.partition}:logs:${local.region}:${local.account_id}:log-group:/aws/kendra/*"
tags = merge(
{
"Name" = var.index_name
"ManagedBy" = "terraform"
},
var.tags,
)
}
# ---------------------------------------------------------------------------
# IAM service role assumed by Kendra
# ---------------------------------------------------------------------------
data "aws_iam_policy_document" "assume" {
statement {
effect = "Allow"
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["kendra.amazonaws.com"]
}
# Confused-deputy protection: only this account's Kendra can assume the role.
condition {
test = "StringEquals"
variable = "aws:SourceAccount"
values = [local.account_id]
}
condition {
test = "ArnLike"
variable = "aws:SourceArn"
values = ["arn:${local.partition}:kendra:${local.region}:${local.account_id}:index/*"]
}
}
}
resource "aws_iam_role" "this" {
name = local.role_name
assume_role_policy = data.aws_iam_policy_document.assume.json
permissions_boundary = var.permissions_boundary_arn
tags = local.tags
}
# Permissions the index itself requires: publish CloudWatch metrics + write logs.
data "aws_iam_policy_document" "service" {
statement {
sid = "CloudWatchMetrics"
effect = "Allow"
actions = ["cloudwatch:PutMetricData"]
resources = ["*"]
condition {
test = "StringEquals"
variable = "cloudwatch:namespace"
values = ["AWS/Kendra"]
}
}
statement {
sid = "DescribeLogGroups"
effect = "Allow"
actions = ["logs:DescribeLogGroups"]
resources = ["*"]
}
statement {
sid = "CreateAndDescribeLogGroup"
effect = "Allow"
actions = ["logs:CreateLogGroup"]
resources = [local.log_group_arn_prefix]
}
statement {
sid = "LogStreams"
effect = "Allow"
actions = [
"logs:DescribeLogStreams",
"logs:CreateLogStream",
"logs:PutLogEvents",
]
resources = ["${local.log_group_arn_prefix}:log-stream:*"]
}
# Allow Kendra to use the customer-managed CMK for the index, when supplied.
dynamic "statement" {
for_each = var.kms_key_id != null ? [1] : []
content {
sid = "UseKmsKey"
effect = "Allow"
actions = [
"kms:Decrypt",
"kms:GenerateDataKey",
"kms:DescribeKey",
]
resources = [var.kms_key_arn]
}
}
# Optional: let the index read documents from named S3 buckets (for S3 data sources).
dynamic "statement" {
for_each = length(var.source_s3_bucket_arns) > 0 ? [1] : []
content {
sid = "ReadSourceBuckets"
effect = "Allow"
actions = ["s3:GetObject"]
resources = [for arn in var.source_s3_bucket_arns : "${arn}/*"]
}
}
dynamic "statement" {
for_each = length(var.source_s3_bucket_arns) > 0 ? [1] : []
content {
sid = "ListSourceBuckets"
effect = "Allow"
actions = ["s3:ListBucket"]
resources = var.source_s3_bucket_arns
}
}
}
resource "aws_iam_role_policy" "service" {
name = "${local.role_name}-policy"
role = aws_iam_role.this.id
policy = data.aws_iam_policy_document.service.json
}
# ---------------------------------------------------------------------------
# Kendra index
# ---------------------------------------------------------------------------
resource "aws_kendra_index" "this" {
name = var.index_name
description = var.description
edition = var.edition
role_arn = aws_iam_role.this.arn
dynamic "server_side_encryption_configuration" {
for_each = var.kms_key_id != null ? [1] : []
content {
kms_key_id = var.kms_key_id
}
}
# Capacity units only apply to ENTERPRISE_EDITION; guarded by validation below.
dynamic "capacity_units" {
for_each = var.edition == "ENTERPRISE_EDITION" ? [1] : []
content {
query_capacity = var.query_capacity_units
storage_capacity = var.storage_capacity_units
}
}
# Field mappings for user/group access control (document-level security).
dynamic "user_token_configurations" {
for_each = var.user_group_resolution_enabled ? [1] : []
content {
json_token_type_configuration {
group_attribute_field = var.group_attribute_field
user_name_attribute_field = var.user_name_attribute_field
}
}
}
user_context_policy = var.user_context_policy
tags = local.tags
# The enterprise index is slow to create/update; give it room.
timeouts {
create = "60m"
update = "60m"
delete = "60m"
}
depends_on = [aws_iam_role_policy.service]
}
variables.tf
variable "index_name" {
description = "Name of the Kendra index. Also used to derive the IAM role name and tags."
type = string
validation {
condition = can(regex("^[a-zA-Z0-9][a-zA-Z0-9_-]{0,999}$", var.index_name))
error_message = "index_name must start alphanumeric and contain only letters, numbers, hyphens, or underscores."
}
}
variable "description" {
description = "Human-readable description of the index."
type = string
default = "Managed by Terraform"
}
variable "edition" {
description = "Kendra edition: DEVELOPER_EDITION (PoC) or ENTERPRISE_EDITION (production HA)."
type = string
default = "ENTERPRISE_EDITION"
validation {
condition = contains(["DEVELOPER_EDITION", "ENTERPRISE_EDITION"], var.edition)
error_message = "edition must be DEVELOPER_EDITION or ENTERPRISE_EDITION."
}
}
variable "query_capacity_units" {
description = "Additional query capacity units (each adds ~0.1 queries/sec). Enterprise edition only."
type = number
default = 0
validation {
condition = var.query_capacity_units >= 0 && floor(var.query_capacity_units) == var.query_capacity_units
error_message = "query_capacity_units must be a non-negative integer."
}
}
variable "storage_capacity_units" {
description = "Additional storage capacity units (each adds ~100k documents / 30 GB). Enterprise edition only."
type = number
default = 0
validation {
condition = var.storage_capacity_units >= 0 && floor(var.storage_capacity_units) == var.storage_capacity_units
error_message = "storage_capacity_units must be a non-negative integer."
}
}
variable "kms_key_id" {
description = "Customer-managed KMS key ID/ARN for server-side encryption of the index. Null uses an AWS-owned key."
type = string
default = null
}
variable "kms_key_arn" {
description = "Full ARN of the KMS key, used to scope the index role's kms:* permissions. Required when kms_key_id is set."
type = string
default = null
validation {
condition = var.kms_key_arn == null || can(regex("^arn:aws[a-z-]*:kms:", var.kms_key_arn))
error_message = "kms_key_arn must be a valid KMS key ARN."
}
}
variable "source_s3_bucket_arns" {
description = "S3 bucket ARNs the index service role may read for S3 data sources (read access wired in IAM)."
type = list(string)
default = []
}
variable "user_context_policy" {
description = "Document-level access control mode: ATTRIBUTE_FILTER, USER_TOKEN, or NULL_VALUE (no filtering)."
type = string
default = "ATTRIBUTE_FILTER"
validation {
condition = contains(["ATTRIBUTE_FILTER", "USER_TOKEN", "NULL_VALUE"], var.user_context_policy)
error_message = "user_context_policy must be ATTRIBUTE_FILTER, USER_TOKEN, or NULL_VALUE."
}
}
variable "user_group_resolution_enabled" {
description = "Enable JSON user-token group resolution (maps identity attributes to ACL filtering)."
type = bool
default = false
}
variable "group_attribute_field" {
description = "Token field carrying group membership, used when user_group_resolution_enabled is true."
type = string
default = "groups"
}
variable "user_name_attribute_field" {
description = "Token field carrying the user name, used when user_group_resolution_enabled is true."
type = string
default = "username"
}
variable "role_name" {
description = "Override for the IAM service-role name. Defaults to <index_name>-kendra-role."
type = string
default = null
}
variable "permissions_boundary_arn" {
description = "Optional IAM permissions boundary ARN applied to the Kendra service role."
type = string
default = null
}
variable "tags" {
description = "Additional tags merged onto all created resources."
type = map(string)
default = {}
}
outputs.tf
output "index_id" {
description = "The identifier of the Kendra index (used by data sources, FAQs, and the Query API)."
value = aws_kendra_index.this.id
}
output "index_arn" {
description = "The ARN of the Kendra index."
value = aws_kendra_index.this.arn
}
output "index_name" {
description = "The name of the Kendra index."
value = aws_kendra_index.this.name
}
output "index_edition" {
description = "The edition of the provisioned index."
value = aws_kendra_index.this.edition
}
output "role_arn" {
description = "ARN of the IAM service role Kendra assumes (reuse it when attaching data sources)."
value = aws_iam_role.this.arn
}
output "role_name" {
description = "Name of the IAM service role created for the index."
value = aws_iam_role.this.name
}
How to use it
module "kendra" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-kendra?ref=v1.0.0"
index_name = "kv-knowledge-base"
description = "Semantic search over internal docs and policies"
edition = "ENTERPRISE_EDITION"
# Customer-managed encryption at rest.
kms_key_id = aws_kms_key.kendra.key_id
kms_key_arn = aws_kms_key.kendra.arn
# Let the index read the docs bucket for an S3 data source.
source_s3_bucket_arns = [aws_s3_bucket.docs.arn]
# Modest extra capacity beyond the base enterprise unit.
query_capacity_units = 1
storage_capacity_units = 1
# Document-level security driven by identity tokens.
user_context_policy = "USER_TOKEN"
user_group_resolution_enabled = true
tags = {
Environment = "prod"
Team = "platform"
CostCenter = "search"
}
}
# Downstream: attach an S3 data source to the index using its outputs.
resource "aws_kendra_data_source" "docs" {
index_id = module.kendra.index_id
name = "internal-docs"
type = "S3"
role_arn = module.kendra.role_arn
configuration {
s3_configuration {
bucket_name = aws_s3_bucket.docs.id
}
}
schedule = "cron(0 2 * * ? *)" # nightly re-crawl at 02:00 UTC
}
# Downstream: an SSM parameter so apps can discover the index ID.
resource "aws_ssm_parameter" "kendra_index_id" {
name = "/search/kendra/index-id"
type = "String"
value = module.kendra.index_id
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "s3"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...s3 state bucket/container + key per path...
}
}
2. Module config — live/prod/kendra/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-kendra?ref=v1.0.0"
}
inputs = {
index_name = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/kendra && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
| index_name | string | — | yes | Name of the Kendra index; also derives the IAM role name and tags. |
| description | string | “Managed by Terraform” | no | Human-readable description of the index. |
| edition | string | “ENTERPRISE_EDITION” | no | DEVELOPER_EDITION (PoC) or ENTERPRISE_EDITION (production HA). |
| query_capacity_units | number | 0 | no | Extra query capacity units (enterprise only); each adds ~0.1 QPS. |
| storage_capacity_units | number | 0 | no | Extra storage capacity units (enterprise only); each adds ~100k docs / 30 GB. |
| kms_key_id | string | null | no | Customer-managed KMS key ID/ARN for server-side encryption; null uses an AWS-owned key. |
| kms_key_arn | string | null | no | Full KMS key ARN used to scope the role’s kms:* permissions; required when kms_key_id is set. |
| source_s3_bucket_arns | list(string) | [] | no | S3 bucket ARNs the index role may read for S3 data sources. |
| user_context_policy | string | “ATTRIBUTE_FILTER” | no | Document-level access mode: ATTRIBUTE_FILTER, USER_TOKEN, or NULL_VALUE. |
| user_group_resolution_enabled | bool | false | no | Enable JSON user-token group resolution for ACL filtering. |
| group_attribute_field | string | “groups” | no | Token field carrying group membership (used when resolution is enabled). |
| user_name_attribute_field | string | “username” | no | Token field carrying the user name (used when resolution is enabled). |
| role_name | string | null | no | Override the IAM service-role name; defaults to <index_name>-kendra-role. |
| permissions_boundary_arn | string | null | no | Optional IAM permissions boundary ARN for the service role. |
| tags | map(string) | {} | no | Additional tags merged onto all created resources. |
Outputs
| Name | Description |
|---|---|
| index_id | Identifier of the Kendra index (used by data sources, FAQs, and the Query API). |
| index_arn | ARN of the Kendra index. |
| index_name | Name of the Kendra index. |
| index_edition | Edition of the provisioned index. |
| role_arn | ARN of the IAM service role Kendra assumes (reuse when attaching data sources). |
| role_name | Name of the IAM service role created for the index. |
Enterprise scenario
A financial-services firm replaces its sprawling internal wiki search with Kendra-backed semantic search across SharePoint policies, an S3 archive of PDF procedures, and a ServiceNow knowledge base. This module provisions the ENTERPRISE_EDITION index in each region with a customer-managed CMK (so encryption keys stay under the InfoSec team’s control) and USER_TOKEN document-level security, ensuring an analyst only sees passages from documents their AD groups are entitled to. The index ID flows out to SSM Parameter Store, where the firm’s internal “Ask Compliance” chatbot reads it at runtime to call the Query API — no console clicks, every change reviewed in a pull request, and capacity sized via the query_capacity_units variable as adoption grows.
Best practices
- Always set a customer-managed CMK (
kms_key_id+kms_key_arn) for production indexes — Kendra stores full document content, and the module scopes the role’skms:Decrypt/GenerateDataKeyonly to that one key rather than*. - Mind the cost floor: an enterprise index bills per hour the moment it exists, before any capacity add-ons. Use
DEVELOPER_EDITIONfor PoCs, destroy idle stacks, and treat eachquery_capacity_units/storage_capacity_unitsbump as a deliberate, reviewed change. - Enforce document-level security with
user_context_policy = "USER_TOKEN"(orATTRIBUTE_FILTER) so search respects source-system ACLs; never shipNULL_VALUEfor indexes containing sensitive data. - Keep the trust policy tight: the bundled assume-role policy pins
aws:SourceAccountandaws:SourceArnto defeat the confused-deputy problem — keep those conditions when extending the role. - Reuse the module’s
role_arnfor data sources instead of minting a new role per connector, but grant additional connector-specific permissions (e.g., SharePoint secrets) as separate scoped policies rather than widening this one. - Name and tag consistently: the index name seeds the role name and the
Name/ManagedBytags, so a clear convention like<app>-<env>-kbkeeps indexes, roles, and CloudWatchAWS/Kendrametrics easy to correlate and bill back.