Terraform Module: AWS Kendra — a governed enterprise search index in one block

Quick take — Provision an AWS Kendra index with Terraform: enterprise-edition capacity, KMS server-side encryption, a least-privilege IAM service role, and CloudWatch metrics — wrapped in a reusable, var-driven module. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "aws" {
  region = "us-east-1"
}

module "kendra" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-kendra?ref=v1.0.0"

  index_name = "..."  # Name of the Kendra index; also derives the IAM role nam…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Amazon Kendra is a managed, ML-powered enterprise search service. Instead of keyword matching, it ranks results by semantic relevance, understands natural-language questions, and surfaces FAQ-style answers and document excerpts across connected repositories (S3, SharePoint, Confluence, RDS, ServiceNow, and more). The atom you provision first is the index — aws_kendra_index — a long-lived, capacity-billed resource that holds your ingested documents and serves the Query API. Data sources, FAQs, and experiences all attach to an index, so getting the index right (edition, encryption, IAM role, capacity units) is the foundation everything else stands on.

Kendra is unusual among AWS services because the index is not cheap to leave running by accident and it is slow to create (the ENTERPRISE_EDITION index can take 30+ minutes to provision). That makes click-ops a poor fit: you want the edition, KMS key, role, and capacity codified, reviewed in a pull request, and reproducible across dev/stage/prod. This module wraps aws_kendra_index plus the two things you almost always need alongside it — a scoped IAM service role (so Kendra can write CloudWatch logs/metrics and, optionally, read your S3 bucket and use your KMS key) and server-side encryption via a customer-managed CMK — behind a small, validated variable surface.

When to use it

You are building enterprise/internal search, a help-center “ask a question” box, or a RAG retrieval layer and want managed semantic ranking instead of running OpenSearch + an embedding pipeline yourself.
You need encryption at rest with your own KMS CMK for compliance (the index stores document content and metadata).
You want index creation, edition sizing, and the service-role trust/permissions to live in version control and ship through CI, not the console.
You are deploying the same search stack to multiple environments or accounts and need consistent, parameterised indexes.
Skip it (or use DEVELOPER_EDITION) if you only need a short-lived proof of concept — and remember to destroy it, because an idle enterprise index still bills per capacity unit per hour.

Module structure

terraform-module-aws-kendra/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

main.tf

data "aws_caller_identity" "current" {}
data "aws_partition" "current" {}
data "aws_region" "current" {}

locals {
  account_id = data.aws_caller_identity.current.account_id
  partition  = data.aws_partition.current.partition
  region     = data.aws_region.current.name

  # Kendra requires a service role that it can assume.
  role_name = coalesce(var.role_name, "${var.index_name}-kendra-role")

  # CloudWatch log group ARN scope Kendra needs for metrics/logging.
  log_group_arn_prefix = "arn:${local.partition}:logs:${local.region}:${local.account_id}:log-group:/aws/kendra/*"

  tags = merge(
    {
      "Name"      = var.index_name
      "ManagedBy" = "terraform"
    },
    var.tags,
  )
}

# ---------------------------------------------------------------------------
# IAM service role assumed by Kendra
# ---------------------------------------------------------------------------
data "aws_iam_policy_document" "assume" {
  statement {
    effect  = "Allow"
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["kendra.amazonaws.com"]
    }

    # Confused-deputy protection: only this account's Kendra can assume the role.
    condition {
      test     = "StringEquals"
      variable = "aws:SourceAccount"
      values   = [local.account_id]
    }

    condition {
      test     = "ArnLike"
      variable = "aws:SourceArn"
      values   = ["arn:${local.partition}:kendra:${local.region}:${local.account_id}:index/*"]
    }
  }
}

resource "aws_iam_role" "this" {
  name                 = local.role_name
  assume_role_policy   = data.aws_iam_policy_document.assume.json
  permissions_boundary = var.permissions_boundary_arn
  tags                 = local.tags
}

# Permissions the index itself requires: publish CloudWatch metrics + write logs.
data "aws_iam_policy_document" "service" {
  statement {
    sid       = "CloudWatchMetrics"
    effect    = "Allow"
    actions   = ["cloudwatch:PutMetricData"]
    resources = ["*"]

    condition {
      test     = "StringEquals"
      variable = "cloudwatch:namespace"
      values   = ["AWS/Kendra"]
    }
  }

  statement {
    sid       = "DescribeLogGroups"
    effect    = "Allow"
    actions   = ["logs:DescribeLogGroups"]
    resources = ["*"]
  }

  statement {
    sid       = "CreateAndDescribeLogGroup"
    effect    = "Allow"
    actions   = ["logs:CreateLogGroup"]
    resources = [local.log_group_arn_prefix]
  }

  statement {
    sid    = "LogStreams"
    effect = "Allow"
    actions = [
      "logs:DescribeLogStreams",
      "logs:CreateLogStream",
      "logs:PutLogEvents",
    ]
    resources = ["${local.log_group_arn_prefix}:log-stream:*"]
  }

  # Allow Kendra to use the customer-managed CMK for the index, when supplied.
  dynamic "statement" {
    for_each = var.kms_key_id != null ? [1] : []
    content {
      sid    = "UseKmsKey"
      effect = "Allow"
      actions = [
        "kms:Decrypt",
        "kms:GenerateDataKey",
        "kms:DescribeKey",
      ]
      resources = [var.kms_key_arn]
    }
  }

  # Optional: let the index read documents from named S3 buckets (for S3 data sources).
  dynamic "statement" {
    for_each = length(var.source_s3_bucket_arns) > 0 ? [1] : []
    content {
      sid       = "ReadSourceBuckets"
      effect    = "Allow"
      actions   = ["s3:GetObject"]
      resources = [for arn in var.source_s3_bucket_arns : "${arn}/*"]
    }
  }

  dynamic "statement" {
    for_each = length(var.source_s3_bucket_arns) > 0 ? [1] : []
    content {
      sid       = "ListSourceBuckets"
      effect    = "Allow"
      actions   = ["s3:ListBucket"]
      resources = var.source_s3_bucket_arns
    }
  }
}

resource "aws_iam_role_policy" "service" {
  name   = "${local.role_name}-policy"
  role   = aws_iam_role.this.id
  policy = data.aws_iam_policy_document.service.json
}

# ---------------------------------------------------------------------------
# Kendra index
# ---------------------------------------------------------------------------
resource "aws_kendra_index" "this" {
  name        = var.index_name
  description = var.description
  edition     = var.edition
  role_arn    = aws_iam_role.this.arn

  dynamic "server_side_encryption_configuration" {
    for_each = var.kms_key_id != null ? [1] : []
    content {
      kms_key_id = var.kms_key_id
    }
  }

  # Capacity units only apply to ENTERPRISE_EDITION; guarded by validation below.
  dynamic "capacity_units" {
    for_each = var.edition == "ENTERPRISE_EDITION" ? [1] : []
    content {
      query_capacity   = var.query_capacity_units
      storage_capacity = var.storage_capacity_units
    }
  }

  # Field mappings for user/group access control (document-level security).
  dynamic "user_token_configurations" {
    for_each = var.user_group_resolution_enabled ? [1] : []
    content {
      json_token_type_configuration {
        group_attribute_field    = var.group_attribute_field
        user_name_attribute_field = var.user_name_attribute_field
      }
    }
  }

  user_context_policy = var.user_context_policy

  tags = local.tags

  # The enterprise index is slow to create/update; give it room.
  timeouts {
    create = "60m"
    update = "60m"
    delete = "60m"
  }

  depends_on = [aws_iam_role_policy.service]
}

variables.tf

variable "index_name" {
  description = "Name of the Kendra index. Also used to derive the IAM role name and tags."
  type        = string

  validation {
    condition     = can(regex("^[a-zA-Z0-9][a-zA-Z0-9_-]{0,999}$", var.index_name))
    error_message = "index_name must start alphanumeric and contain only letters, numbers, hyphens, or underscores."
  }
}

variable "description" {
  description = "Human-readable description of the index."
  type        = string
  default     = "Managed by Terraform"
}

variable "edition" {
  description = "Kendra edition: DEVELOPER_EDITION (PoC) or ENTERPRISE_EDITION (production HA)."
  type        = string
  default     = "ENTERPRISE_EDITION"

  validation {
    condition     = contains(["DEVELOPER_EDITION", "ENTERPRISE_EDITION"], var.edition)
    error_message = "edition must be DEVELOPER_EDITION or ENTERPRISE_EDITION."
  }
}

variable "query_capacity_units" {
  description = "Additional query capacity units (each adds ~0.1 queries/sec). Enterprise edition only."
  type        = number
  default     = 0

  validation {
    condition     = var.query_capacity_units >= 0 && floor(var.query_capacity_units) == var.query_capacity_units
    error_message = "query_capacity_units must be a non-negative integer."
  }
}

variable "storage_capacity_units" {
  description = "Additional storage capacity units (each adds ~100k documents / 30 GB). Enterprise edition only."
  type        = number
  default     = 0

  validation {
    condition     = var.storage_capacity_units >= 0 && floor(var.storage_capacity_units) == var.storage_capacity_units
    error_message = "storage_capacity_units must be a non-negative integer."
  }
}

variable "kms_key_id" {
  description = "Customer-managed KMS key ID/ARN for server-side encryption of the index. Null uses an AWS-owned key."
  type        = string
  default     = null
}

variable "kms_key_arn" {
  description = "Full ARN of the KMS key, used to scope the index role's kms:* permissions. Required when kms_key_id is set."
  type        = string
  default     = null

  validation {
    condition     = var.kms_key_arn == null || can(regex("^arn:aws[a-z-]*:kms:", var.kms_key_arn))
    error_message = "kms_key_arn must be a valid KMS key ARN."
  }
}

variable "source_s3_bucket_arns" {
  description = "S3 bucket ARNs the index service role may read for S3 data sources (read access wired in IAM)."
  type        = list(string)
  default     = []
}

variable "user_context_policy" {
  description = "Document-level access control mode: ATTRIBUTE_FILTER, USER_TOKEN, or NULL_VALUE (no filtering)."
  type        = string
  default     = "ATTRIBUTE_FILTER"

  validation {
    condition     = contains(["ATTRIBUTE_FILTER", "USER_TOKEN", "NULL_VALUE"], var.user_context_policy)
    error_message = "user_context_policy must be ATTRIBUTE_FILTER, USER_TOKEN, or NULL_VALUE."
  }
}

variable "user_group_resolution_enabled" {
  description = "Enable JSON user-token group resolution (maps identity attributes to ACL filtering)."
  type        = bool
  default     = false
}

variable "group_attribute_field" {
  description = "Token field carrying group membership, used when user_group_resolution_enabled is true."
  type        = string
  default     = "groups"
}

variable "user_name_attribute_field" {
  description = "Token field carrying the user name, used when user_group_resolution_enabled is true."
  type        = string
  default     = "username"
}

variable "role_name" {
  description = "Override for the IAM service-role name. Defaults to <index_name>-kendra-role."
  type        = string
  default     = null
}

variable "permissions_boundary_arn" {
  description = "Optional IAM permissions boundary ARN applied to the Kendra service role."
  type        = string
  default     = null
}

variable "tags" {
  description = "Additional tags merged onto all created resources."
  type        = map(string)
  default     = {}
}

outputs.tf

output "index_id" {
  description = "The identifier of the Kendra index (used by data sources, FAQs, and the Query API)."
  value       = aws_kendra_index.this.id
}

output "index_arn" {
  description = "The ARN of the Kendra index."
  value       = aws_kendra_index.this.arn
}

output "index_name" {
  description = "The name of the Kendra index."
  value       = aws_kendra_index.this.name
}

output "index_edition" {
  description = "The edition of the provisioned index."
  value       = aws_kendra_index.this.edition
}

output "role_arn" {
  description = "ARN of the IAM service role Kendra assumes (reuse it when attaching data sources)."
  value       = aws_iam_role.this.arn
}

output "role_name" {
  description = "Name of the IAM service role created for the index."
  value       = aws_iam_role.this.name
}

How to use it

module "kendra" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-kendra?ref=v1.0.0"

  index_name  = "kv-knowledge-base"
  description = "Semantic search over internal docs and policies"
  edition     = "ENTERPRISE_EDITION"

  # Customer-managed encryption at rest.
  kms_key_id  = aws_kms_key.kendra.key_id
  kms_key_arn = aws_kms_key.kendra.arn

  # Let the index read the docs bucket for an S3 data source.
  source_s3_bucket_arns = [aws_s3_bucket.docs.arn]

  # Modest extra capacity beyond the base enterprise unit.
  query_capacity_units   = 1
  storage_capacity_units = 1

  # Document-level security driven by identity tokens.
  user_context_policy           = "USER_TOKEN"
  user_group_resolution_enabled = true

  tags = {
    Environment = "prod"
    Team        = "platform"
    CostCenter  = "search"
  }
}

# Downstream: attach an S3 data source to the index using its outputs.
resource "aws_kendra_data_source" "docs" {
  index_id = module.kendra.index_id
  name     = "internal-docs"
  type     = "S3"
  role_arn = module.kendra.role_arn

  configuration {
    s3_configuration {
      bucket_name = aws_s3_bucket.docs.id
    }
  }

  schedule = "cron(0 2 * * ? *)" # nightly re-crawl at 02:00 UTC
}

# Downstream: an SSM parameter so apps can discover the index ID.
resource "aws_ssm_parameter" "kendra_index_id" {
  name  = "/search/kendra/index-id"
  type  = "String"
  value = module.kendra.index_id
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "s3"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...s3 state bucket/container + key per path...
  }
}

2. Module config — live/prod/kendra/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-kendra?ref=v1.0.0"
}

inputs = {
  index_name = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/kendra && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
index_name	string	—	yes	Name of the Kendra index; also derives the IAM role name and tags.
description	string	“Managed by Terraform”	no	Human-readable description of the index.
edition	string	“ENTERPRISE_EDITION”	no	`DEVELOPER_EDITION` (PoC) or `ENTERPRISE_EDITION` (production HA).
query_capacity_units	number	0	no	Extra query capacity units (enterprise only); each adds ~0.1 QPS.
storage_capacity_units	number	0	no	Extra storage capacity units (enterprise only); each adds ~100k docs / 30 GB.
kms_key_id	string	null	no	Customer-managed KMS key ID/ARN for server-side encryption; null uses an AWS-owned key.
kms_key_arn	string	null	no	Full KMS key ARN used to scope the role’s `kms:*` permissions; required when `kms_key_id` is set.
source_s3_bucket_arns	list(string)	[]	no	S3 bucket ARNs the index role may read for S3 data sources.
user_context_policy	string	“ATTRIBUTE_FILTER”	no	Document-level access mode: `ATTRIBUTE_FILTER`, `USER_TOKEN`, or `NULL_VALUE`.
user_group_resolution_enabled	bool	false	no	Enable JSON user-token group resolution for ACL filtering.
group_attribute_field	string	“groups”	no	Token field carrying group membership (used when resolution is enabled).
user_name_attribute_field	string	“username”	no	Token field carrying the user name (used when resolution is enabled).
role_name	string	null	no	Override the IAM service-role name; defaults to `<index_name>-kendra-role`.
permissions_boundary_arn	string	null	no	Optional IAM permissions boundary ARN for the service role.
tags	map(string)	{}	no	Additional tags merged onto all created resources.

Outputs

Name	Description
index_id	Identifier of the Kendra index (used by data sources, FAQs, and the Query API).
index_arn	ARN of the Kendra index.
index_name	Name of the Kendra index.
index_edition	Edition of the provisioned index.
role_arn	ARN of the IAM service role Kendra assumes (reuse when attaching data sources).
role_name	Name of the IAM service role created for the index.

Enterprise scenario

A financial-services firm replaces its sprawling internal wiki search with Kendra-backed semantic search across SharePoint policies, an S3 archive of PDF procedures, and a ServiceNow knowledge base. This module provisions the ENTERPRISE_EDITION index in each region with a customer-managed CMK (so encryption keys stay under the InfoSec team’s control) and USER_TOKEN document-level security, ensuring an analyst only sees passages from documents their AD groups are entitled to. The index ID flows out to SSM Parameter Store, where the firm’s internal “Ask Compliance” chatbot reads it at runtime to call the Query API — no console clicks, every change reviewed in a pull request, and capacity sized via the query_capacity_units variable as adoption grows.

Best practices

Always set a customer-managed CMK (kms_key_id + kms_key_arn) for production indexes — Kendra stores full document content, and the module scopes the role’s kms:Decrypt/GenerateDataKey only to that one key rather than *.
Mind the cost floor: an enterprise index bills per hour the moment it exists, before any capacity add-ons. Use DEVELOPER_EDITION for PoCs, destroy idle stacks, and treat each query_capacity_units/storage_capacity_units bump as a deliberate, reviewed change.
Enforce document-level security with user_context_policy = "USER_TOKEN" (or ATTRIBUTE_FILTER) so search respects source-system ACLs; never ship NULL_VALUE for indexes containing sensitive data.
Keep the trust policy tight: the bundled assume-role policy pins aws:SourceAccount and aws:SourceArn to defeat the confused-deputy problem — keep those conditions when extending the role.
Reuse the module’s role_arn for data sources instead of minting a new role per connector, but grant additional connector-specific permissions (e.g., SharePoint secrets) as separate scoped policies rather than widening this one.
Name and tag consistently: the index name seeds the role name and the Name/ManagedBy tags, so a clear convention like <app>-<env>-kb keeps indexes, roles, and CloudWatch AWS/Kendra metrics easy to correlate and bill back.