IaC AWS

Terraform Module: AWS Kendra — a governed enterprise search index in one block

Quick take — Provision an AWS Kendra index with Terraform: enterprise-edition capacity, KMS server-side encryption, a least-privilege IAM service role, and CloudWatch metrics — wrapped in a reusable, var-driven module. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "aws" {
  region = "us-east-1"
}

module "kendra" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-kendra?ref=v1.0.0"

  index_name = "..."  # Name of the Kendra index; also derives the IAM role nam…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Amazon Kendra is a managed, ML-powered enterprise search service. Instead of keyword matching, it ranks results by semantic relevance, understands natural-language questions, and surfaces FAQ-style answers and document excerpts across connected repositories (S3, SharePoint, Confluence, RDS, ServiceNow, and more). The atom you provision first is the indexaws_kendra_index — a long-lived, capacity-billed resource that holds your ingested documents and serves the Query API. Data sources, FAQs, and experiences all attach to an index, so getting the index right (edition, encryption, IAM role, capacity units) is the foundation everything else stands on.

Kendra is unusual among AWS services because the index is not cheap to leave running by accident and it is slow to create (the ENTERPRISE_EDITION index can take 30+ minutes to provision). That makes click-ops a poor fit: you want the edition, KMS key, role, and capacity codified, reviewed in a pull request, and reproducible across dev/stage/prod. This module wraps aws_kendra_index plus the two things you almost always need alongside it — a scoped IAM service role (so Kendra can write CloudWatch logs/metrics and, optionally, read your S3 bucket and use your KMS key) and server-side encryption via a customer-managed CMK — behind a small, validated variable surface.

When to use it

Module structure

terraform-module-aws-kendra/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

main.tf

data "aws_caller_identity" "current" {}
data "aws_partition" "current" {}
data "aws_region" "current" {}

locals {
  account_id = data.aws_caller_identity.current.account_id
  partition  = data.aws_partition.current.partition
  region     = data.aws_region.current.name

  # Kendra requires a service role that it can assume.
  role_name = coalesce(var.role_name, "${var.index_name}-kendra-role")

  # CloudWatch log group ARN scope Kendra needs for metrics/logging.
  log_group_arn_prefix = "arn:${local.partition}:logs:${local.region}:${local.account_id}:log-group:/aws/kendra/*"

  tags = merge(
    {
      "Name"      = var.index_name
      "ManagedBy" = "terraform"
    },
    var.tags,
  )
}

# ---------------------------------------------------------------------------
# IAM service role assumed by Kendra
# ---------------------------------------------------------------------------
data "aws_iam_policy_document" "assume" {
  statement {
    effect  = "Allow"
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["kendra.amazonaws.com"]
    }

    # Confused-deputy protection: only this account's Kendra can assume the role.
    condition {
      test     = "StringEquals"
      variable = "aws:SourceAccount"
      values   = [local.account_id]
    }

    condition {
      test     = "ArnLike"
      variable = "aws:SourceArn"
      values   = ["arn:${local.partition}:kendra:${local.region}:${local.account_id}:index/*"]
    }
  }
}

resource "aws_iam_role" "this" {
  name                 = local.role_name
  assume_role_policy   = data.aws_iam_policy_document.assume.json
  permissions_boundary = var.permissions_boundary_arn
  tags                 = local.tags
}

# Permissions the index itself requires: publish CloudWatch metrics + write logs.
data "aws_iam_policy_document" "service" {
  statement {
    sid       = "CloudWatchMetrics"
    effect    = "Allow"
    actions   = ["cloudwatch:PutMetricData"]
    resources = ["*"]

    condition {
      test     = "StringEquals"
      variable = "cloudwatch:namespace"
      values   = ["AWS/Kendra"]
    }
  }

  statement {
    sid       = "DescribeLogGroups"
    effect    = "Allow"
    actions   = ["logs:DescribeLogGroups"]
    resources = ["*"]
  }

  statement {
    sid       = "CreateAndDescribeLogGroup"
    effect    = "Allow"
    actions   = ["logs:CreateLogGroup"]
    resources = [local.log_group_arn_prefix]
  }

  statement {
    sid    = "LogStreams"
    effect = "Allow"
    actions = [
      "logs:DescribeLogStreams",
      "logs:CreateLogStream",
      "logs:PutLogEvents",
    ]
    resources = ["${local.log_group_arn_prefix}:log-stream:*"]
  }

  # Allow Kendra to use the customer-managed CMK for the index, when supplied.
  dynamic "statement" {
    for_each = var.kms_key_id != null ? [1] : []
    content {
      sid    = "UseKmsKey"
      effect = "Allow"
      actions = [
        "kms:Decrypt",
        "kms:GenerateDataKey",
        "kms:DescribeKey",
      ]
      resources = [var.kms_key_arn]
    }
  }

  # Optional: let the index read documents from named S3 buckets (for S3 data sources).
  dynamic "statement" {
    for_each = length(var.source_s3_bucket_arns) > 0 ? [1] : []
    content {
      sid       = "ReadSourceBuckets"
      effect    = "Allow"
      actions   = ["s3:GetObject"]
      resources = [for arn in var.source_s3_bucket_arns : "${arn}/*"]
    }
  }

  dynamic "statement" {
    for_each = length(var.source_s3_bucket_arns) > 0 ? [1] : []
    content {
      sid       = "ListSourceBuckets"
      effect    = "Allow"
      actions   = ["s3:ListBucket"]
      resources = var.source_s3_bucket_arns
    }
  }
}

resource "aws_iam_role_policy" "service" {
  name   = "${local.role_name}-policy"
  role   = aws_iam_role.this.id
  policy = data.aws_iam_policy_document.service.json
}

# ---------------------------------------------------------------------------
# Kendra index
# ---------------------------------------------------------------------------
resource "aws_kendra_index" "this" {
  name        = var.index_name
  description = var.description
  edition     = var.edition
  role_arn    = aws_iam_role.this.arn

  dynamic "server_side_encryption_configuration" {
    for_each = var.kms_key_id != null ? [1] : []
    content {
      kms_key_id = var.kms_key_id
    }
  }

  # Capacity units only apply to ENTERPRISE_EDITION; guarded by validation below.
  dynamic "capacity_units" {
    for_each = var.edition == "ENTERPRISE_EDITION" ? [1] : []
    content {
      query_capacity   = var.query_capacity_units
      storage_capacity = var.storage_capacity_units
    }
  }

  # Field mappings for user/group access control (document-level security).
  dynamic "user_token_configurations" {
    for_each = var.user_group_resolution_enabled ? [1] : []
    content {
      json_token_type_configuration {
        group_attribute_field    = var.group_attribute_field
        user_name_attribute_field = var.user_name_attribute_field
      }
    }
  }

  user_context_policy = var.user_context_policy

  tags = local.tags

  # The enterprise index is slow to create/update; give it room.
  timeouts {
    create = "60m"
    update = "60m"
    delete = "60m"
  }

  depends_on = [aws_iam_role_policy.service]
}

variables.tf

variable "index_name" {
  description = "Name of the Kendra index. Also used to derive the IAM role name and tags."
  type        = string

  validation {
    condition     = can(regex("^[a-zA-Z0-9][a-zA-Z0-9_-]{0,999}$", var.index_name))
    error_message = "index_name must start alphanumeric and contain only letters, numbers, hyphens, or underscores."
  }
}

variable "description" {
  description = "Human-readable description of the index."
  type        = string
  default     = "Managed by Terraform"
}

variable "edition" {
  description = "Kendra edition: DEVELOPER_EDITION (PoC) or ENTERPRISE_EDITION (production HA)."
  type        = string
  default     = "ENTERPRISE_EDITION"

  validation {
    condition     = contains(["DEVELOPER_EDITION", "ENTERPRISE_EDITION"], var.edition)
    error_message = "edition must be DEVELOPER_EDITION or ENTERPRISE_EDITION."
  }
}

variable "query_capacity_units" {
  description = "Additional query capacity units (each adds ~0.1 queries/sec). Enterprise edition only."
  type        = number
  default     = 0

  validation {
    condition     = var.query_capacity_units >= 0 && floor(var.query_capacity_units) == var.query_capacity_units
    error_message = "query_capacity_units must be a non-negative integer."
  }
}

variable "storage_capacity_units" {
  description = "Additional storage capacity units (each adds ~100k documents / 30 GB). Enterprise edition only."
  type        = number
  default     = 0

  validation {
    condition     = var.storage_capacity_units >= 0 && floor(var.storage_capacity_units) == var.storage_capacity_units
    error_message = "storage_capacity_units must be a non-negative integer."
  }
}

variable "kms_key_id" {
  description = "Customer-managed KMS key ID/ARN for server-side encryption of the index. Null uses an AWS-owned key."
  type        = string
  default     = null
}

variable "kms_key_arn" {
  description = "Full ARN of the KMS key, used to scope the index role's kms:* permissions. Required when kms_key_id is set."
  type        = string
  default     = null

  validation {
    condition     = var.kms_key_arn == null || can(regex("^arn:aws[a-z-]*:kms:", var.kms_key_arn))
    error_message = "kms_key_arn must be a valid KMS key ARN."
  }
}

variable "source_s3_bucket_arns" {
  description = "S3 bucket ARNs the index service role may read for S3 data sources (read access wired in IAM)."
  type        = list(string)
  default     = []
}

variable "user_context_policy" {
  description = "Document-level access control mode: ATTRIBUTE_FILTER, USER_TOKEN, or NULL_VALUE (no filtering)."
  type        = string
  default     = "ATTRIBUTE_FILTER"

  validation {
    condition     = contains(["ATTRIBUTE_FILTER", "USER_TOKEN", "NULL_VALUE"], var.user_context_policy)
    error_message = "user_context_policy must be ATTRIBUTE_FILTER, USER_TOKEN, or NULL_VALUE."
  }
}

variable "user_group_resolution_enabled" {
  description = "Enable JSON user-token group resolution (maps identity attributes to ACL filtering)."
  type        = bool
  default     = false
}

variable "group_attribute_field" {
  description = "Token field carrying group membership, used when user_group_resolution_enabled is true."
  type        = string
  default     = "groups"
}

variable "user_name_attribute_field" {
  description = "Token field carrying the user name, used when user_group_resolution_enabled is true."
  type        = string
  default     = "username"
}

variable "role_name" {
  description = "Override for the IAM service-role name. Defaults to <index_name>-kendra-role."
  type        = string
  default     = null
}

variable "permissions_boundary_arn" {
  description = "Optional IAM permissions boundary ARN applied to the Kendra service role."
  type        = string
  default     = null
}

variable "tags" {
  description = "Additional tags merged onto all created resources."
  type        = map(string)
  default     = {}
}

outputs.tf

output "index_id" {
  description = "The identifier of the Kendra index (used by data sources, FAQs, and the Query API)."
  value       = aws_kendra_index.this.id
}

output "index_arn" {
  description = "The ARN of the Kendra index."
  value       = aws_kendra_index.this.arn
}

output "index_name" {
  description = "The name of the Kendra index."
  value       = aws_kendra_index.this.name
}

output "index_edition" {
  description = "The edition of the provisioned index."
  value       = aws_kendra_index.this.edition
}

output "role_arn" {
  description = "ARN of the IAM service role Kendra assumes (reuse it when attaching data sources)."
  value       = aws_iam_role.this.arn
}

output "role_name" {
  description = "Name of the IAM service role created for the index."
  value       = aws_iam_role.this.name
}

How to use it

module "kendra" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-kendra?ref=v1.0.0"

  index_name  = "kv-knowledge-base"
  description = "Semantic search over internal docs and policies"
  edition     = "ENTERPRISE_EDITION"

  # Customer-managed encryption at rest.
  kms_key_id  = aws_kms_key.kendra.key_id
  kms_key_arn = aws_kms_key.kendra.arn

  # Let the index read the docs bucket for an S3 data source.
  source_s3_bucket_arns = [aws_s3_bucket.docs.arn]

  # Modest extra capacity beyond the base enterprise unit.
  query_capacity_units   = 1
  storage_capacity_units = 1

  # Document-level security driven by identity tokens.
  user_context_policy           = "USER_TOKEN"
  user_group_resolution_enabled = true

  tags = {
    Environment = "prod"
    Team        = "platform"
    CostCenter  = "search"
  }
}

# Downstream: attach an S3 data source to the index using its outputs.
resource "aws_kendra_data_source" "docs" {
  index_id = module.kendra.index_id
  name     = "internal-docs"
  type     = "S3"
  role_arn = module.kendra.role_arn

  configuration {
    s3_configuration {
      bucket_name = aws_s3_bucket.docs.id
    }
  }

  schedule = "cron(0 2 * * ? *)" # nightly re-crawl at 02:00 UTC
}

# Downstream: an SSM parameter so apps can discover the index ID.
resource "aws_ssm_parameter" "kendra_index_id" {
  name  = "/search/kendra/index-id"
  type  = "String"
  value = module.kendra.index_id
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "s3"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...s3 state bucket/container + key per path...
  }
}

2. Module configlive/prod/kendra/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-kendra?ref=v1.0.0"
}

inputs = {
  index_name = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/kendra && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
index_name string yes Name of the Kendra index; also derives the IAM role name and tags.
description string “Managed by Terraform” no Human-readable description of the index.
edition string “ENTERPRISE_EDITION” no DEVELOPER_EDITION (PoC) or ENTERPRISE_EDITION (production HA).
query_capacity_units number 0 no Extra query capacity units (enterprise only); each adds ~0.1 QPS.
storage_capacity_units number 0 no Extra storage capacity units (enterprise only); each adds ~100k docs / 30 GB.
kms_key_id string null no Customer-managed KMS key ID/ARN for server-side encryption; null uses an AWS-owned key.
kms_key_arn string null no Full KMS key ARN used to scope the role’s kms:* permissions; required when kms_key_id is set.
source_s3_bucket_arns list(string) [] no S3 bucket ARNs the index role may read for S3 data sources.
user_context_policy string “ATTRIBUTE_FILTER” no Document-level access mode: ATTRIBUTE_FILTER, USER_TOKEN, or NULL_VALUE.
user_group_resolution_enabled bool false no Enable JSON user-token group resolution for ACL filtering.
group_attribute_field string “groups” no Token field carrying group membership (used when resolution is enabled).
user_name_attribute_field string “username” no Token field carrying the user name (used when resolution is enabled).
role_name string null no Override the IAM service-role name; defaults to <index_name>-kendra-role.
permissions_boundary_arn string null no Optional IAM permissions boundary ARN for the service role.
tags map(string) {} no Additional tags merged onto all created resources.

Outputs

Name Description
index_id Identifier of the Kendra index (used by data sources, FAQs, and the Query API).
index_arn ARN of the Kendra index.
index_name Name of the Kendra index.
index_edition Edition of the provisioned index.
role_arn ARN of the IAM service role Kendra assumes (reuse when attaching data sources).
role_name Name of the IAM service role created for the index.

Enterprise scenario

A financial-services firm replaces its sprawling internal wiki search with Kendra-backed semantic search across SharePoint policies, an S3 archive of PDF procedures, and a ServiceNow knowledge base. This module provisions the ENTERPRISE_EDITION index in each region with a customer-managed CMK (so encryption keys stay under the InfoSec team’s control) and USER_TOKEN document-level security, ensuring an analyst only sees passages from documents their AD groups are entitled to. The index ID flows out to SSM Parameter Store, where the firm’s internal “Ask Compliance” chatbot reads it at runtime to call the Query API — no console clicks, every change reviewed in a pull request, and capacity sized via the query_capacity_units variable as adoption grows.

Best practices

TerraformAWSKendraModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading