Terraform Module: Azure AI Search — private, identity-bound search clusters in one call

Quick take — Reusable Terraform module for azurerm_search_service on azurerm ~> 4.0: tuned SKU, replica/partition capacity, RBAC-only auth, system-assigned identity, and a locked-down private endpoint. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "search_service" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-search-service?ref=v1.0.0"

  name                = "..."  # Globally unique service name; 2-60 lowercase alphanumer…
  resource_group_name = "..."  # Resource group for the service and private endpoint.
  location            = "..."  # Azure region (e.g. `centralindia`).
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Azure AI Search (formerly Cognitive Search) is a managed search-as-a-service: you push documents into an index and it gives you full-text, faceted, vector, and semantic ranking over them with a query API. It’s the retrieval engine behind site search, product catalogs, and — increasingly — the R in RAG, where it stores embeddings and serves the nearest-neighbour chunks your LLM grounds on.

The trouble is that a production-grade search service is never just azurerm_search_service { sku = "standard" }. You need replicas sized for query QPS and SLA, partitions sized for index size, an identity so the indexer can reach Blob/SQL/Cosmos without keys, RBAC-only authentication so nobody is passing admin keys around, and a private endpoint so the service never answers on its public IP. Wrapping all of that in a module means every search service across your estate is born compliant, with the same naming, the same network posture, and the same diagnostic wiring — instead of each team rediscovering that partition_count is immutable the hard way.

When to use it

You’re standing up RAG / vector search backends and want the embedding store on a private endpoint, reachable only from your AKS or App Service VNet.
You run multiple search services (dev/test/prod, or one per product line) and want identical capacity, auth, and network rules from a single definition.
You need an indexer to pull from Blob, ADLS, SQL, or Cosmos using a managed identity instead of connection strings or admin keys.
You must satisfy a control that says “no public network access” and “local/key auth disabled” on data services — this module makes both the default.
You want the 3-replica / SLA-eligible topology encoded once, so nobody ships a single-replica “prod” service that has no read SLA.

Module structure

terraform-module-azure-search-service/
├── versions.tf      # provider + Terraform version pins
├── main.tf          # azurerm_search_service + private endpoint + diagnostics
├── variables.tf     # var-driven inputs with validation
└── outputs.tf       # id, name, endpoint, identity principal, key attrs

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

locals {
  # AI Search only supports 1, 2, 3, 4, 6, or 12 partitions.
  valid_partition_counts = [1, 2, 3, 4, 6, 12]

  # Free/Basic tiers cannot use customer identity-bound features the same way
  # and don't support high replica counts; we surface that in validation below.
  is_billable_tier = !contains(["free", "basic"], lower(var.sku))
}

resource "azurerm_search_service" "this" {
  name                = var.name
  resource_group_name = var.resource_group_name
  location            = var.location
  sku                 = var.sku

  replica_count   = var.replica_count
  partition_count = var.partition_count

  # Hosting mode "highDensity" is only valid for the standard3 SKU and lets a
  # single service hold up to 1,000 indexes. Anything else must be "default".
  hosting_mode = var.hosting_mode

  # Security posture: turn the public endpoint off and force AAD/RBAC.
  public_network_access_enabled = var.public_network_access_enabled

  # "false" disables admin/query API keys entirely (RBAC only).
  # "true" allows both keys and RBAC. We default to RBAC-only.
  local_authentication_enabled = var.local_authentication_enabled

  # Only meaningful when local auth is enabled; pins data-plane calls to RBAC
  # roles instead of accepting an admin key as a superuser.
  authentication_failure_mode = var.local_authentication_enabled ? var.authentication_failure_mode : null

  # Restrict which networks may hit the (still-public) endpoint. Ignored when
  # public access is disabled, but kept for services that must stay public.
  dynamic "allowed_ips" {
    for_each = var.public_network_access_enabled ? toset(var.allowed_ips) : toset([])
    content {
      value = allowed_ips.value
    }
  }

  # A system-assigned identity lets indexers and skillsets reach data sources
  # and Azure OpenAI without keys. User-assigned can be layered in via var.
  identity {
    type         = var.identity_type
    identity_ids = var.identity_type == "UserAssigned" || var.identity_type == "SystemAssigned, UserAssigned" ? var.identity_ids : null
  }

  tags = var.tags

  lifecycle {
    precondition {
      condition     = contains(local.valid_partition_counts, var.partition_count)
      error_message = "partition_count must be one of 1, 2, 3, 4, 6, or 12."
    }
    precondition {
      condition     = !(local.is_billable_tier == false && var.replica_count > 3)
      error_message = "free and basic SKUs support at most 3 replicas; scale up the SKU for more."
    }
  }
}

# Private endpoint: the recommended way to reach the service from a VNet with
# no public exposure. Created only when a subnet is supplied.
resource "azurerm_private_endpoint" "this" {
  count = var.private_endpoint_subnet_id != null ? 1 : 0

  name                = "pe-${var.name}"
  resource_group_name = var.resource_group_name
  location            = var.location
  subnet_id           = var.private_endpoint_subnet_id

  private_service_connection {
    name                           = "psc-${var.name}"
    private_connection_resource_id = azurerm_search_service.this.id
    subresource_names              = ["searchService"]
    is_manual_connection           = false
  }

  dynamic "private_dns_zone_group" {
    for_each = var.private_dns_zone_ids != null ? [1] : []
    content {
      name                 = "search-dns"
      private_dns_zone_ids = var.private_dns_zone_ids
    }
  }

  tags = var.tags
}

# Optional diagnostic settings -> Log Analytics for query/indexer telemetry.
resource "azurerm_monitor_diagnostic_setting" "this" {
  count = var.log_analytics_workspace_id != null ? 1 : 0

  name                       = "diag-${var.name}"
  target_resource_id         = azurerm_search_service.this.id
  log_analytics_workspace_id = var.log_analytics_workspace_id

  enabled_log {
    category = "OperationLogs"
  }

  enabled_metric {
    category = "AllMetrics"
  }
}

variables.tf

variable "name" {
  description = "Name of the Azure AI Search service. Globally unique, 2-60 chars, lowercase letters/digits/hyphens, no leading/trailing hyphen."
  type        = string

  validation {
    condition     = can(regex("^[a-z0-9](?:[a-z0-9-]{0,58}[a-z0-9])?$", var.name))
    error_message = "name must be 2-60 chars, lowercase alphanumerics and hyphens only, and may not start or end with a hyphen."
  }
}

variable "resource_group_name" {
  description = "Resource group that will hold the search service and its private endpoint."
  type        = string
}

variable "location" {
  description = "Azure region (e.g. southeastasia, centralindia)."
  type        = string
}

variable "sku" {
  description = "Pricing tier: free, basic, standard, standard2, standard3, storage_optimized_l1, or storage_optimized_l2. Immutable after creation."
  type        = string
  default     = "standard"

  validation {
    condition = contains([
      "free", "basic", "standard", "standard2", "standard3",
      "storage_optimized_l1", "storage_optimized_l2"
    ], lower(var.sku))
    error_message = "sku must be one of free, basic, standard, standard2, standard3, storage_optimized_l1, storage_optimized_l2."
  }
}

variable "replica_count" {
  description = "Number of replicas. >=2 is required for read (query) SLA, >=3 for read+write SLA. Free/basic cap at 3."
  type        = number
  default     = 3

  validation {
    condition     = var.replica_count >= 1 && var.replica_count <= 12
    error_message = "replica_count must be between 1 and 12."
  }
}

variable "partition_count" {
  description = "Number of partitions (index storage/scale). Allowed values: 1, 2, 3, 4, 6, 12. Immutable on free/basic SKUs."
  type        = number
  default     = 1
}

variable "hosting_mode" {
  description = "default, or highDensity (standard3 SKU only) for up to 1,000 indexes per service."
  type        = string
  default     = "default"

  validation {
    condition     = contains(["default", "highDensity"], var.hosting_mode)
    error_message = "hosting_mode must be either default or highDensity."
  }
}

variable "public_network_access_enabled" {
  description = "Allow the public endpoint. Set false when fronting the service with a private endpoint."
  type        = bool
  default     = false
}

variable "local_authentication_enabled" {
  description = "Allow admin/query API keys. false = RBAC (Azure AD) authentication only, which is the recommended posture."
  type        = bool
  default     = false
}

variable "authentication_failure_mode" {
  description = "When local auth is enabled, how key-based calls degrade: http401WithBearerChallenge or http403. Ignored when local auth is disabled."
  type        = string
  default     = "http403"

  validation {
    condition     = contains(["http401WithBearerChallenge", "http403"], var.authentication_failure_mode)
    error_message = "authentication_failure_mode must be http401WithBearerChallenge or http403."
  }
}

variable "identity_type" {
  description = "Managed identity for the service: SystemAssigned, UserAssigned, or 'SystemAssigned, UserAssigned'."
  type        = string
  default     = "SystemAssigned"

  validation {
    condition     = contains(["SystemAssigned", "UserAssigned", "SystemAssigned, UserAssigned"], var.identity_type)
    error_message = "identity_type must be SystemAssigned, UserAssigned, or 'SystemAssigned, UserAssigned'."
  }
}

variable "identity_ids" {
  description = "User-assigned identity resource IDs. Required when identity_type includes UserAssigned."
  type        = list(string)
  default     = []
}

variable "allowed_ips" {
  description = "IPv4 addresses or CIDR ranges permitted to reach the public endpoint. Only applied when public_network_access_enabled is true."
  type        = list(string)
  default     = []
}

variable "private_endpoint_subnet_id" {
  description = "Subnet ID for the private endpoint. When null, no private endpoint is created."
  type        = string
  default     = null
}

variable "private_dns_zone_ids" {
  description = "Private DNS zone IDs (privatelink.search.windows.net) to bind the private endpoint A record. Null skips DNS integration."
  type        = list(string)
  default     = null
}

variable "log_analytics_workspace_id" {
  description = "Log Analytics workspace ID for diagnostic settings. When null, no diagnostics are configured."
  type        = string
  default     = null
}

variable "tags" {
  description = "Tags applied to the search service and private endpoint."
  type        = map(string)
  default     = {}
}

outputs.tf

output "id" {
  description = "Resource ID of the Azure AI Search service."
  value       = azurerm_search_service.this.id
}

output "name" {
  description = "Name of the search service."
  value       = azurerm_search_service.this.name
}

output "endpoint" {
  description = "HTTPS query/management endpoint for the service."
  value       = "https://${azurerm_search_service.this.name}.search.windows.net"
}

output "identity_principal_id" {
  description = "Principal ID of the system-assigned identity, for granting it data-source RBAC roles. Null when no system identity is enabled."
  value       = try(azurerm_search_service.this.identity[0].principal_id, null)
}

output "primary_key" {
  description = "Primary admin API key. Empty when local authentication is disabled (RBAC-only)."
  value       = azurerm_search_service.this.primary_key
  sensitive   = true
}

output "query_keys" {
  description = "List of read-only query keys (key + name). Empty when local authentication is disabled."
  value       = azurerm_search_service.this.query_keys
  sensitive   = true
}

output "private_endpoint_ip" {
  description = "Private IP allocated to the search service private endpoint, or null when no private endpoint was created."
  value       = try(azurerm_private_endpoint.this[0].private_service_connection[0].private_ip_address, null)
}

How to use it

module "ai_search" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-search-service?ref=v1.0.0"

  name                = "kv-rag-search-prod"
  resource_group_name = azurerm_resource_group.platform.name
  location            = "centralindia"

  # Standard tier, sized for SLA: 3 replicas (read+write SLA), 2 partitions.
  sku             = "standard"
  replica_count   = 3
  partition_count = 2

  # Locked down: no public endpoint, RBAC-only (no admin/query keys).
  public_network_access_enabled = false
  local_authentication_enabled  = false

  # System identity so the indexer can pull from Blob and call Azure OpenAI.
  identity_type = "SystemAssigned"

  # Private endpoint into the app subnet, with DNS integration.
  private_endpoint_subnet_id = azurerm_subnet.app.id
  private_dns_zone_ids       = [azurerm_private_dns_zone.search.id]

  log_analytics_workspace_id = azurerm_log_analytics_workspace.platform.id

  tags = {
    environment = "prod"
    workload    = "rag"
    owner       = "platform-team"
  }
}

# Downstream: grant the search service's managed identity read access to the
# Blob container holding documents to be indexed, using the module's output.
resource "azurerm_role_assignment" "search_reads_blob" {
  scope                = azurerm_storage_account.documents.id
  role_definition_name = "Storage Blob Data Reader"
  principal_id         = module.ai_search.identity_principal_id
}

# Downstream: wire the private endpoint to a Front Door / App Service that
# reaches the service over its endpoint URL.
output "search_endpoint" {
  value = module.ai_search.endpoint
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module config — live/prod/search_service/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-search-service?ref=v1.0.0"
}

inputs = {
  name = "..."
  resource_group_name = "..."
  location = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/search_service && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
`name`	`string`	—	Yes	Globally unique service name; 2-60 lowercase alphanumerics/hyphens, no leading/trailing hyphen.
`resource_group_name`	`string`	—	Yes	Resource group for the service and private endpoint.
`location`	`string`	—	Yes	Azure region (e.g. `centralindia`).
`sku`	`string`	`"standard"`	No	Tier: free, basic, standard, standard2, standard3, storage_optimized_l1/l2. Immutable.
`replica_count`	`number`	`3`	No	Replicas; >=2 for read SLA, >=3 for read+write SLA (1-12).
`partition_count`	`number`	`1`	No	Partitions for index scale; must be 1, 2, 3, 4, 6, or 12.
`hosting_mode`	`string`	`"default"`	No	`default` or `highDensity` (standard3 only).
`public_network_access_enabled`	`bool`	`false`	No	Allow the public endpoint; set false when using a private endpoint.
`local_authentication_enabled`	`bool`	`false`	No	Allow admin/query keys; false = RBAC-only.
`authentication_failure_mode`	`string`	`"http403"`	No	Key-call failure behaviour when local auth is on: `http401WithBearerChallenge` or `http403`.
`identity_type`	`string`	`"SystemAssigned"`	No	`SystemAssigned`, `UserAssigned`, or `SystemAssigned, UserAssigned`.
`identity_ids`	`list(string)`	`[]`	No	User-assigned identity IDs; required when identity_type includes UserAssigned.
`allowed_ips`	`list(string)`	`[]`	No	IPv4/CIDR allowlist for the public endpoint (applied only when public access is on).
`private_endpoint_subnet_id`	`string`	`null`	No	Subnet for the private endpoint; null skips creation.
`private_dns_zone_ids`	`list(string)`	`null`	No	`privatelink.search.windows.net` DNS zone IDs for the endpoint A record.
`log_analytics_workspace_id`	`string`	`null`	No	Log Analytics workspace for diagnostics; null skips.
`tags`	`map(string)`	`{}`	No	Tags applied to all created resources.

Outputs

Name	Description
`id`	Resource ID of the search service.
`name`	Name of the search service.
`endpoint`	HTTPS endpoint URL (`https://<name>.search.windows.net`).
`identity_principal_id`	System-assigned identity principal ID for granting data-source RBAC; null if disabled.
`primary_key`	Primary admin API key (sensitive); empty under RBAC-only auth.
`query_keys`	Read-only query keys (sensitive); empty under RBAC-only auth.
`private_endpoint_ip`	Private IP of the search private endpoint; null if none created.

Enterprise scenario

A retail bank builds an internal “policy copilot” that lets relationship managers ask plain-language questions over thousands of product and compliance PDFs. The platform team uses this module to deploy one standard AI Search service per environment with public_network_access_enabled = false and local_authentication_enabled = false, so the only path to the index is a private endpoint from the AKS cluster running the RAG API, and the only credential is an Azure AD token. The module’s identity_principal_id output is fed straight into role assignments granting the search service’s managed identity Storage Blob Data Reader on the document store and Cognitive Services OpenAI User on the embedding model — meaning no admin key, query key, or storage connection string exists anywhere in the pipeline. When indexing volume grew, they bumped partition_count from 2 to 4 and replica_count to 4 in one PR, keeping the read+write SLA intact.

Best practices

Run at least 3 replicas in production. Azure only grants the read SLA at 2+ replicas and the read-and-write (indexing) SLA at 3+; a single-replica service has no SLA, so don’t let a one-replica deployment masquerade as prod. Scale partitions (1/2/3/4/6/12) for index size and replicas for query throughput — they’re independent dials.
Default to RBAC-only and private endpoints. Keep local_authentication_enabled = false so no admin/query keys exist to leak, and public_network_access_enabled = false with a private endpoint into the app VNet. Bind the privatelink.search.windows.net zone so clients resolve to the private IP automatically.
Use the system-assigned identity for every data source. Grant it Storage Blob Data Reader, SQL db_datareader, or Cognitive Services OpenAI User instead of embedding connection strings or keys in indexer/skillset definitions — it removes an entire class of secret from the pipeline.
Pick the SKU deliberately — it’s immutable. You cannot resize a service in place; changing sku forces a destroy-and-recreate and you re-index from scratch. Same for partition_count on free/basic. Right-size on day one, and for very large but cold indexes consider the storage-optimized L1/L2 tiers rather than over-provisioning standard.
Name regionally and consistently. Service names are global and become the https://<name>.search.windows.net host, so encode product + environment (e.g. kv-rag-search-prod) and keep them DNS-clean; you can’t rename later without rebuilding.
Wire diagnostics from the start. Stream OperationLogs and AllMetrics to Log Analytics via log_analytics_workspace_id so you can see throttling (HTTP 503/429), query latency, and indexer failures before users do, and alert on search-latency and throttled-query metrics.