IaC Azure

Terraform Module: Azure Azure OpenAI — governed model deployments with private networking

Quick take — A reusable hashicorp/azurerm module for Azure OpenAI: provision the Cognitive account, pin model deployments with quota, lock it down with a private endpoint and customer-managed keys, and emit the endpoint for downstream apps. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "openai" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-openai?ref=v1.0.0"

  name                = "..."  # Account name; 2-64 chars of letters, numbers, hyphens.
  resource_group_name = "..."  # Resource group for the account and private endpoint.
  location            = "..."  # Region with Azure OpenAI + your model availability.
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Azure OpenAI gives you the OpenAI model family — GPT-4o, GPT-4.1, text-embedding-3-large, o3 reasoning models — behind an Azure-native control plane, so you inherit Azure RBAC, private networking, customer-managed keys, regional data residency, and Microsoft’s enterprise data-handling commitments instead of calling api.openai.com directly. In Terraform there is no dedicated azurerm_openai_account; an Azure OpenAI resource is just an azurerm_cognitive_account with kind = "OpenAI", and every model you want to call is a separate azurerm_cognitive_deployment child resource that carries its own SKU and capacity (tokens-per-minute quota).

Wrapping this in a module matters because a correct production deployment is never a single resource. You almost always need: the account with custom_subdomain_name set (mandatory for Entra ID token auth and private endpoints), public_network_access_enabled = false, a system-assigned identity, a private endpoint into your spoke VNet, and a map of model deployments where capacity is the thing that actually causes 429s in production. Hand-rolling that per workload leads to drift — one team sets local_auth_enabled = true, another forgets the subdomain and breaks Private Link. This module makes the secure, quota-aware shape the default and exposes the few things that genuinely vary (location, SKU, which models, which subnet).

When to use it

Module structure

terraform-module-azure-openai/
├── versions.tf      # provider pins
├── main.tf          # cognitive_account (OpenAI) + deployments + private endpoint + CMK
├── variables.tf     # var-driven inputs with validation
└── outputs.tf       # id, endpoint, deployment names, identity principal
# versions.tf
terraform {
  required_version = ">= 1.6.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}
# main.tf

locals {
  # Azure OpenAI requires a globally-unique custom subdomain for Entra ID auth
  # and Private Link. Default it from the account name if not supplied.
  custom_subdomain = coalesce(var.custom_subdomain_name, var.name)

  enable_private_endpoint = var.public_network_access_enabled == false && var.private_endpoint_subnet_id != null
}

resource "azurerm_cognitive_account" "openai" {
  name                  = var.name
  resource_group_name   = var.resource_group_name
  location              = var.location
  kind                  = "OpenAI"
  sku_name              = var.sku_name
  custom_subdomain_name = local.custom_subdomain

  # Networking posture
  public_network_access_enabled = var.public_network_access_enabled
  outbound_network_access_restricted = var.outbound_network_access_restricted

  # Prefer Entra ID (Azure AD) tokens; disabling local auth blocks api-key access.
  local_auth_enabled = var.local_auth_enabled

  dynamic "identity" {
    for_each = var.identity_type == null ? [] : [1]
    content {
      type         = var.identity_type
      identity_ids = var.identity_type == "UserAssigned" || var.identity_type == "SystemAssigned, UserAssigned" ? var.user_assigned_identity_ids : null
    }
  }

  # Optional IP/VNet allow-list applied when not fully private.
  dynamic "network_acls" {
    for_each = length(var.network_acls_ip_rules) > 0 || length(var.network_acls_virtual_network_subnet_ids) > 0 ? [1] : []
    content {
      default_action = var.network_acls_default_action
      ip_rules       = var.network_acls_ip_rules

      dynamic "virtual_network_rules" {
        for_each = var.network_acls_virtual_network_subnet_ids
        content {
          subnet_id = virtual_network_rules.value
        }
      }
    }
  }

  # Customer-managed key encryption (requires a User/SystemAssigned identity with
  # Key Vault Crypto permissions on the referenced key).
  dynamic "customer_managed_key" {
    for_each = var.customer_managed_key_id == null ? [] : [1]
    content {
      key_vault_key_id   = var.customer_managed_key_id
      identity_client_id = var.cmk_identity_client_id
    }
  }

  tags = var.tags
}

# One azurerm_cognitive_deployment per model. capacity == TPM/1000 quota units.
resource "azurerm_cognitive_deployment" "this" {
  for_each = var.model_deployments

  name                 = each.key
  cognitive_account_id = azurerm_cognitive_account.openai.id

  # Block new requests once quota is exceeded instead of silently dynamic-routing.
  rai_policy_name      = each.value.rai_policy_name
  version_upgrade_option = each.value.version_upgrade_option

  model {
    format  = "OpenAI"
    name    = each.value.model_name
    version = each.value.model_version
  }

  sku {
    name     = each.value.scale_type
    capacity = each.value.capacity
  }
}

# Private Link: created only when going fully private with a subnet supplied.
resource "azurerm_private_endpoint" "openai" {
  count = local.enable_private_endpoint ? 1 : 0

  name                = "${var.name}-pe"
  location            = var.location
  resource_group_name = var.resource_group_name
  subnet_id           = var.private_endpoint_subnet_id

  private_service_connection {
    name                           = "${var.name}-psc"
    private_connection_resource_id = azurerm_cognitive_account.openai.id
    subresource_names              = ["account"]
    is_manual_connection           = false
  }

  dynamic "private_dns_zone_group" {
    for_each = var.private_dns_zone_ids == null ? [] : [1]
    content {
      name                 = "default"
      private_dns_zone_ids = var.private_dns_zone_ids
    }
  }

  tags = var.tags
}
# variables.tf

variable "name" {
  description = "Name of the Azure OpenAI (Cognitive) account. Lowercase letters, numbers and hyphens; 2-64 chars."
  type        = string

  validation {
    condition     = can(regex("^[a-zA-Z0-9-]{2,64}$", var.name))
    error_message = "name must be 2-64 characters of letters, numbers, or hyphens."
  }
}

variable "resource_group_name" {
  description = "Resource group that will contain the account and private endpoint."
  type        = string
}

variable "location" {
  description = "Azure region. Must be a region where Azure OpenAI and your chosen models are available (e.g. eastus2, swedencentral)."
  type        = string
}

variable "sku_name" {
  description = "Account SKU. S0 is the standard pay-as-you-go tier for Azure OpenAI."
  type        = string
  default     = "S0"

  validation {
    condition     = contains(["S0"], var.sku_name)
    error_message = "Azure OpenAI accounts currently support the S0 SKU."
  }
}

variable "custom_subdomain_name" {
  description = "Globally-unique custom subdomain. Required for Entra ID auth and Private Link. Defaults to var.name when null."
  type        = string
  default     = null
}

variable "public_network_access_enabled" {
  description = "Whether the account is reachable over the public internet. Set false for Private Link deployments."
  type        = bool
  default     = false
}

variable "outbound_network_access_restricted" {
  description = "Restrict outbound calls (e.g. for 'On Your Data') to approved FQDNs."
  type        = bool
  default     = false
}

variable "local_auth_enabled" {
  description = "Allow api-key (local) auth in addition to Entra ID tokens. Disable to enforce keyless RBAC access."
  type        = bool
  default     = false
}

variable "identity_type" {
  description = "Managed identity type: SystemAssigned, UserAssigned, 'SystemAssigned, UserAssigned', or null."
  type        = string
  default     = "SystemAssigned"

  validation {
    condition     = var.identity_type == null || contains(["SystemAssigned", "UserAssigned", "SystemAssigned, UserAssigned"], var.identity_type)
    error_message = "identity_type must be SystemAssigned, UserAssigned, 'SystemAssigned, UserAssigned', or null."
  }
}

variable "user_assigned_identity_ids" {
  description = "User-assigned identity resource IDs (required when identity_type includes UserAssigned)."
  type        = list(string)
  default     = []
}

variable "model_deployments" {
  description = "Map of model deployments keyed by deployment name. capacity is TPM in thousands (e.g. 30 = 30K TPM)."
  type = map(object({
    model_name             = string
    model_version          = string
    scale_type             = optional(string, "Standard")
    capacity               = optional(number, 30)
    rai_policy_name        = optional(string)
    version_upgrade_option = optional(string, "OnceNewDefaultVersionAvailable")
  }))
  default = {}

  validation {
    condition = alltrue([
      for d in values(var.model_deployments) :
      contains(["Standard", "GlobalStandard", "DataZoneStandard", "ProvisionedManaged", "GlobalProvisionedManaged"], d.scale_type)
    ])
    error_message = "scale_type must be one of Standard, GlobalStandard, DataZoneStandard, ProvisionedManaged, GlobalProvisionedManaged."
  }

  validation {
    condition     = alltrue([for d in values(var.model_deployments) : d.capacity >= 1])
    error_message = "Each deployment capacity must be >= 1 (thousand TPM / PTU units)."
  }
}

variable "network_acls_default_action" {
  description = "Default action for the network ACL when IP/VNet rules are supplied (Allow or Deny)."
  type        = string
  default     = "Deny"

  validation {
    condition     = contains(["Allow", "Deny"], var.network_acls_default_action)
    error_message = "network_acls_default_action must be Allow or Deny."
  }
}

variable "network_acls_ip_rules" {
  description = "List of public IPs/CIDRs allowed when the account is not fully private."
  type        = list(string)
  default     = []
}

variable "network_acls_virtual_network_subnet_ids" {
  description = "Subnet IDs allowed via service endpoints (alternative to a private endpoint)."
  type        = list(string)
  default     = []
}

variable "private_endpoint_subnet_id" {
  description = "Subnet ID for the private endpoint. When set with public access disabled, a private endpoint is created."
  type        = string
  default     = null
}

variable "private_dns_zone_ids" {
  description = "Private DNS zone IDs (typically privatelink.openai.azure.com) to register the private endpoint A record."
  type        = list(string)
  default     = null
}

variable "customer_managed_key_id" {
  description = "Key Vault key ID for customer-managed encryption. Null uses Microsoft-managed keys."
  type        = string
  default     = null
}

variable "cmk_identity_client_id" {
  description = "Client ID of the identity used to access the CMK (required when customer_managed_key_id is set with a user-assigned identity)."
  type        = string
  default     = null
}

variable "tags" {
  description = "Tags applied to all resources."
  type        = map(string)
  default     = {}
}
# outputs.tf

output "id" {
  description = "Resource ID of the Azure OpenAI (Cognitive) account."
  value       = azurerm_cognitive_account.openai.id
}

output "name" {
  description = "Name of the Azure OpenAI account."
  value       = azurerm_cognitive_account.openai.name
}

output "endpoint" {
  description = "Base endpoint URL (https://<subdomain>.openai.azure.com/) used by the OpenAI SDK / REST calls."
  value       = azurerm_cognitive_account.openai.endpoint
}

output "custom_subdomain_name" {
  description = "The custom subdomain in use (drives the endpoint host and Private Link DNS record)."
  value       = azurerm_cognitive_account.openai.custom_subdomain_name
}

output "primary_access_key" {
  description = "Primary api-key. Empty when local_auth_enabled = false; prefer Entra ID tokens."
  value       = azurerm_cognitive_account.openai.primary_access_key
  sensitive   = true
}

output "identity_principal_id" {
  description = "Principal ID of the system-assigned identity, for granting it RBAC (e.g. Key Vault access)."
  value       = try(azurerm_cognitive_account.openai.identity[0].principal_id, null)
}

output "deployment_names" {
  description = "Map of deployment key => deployment name, for passing model/deployment IDs to applications."
  value       = { for k, d in azurerm_cognitive_deployment.this : k => d.name }
}

output "private_endpoint_ip" {
  description = "Private IP assigned to the account's private endpoint, if one was created."
  value       = try(azurerm_private_endpoint.openai[0].private_service_connection[0].private_ip_address, null)
}

How to use it

module "azure_openai" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-openai?ref=v1.0.0"

  name                = "kv-openai-prod-eus2"
  resource_group_name = azurerm_resource_group.ai.name
  location            = "eastus2"
  sku_name            = "S0"

  # Keyless, fully private posture
  local_auth_enabled            = false
  public_network_access_enabled = false
  identity_type                 = "SystemAssigned"

  private_endpoint_subnet_id = azurerm_subnet.privatelink.id
  private_dns_zone_ids       = [azurerm_private_dns_zone.openai.id]

  model_deployments = {
    "gpt-4o" = {
      model_name    = "gpt-4o"
      model_version = "2024-11-20"
      scale_type    = "GlobalStandard"
      capacity      = 50 # 50K TPM
    }
    "text-embedding-3-large" = {
      model_name    = "text-embedding-3-large"
      model_version = "1"
      scale_type    = "Standard"
      capacity      = 120 # 120K TPM for bulk embedding jobs
    }
  }

  tags = {
    workload    = "rag-assistant"
    environment = "prod"
    owner       = "platform-ai"
  }
}

# Downstream: feed the endpoint + deployment name into the app's settings,
# and grant the app's identity the data-plane role for keyless calls.
resource "azurerm_linux_web_app" "assistant" {
  name                = "kv-rag-assistant-prod"
  resource_group_name = azurerm_resource_group.ai.name
  location            = "eastus2"
  service_plan_id     = azurerm_service_plan.app.id

  site_config {}

  app_settings = {
    "AZURE_OPENAI_ENDPOINT"        = module.azure_openai.endpoint
    "AZURE_OPENAI_CHAT_DEPLOYMENT" = module.azure_openai.deployment_names["gpt-4o"]
    "AZURE_OPENAI_EMBED_DEPLOYMENT" = module.azure_openai.deployment_names["text-embedding-3-large"]
  }

  identity {
    type = "SystemAssigned"
  }
}

resource "azurerm_role_assignment" "app_can_call_openai" {
  scope                = module.azure_openai.id
  role_definition_name = "Cognitive Services OpenAI User"
  principal_id         = azurerm_linux_web_app.assistant.identity[0].principal_id
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module configlive/prod/openai/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-openai?ref=v1.0.0"
}

inputs = {
  name = "..."
  resource_group_name = "..."
  location = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/openai && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
name string yes Account name; 2-64 chars of letters, numbers, hyphens.
resource_group_name string yes Resource group for the account and private endpoint.
location string yes Region with Azure OpenAI + your model availability.
sku_name string “S0” no Account SKU (S0).
custom_subdomain_name string null no Globally-unique subdomain; required for Entra ID + Private Link. Defaults to name.
public_network_access_enabled bool false no Public internet reachability; false for Private Link.
outbound_network_access_restricted bool false no Restrict outbound (e.g. “On Your Data”) to approved FQDNs.
local_auth_enabled bool false no Allow api-key auth alongside Entra ID; false enforces keyless RBAC.
identity_type string “SystemAssigned” no Managed identity type, or null.
user_assigned_identity_ids list(string) [] no UAMI IDs when identity_type includes UserAssigned.
model_deployments map(object) {} no Model deployments keyed by name; capacity is TPM in thousands.
network_acls_default_action string “Deny” no Default ACL action when IP/VNet rules are set.
network_acls_ip_rules list(string) [] no Allowed public IPs/CIDRs when not fully private.
network_acls_virtual_network_subnet_ids list(string) [] no Subnet IDs allowed via service endpoints.
private_endpoint_subnet_id string null no Subnet for the private endpoint (with public access off).
private_dns_zone_ids list(string) null no Private DNS zones (privatelink.openai.azure.com) for the PE record.
customer_managed_key_id string null no Key Vault key ID for CMK encryption; null = Microsoft-managed.
cmk_identity_client_id string null no Client ID of the identity accessing the CMK.
tags map(string) {} no Tags applied to all resources.

Outputs

Name Description
id Resource ID of the Azure OpenAI account.
name Account name.
endpoint Base endpoint URL (https://<subdomain>.openai.azure.com/) for SDK/REST calls.
custom_subdomain_name Subdomain driving the endpoint host and Private Link DNS.
primary_access_key Primary api-key (sensitive; empty when local auth is disabled).
identity_principal_id System-assigned identity principal ID for RBAC grants.
deployment_names Map of deployment key => deployment name for app configuration.
private_endpoint_ip Private IP of the account’s private endpoint, if created.

Enterprise scenario

A financial-services firm runs a regulated RAG copilot that may never send customer data to public model endpoints. The platform team instantiates this module once per region (eastus2 and swedencentral for EU data residency), each with public_network_access_enabled = false, a private endpoint into the shared services VNet, and local_auth_enabled = false so every call must carry an Entra ID token tied to a workload identity. They pin a gpt-4o GlobalStandard deployment at 50K TPM and a text-embedding-3-large deployment at 120K TPM, and the audit team gets exactly what they need: the model, version, capacity, and encryption posture all live in a reviewed Terraform PR rather than in someone’s portal session.

Best practices

TerraformAzureAzure OpenAIModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading