Terraform Module: Azure Azure OpenAI — governed model deployments with private networking

Quick take — A reusable hashicorp/azurerm module for Azure OpenAI: provision the Cognitive account, pin model deployments with quota, lock it down with a private endpoint and customer-managed keys, and emit the endpoint for downstream apps. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "openai" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-openai?ref=v1.0.0"

  name                = "..."  # Account name; 2-64 chars of letters, numbers, hyphens.
  resource_group_name = "..."  # Resource group for the account and private endpoint.
  location            = "..."  # Region with Azure OpenAI + your model availability.
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Azure OpenAI gives you the OpenAI model family — GPT-4o, GPT-4.1, text-embedding-3-large, o3 reasoning models — behind an Azure-native control plane, so you inherit Azure RBAC, private networking, customer-managed keys, regional data residency, and Microsoft’s enterprise data-handling commitments instead of calling api.openai.com directly. In Terraform there is no dedicated azurerm_openai_account; an Azure OpenAI resource is just an azurerm_cognitive_account with kind = "OpenAI", and every model you want to call is a separate azurerm_cognitive_deployment child resource that carries its own SKU and capacity (tokens-per-minute quota).

Wrapping this in a module matters because a correct production deployment is never a single resource. You almost always need: the account with custom_subdomain_name set (mandatory for Entra ID token auth and private endpoints), public_network_access_enabled = false, a system-assigned identity, a private endpoint into your spoke VNet, and a map of model deployments where capacity is the thing that actually causes 429s in production. Hand-rolling that per workload leads to drift — one team sets local_auth_enabled = true, another forgets the subdomain and breaks Private Link. This module makes the secure, quota-aware shape the default and exposes the few things that genuinely vary (location, SKU, which models, which subnet).

When to use it

You are giving applications GPT-4o / GPT-4.1 / embeddings access and need keyless Entra ID auth plus a stable endpoint URL injected into app settings or Key Vault.
You must keep inference traffic off the public internet — a Private networking posture with a private endpoint and public_network_access_enabled = false is a hard requirement.
You manage per-model TPM quota deliberately (e.g. 30K TPM for a chat model, 120K TPM for embeddings) and want capacity expressed as code, not clicked in the portal.
You run multiple environments or regions and want identical, policy-compliant Azure OpenAI accounts from one reviewed module.
Skip it for throwaway prototypes where one S0 account with default public access is fine — the private-endpoint plumbing is overhead you don’t need yet.

Module structure

terraform-module-azure-openai/
├── versions.tf      # provider pins
├── main.tf          # cognitive_account (OpenAI) + deployments + private endpoint + CMK
├── variables.tf     # var-driven inputs with validation
└── outputs.tf       # id, endpoint, deployment names, identity principal

# versions.tf
terraform {
  required_version = ">= 1.6.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

# main.tf

locals {
  # Azure OpenAI requires a globally-unique custom subdomain for Entra ID auth
  # and Private Link. Default it from the account name if not supplied.
  custom_subdomain = coalesce(var.custom_subdomain_name, var.name)

  enable_private_endpoint = var.public_network_access_enabled == false && var.private_endpoint_subnet_id != null
}

resource "azurerm_cognitive_account" "openai" {
  name                  = var.name
  resource_group_name   = var.resource_group_name
  location              = var.location
  kind                  = "OpenAI"
  sku_name              = var.sku_name
  custom_subdomain_name = local.custom_subdomain

  # Networking posture
  public_network_access_enabled = var.public_network_access_enabled
  outbound_network_access_restricted = var.outbound_network_access_restricted

  # Prefer Entra ID (Azure AD) tokens; disabling local auth blocks api-key access.
  local_auth_enabled = var.local_auth_enabled

  dynamic "identity" {
    for_each = var.identity_type == null ? [] : [1]
    content {
      type         = var.identity_type
      identity_ids = var.identity_type == "UserAssigned" || var.identity_type == "SystemAssigned, UserAssigned" ? var.user_assigned_identity_ids : null
    }
  }

  # Optional IP/VNet allow-list applied when not fully private.
  dynamic "network_acls" {
    for_each = length(var.network_acls_ip_rules) > 0 || length(var.network_acls_virtual_network_subnet_ids) > 0 ? [1] : []
    content {
      default_action = var.network_acls_default_action
      ip_rules       = var.network_acls_ip_rules

      dynamic "virtual_network_rules" {
        for_each = var.network_acls_virtual_network_subnet_ids
        content {
          subnet_id = virtual_network_rules.value
        }
      }
    }
  }

  # Customer-managed key encryption (requires a User/SystemAssigned identity with
  # Key Vault Crypto permissions on the referenced key).
  dynamic "customer_managed_key" {
    for_each = var.customer_managed_key_id == null ? [] : [1]
    content {
      key_vault_key_id   = var.customer_managed_key_id
      identity_client_id = var.cmk_identity_client_id
    }
  }

  tags = var.tags
}

# One azurerm_cognitive_deployment per model. capacity == TPM/1000 quota units.
resource "azurerm_cognitive_deployment" "this" {
  for_each = var.model_deployments

  name                 = each.key
  cognitive_account_id = azurerm_cognitive_account.openai.id

  # Block new requests once quota is exceeded instead of silently dynamic-routing.
  rai_policy_name      = each.value.rai_policy_name
  version_upgrade_option = each.value.version_upgrade_option

  model {
    format  = "OpenAI"
    name    = each.value.model_name
    version = each.value.model_version
  }

  sku {
    name     = each.value.scale_type
    capacity = each.value.capacity
  }
}

# Private Link: created only when going fully private with a subnet supplied.
resource "azurerm_private_endpoint" "openai" {
  count = local.enable_private_endpoint ? 1 : 0

  name                = "${var.name}-pe"
  location            = var.location
  resource_group_name = var.resource_group_name
  subnet_id           = var.private_endpoint_subnet_id

  private_service_connection {
    name                           = "${var.name}-psc"
    private_connection_resource_id = azurerm_cognitive_account.openai.id
    subresource_names              = ["account"]
    is_manual_connection           = false
  }

  dynamic "private_dns_zone_group" {
    for_each = var.private_dns_zone_ids == null ? [] : [1]
    content {
      name                 = "default"
      private_dns_zone_ids = var.private_dns_zone_ids
    }
  }

  tags = var.tags
}

# variables.tf

variable "name" {
  description = "Name of the Azure OpenAI (Cognitive) account. Lowercase letters, numbers and hyphens; 2-64 chars."
  type        = string

  validation {
    condition     = can(regex("^[a-zA-Z0-9-]{2,64}$", var.name))
    error_message = "name must be 2-64 characters of letters, numbers, or hyphens."
  }
}

variable "resource_group_name" {
  description = "Resource group that will contain the account and private endpoint."
  type        = string
}

variable "location" {
  description = "Azure region. Must be a region where Azure OpenAI and your chosen models are available (e.g. eastus2, swedencentral)."
  type        = string
}

variable "sku_name" {
  description = "Account SKU. S0 is the standard pay-as-you-go tier for Azure OpenAI."
  type        = string
  default     = "S0"

  validation {
    condition     = contains(["S0"], var.sku_name)
    error_message = "Azure OpenAI accounts currently support the S0 SKU."
  }
}

variable "custom_subdomain_name" {
  description = "Globally-unique custom subdomain. Required for Entra ID auth and Private Link. Defaults to var.name when null."
  type        = string
  default     = null
}

variable "public_network_access_enabled" {
  description = "Whether the account is reachable over the public internet. Set false for Private Link deployments."
  type        = bool
  default     = false
}

variable "outbound_network_access_restricted" {
  description = "Restrict outbound calls (e.g. for 'On Your Data') to approved FQDNs."
  type        = bool
  default     = false
}

variable "local_auth_enabled" {
  description = "Allow api-key (local) auth in addition to Entra ID tokens. Disable to enforce keyless RBAC access."
  type        = bool
  default     = false
}

variable "identity_type" {
  description = "Managed identity type: SystemAssigned, UserAssigned, 'SystemAssigned, UserAssigned', or null."
  type        = string
  default     = "SystemAssigned"

  validation {
    condition     = var.identity_type == null || contains(["SystemAssigned", "UserAssigned", "SystemAssigned, UserAssigned"], var.identity_type)
    error_message = "identity_type must be SystemAssigned, UserAssigned, 'SystemAssigned, UserAssigned', or null."
  }
}

variable "user_assigned_identity_ids" {
  description = "User-assigned identity resource IDs (required when identity_type includes UserAssigned)."
  type        = list(string)
  default     = []
}

variable "model_deployments" {
  description = "Map of model deployments keyed by deployment name. capacity is TPM in thousands (e.g. 30 = 30K TPM)."
  type = map(object({
    model_name             = string
    model_version          = string
    scale_type             = optional(string, "Standard")
    capacity               = optional(number, 30)
    rai_policy_name        = optional(string)
    version_upgrade_option = optional(string, "OnceNewDefaultVersionAvailable")
  }))
  default = {}

  validation {
    condition = alltrue([
      for d in values(var.model_deployments) :
      contains(["Standard", "GlobalStandard", "DataZoneStandard", "ProvisionedManaged", "GlobalProvisionedManaged"], d.scale_type)
    ])
    error_message = "scale_type must be one of Standard, GlobalStandard, DataZoneStandard, ProvisionedManaged, GlobalProvisionedManaged."
  }

  validation {
    condition     = alltrue([for d in values(var.model_deployments) : d.capacity >= 1])
    error_message = "Each deployment capacity must be >= 1 (thousand TPM / PTU units)."
  }
}

variable "network_acls_default_action" {
  description = "Default action for the network ACL when IP/VNet rules are supplied (Allow or Deny)."
  type        = string
  default     = "Deny"

  validation {
    condition     = contains(["Allow", "Deny"], var.network_acls_default_action)
    error_message = "network_acls_default_action must be Allow or Deny."
  }
}

variable "network_acls_ip_rules" {
  description = "List of public IPs/CIDRs allowed when the account is not fully private."
  type        = list(string)
  default     = []
}

variable "network_acls_virtual_network_subnet_ids" {
  description = "Subnet IDs allowed via service endpoints (alternative to a private endpoint)."
  type        = list(string)
  default     = []
}

variable "private_endpoint_subnet_id" {
  description = "Subnet ID for the private endpoint. When set with public access disabled, a private endpoint is created."
  type        = string
  default     = null
}

variable "private_dns_zone_ids" {
  description = "Private DNS zone IDs (typically privatelink.openai.azure.com) to register the private endpoint A record."
  type        = list(string)
  default     = null
}

variable "customer_managed_key_id" {
  description = "Key Vault key ID for customer-managed encryption. Null uses Microsoft-managed keys."
  type        = string
  default     = null
}

variable "cmk_identity_client_id" {
  description = "Client ID of the identity used to access the CMK (required when customer_managed_key_id is set with a user-assigned identity)."
  type        = string
  default     = null
}

variable "tags" {
  description = "Tags applied to all resources."
  type        = map(string)
  default     = {}
}

# outputs.tf

output "id" {
  description = "Resource ID of the Azure OpenAI (Cognitive) account."
  value       = azurerm_cognitive_account.openai.id
}

output "name" {
  description = "Name of the Azure OpenAI account."
  value       = azurerm_cognitive_account.openai.name
}

output "endpoint" {
  description = "Base endpoint URL (https://<subdomain>.openai.azure.com/) used by the OpenAI SDK / REST calls."
  value       = azurerm_cognitive_account.openai.endpoint
}

output "custom_subdomain_name" {
  description = "The custom subdomain in use (drives the endpoint host and Private Link DNS record)."
  value       = azurerm_cognitive_account.openai.custom_subdomain_name
}

output "primary_access_key" {
  description = "Primary api-key. Empty when local_auth_enabled = false; prefer Entra ID tokens."
  value       = azurerm_cognitive_account.openai.primary_access_key
  sensitive   = true
}

output "identity_principal_id" {
  description = "Principal ID of the system-assigned identity, for granting it RBAC (e.g. Key Vault access)."
  value       = try(azurerm_cognitive_account.openai.identity[0].principal_id, null)
}

output "deployment_names" {
  description = "Map of deployment key => deployment name, for passing model/deployment IDs to applications."
  value       = { for k, d in azurerm_cognitive_deployment.this : k => d.name }
}

output "private_endpoint_ip" {
  description = "Private IP assigned to the account's private endpoint, if one was created."
  value       = try(azurerm_private_endpoint.openai[0].private_service_connection[0].private_ip_address, null)
}

How to use it

module "azure_openai" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-openai?ref=v1.0.0"

  name                = "kv-openai-prod-eus2"
  resource_group_name = azurerm_resource_group.ai.name
  location            = "eastus2"
  sku_name            = "S0"

  # Keyless, fully private posture
  local_auth_enabled            = false
  public_network_access_enabled = false
  identity_type                 = "SystemAssigned"

  private_endpoint_subnet_id = azurerm_subnet.privatelink.id
  private_dns_zone_ids       = [azurerm_private_dns_zone.openai.id]

  model_deployments = {
    "gpt-4o" = {
      model_name    = "gpt-4o"
      model_version = "2024-11-20"
      scale_type    = "GlobalStandard"
      capacity      = 50 # 50K TPM
    }
    "text-embedding-3-large" = {
      model_name    = "text-embedding-3-large"
      model_version = "1"
      scale_type    = "Standard"
      capacity      = 120 # 120K TPM for bulk embedding jobs
    }
  }

  tags = {
    workload    = "rag-assistant"
    environment = "prod"
    owner       = "platform-ai"
  }
}

# Downstream: feed the endpoint + deployment name into the app's settings,
# and grant the app's identity the data-plane role for keyless calls.
resource "azurerm_linux_web_app" "assistant" {
  name                = "kv-rag-assistant-prod"
  resource_group_name = azurerm_resource_group.ai.name
  location            = "eastus2"
  service_plan_id     = azurerm_service_plan.app.id

  site_config {}

  app_settings = {
    "AZURE_OPENAI_ENDPOINT"        = module.azure_openai.endpoint
    "AZURE_OPENAI_CHAT_DEPLOYMENT" = module.azure_openai.deployment_names["gpt-4o"]
    "AZURE_OPENAI_EMBED_DEPLOYMENT" = module.azure_openai.deployment_names["text-embedding-3-large"]
  }

  identity {
    type = "SystemAssigned"
  }
}

resource "azurerm_role_assignment" "app_can_call_openai" {
  scope                = module.azure_openai.id
  role_definition_name = "Cognitive Services OpenAI User"
  principal_id         = azurerm_linux_web_app.assistant.identity[0].principal_id
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module config — live/prod/openai/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-openai?ref=v1.0.0"
}

inputs = {
  name = "..."
  resource_group_name = "..."
  location = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/openai && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
name	string	—	yes	Account name; 2-64 chars of letters, numbers, hyphens.
resource_group_name	string	—	yes	Resource group for the account and private endpoint.
location	string	—	yes	Region with Azure OpenAI + your model availability.
sku_name	string	“S0”	no	Account SKU (S0).
custom_subdomain_name	string	null	no	Globally-unique subdomain; required for Entra ID + Private Link. Defaults to name.
public_network_access_enabled	bool	false	no	Public internet reachability; false for Private Link.
outbound_network_access_restricted	bool	false	no	Restrict outbound (e.g. “On Your Data”) to approved FQDNs.
local_auth_enabled	bool	false	no	Allow api-key auth alongside Entra ID; false enforces keyless RBAC.
identity_type	string	“SystemAssigned”	no	Managed identity type, or null.
user_assigned_identity_ids	list(string)	[]	no	UAMI IDs when identity_type includes UserAssigned.
model_deployments	map(object)	{}	no	Model deployments keyed by name; capacity is TPM in thousands.
network_acls_default_action	string	“Deny”	no	Default ACL action when IP/VNet rules are set.
network_acls_ip_rules	list(string)	[]	no	Allowed public IPs/CIDRs when not fully private.
network_acls_virtual_network_subnet_ids	list(string)	[]	no	Subnet IDs allowed via service endpoints.
private_endpoint_subnet_id	string	null	no	Subnet for the private endpoint (with public access off).
private_dns_zone_ids	list(string)	null	no	Private DNS zones (privatelink.openai.azure.com) for the PE record.
customer_managed_key_id	string	null	no	Key Vault key ID for CMK encryption; null = Microsoft-managed.
cmk_identity_client_id	string	null	no	Client ID of the identity accessing the CMK.
tags	map(string)	{}	no	Tags applied to all resources.

Outputs

Name	Description
id	Resource ID of the Azure OpenAI account.
name	Account name.
endpoint	Base endpoint URL (https://<subdomain>.openai.azure.com/) for SDK/REST calls.
custom_subdomain_name	Subdomain driving the endpoint host and Private Link DNS.
primary_access_key	Primary api-key (sensitive; empty when local auth is disabled).
identity_principal_id	System-assigned identity principal ID for RBAC grants.
deployment_names	Map of deployment key => deployment name for app configuration.
private_endpoint_ip	Private IP of the account’s private endpoint, if created.

Enterprise scenario

A financial-services firm runs a regulated RAG copilot that may never send customer data to public model endpoints. The platform team instantiates this module once per region (eastus2 and swedencentral for EU data residency), each with public_network_access_enabled = false, a private endpoint into the shared services VNet, and local_auth_enabled = false so every call must carry an Entra ID token tied to a workload identity. They pin a gpt-4o GlobalStandard deployment at 50K TPM and a text-embedding-3-large deployment at 120K TPM, and the audit team gets exactly what they need: the model, version, capacity, and encryption posture all live in a reviewed Terraform PR rather than in someone’s portal session.

Best practices

Go keyless. Set local_auth_enabled = false and grant the app’s identity Cognitive Services OpenAI User on the account id output; this eliminates long-lived api keys and routes access through Entra ID and conditional-access policies.
Pin model name and version, and control upgrades. Treat gpt-4o/2024-11-20 like a dependency. Use version_upgrade_option = "NoAutoUpgrade" for models where output drift would break evals, and let only non-critical deployments auto-upgrade.
Size capacity for the real bottleneck — TPM. Most production incidents on Azure OpenAI are HTTP 429s from exhausted tokens-per-minute, not outages. Set capacity per deployment from observed load, prefer GlobalStandard for higher default quotas, and reserve PTUs (ProvisionedManaged) for latency-sensitive paths.
Make it private and resolve DNS correctly. With public_network_access_enabled = false, always wire private_endpoint_subnet_id plus a privatelink.openai.azure.com zone in private_dns_zone_ids; without the DNS zone group, clients resolve the public name and Private Link silently fails.
Encrypt with a customer-managed key for regulated data. Supply customer_managed_key_id from a Key Vault with purge protection, grant the account identity Key Vault Crypto permissions, and you control key rotation and revocation independent of Microsoft.
Name and tag for region/quota traceability. Encode region and environment in name (e.g. kv-openai-prod-eus2) and tag workload/owner, so quota requests, cost reports, and the per-region account map stay legible as you scale to multiple deployments.

Terraform Module: Azure Azure OpenAI — governed model deployments with private networking

Quickstart (copy-paste)

What this module is

When to use it

Module structure

How to use it

With Terragrunt

Inputs

Outputs

Enterprise scenario

Best practices

Written by Vinod

Comments

Keep Reading

The Terraform Architecting Ladder: From a Single Module to an Enterprise IaC Platform

HashiCorp Terraform Associate (003) Prep Kit: Objectives, Practice Questions & Cheat Sheet

Terraform Fundamentals: HCL, Providers, State & the Core Workflow