Quick take — A reusable hashicorp/azurerm module for Azure OpenAI: provision the Cognitive account, pin model deployments with quota, lock it down with a private endpoint and customer-managed keys, and emit the endpoint for downstream apps. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "azurerm" {
features {}
}
module "openai" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-openai?ref=v1.0.0"
name = "..." # Account name; 2-64 chars of letters, numbers, hyphens.
resource_group_name = "..." # Resource group for the account and private endpoint.
location = "..." # Region with Azure OpenAI + your model availability.
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
Azure OpenAI gives you the OpenAI model family — GPT-4o, GPT-4.1, text-embedding-3-large, o3 reasoning models — behind an Azure-native control plane, so you inherit Azure RBAC, private networking, customer-managed keys, regional data residency, and Microsoft’s enterprise data-handling commitments instead of calling api.openai.com directly. In Terraform there is no dedicated azurerm_openai_account; an Azure OpenAI resource is just an azurerm_cognitive_account with kind = "OpenAI", and every model you want to call is a separate azurerm_cognitive_deployment child resource that carries its own SKU and capacity (tokens-per-minute quota).
Wrapping this in a module matters because a correct production deployment is never a single resource. You almost always need: the account with custom_subdomain_name set (mandatory for Entra ID token auth and private endpoints), public_network_access_enabled = false, a system-assigned identity, a private endpoint into your spoke VNet, and a map of model deployments where capacity is the thing that actually causes 429s in production. Hand-rolling that per workload leads to drift — one team sets local_auth_enabled = true, another forgets the subdomain and breaks Private Link. This module makes the secure, quota-aware shape the default and exposes the few things that genuinely vary (location, SKU, which models, which subnet).
When to use it
- You are giving applications GPT-4o / GPT-4.1 / embeddings access and need keyless Entra ID auth plus a stable endpoint URL injected into app settings or Key Vault.
- You must keep inference traffic off the public internet — a
Privatenetworking posture with a private endpoint andpublic_network_access_enabled = falseis a hard requirement. - You manage per-model TPM quota deliberately (e.g. 30K TPM for a chat model, 120K TPM for embeddings) and want capacity expressed as code, not clicked in the portal.
- You run multiple environments or regions and want identical, policy-compliant Azure OpenAI accounts from one reviewed module.
- Skip it for throwaway prototypes where one
S0account with default public access is fine — the private-endpoint plumbing is overhead you don’t need yet.
Module structure
terraform-module-azure-openai/
├── versions.tf # provider pins
├── main.tf # cognitive_account (OpenAI) + deployments + private endpoint + CMK
├── variables.tf # var-driven inputs with validation
└── outputs.tf # id, endpoint, deployment names, identity principal
# versions.tf
terraform {
required_version = ">= 1.6.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 4.0"
}
}
}
# main.tf
locals {
# Azure OpenAI requires a globally-unique custom subdomain for Entra ID auth
# and Private Link. Default it from the account name if not supplied.
custom_subdomain = coalesce(var.custom_subdomain_name, var.name)
enable_private_endpoint = var.public_network_access_enabled == false && var.private_endpoint_subnet_id != null
}
resource "azurerm_cognitive_account" "openai" {
name = var.name
resource_group_name = var.resource_group_name
location = var.location
kind = "OpenAI"
sku_name = var.sku_name
custom_subdomain_name = local.custom_subdomain
# Networking posture
public_network_access_enabled = var.public_network_access_enabled
outbound_network_access_restricted = var.outbound_network_access_restricted
# Prefer Entra ID (Azure AD) tokens; disabling local auth blocks api-key access.
local_auth_enabled = var.local_auth_enabled
dynamic "identity" {
for_each = var.identity_type == null ? [] : [1]
content {
type = var.identity_type
identity_ids = var.identity_type == "UserAssigned" || var.identity_type == "SystemAssigned, UserAssigned" ? var.user_assigned_identity_ids : null
}
}
# Optional IP/VNet allow-list applied when not fully private.
dynamic "network_acls" {
for_each = length(var.network_acls_ip_rules) > 0 || length(var.network_acls_virtual_network_subnet_ids) > 0 ? [1] : []
content {
default_action = var.network_acls_default_action
ip_rules = var.network_acls_ip_rules
dynamic "virtual_network_rules" {
for_each = var.network_acls_virtual_network_subnet_ids
content {
subnet_id = virtual_network_rules.value
}
}
}
}
# Customer-managed key encryption (requires a User/SystemAssigned identity with
# Key Vault Crypto permissions on the referenced key).
dynamic "customer_managed_key" {
for_each = var.customer_managed_key_id == null ? [] : [1]
content {
key_vault_key_id = var.customer_managed_key_id
identity_client_id = var.cmk_identity_client_id
}
}
tags = var.tags
}
# One azurerm_cognitive_deployment per model. capacity == TPM/1000 quota units.
resource "azurerm_cognitive_deployment" "this" {
for_each = var.model_deployments
name = each.key
cognitive_account_id = azurerm_cognitive_account.openai.id
# Block new requests once quota is exceeded instead of silently dynamic-routing.
rai_policy_name = each.value.rai_policy_name
version_upgrade_option = each.value.version_upgrade_option
model {
format = "OpenAI"
name = each.value.model_name
version = each.value.model_version
}
sku {
name = each.value.scale_type
capacity = each.value.capacity
}
}
# Private Link: created only when going fully private with a subnet supplied.
resource "azurerm_private_endpoint" "openai" {
count = local.enable_private_endpoint ? 1 : 0
name = "${var.name}-pe"
location = var.location
resource_group_name = var.resource_group_name
subnet_id = var.private_endpoint_subnet_id
private_service_connection {
name = "${var.name}-psc"
private_connection_resource_id = azurerm_cognitive_account.openai.id
subresource_names = ["account"]
is_manual_connection = false
}
dynamic "private_dns_zone_group" {
for_each = var.private_dns_zone_ids == null ? [] : [1]
content {
name = "default"
private_dns_zone_ids = var.private_dns_zone_ids
}
}
tags = var.tags
}
# variables.tf
variable "name" {
description = "Name of the Azure OpenAI (Cognitive) account. Lowercase letters, numbers and hyphens; 2-64 chars."
type = string
validation {
condition = can(regex("^[a-zA-Z0-9-]{2,64}$", var.name))
error_message = "name must be 2-64 characters of letters, numbers, or hyphens."
}
}
variable "resource_group_name" {
description = "Resource group that will contain the account and private endpoint."
type = string
}
variable "location" {
description = "Azure region. Must be a region where Azure OpenAI and your chosen models are available (e.g. eastus2, swedencentral)."
type = string
}
variable "sku_name" {
description = "Account SKU. S0 is the standard pay-as-you-go tier for Azure OpenAI."
type = string
default = "S0"
validation {
condition = contains(["S0"], var.sku_name)
error_message = "Azure OpenAI accounts currently support the S0 SKU."
}
}
variable "custom_subdomain_name" {
description = "Globally-unique custom subdomain. Required for Entra ID auth and Private Link. Defaults to var.name when null."
type = string
default = null
}
variable "public_network_access_enabled" {
description = "Whether the account is reachable over the public internet. Set false for Private Link deployments."
type = bool
default = false
}
variable "outbound_network_access_restricted" {
description = "Restrict outbound calls (e.g. for 'On Your Data') to approved FQDNs."
type = bool
default = false
}
variable "local_auth_enabled" {
description = "Allow api-key (local) auth in addition to Entra ID tokens. Disable to enforce keyless RBAC access."
type = bool
default = false
}
variable "identity_type" {
description = "Managed identity type: SystemAssigned, UserAssigned, 'SystemAssigned, UserAssigned', or null."
type = string
default = "SystemAssigned"
validation {
condition = var.identity_type == null || contains(["SystemAssigned", "UserAssigned", "SystemAssigned, UserAssigned"], var.identity_type)
error_message = "identity_type must be SystemAssigned, UserAssigned, 'SystemAssigned, UserAssigned', or null."
}
}
variable "user_assigned_identity_ids" {
description = "User-assigned identity resource IDs (required when identity_type includes UserAssigned)."
type = list(string)
default = []
}
variable "model_deployments" {
description = "Map of model deployments keyed by deployment name. capacity is TPM in thousands (e.g. 30 = 30K TPM)."
type = map(object({
model_name = string
model_version = string
scale_type = optional(string, "Standard")
capacity = optional(number, 30)
rai_policy_name = optional(string)
version_upgrade_option = optional(string, "OnceNewDefaultVersionAvailable")
}))
default = {}
validation {
condition = alltrue([
for d in values(var.model_deployments) :
contains(["Standard", "GlobalStandard", "DataZoneStandard", "ProvisionedManaged", "GlobalProvisionedManaged"], d.scale_type)
])
error_message = "scale_type must be one of Standard, GlobalStandard, DataZoneStandard, ProvisionedManaged, GlobalProvisionedManaged."
}
validation {
condition = alltrue([for d in values(var.model_deployments) : d.capacity >= 1])
error_message = "Each deployment capacity must be >= 1 (thousand TPM / PTU units)."
}
}
variable "network_acls_default_action" {
description = "Default action for the network ACL when IP/VNet rules are supplied (Allow or Deny)."
type = string
default = "Deny"
validation {
condition = contains(["Allow", "Deny"], var.network_acls_default_action)
error_message = "network_acls_default_action must be Allow or Deny."
}
}
variable "network_acls_ip_rules" {
description = "List of public IPs/CIDRs allowed when the account is not fully private."
type = list(string)
default = []
}
variable "network_acls_virtual_network_subnet_ids" {
description = "Subnet IDs allowed via service endpoints (alternative to a private endpoint)."
type = list(string)
default = []
}
variable "private_endpoint_subnet_id" {
description = "Subnet ID for the private endpoint. When set with public access disabled, a private endpoint is created."
type = string
default = null
}
variable "private_dns_zone_ids" {
description = "Private DNS zone IDs (typically privatelink.openai.azure.com) to register the private endpoint A record."
type = list(string)
default = null
}
variable "customer_managed_key_id" {
description = "Key Vault key ID for customer-managed encryption. Null uses Microsoft-managed keys."
type = string
default = null
}
variable "cmk_identity_client_id" {
description = "Client ID of the identity used to access the CMK (required when customer_managed_key_id is set with a user-assigned identity)."
type = string
default = null
}
variable "tags" {
description = "Tags applied to all resources."
type = map(string)
default = {}
}
# outputs.tf
output "id" {
description = "Resource ID of the Azure OpenAI (Cognitive) account."
value = azurerm_cognitive_account.openai.id
}
output "name" {
description = "Name of the Azure OpenAI account."
value = azurerm_cognitive_account.openai.name
}
output "endpoint" {
description = "Base endpoint URL (https://<subdomain>.openai.azure.com/) used by the OpenAI SDK / REST calls."
value = azurerm_cognitive_account.openai.endpoint
}
output "custom_subdomain_name" {
description = "The custom subdomain in use (drives the endpoint host and Private Link DNS record)."
value = azurerm_cognitive_account.openai.custom_subdomain_name
}
output "primary_access_key" {
description = "Primary api-key. Empty when local_auth_enabled = false; prefer Entra ID tokens."
value = azurerm_cognitive_account.openai.primary_access_key
sensitive = true
}
output "identity_principal_id" {
description = "Principal ID of the system-assigned identity, for granting it RBAC (e.g. Key Vault access)."
value = try(azurerm_cognitive_account.openai.identity[0].principal_id, null)
}
output "deployment_names" {
description = "Map of deployment key => deployment name, for passing model/deployment IDs to applications."
value = { for k, d in azurerm_cognitive_deployment.this : k => d.name }
}
output "private_endpoint_ip" {
description = "Private IP assigned to the account's private endpoint, if one was created."
value = try(azurerm_private_endpoint.openai[0].private_service_connection[0].private_ip_address, null)
}
How to use it
module "azure_openai" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-openai?ref=v1.0.0"
name = "kv-openai-prod-eus2"
resource_group_name = azurerm_resource_group.ai.name
location = "eastus2"
sku_name = "S0"
# Keyless, fully private posture
local_auth_enabled = false
public_network_access_enabled = false
identity_type = "SystemAssigned"
private_endpoint_subnet_id = azurerm_subnet.privatelink.id
private_dns_zone_ids = [azurerm_private_dns_zone.openai.id]
model_deployments = {
"gpt-4o" = {
model_name = "gpt-4o"
model_version = "2024-11-20"
scale_type = "GlobalStandard"
capacity = 50 # 50K TPM
}
"text-embedding-3-large" = {
model_name = "text-embedding-3-large"
model_version = "1"
scale_type = "Standard"
capacity = 120 # 120K TPM for bulk embedding jobs
}
}
tags = {
workload = "rag-assistant"
environment = "prod"
owner = "platform-ai"
}
}
# Downstream: feed the endpoint + deployment name into the app's settings,
# and grant the app's identity the data-plane role for keyless calls.
resource "azurerm_linux_web_app" "assistant" {
name = "kv-rag-assistant-prod"
resource_group_name = azurerm_resource_group.ai.name
location = "eastus2"
service_plan_id = azurerm_service_plan.app.id
site_config {}
app_settings = {
"AZURE_OPENAI_ENDPOINT" = module.azure_openai.endpoint
"AZURE_OPENAI_CHAT_DEPLOYMENT" = module.azure_openai.deployment_names["gpt-4o"]
"AZURE_OPENAI_EMBED_DEPLOYMENT" = module.azure_openai.deployment_names["text-embedding-3-large"]
}
identity {
type = "SystemAssigned"
}
}
resource "azurerm_role_assignment" "app_can_call_openai" {
scope = module.azure_openai.id
role_definition_name = "Cognitive Services OpenAI User"
principal_id = azurerm_linux_web_app.assistant.identity[0].principal_id
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "azurerm"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...azurerm state bucket/container + key per path...
}
}
2. Module config — live/prod/openai/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-openai?ref=v1.0.0"
}
inputs = {
name = "..."
resource_group_name = "..."
location = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/openai && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
| name | string | — | yes | Account name; 2-64 chars of letters, numbers, hyphens. |
| resource_group_name | string | — | yes | Resource group for the account and private endpoint. |
| location | string | — | yes | Region with Azure OpenAI + your model availability. |
| sku_name | string | “S0” | no | Account SKU (S0). |
| custom_subdomain_name | string | null | no | Globally-unique subdomain; required for Entra ID + Private Link. Defaults to name. |
| public_network_access_enabled | bool | false | no | Public internet reachability; false for Private Link. |
| outbound_network_access_restricted | bool | false | no | Restrict outbound (e.g. “On Your Data”) to approved FQDNs. |
| local_auth_enabled | bool | false | no | Allow api-key auth alongside Entra ID; false enforces keyless RBAC. |
| identity_type | string | “SystemAssigned” | no | Managed identity type, or null. |
| user_assigned_identity_ids | list(string) | [] | no | UAMI IDs when identity_type includes UserAssigned. |
| model_deployments | map(object) | {} | no | Model deployments keyed by name; capacity is TPM in thousands. |
| network_acls_default_action | string | “Deny” | no | Default ACL action when IP/VNet rules are set. |
| network_acls_ip_rules | list(string) | [] | no | Allowed public IPs/CIDRs when not fully private. |
| network_acls_virtual_network_subnet_ids | list(string) | [] | no | Subnet IDs allowed via service endpoints. |
| private_endpoint_subnet_id | string | null | no | Subnet for the private endpoint (with public access off). |
| private_dns_zone_ids | list(string) | null | no | Private DNS zones (privatelink.openai.azure.com) for the PE record. |
| customer_managed_key_id | string | null | no | Key Vault key ID for CMK encryption; null = Microsoft-managed. |
| cmk_identity_client_id | string | null | no | Client ID of the identity accessing the CMK. |
| tags | map(string) | {} | no | Tags applied to all resources. |
Outputs
| Name | Description |
|---|---|
| id | Resource ID of the Azure OpenAI account. |
| name | Account name. |
| endpoint | Base endpoint URL (https://<subdomain>.openai.azure.com/) for SDK/REST calls. |
| custom_subdomain_name | Subdomain driving the endpoint host and Private Link DNS. |
| primary_access_key | Primary api-key (sensitive; empty when local auth is disabled). |
| identity_principal_id | System-assigned identity principal ID for RBAC grants. |
| deployment_names | Map of deployment key => deployment name for app configuration. |
| private_endpoint_ip | Private IP of the account’s private endpoint, if created. |
Enterprise scenario
A financial-services firm runs a regulated RAG copilot that may never send customer data to public model endpoints. The platform team instantiates this module once per region (eastus2 and swedencentral for EU data residency), each with public_network_access_enabled = false, a private endpoint into the shared services VNet, and local_auth_enabled = false so every call must carry an Entra ID token tied to a workload identity. They pin a gpt-4o GlobalStandard deployment at 50K TPM and a text-embedding-3-large deployment at 120K TPM, and the audit team gets exactly what they need: the model, version, capacity, and encryption posture all live in a reviewed Terraform PR rather than in someone’s portal session.
Best practices
- Go keyless. Set
local_auth_enabled = falseand grant the app’s identityCognitive Services OpenAI Useron the accountidoutput; this eliminates long-lived api keys and routes access through Entra ID and conditional-access policies. - Pin model name and version, and control upgrades. Treat
gpt-4o/2024-11-20like a dependency. Useversion_upgrade_option = "NoAutoUpgrade"for models where output drift would break evals, and let only non-critical deployments auto-upgrade. - Size capacity for the real bottleneck — TPM. Most production incidents on Azure OpenAI are HTTP 429s from exhausted tokens-per-minute, not outages. Set
capacityper deployment from observed load, preferGlobalStandardfor higher default quotas, and reserve PTUs (ProvisionedManaged) for latency-sensitive paths. - Make it private and resolve DNS correctly. With
public_network_access_enabled = false, always wireprivate_endpoint_subnet_idplus aprivatelink.openai.azure.comzone inprivate_dns_zone_ids; without the DNS zone group, clients resolve the public name and Private Link silently fails. - Encrypt with a customer-managed key for regulated data. Supply
customer_managed_key_idfrom a Key Vault with purge protection, grant the account identity Key Vault Crypto permissions, and you control key rotation and revocation independent of Microsoft. - Name and tag for region/quota traceability. Encode region and environment in
name(e.g.kv-openai-prod-eus2) and tagworkload/owner, so quota requests, cost reports, and the per-region account map stay legible as you scale to multiple deployments.