IaC Azure

Terraform Module: Azure Custom Role Definition — least-privilege RBAC roles as versioned code

Quick take — Build reusable Azure custom RBAC roles with Terraform and azurerm_role_definition: scoped Actions/DataActions, AssignableScopes, NotActions, and a stable role ID output for downstream assignments. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "custom_role_definition" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-custom-role-definition?ref=v1.0.0"

  name              = "..."           # Display name of the custom role; unique within the tena…
  description       = "..."           # Purpose and least-privilege rationale, shown in the por…
  assignable_scopes = ["...", "..."]  # Subscription / resource group / management group IDs wh…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Azure RBAC ships with hundreds of built-in roles, but they are coarse. Contributor lets a principal do almost anything except manage access; Reader lets it do nothing. Real teams sit in the gap — a deployment service principal that may restart App Services and read Key Vault metadata but must never delete a resource group, or an on-call engineer who can reboot VMs but cannot touch networking. A custom role definition is how you express exactly that: an explicit allow-list of control-plane Actions (and optional data-plane DataActions), minus a NotActions deny-list, constrained to the subscriptions or management groups named in AssignableScopes.

In Terraform the resource is azurerm_role_definition. On its own it is a handful of fiddly, security-sensitive fields where a single wildcard typo (Microsoft.Authorization/*/write) can hand out privilege escalation. Wrapping it in a module gives you one reviewed place to define the role, input validation that rejects an empty action list or a missing scope, and a stable role_definition_resource_id output that every azurerm_role_assignment in your estate can consume by reference instead of a copy-pasted GUID. The role is defined once, versioned via Git tags, and rolled out identically across dev, test, and prod.

When to use it

Module structure

terraform-module-azure-custom-role-definition/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

# A custom RBAC role is scoped to one or more AssignableScopes. The first
# scope listed is used as the resource's `scope`, which is where Azure stores
# the role definition object itself.
locals {
  primary_scope = var.assignable_scopes[0]

  # Normalise NotActions: an empty list is valid and means "deny nothing".
  not_actions      = var.not_actions
  not_data_actions = var.not_data_actions
}

resource "azurerm_role_definition" "this" {
  # Pinning the GUID makes the role definition stable across re-creates and
  # lets assignments reference it deterministically. When null, Azure
  # generates one (recommended unless you are importing an existing role).
  role_definition_id = var.role_definition_id

  name        = var.name
  scope       = local.primary_scope
  description = var.description

  permissions {
    actions          = var.actions
    not_actions      = local.not_actions
    data_actions     = var.data_actions
    not_data_actions = local.not_data_actions
  }

  assignable_scopes = var.assignable_scopes
}

variables.tf

variable "name" {
  description = "Display name of the custom role. Must be unique within the tenant."
  type        = string

  validation {
    condition     = length(trimspace(var.name)) > 0 && length(var.name) <= 512
    error_message = "name must be non-empty and 512 characters or fewer."
  }
}

variable "description" {
  description = "Human-readable description shown in the portal and `az role definition` output. State the intent and least-privilege rationale."
  type        = string

  validation {
    condition     = length(var.description) > 0
    error_message = "description is required so reviewers understand the role's purpose."
  }
}

variable "assignable_scopes" {
  description = "Scopes at which the role can be assigned (subscription, resource group, or management group IDs). The first entry is where the definition is stored."
  type        = list(string)

  validation {
    condition     = length(var.assignable_scopes) > 0
    error_message = "At least one assignable scope is required."
  }

  validation {
    condition = alltrue([
      for s in var.assignable_scopes :
      can(regex("^/(subscriptions|providers/Microsoft.Management/managementGroups)/", s))
    ])
    error_message = "Each assignable scope must be a subscription, resource group, or management group resource ID."
  }
}

variable "actions" {
  description = "Control-plane operations the role allows, e.g. Microsoft.Compute/virtualMachines/restart/action."
  type        = list(string)
  default     = []
}

variable "not_actions" {
  description = "Control-plane operations subtracted from `actions` (the deny portion). Useful to allow a wildcard then carve out destructive operations."
  type        = list(string)
  default     = []
}

variable "data_actions" {
  description = "Data-plane operations the role allows, e.g. Microsoft.KeyVault/vaults/secrets/getSecret/action. Leave empty for control-plane-only roles."
  type        = list(string)
  default     = []
}

variable "not_data_actions" {
  description = "Data-plane operations subtracted from `data_actions`."
  type        = list(string)
  default     = []
}

variable "role_definition_id" {
  description = "Optional fixed GUID for the role definition. Leave null to let Azure generate one; set it when importing or pinning an existing role across tenants."
  type        = string
  default     = null

  validation {
    condition = (
      var.role_definition_id == null ||
      can(regex("^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$", coalesce(var.role_definition_id, "")))
    )
    error_message = "role_definition_id must be a valid GUID or null."
  }
}

variable "require_actions" {
  description = "Guardrail: when true, the role must grant at least one control-plane or data-plane action (rejects an all-deny role created by mistake)."
  type        = bool
  default     = true

  validation {
    condition     = can(tobool(var.require_actions))
    error_message = "require_actions must be a boolean."
  }
}

The combined “must have at least one action” guardrail is enforced with a check block below so it can read multiple variables at once:

check "role_grants_something" {
  assert {
    condition = (
      !var.require_actions ||
      length(var.actions) > 0 ||
      length(var.data_actions) > 0
    )
    error_message = "Role grants no actions. Set actions/data_actions, or set require_actions = false to allow an intentionally empty role."
  }
}

outputs.tf

output "role_definition_resource_id" {
  description = "The Azure resource ID of the role definition. Pass this to azurerm_role_assignment.role_definition_id."
  value       = azurerm_role_definition.this.role_definition_resource_id
}

output "role_definition_id" {
  description = "The GUID of the role definition (the bare definition ID, without the scope prefix)."
  value       = azurerm_role_definition.this.role_definition_id
}

output "name" {
  description = "Display name of the custom role."
  value       = azurerm_role_definition.this.name
}

output "assignable_scopes" {
  description = "Scopes at which the role can be assigned."
  value       = azurerm_role_definition.this.assignable_scopes
}

How to use it

This example defines a tightly scoped operator role — it may start, restart, and read VMs but explicitly cannot delete them or write network configuration — and then assigns it to an on-call group using the module’s output.

data "azurerm_subscription" "current" {}

data "azuread_group" "oncall" {
  display_name = "vm-oncall-operators"
}

module "custom_role_definition" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-custom-role-definition?ref=v1.0.0"

  name        = "KloudVin VM Operator"
  description = "Start/restart and read VMs for on-call; no delete, no network writes. Least-privilege per runbook RB-204."

  assignable_scopes = [
    data.azurerm_subscription.current.id,
  ]

  actions = [
    "Microsoft.Compute/virtualMachines/read",
    "Microsoft.Compute/virtualMachines/start/action",
    "Microsoft.Compute/virtualMachines/restart/action",
    "Microsoft.Compute/virtualMachines/instanceView/read",
    "Microsoft.Insights/alertRules/read",
    "Microsoft.Resources/subscriptions/resourceGroups/read",
  ]

  not_actions = [
    "Microsoft.Compute/virtualMachines/delete",
  ]
}

# Downstream: consume role_definition_resource_id to bind the role to the group.
resource "azurerm_role_assignment" "oncall_vm_operator" {
  scope              = data.azurerm_subscription.current.id
  role_definition_id = module.custom_role_definition.role_definition_resource_id
  principal_id       = data.azuread_group.oncall.object_id

  description = "On-call operators receive the custom VM Operator role."
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module configlive/prod/custom_role_definition/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-custom-role-definition?ref=v1.0.0"
}

inputs = {
  name = "..."
  description = "..."
  assignable_scopes = ["...", "..."]
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/custom_role_definition && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
name string n/a Yes Display name of the custom role; unique within the tenant (≤ 512 chars).
description string n/a Yes Purpose and least-privilege rationale, shown in the portal and CLI.
assignable_scopes list(string) n/a Yes Subscription / resource group / management group IDs where the role may be assigned. First entry stores the definition.
actions list(string) [] No Allowed control-plane operations (e.g. .../restart/action).
not_actions list(string) [] No Control-plane operations subtracted from actions.
data_actions list(string) [] No Allowed data-plane operations (e.g. Key Vault getSecret/action).
not_data_actions list(string) [] No Data-plane operations subtracted from data_actions.
role_definition_id string null No Fixed GUID for the definition; null lets Azure generate one. Set when importing/pinning.
require_actions bool true No Guardrail that rejects a role granting no actions; set false for an intentionally empty role.

Outputs

Name Description
role_definition_resource_id Full Azure resource ID of the role definition; feed to azurerm_role_assignment.role_definition_id.
role_definition_id Bare GUID of the role definition (no scope prefix).
name Display name of the custom role.
assignable_scopes Scopes at which the role can be assigned.

Enterprise scenario

A managed-services provider runs 40 customer subscriptions under one management group and must give its NOC team uniform break-fix rights without ever exposing access management or data deletion. They define a single KloudVin Platform Operator role from this module with assignable_scopes pinned to the management group, then call the module once and fan out azurerm_role_assignment resources per subscription using role_definition_resource_id. When a new Azure resource provider needs to be supported, one PR adds the action, bumps the module tag to v1.1.0, and the change rolls out everywhere on the next terraform apply — with the full diff captured in code review for the customer’s audit evidence.

Best practices

TerraformAzureCustom Role DefinitionModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading