Quick take — Build reusable Azure custom RBAC roles with Terraform and azurerm_role_definition: scoped Actions/DataActions, AssignableScopes, NotActions, and a stable role ID output for downstream assignments. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "azurerm" {
features {}
}
module "custom_role_definition" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-custom-role-definition?ref=v1.0.0"
name = "..." # Display name of the custom role; unique within the tena…
description = "..." # Purpose and least-privilege rationale, shown in the por…
assignable_scopes = ["...", "..."] # Subscription / resource group / management group IDs wh…
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
Azure RBAC ships with hundreds of built-in roles, but they are coarse. Contributor lets a principal do almost anything except manage access; Reader lets it do nothing. Real teams sit in the gap — a deployment service principal that may restart App Services and read Key Vault metadata but must never delete a resource group, or an on-call engineer who can reboot VMs but cannot touch networking. A custom role definition is how you express exactly that: an explicit allow-list of control-plane Actions (and optional data-plane DataActions), minus a NotActions deny-list, constrained to the subscriptions or management groups named in AssignableScopes.
In Terraform the resource is azurerm_role_definition. On its own it is a handful of fiddly, security-sensitive fields where a single wildcard typo (Microsoft.Authorization/*/write) can hand out privilege escalation. Wrapping it in a module gives you one reviewed place to define the role, input validation that rejects an empty action list or a missing scope, and a stable role_definition_resource_id output that every azurerm_role_assignment in your estate can consume by reference instead of a copy-pasted GUID. The role is defined once, versioned via Git tags, and rolled out identically across dev, test, and prod.
When to use it
- A built-in role grants too much (you want
Virtual Machine Contributorminus the ability to delete VMs) or too little (Reader plus the onerestartaction a runbook needs). - You need the same role in many subscriptions and want a single source of truth rather than drift between hand-clicked roles in each subscription.
- A pipeline / service principal needs a narrow, auditable permission set and you want that set reviewed in a pull request, not edited live in the portal.
- You are enforcing least privilege for a landing zone and want roles pinned in
AssignableScopesto a management group so they cannot be assigned where they don’t belong. - Skip a custom role when a built-in role already fits — custom roles count against the 5,000-per-tenant limit and add maintenance burden, so prefer built-ins where they match.
Module structure
terraform-module-azure-custom-role-definition/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 4.0"
}
}
}
main.tf
# A custom RBAC role is scoped to one or more AssignableScopes. The first
# scope listed is used as the resource's `scope`, which is where Azure stores
# the role definition object itself.
locals {
primary_scope = var.assignable_scopes[0]
# Normalise NotActions: an empty list is valid and means "deny nothing".
not_actions = var.not_actions
not_data_actions = var.not_data_actions
}
resource "azurerm_role_definition" "this" {
# Pinning the GUID makes the role definition stable across re-creates and
# lets assignments reference it deterministically. When null, Azure
# generates one (recommended unless you are importing an existing role).
role_definition_id = var.role_definition_id
name = var.name
scope = local.primary_scope
description = var.description
permissions {
actions = var.actions
not_actions = local.not_actions
data_actions = var.data_actions
not_data_actions = local.not_data_actions
}
assignable_scopes = var.assignable_scopes
}
variables.tf
variable "name" {
description = "Display name of the custom role. Must be unique within the tenant."
type = string
validation {
condition = length(trimspace(var.name)) > 0 && length(var.name) <= 512
error_message = "name must be non-empty and 512 characters or fewer."
}
}
variable "description" {
description = "Human-readable description shown in the portal and `az role definition` output. State the intent and least-privilege rationale."
type = string
validation {
condition = length(var.description) > 0
error_message = "description is required so reviewers understand the role's purpose."
}
}
variable "assignable_scopes" {
description = "Scopes at which the role can be assigned (subscription, resource group, or management group IDs). The first entry is where the definition is stored."
type = list(string)
validation {
condition = length(var.assignable_scopes) > 0
error_message = "At least one assignable scope is required."
}
validation {
condition = alltrue([
for s in var.assignable_scopes :
can(regex("^/(subscriptions|providers/Microsoft.Management/managementGroups)/", s))
])
error_message = "Each assignable scope must be a subscription, resource group, or management group resource ID."
}
}
variable "actions" {
description = "Control-plane operations the role allows, e.g. Microsoft.Compute/virtualMachines/restart/action."
type = list(string)
default = []
}
variable "not_actions" {
description = "Control-plane operations subtracted from `actions` (the deny portion). Useful to allow a wildcard then carve out destructive operations."
type = list(string)
default = []
}
variable "data_actions" {
description = "Data-plane operations the role allows, e.g. Microsoft.KeyVault/vaults/secrets/getSecret/action. Leave empty for control-plane-only roles."
type = list(string)
default = []
}
variable "not_data_actions" {
description = "Data-plane operations subtracted from `data_actions`."
type = list(string)
default = []
}
variable "role_definition_id" {
description = "Optional fixed GUID for the role definition. Leave null to let Azure generate one; set it when importing or pinning an existing role across tenants."
type = string
default = null
validation {
condition = (
var.role_definition_id == null ||
can(regex("^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$", coalesce(var.role_definition_id, "")))
)
error_message = "role_definition_id must be a valid GUID or null."
}
}
variable "require_actions" {
description = "Guardrail: when true, the role must grant at least one control-plane or data-plane action (rejects an all-deny role created by mistake)."
type = bool
default = true
validation {
condition = can(tobool(var.require_actions))
error_message = "require_actions must be a boolean."
}
}
The combined “must have at least one action” guardrail is enforced with a
checkblock below so it can read multiple variables at once:
check "role_grants_something" {
assert {
condition = (
!var.require_actions ||
length(var.actions) > 0 ||
length(var.data_actions) > 0
)
error_message = "Role grants no actions. Set actions/data_actions, or set require_actions = false to allow an intentionally empty role."
}
}
outputs.tf
output "role_definition_resource_id" {
description = "The Azure resource ID of the role definition. Pass this to azurerm_role_assignment.role_definition_id."
value = azurerm_role_definition.this.role_definition_resource_id
}
output "role_definition_id" {
description = "The GUID of the role definition (the bare definition ID, without the scope prefix)."
value = azurerm_role_definition.this.role_definition_id
}
output "name" {
description = "Display name of the custom role."
value = azurerm_role_definition.this.name
}
output "assignable_scopes" {
description = "Scopes at which the role can be assigned."
value = azurerm_role_definition.this.assignable_scopes
}
How to use it
This example defines a tightly scoped operator role — it may start, restart, and read VMs but explicitly cannot delete them or write network configuration — and then assigns it to an on-call group using the module’s output.
data "azurerm_subscription" "current" {}
data "azuread_group" "oncall" {
display_name = "vm-oncall-operators"
}
module "custom_role_definition" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-custom-role-definition?ref=v1.0.0"
name = "KloudVin VM Operator"
description = "Start/restart and read VMs for on-call; no delete, no network writes. Least-privilege per runbook RB-204."
assignable_scopes = [
data.azurerm_subscription.current.id,
]
actions = [
"Microsoft.Compute/virtualMachines/read",
"Microsoft.Compute/virtualMachines/start/action",
"Microsoft.Compute/virtualMachines/restart/action",
"Microsoft.Compute/virtualMachines/instanceView/read",
"Microsoft.Insights/alertRules/read",
"Microsoft.Resources/subscriptions/resourceGroups/read",
]
not_actions = [
"Microsoft.Compute/virtualMachines/delete",
]
}
# Downstream: consume role_definition_resource_id to bind the role to the group.
resource "azurerm_role_assignment" "oncall_vm_operator" {
scope = data.azurerm_subscription.current.id
role_definition_id = module.custom_role_definition.role_definition_resource_id
principal_id = data.azuread_group.oncall.object_id
description = "On-call operators receive the custom VM Operator role."
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "azurerm"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...azurerm state bucket/container + key per path...
}
}
2. Module config — live/prod/custom_role_definition/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-custom-role-definition?ref=v1.0.0"
}
inputs = {
name = "..."
description = "..."
assignable_scopes = ["...", "..."]
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/custom_role_definition && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
name |
string |
n/a | Yes | Display name of the custom role; unique within the tenant (≤ 512 chars). |
description |
string |
n/a | Yes | Purpose and least-privilege rationale, shown in the portal and CLI. |
assignable_scopes |
list(string) |
n/a | Yes | Subscription / resource group / management group IDs where the role may be assigned. First entry stores the definition. |
actions |
list(string) |
[] |
No | Allowed control-plane operations (e.g. .../restart/action). |
not_actions |
list(string) |
[] |
No | Control-plane operations subtracted from actions. |
data_actions |
list(string) |
[] |
No | Allowed data-plane operations (e.g. Key Vault getSecret/action). |
not_data_actions |
list(string) |
[] |
No | Data-plane operations subtracted from data_actions. |
role_definition_id |
string |
null |
No | Fixed GUID for the definition; null lets Azure generate one. Set when importing/pinning. |
require_actions |
bool |
true |
No | Guardrail that rejects a role granting no actions; set false for an intentionally empty role. |
Outputs
| Name | Description |
|---|---|
role_definition_resource_id |
Full Azure resource ID of the role definition; feed to azurerm_role_assignment.role_definition_id. |
role_definition_id |
Bare GUID of the role definition (no scope prefix). |
name |
Display name of the custom role. |
assignable_scopes |
Scopes at which the role can be assigned. |
Enterprise scenario
A managed-services provider runs 40 customer subscriptions under one management group and must give its NOC team uniform break-fix rights without ever exposing access management or data deletion. They define a single KloudVin Platform Operator role from this module with assignable_scopes pinned to the management group, then call the module once and fan out azurerm_role_assignment resources per subscription using role_definition_resource_id. When a new Azure resource provider needs to be supported, one PR adds the action, bumps the module tag to v1.1.0, and the change rolls out everywhere on the next terraform apply — with the full diff captured in code review for the customer’s audit evidence.
Best practices
- Enumerate, don’t wildcard. Prefer explicit
ActionsoverMicrosoft.X/*. If you must wildcard, immediately carve out destructive and authorization operations innot_actions(never grantMicrosoft.Authorization/*/Writeor.../elevateAccess/actionin a custom role). - Pin
AssignableScopesto the narrowest container. Scope to a management group or single subscription, not the tenant root, so the role physically cannot be assigned where it doesn’t belong — this caps the blast radius of any future misassignment. - Keep control-plane and data-plane separate.
DataActions(read a blob, get a secret) andActions(manage the vault) are different planes; only addDataActionswhen the principal genuinely needs to touch data, and review them with extra scrutiny. - Make the role ID stable. Let downstream assignments consume
role_definition_resource_idrather than hardcoding a GUID, and only setrole_definition_idexplicitly when you need the same GUID across tenants — recreating a role under a new GUID silently breaks every assignment pointing at the old one. - Name and describe for auditors. Use a consistent prefix (
KloudVin <Resource> <Verb>) and put the least-privilege rationale plus a runbook/ticket reference indescription; this is the first thing a security reviewer reads inaz role definition list. - Respect the limits and clean up. A tenant allows 5,000 custom roles; retire unused ones and reuse a built-in role whenever it already matches, since every custom role is a permission surface you now own and must maintain.