IaC Azure

Terraform Module: Azure Data Share — governed snapshot sharing in one block

Quick take — A reusable hashicorp/azurerm ~> 4.0 module for Azure Data Share: provisions a managed-identity share account and a snapshot/in-place share with a built-in sync schedule, ready for cross-tenant data collaboration. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "data_share" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-data-share?ref=v1.0.0"

  account_name        = "..."  # Name of the Data Share account (3-90 chars, alphanumeri…
  resource_group_name = "..."  # Resource group that will hold the account.
  location            = "..."  # Azure region where Data Share is available.
  share_name          = "..."  # Name of the share inside the account.
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Azure Data Share lets a provider organisation share data — typically Azure Storage blobs/files or Azure Data Explorer/SQL datasets — with one or more consumer organisations, without ever handing over storage keys, SAS tokens or VPN access. The provider creates a share account (which carries a system-assigned managed identity), defines a share (either snapshot-based copies or in-place references), attaches datasets, and optionally pins a synchronization schedule so consumers always receive fresh data. Consumers accept an invitation and map the incoming data into their own subscription. It is the Azure-native answer to “email me a CSV every night” and to fragile blob-SAS hand-offs.

Wrapping it in a Terraform module matters because the moving parts are easy to get subtly wrong by hand: the share account’s managed identity must be granted Storage Blob Data Reader on the source storage before datasets attach, the kind of the share (CopyBased vs InPlace) is immutable, and a Scheduled synchronization needs both a valid RFC-3339 start time and a recurrence that Azure accepts. This module fixes those defaults, validates the inputs that commonly break apply, and exposes the share account’s principal ID so the consumer-side RBAC and the provider-side dataset wiring can be driven downstream from a single, version-pinned source of truth.

When to use it

Skip it if a single internal team just needs blob access — a plain RBAC role assignment on the storage account is simpler and cheaper than standing up a share.

Module structure

terraform-module-azure-data-share/
├── versions.tf      # provider + Terraform version pins
├── main.tf          # data_share_account + data_share (+ optional snapshot schedule)
├── variables.tf     # var-driven inputs with validation
└── outputs.tf       # account id, share id, managed identity principal id

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

resource "azurerm_data_share_account" "this" {
  name                = var.account_name
  location            = var.location
  resource_group_name = var.resource_group_name

  identity {
    type = "SystemAssigned"
  }

  tags = var.tags
}

resource "azurerm_data_share" "this" {
  name       = var.share_name
  account_id = azurerm_data_share_account.this.id
  kind       = var.share_kind

  description = var.share_description
  terms       = var.share_terms

  # A snapshot schedule only applies to CopyBased shares. For InPlace shares
  # the block is omitted entirely (consumers read the live source).
  dynamic "snapshot_schedule" {
    for_each = var.share_kind == "CopyBased" && var.snapshot_schedule != null ? [var.snapshot_schedule] : []

    content {
      name       = snapshot_schedule.value.name
      recurrence = snapshot_schedule.value.recurrence
      start_time = snapshot_schedule.value.start_time
    }
  }
}

# Grant the share account's managed identity read access on the source
# storage account so it can enumerate and snapshot the data to be shared.
# Optional: skip it when the grant is managed elsewhere (e.g. PIM / a
# central RBAC module) by setting source_storage_account_id = null.
resource "azurerm_role_assignment" "share_reader" {
  count = var.source_storage_account_id == null ? 0 : 1

  scope                = var.source_storage_account_id
  role_definition_name = "Storage Blob Data Reader"
  principal_id         = azurerm_data_share_account.this.identity[0].principal_id
}

variables.tf

variable "account_name" {
  type        = string
  description = "Name of the Data Share account (3-90 chars, alphanumeric and hyphens)."

  validation {
    condition     = can(regex("^[a-zA-Z0-9][a-zA-Z0-9-]{1,88}[a-zA-Z0-9]$", var.account_name))
    error_message = "account_name must be 3-90 characters, alphanumeric or hyphen, and start/end alphanumeric."
  }
}

variable "resource_group_name" {
  type        = string
  description = "Resource group that will hold the Data Share account."
}

variable "location" {
  type        = string
  description = "Azure region for the Data Share account (must be a region where Data Share is available, e.g. eastus, westeurope, southeastasia)."
}

variable "share_name" {
  type        = string
  description = "Name of the share inside the account."

  validation {
    condition     = can(regex("^[a-zA-Z0-9][a-zA-Z0-9-]{0,89}$", var.share_name))
    error_message = "share_name must be 1-90 characters, alphanumeric or hyphen, and start alphanumeric."
  }
}

variable "share_kind" {
  type        = string
  default     = "CopyBased"
  description = "Share type: CopyBased (snapshot copies) or InPlace (live reference). Immutable after creation."

  validation {
    condition     = contains(["CopyBased", "InPlace"], var.share_kind)
    error_message = "share_kind must be either 'CopyBased' or 'InPlace'."
  }
}

variable "share_description" {
  type        = string
  default     = null
  description = "Human-readable description shown to consumers in the share invitation."
}

variable "share_terms" {
  type        = string
  default     = null
  description = "Terms of use the consumer must accept before receiving data (e.g. internal data-handling policy reference)."
}

variable "snapshot_schedule" {
  type = object({
    name       = string
    recurrence = string
    start_time = string
  })
  default     = null
  description = "Optional automatic snapshot schedule (CopyBased only). recurrence is 'Hour' or 'Day'; start_time is an RFC-3339 UTC timestamp."

  validation {
    condition = var.snapshot_schedule == null ? true : (
      contains(["Hour", "Day"], var.snapshot_schedule.recurrence) &&
      can(formatdate("YYYY-MM-DD'T'hh:mm:ss'Z'", var.snapshot_schedule.start_time))
    )
    error_message = "snapshot_schedule.recurrence must be 'Hour' or 'Day' and start_time must be an RFC-3339 UTC timestamp (e.g. 2026-06-10T00:00:00Z)."
  }
}

variable "source_storage_account_id" {
  type        = string
  default     = null
  description = "Resource ID of the source storage account to grant the share identity 'Storage Blob Data Reader' on. Set to null to manage this RBAC grant elsewhere."
}

variable "tags" {
  type        = map(string)
  default     = {}
  description = "Tags applied to the Data Share account."
}

outputs.tf

output "account_id" {
  description = "Resource ID of the Data Share account."
  value       = azurerm_data_share_account.this.id
}

output "account_name" {
  description = "Name of the Data Share account."
  value       = azurerm_data_share_account.this.name
}

output "identity_principal_id" {
  description = "Object (principal) ID of the share account's system-assigned managed identity — use it for consumer-side or source-storage RBAC."
  value       = azurerm_data_share_account.this.identity[0].principal_id
}

output "identity_tenant_id" {
  description = "Tenant ID of the share account's system-assigned managed identity."
  value       = azurerm_data_share_account.this.identity[0].tenant_id
}

output "share_id" {
  description = "Resource ID of the share."
  value       = azurerm_data_share.this.id
}

output "share_name" {
  description = "Name of the share."
  value       = azurerm_data_share.this.name
}

How to use it

module "data_share" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-data-share?ref=v1.0.0"

  account_name        = "ds-sales-prod"
  resource_group_name = azurerm_resource_group.data.name
  location            = azurerm_resource_group.data.location

  share_name        = "nightly-sales-extract"
  share_kind        = "CopyBased"
  share_description = "Daily sales extract shared with the EMEA partner tenant."
  share_terms       = "Subject to KloudVin internal data-handling policy DH-014."

  # Drive snapshots automatically at 02:00 UTC every day.
  snapshot_schedule = {
    name       = "daily-0200"
    recurrence = "Day"
    start_time = "2026-06-10T02:00:00Z"
  }

  # Let the module grant the share identity read access on the source storage.
  source_storage_account_id = azurerm_storage_account.sales.id

  tags = {
    environment = "prod"
    dataproduct = "sales"
  }
}

# Downstream: attach a blob folder dataset to the share created above,
# referencing the module's share_id output.
resource "azurerm_data_share_dataset_blob_storage" "sales_extract" {
  name             = "sales-extract-2026"
  data_share_id    = module.data_share.share_id
  container_name   = "exports"
  storage_account {
    name                = azurerm_storage_account.sales.name
    resource_group_name = azurerm_storage_account.sales.resource_group_name
    subscription_id     = data.azurerm_subscription.current.subscription_id
  }
  file_path = "sales/2026/extract.parquet"
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module configlive/prod/data_share/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-data-share?ref=v1.0.0"
}

inputs = {
  account_name = "..."
  resource_group_name = "..."
  location = "..."
  share_name = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/data_share && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
account_name string Yes Name of the Data Share account (3-90 chars, alphanumeric/hyphen).
resource_group_name string Yes Resource group that will hold the account.
location string Yes Azure region where Data Share is available.
share_name string Yes Name of the share inside the account.
share_kind string "CopyBased" No CopyBased (snapshots) or InPlace (live). Immutable.
share_description string null No Description shown to consumers in the invitation.
share_terms string null No Terms the consumer must accept before receiving data.
snapshot_schedule object({ name, recurrence, start_time }) null No Auto-snapshot schedule (CopyBased only); recurrence Hour/Day, RFC-3339 UTC start.
source_storage_account_id string null No Source storage account ID to grant the share identity Storage Blob Data Reader; null to manage RBAC elsewhere.
tags map(string) {} No Tags applied to the account.

Outputs

Name Description
account_id Resource ID of the Data Share account.
account_name Name of the Data Share account.
identity_principal_id Object ID of the account’s system-assigned managed identity (for downstream RBAC).
identity_tenant_id Tenant ID of the account’s managed identity.
share_id Resource ID of the share.
share_name Name of the share.

Enterprise scenario

A retail group’s central analytics team publishes a nightly Parquet sales extract from its sales storage account to three regional franchise partners, each in a separate Azure AD tenant. The platform team stamps ds-sales-prod from this module per environment, lets the module grant the share’s managed identity Storage Blob Data Reader on the source storage, and pins a Day snapshot schedule at 02:00 UTC so every partner receives fresh data before business hours — with no SAS tokens, no key rotation, and a Git-auditable record of exactly which dataset is shared on what cadence.

Best practices

TerraformAzureData ShareModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading