Terraform Module: Azure Data Share — governed snapshot sharing in one block

Quick take — A reusable hashicorp/azurerm ~> 4.0 module for Azure Data Share: provisions a managed-identity share account and a snapshot/in-place share with a built-in sync schedule, ready for cross-tenant data collaboration. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "data_share" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-data-share?ref=v1.0.0"

  account_name        = "..."  # Name of the Data Share account (3-90 chars, alphanumeri…
  resource_group_name = "..."  # Resource group that will hold the account.
  location            = "..."  # Azure region where Data Share is available.
  share_name          = "..."  # Name of the share inside the account.
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Azure Data Share lets a provider organisation share data — typically Azure Storage blobs/files or Azure Data Explorer/SQL datasets — with one or more consumer organisations, without ever handing over storage keys, SAS tokens or VPN access. The provider creates a share account (which carries a system-assigned managed identity), defines a share (either snapshot-based copies or in-place references), attaches datasets, and optionally pins a synchronization schedule so consumers always receive fresh data. Consumers accept an invitation and map the incoming data into their own subscription. It is the Azure-native answer to “email me a CSV every night” and to fragile blob-SAS hand-offs.

Wrapping it in a Terraform module matters because the moving parts are easy to get subtly wrong by hand: the share account’s managed identity must be granted Storage Blob Data Reader on the source storage before datasets attach, the kind of the share (CopyBased vs InPlace) is immutable, and a Scheduled synchronization needs both a valid RFC-3339 start time and a recurrence that Azure accepts. This module fixes those defaults, validates the inputs that commonly break apply, and exposes the share account’s principal ID so the consumer-side RBAC and the provider-side dataset wiring can be driven downstream from a single, version-pinned source of truth.

When to use it

You publish a recurring dataset (sales extracts, telemetry, reference data) to partners, subsidiaries or another business unit in a different Azure AD tenant and want zero credential sharing.
You need an auditable, infrastructure-as-code record of who shares what, on what cadence — not a click-ops share configured in the portal.
You are standardising a data-mesh / data-product pattern where each product team stamps out an identical share account + snapshot share per environment.
You want the share’s managed identity captured as a Terraform output so consumer-subscription azurerm_role_assignments (or the source-storage grant) can reference it without copy-pasting an object ID.

Skip it if a single internal team just needs blob access — a plain RBAC role assignment on the storage account is simpler and cheaper than standing up a share.

Module structure

terraform-module-azure-data-share/
├── versions.tf      # provider + Terraform version pins
├── main.tf          # data_share_account + data_share (+ optional snapshot schedule)
├── variables.tf     # var-driven inputs with validation
└── outputs.tf       # account id, share id, managed identity principal id

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

resource "azurerm_data_share_account" "this" {
  name                = var.account_name
  location            = var.location
  resource_group_name = var.resource_group_name

  identity {
    type = "SystemAssigned"
  }

  tags = var.tags
}

resource "azurerm_data_share" "this" {
  name       = var.share_name
  account_id = azurerm_data_share_account.this.id
  kind       = var.share_kind

  description = var.share_description
  terms       = var.share_terms

  # A snapshot schedule only applies to CopyBased shares. For InPlace shares
  # the block is omitted entirely (consumers read the live source).
  dynamic "snapshot_schedule" {
    for_each = var.share_kind == "CopyBased" && var.snapshot_schedule != null ? [var.snapshot_schedule] : []

    content {
      name       = snapshot_schedule.value.name
      recurrence = snapshot_schedule.value.recurrence
      start_time = snapshot_schedule.value.start_time
    }
  }
}

# Grant the share account's managed identity read access on the source
# storage account so it can enumerate and snapshot the data to be shared.
# Optional: skip it when the grant is managed elsewhere (e.g. PIM / a
# central RBAC module) by setting source_storage_account_id = null.
resource "azurerm_role_assignment" "share_reader" {
  count = var.source_storage_account_id == null ? 0 : 1

  scope                = var.source_storage_account_id
  role_definition_name = "Storage Blob Data Reader"
  principal_id         = azurerm_data_share_account.this.identity[0].principal_id
}

variables.tf

variable "account_name" {
  type        = string
  description = "Name of the Data Share account (3-90 chars, alphanumeric and hyphens)."

  validation {
    condition     = can(regex("^[a-zA-Z0-9][a-zA-Z0-9-]{1,88}[a-zA-Z0-9]$", var.account_name))
    error_message = "account_name must be 3-90 characters, alphanumeric or hyphen, and start/end alphanumeric."
  }
}

variable "resource_group_name" {
  type        = string
  description = "Resource group that will hold the Data Share account."
}

variable "location" {
  type        = string
  description = "Azure region for the Data Share account (must be a region where Data Share is available, e.g. eastus, westeurope, southeastasia)."
}

variable "share_name" {
  type        = string
  description = "Name of the share inside the account."

  validation {
    condition     = can(regex("^[a-zA-Z0-9][a-zA-Z0-9-]{0,89}$", var.share_name))
    error_message = "share_name must be 1-90 characters, alphanumeric or hyphen, and start alphanumeric."
  }
}

variable "share_kind" {
  type        = string
  default     = "CopyBased"
  description = "Share type: CopyBased (snapshot copies) or InPlace (live reference). Immutable after creation."

  validation {
    condition     = contains(["CopyBased", "InPlace"], var.share_kind)
    error_message = "share_kind must be either 'CopyBased' or 'InPlace'."
  }
}

variable "share_description" {
  type        = string
  default     = null
  description = "Human-readable description shown to consumers in the share invitation."
}

variable "share_terms" {
  type        = string
  default     = null
  description = "Terms of use the consumer must accept before receiving data (e.g. internal data-handling policy reference)."
}

variable "snapshot_schedule" {
  type = object({
    name       = string
    recurrence = string
    start_time = string
  })
  default     = null
  description = "Optional automatic snapshot schedule (CopyBased only). recurrence is 'Hour' or 'Day'; start_time is an RFC-3339 UTC timestamp."

  validation {
    condition = var.snapshot_schedule == null ? true : (
      contains(["Hour", "Day"], var.snapshot_schedule.recurrence) &&
      can(formatdate("YYYY-MM-DD'T'hh:mm:ss'Z'", var.snapshot_schedule.start_time))
    )
    error_message = "snapshot_schedule.recurrence must be 'Hour' or 'Day' and start_time must be an RFC-3339 UTC timestamp (e.g. 2026-06-10T00:00:00Z)."
  }
}

variable "source_storage_account_id" {
  type        = string
  default     = null
  description = "Resource ID of the source storage account to grant the share identity 'Storage Blob Data Reader' on. Set to null to manage this RBAC grant elsewhere."
}

variable "tags" {
  type        = map(string)
  default     = {}
  description = "Tags applied to the Data Share account."
}

outputs.tf

output "account_id" {
  description = "Resource ID of the Data Share account."
  value       = azurerm_data_share_account.this.id
}

output "account_name" {
  description = "Name of the Data Share account."
  value       = azurerm_data_share_account.this.name
}

output "identity_principal_id" {
  description = "Object (principal) ID of the share account's system-assigned managed identity — use it for consumer-side or source-storage RBAC."
  value       = azurerm_data_share_account.this.identity[0].principal_id
}

output "identity_tenant_id" {
  description = "Tenant ID of the share account's system-assigned managed identity."
  value       = azurerm_data_share_account.this.identity[0].tenant_id
}

output "share_id" {
  description = "Resource ID of the share."
  value       = azurerm_data_share.this.id
}

output "share_name" {
  description = "Name of the share."
  value       = azurerm_data_share.this.name
}

How to use it

module "data_share" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-data-share?ref=v1.0.0"

  account_name        = "ds-sales-prod"
  resource_group_name = azurerm_resource_group.data.name
  location            = azurerm_resource_group.data.location

  share_name        = "nightly-sales-extract"
  share_kind        = "CopyBased"
  share_description = "Daily sales extract shared with the EMEA partner tenant."
  share_terms       = "Subject to KloudVin internal data-handling policy DH-014."

  # Drive snapshots automatically at 02:00 UTC every day.
  snapshot_schedule = {
    name       = "daily-0200"
    recurrence = "Day"
    start_time = "2026-06-10T02:00:00Z"
  }

  # Let the module grant the share identity read access on the source storage.
  source_storage_account_id = azurerm_storage_account.sales.id

  tags = {
    environment = "prod"
    dataproduct = "sales"
  }
}

# Downstream: attach a blob folder dataset to the share created above,
# referencing the module's share_id output.
resource "azurerm_data_share_dataset_blob_storage" "sales_extract" {
  name             = "sales-extract-2026"
  data_share_id    = module.data_share.share_id
  container_name   = "exports"
  storage_account {
    name                = azurerm_storage_account.sales.name
    resource_group_name = azurerm_storage_account.sales.resource_group_name
    subscription_id     = data.azurerm_subscription.current.subscription_id
  }
  file_path = "sales/2026/extract.parquet"
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module config — live/prod/data_share/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-data-share?ref=v1.0.0"
}

inputs = {
  account_name = "..."
  resource_group_name = "..."
  location = "..."
  share_name = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/data_share && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
`account_name`	`string`	—	Yes	Name of the Data Share account (3-90 chars, alphanumeric/hyphen).
`resource_group_name`	`string`	—	Yes	Resource group that will hold the account.
`location`	`string`	—	Yes	Azure region where Data Share is available.
`share_name`	`string`	—	Yes	Name of the share inside the account.
`share_kind`	`string`	`"CopyBased"`	No	`CopyBased` (snapshots) or `InPlace` (live). Immutable.
`share_description`	`string`	`null`	No	Description shown to consumers in the invitation.
`share_terms`	`string`	`null`	No	Terms the consumer must accept before receiving data.
`snapshot_schedule`	`object({ name, recurrence, start_time })`	`null`	No	Auto-snapshot schedule (CopyBased only); recurrence `Hour`/`Day`, RFC-3339 UTC start.
`source_storage_account_id`	`string`	`null`	No	Source storage account ID to grant the share identity `Storage Blob Data Reader`; `null` to manage RBAC elsewhere.
`tags`	`map(string)`	`{}`	No	Tags applied to the account.

Outputs

Name	Description
`account_id`	Resource ID of the Data Share account.
`account_name`	Name of the Data Share account.
`identity_principal_id`	Object ID of the account’s system-assigned managed identity (for downstream RBAC).
`identity_tenant_id`	Tenant ID of the account’s managed identity.
`share_id`	Resource ID of the share.
`share_name`	Name of the share.

Enterprise scenario

A retail group’s central analytics team publishes a nightly Parquet sales extract from its sales storage account to three regional franchise partners, each in a separate Azure AD tenant. The platform team stamps ds-sales-prod from this module per environment, lets the module grant the share’s managed identity Storage Blob Data Reader on the source storage, and pins a Day snapshot schedule at 02:00 UTC so every partner receives fresh data before business hours — with no SAS tokens, no key rotation, and a Git-auditable record of exactly which dataset is shared on what cadence.

Best practices

Never share storage keys or SAS — let the managed identity do the reading. Grant the share account’s identity_principal_id exactly Storage Blob Data Reader (least privilege) on only the source storage account; avoid broad Contributor or subscription-level scopes.
Choose kind deliberately, up front — it is immutable. CopyBased snapshots incur egress and storage transactions on every sync, so for large or rarely-changing datasets prefer InPlace (live reference) to cut cost; switching later forces a destroy/recreate of the share.
Right-size the snapshot cadence for cost. Hour recurrence multiplies snapshot/egress charges 24x versus Day; schedule only as often as the consumer actually needs, and drop the schedule entirely for ad-hoc one-time shares.
Pin start_time in UTC and in the future. Azure rejects a Scheduled synchronization whose start time is in the past; always use an RFC-3339 Z timestamp (the validation here enforces the format) and verify it against currentDate before applying.
Name for discoverability and tag for ownership. Use a consistent ds-<dataproduct>-<env> account convention and a self-describing share_name, and tag with dataproduct/environment so finance can attribute the (otherwise easy-to-miss) snapshot and egress spend.
Region-co-locate provider data and the share account. Keep the Data Share account in the same region as the source storage to avoid cross-region snapshot egress and to satisfy data-residency constraints for the shared dataset.