Quick take — A reusable hashicorp/azurerm ~> 4.0 module for Azure Data Share: provisions a managed-identity share account and a snapshot/in-place share with a built-in sync schedule, ready for cross-tenant data collaboration. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "azurerm" {
features {}
}
module "data_share" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-data-share?ref=v1.0.0"
account_name = "..." # Name of the Data Share account (3-90 chars, alphanumeri…
resource_group_name = "..." # Resource group that will hold the account.
location = "..." # Azure region where Data Share is available.
share_name = "..." # Name of the share inside the account.
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
Azure Data Share lets a provider organisation share data — typically Azure Storage blobs/files or Azure Data Explorer/SQL datasets — with one or more consumer organisations, without ever handing over storage keys, SAS tokens or VPN access. The provider creates a share account (which carries a system-assigned managed identity), defines a share (either snapshot-based copies or in-place references), attaches datasets, and optionally pins a synchronization schedule so consumers always receive fresh data. Consumers accept an invitation and map the incoming data into their own subscription. It is the Azure-native answer to “email me a CSV every night” and to fragile blob-SAS hand-offs.
Wrapping it in a Terraform module matters because the moving parts are easy to get subtly wrong by hand: the share account’s managed identity must be granted Storage Blob Data Reader on the source storage before datasets attach, the kind of the share (CopyBased vs InPlace) is immutable, and a Scheduled synchronization needs both a valid RFC-3339 start time and a recurrence that Azure accepts. This module fixes those defaults, validates the inputs that commonly break apply, and exposes the share account’s principal ID so the consumer-side RBAC and the provider-side dataset wiring can be driven downstream from a single, version-pinned source of truth.
When to use it
- You publish a recurring dataset (sales extracts, telemetry, reference data) to partners, subsidiaries or another business unit in a different Azure AD tenant and want zero credential sharing.
- You need an auditable, infrastructure-as-code record of who shares what, on what cadence — not a click-ops share configured in the portal.
- You are standardising a data-mesh / data-product pattern where each product team stamps out an identical share account + snapshot share per environment.
- You want the share’s managed identity captured as a Terraform output so consumer-subscription
azurerm_role_assignments (or the source-storage grant) can reference it without copy-pasting an object ID.
Skip it if a single internal team just needs blob access — a plain RBAC role assignment on the storage account is simpler and cheaper than standing up a share.
Module structure
terraform-module-azure-data-share/
├── versions.tf # provider + Terraform version pins
├── main.tf # data_share_account + data_share (+ optional snapshot schedule)
├── variables.tf # var-driven inputs with validation
└── outputs.tf # account id, share id, managed identity principal id
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 4.0"
}
}
}
main.tf
resource "azurerm_data_share_account" "this" {
name = var.account_name
location = var.location
resource_group_name = var.resource_group_name
identity {
type = "SystemAssigned"
}
tags = var.tags
}
resource "azurerm_data_share" "this" {
name = var.share_name
account_id = azurerm_data_share_account.this.id
kind = var.share_kind
description = var.share_description
terms = var.share_terms
# A snapshot schedule only applies to CopyBased shares. For InPlace shares
# the block is omitted entirely (consumers read the live source).
dynamic "snapshot_schedule" {
for_each = var.share_kind == "CopyBased" && var.snapshot_schedule != null ? [var.snapshot_schedule] : []
content {
name = snapshot_schedule.value.name
recurrence = snapshot_schedule.value.recurrence
start_time = snapshot_schedule.value.start_time
}
}
}
# Grant the share account's managed identity read access on the source
# storage account so it can enumerate and snapshot the data to be shared.
# Optional: skip it when the grant is managed elsewhere (e.g. PIM / a
# central RBAC module) by setting source_storage_account_id = null.
resource "azurerm_role_assignment" "share_reader" {
count = var.source_storage_account_id == null ? 0 : 1
scope = var.source_storage_account_id
role_definition_name = "Storage Blob Data Reader"
principal_id = azurerm_data_share_account.this.identity[0].principal_id
}
variables.tf
variable "account_name" {
type = string
description = "Name of the Data Share account (3-90 chars, alphanumeric and hyphens)."
validation {
condition = can(regex("^[a-zA-Z0-9][a-zA-Z0-9-]{1,88}[a-zA-Z0-9]$", var.account_name))
error_message = "account_name must be 3-90 characters, alphanumeric or hyphen, and start/end alphanumeric."
}
}
variable "resource_group_name" {
type = string
description = "Resource group that will hold the Data Share account."
}
variable "location" {
type = string
description = "Azure region for the Data Share account (must be a region where Data Share is available, e.g. eastus, westeurope, southeastasia)."
}
variable "share_name" {
type = string
description = "Name of the share inside the account."
validation {
condition = can(regex("^[a-zA-Z0-9][a-zA-Z0-9-]{0,89}$", var.share_name))
error_message = "share_name must be 1-90 characters, alphanumeric or hyphen, and start alphanumeric."
}
}
variable "share_kind" {
type = string
default = "CopyBased"
description = "Share type: CopyBased (snapshot copies) or InPlace (live reference). Immutable after creation."
validation {
condition = contains(["CopyBased", "InPlace"], var.share_kind)
error_message = "share_kind must be either 'CopyBased' or 'InPlace'."
}
}
variable "share_description" {
type = string
default = null
description = "Human-readable description shown to consumers in the share invitation."
}
variable "share_terms" {
type = string
default = null
description = "Terms of use the consumer must accept before receiving data (e.g. internal data-handling policy reference)."
}
variable "snapshot_schedule" {
type = object({
name = string
recurrence = string
start_time = string
})
default = null
description = "Optional automatic snapshot schedule (CopyBased only). recurrence is 'Hour' or 'Day'; start_time is an RFC-3339 UTC timestamp."
validation {
condition = var.snapshot_schedule == null ? true : (
contains(["Hour", "Day"], var.snapshot_schedule.recurrence) &&
can(formatdate("YYYY-MM-DD'T'hh:mm:ss'Z'", var.snapshot_schedule.start_time))
)
error_message = "snapshot_schedule.recurrence must be 'Hour' or 'Day' and start_time must be an RFC-3339 UTC timestamp (e.g. 2026-06-10T00:00:00Z)."
}
}
variable "source_storage_account_id" {
type = string
default = null
description = "Resource ID of the source storage account to grant the share identity 'Storage Blob Data Reader' on. Set to null to manage this RBAC grant elsewhere."
}
variable "tags" {
type = map(string)
default = {}
description = "Tags applied to the Data Share account."
}
outputs.tf
output "account_id" {
description = "Resource ID of the Data Share account."
value = azurerm_data_share_account.this.id
}
output "account_name" {
description = "Name of the Data Share account."
value = azurerm_data_share_account.this.name
}
output "identity_principal_id" {
description = "Object (principal) ID of the share account's system-assigned managed identity — use it for consumer-side or source-storage RBAC."
value = azurerm_data_share_account.this.identity[0].principal_id
}
output "identity_tenant_id" {
description = "Tenant ID of the share account's system-assigned managed identity."
value = azurerm_data_share_account.this.identity[0].tenant_id
}
output "share_id" {
description = "Resource ID of the share."
value = azurerm_data_share.this.id
}
output "share_name" {
description = "Name of the share."
value = azurerm_data_share.this.name
}
How to use it
module "data_share" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-data-share?ref=v1.0.0"
account_name = "ds-sales-prod"
resource_group_name = azurerm_resource_group.data.name
location = azurerm_resource_group.data.location
share_name = "nightly-sales-extract"
share_kind = "CopyBased"
share_description = "Daily sales extract shared with the EMEA partner tenant."
share_terms = "Subject to KloudVin internal data-handling policy DH-014."
# Drive snapshots automatically at 02:00 UTC every day.
snapshot_schedule = {
name = "daily-0200"
recurrence = "Day"
start_time = "2026-06-10T02:00:00Z"
}
# Let the module grant the share identity read access on the source storage.
source_storage_account_id = azurerm_storage_account.sales.id
tags = {
environment = "prod"
dataproduct = "sales"
}
}
# Downstream: attach a blob folder dataset to the share created above,
# referencing the module's share_id output.
resource "azurerm_data_share_dataset_blob_storage" "sales_extract" {
name = "sales-extract-2026"
data_share_id = module.data_share.share_id
container_name = "exports"
storage_account {
name = azurerm_storage_account.sales.name
resource_group_name = azurerm_storage_account.sales.resource_group_name
subscription_id = data.azurerm_subscription.current.subscription_id
}
file_path = "sales/2026/extract.parquet"
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "azurerm"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...azurerm state bucket/container + key per path...
}
}
2. Module config — live/prod/data_share/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-data-share?ref=v1.0.0"
}
inputs = {
account_name = "..."
resource_group_name = "..."
location = "..."
share_name = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/data_share && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
account_name |
string |
— | Yes | Name of the Data Share account (3-90 chars, alphanumeric/hyphen). |
resource_group_name |
string |
— | Yes | Resource group that will hold the account. |
location |
string |
— | Yes | Azure region where Data Share is available. |
share_name |
string |
— | Yes | Name of the share inside the account. |
share_kind |
string |
"CopyBased" |
No | CopyBased (snapshots) or InPlace (live). Immutable. |
share_description |
string |
null |
No | Description shown to consumers in the invitation. |
share_terms |
string |
null |
No | Terms the consumer must accept before receiving data. |
snapshot_schedule |
object({ name, recurrence, start_time }) |
null |
No | Auto-snapshot schedule (CopyBased only); recurrence Hour/Day, RFC-3339 UTC start. |
source_storage_account_id |
string |
null |
No | Source storage account ID to grant the share identity Storage Blob Data Reader; null to manage RBAC elsewhere. |
tags |
map(string) |
{} |
No | Tags applied to the account. |
Outputs
| Name | Description |
|---|---|
account_id |
Resource ID of the Data Share account. |
account_name |
Name of the Data Share account. |
identity_principal_id |
Object ID of the account’s system-assigned managed identity (for downstream RBAC). |
identity_tenant_id |
Tenant ID of the account’s managed identity. |
share_id |
Resource ID of the share. |
share_name |
Name of the share. |
Enterprise scenario
A retail group’s central analytics team publishes a nightly Parquet sales extract from its sales storage account to three regional franchise partners, each in a separate Azure AD tenant. The platform team stamps ds-sales-prod from this module per environment, lets the module grant the share’s managed identity Storage Blob Data Reader on the source storage, and pins a Day snapshot schedule at 02:00 UTC so every partner receives fresh data before business hours — with no SAS tokens, no key rotation, and a Git-auditable record of exactly which dataset is shared on what cadence.
Best practices
- Never share storage keys or SAS — let the managed identity do the reading. Grant the share account’s
identity_principal_idexactlyStorage Blob Data Reader(least privilege) on only the source storage account; avoid broadContributoror subscription-level scopes. - Choose
kinddeliberately, up front — it is immutable.CopyBasedsnapshots incur egress and storage transactions on every sync, so for large or rarely-changing datasets preferInPlace(live reference) to cut cost; switching later forces a destroy/recreate of the share. - Right-size the snapshot cadence for cost.
Hourrecurrence multiplies snapshot/egress charges 24x versusDay; schedule only as often as the consumer actually needs, and drop the schedule entirely for ad-hoc one-time shares. - Pin
start_timein UTC and in the future. Azure rejects aScheduledsynchronization whose start time is in the past; always use an RFC-3339Ztimestamp (the validation here enforces the format) and verify it againstcurrentDatebefore applying. - Name for discoverability and tag for ownership. Use a consistent
ds-<dataproduct>-<env>account convention and a self-describingshare_name, and tag withdataproduct/environmentso finance can attribute the (otherwise easy-to-miss) snapshot and egress spend. - Region-co-locate provider data and the share account. Keep the Data Share account in the same region as the source storage to avoid cross-region snapshot egress and to satisfy data-residency constraints for the shared dataset.