Azure Arc-Enabled Kubernetes: GitOps, Policy, and Fleet Governance for Hybrid Clusters

A platform team running EKS in one account, GKE in another, and three on-prem clusters in a colo does not have a Kubernetes problem. It has a governance problem: no single place to assert “every cluster runs this GitOps config, denies privileged pods, ships logs to one workspace, and is reachable for debugging without poking inbound holes in five firewalls.” Azure Arc-enabled Kubernetes projects any conformant cluster into Azure Resource Manager as a Microsoft.Kubernetes/connectedClusters resource, so the same management-group hierarchy, Azure Policy assignments, and RBAC you already use for native Azure resources now reach the cluster.

This walkthrough onboards a non-Azure cluster, then layers the four controls that actually matter at fleet scale: Flux v2 GitOps for desired-state config, Azure Policy (Gatekeeper) for admission guardrails, cluster connect for kubectl without inbound firewall changes, and Container Insights plus workload identity for observability and secretless Key Vault access. Throughout I assume you have cluster-admin on the target cluster and Owner (or sufficient RBAC) on the Azure side.

Arc projects the cluster; it does not run it. The control plane, scheduler, and your nodes stay exactly where they are. Arc adds a set of agents that maintain an outbound connection to Azure and reconcile ARM intent into the cluster. If Azure is unreachable, the cluster keeps serving traffic.

1. Agent architecture, connectivity, and outbound requirements

az connectedk8s connect installs a Helm release into the azure-arc namespace. The agents are all-outbound by design - there is no inbound listener Azure dials into. The ones you will care about:

Agent	Role
`clusterconnect-agent`	Reverse proxy that brokers the cluster connect channel (kubectl from anywhere)
`kube-aad-proxy`	Performs Microsoft Entra auth on incoming cluster-connect requests, then impersonates the user against `kube-apiserver`
`config-agent`	Watches ARM for `fluxConfigurations` / source-control config and applies them
`extension-manager`	Installs and lifecycles cluster extensions (Flux, Policy, Monitor, Key Vault)
`clusteridentityoperator`	Maintains the cluster’s managed identity certificate (MSI) used to auth to Azure
`resource-sync-agent`	Syncs cluster inventory back to the ARM resource

Every agent talks outbound over https://:443 and websockets. The non-obvious requirement is *.servicebus.windows.net with websockets enabled on your proxy/firewall - cluster connect rides Azure Relay over that endpoint, and a Layer-7 proxy that blocks websocket upgrades will let onboarding succeed but break kubectl-over-Arc later. Other required FQDNs include management.azure.com, login.microsoftonline.com, *.dp.kubernetesconfiguration.azure.com, mcr.microsoft.com (agent images), and guestnotificationservice.azure.com. The wildcard Service Bus endpoints resolve per-region; expand them with:

# Region-specific allowlist to replace the *.servicebus.windows.net wildcard
curl "https://guestnotificationservice.azure.com/urls/allowlist?api-version=2020-01-01&location=eastus"

There is no “Azure-initiated inbound” connectivity mode for Arc Kubernetes - it is outbound-only, which is precisely why it fits locked-down on-prem and multi-cloud egress postures. If you sit behind a proxy, pass it at connect time rather than relying on env vars alone (covered below).

2. Onboard an on-prem or EKS/GKE cluster

Point your kubeconfig at the target cluster (kubectl config use-context my-eks), then prep the Azure side. Register the resource providers once per subscription:

az extension add --name connectedk8s

az provider register --namespace Microsoft.Kubernetes
az provider register --namespace Microsoft.KubernetesConfiguration
az provider register --namespace Microsoft.ExtendedLocation

# Registration can take ~10 min; gate on it
az provider show -n Microsoft.Kubernetes --query registrationState -o tsv

Create a resource group to hold the connected-cluster resources, then connect:

export RESOURCE_GROUP=rg-arc-fleet
export LOCATION=eastus
export CLUSTER_NAME=eks-prod-use1

az group create --name $RESOURCE_GROUP --location $LOCATION -o table

# Uses the CURRENT kubeconfig context to deploy the Arc agents
az connectedk8s connect \
  --name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP \
  --location $LOCATION

connect installs its own Helm v3 into ~/.azure (it never touches a Helm you already have) and deploys the agents. If the cluster egresses through a proxy, do not rely on HTTP_PROXY alone - pass it so the in-cluster agents inherit it. Always include the cluster’s service CIDR in --proxy-skip-range, or in-cluster service-to-service calls will be wrongly routed at the proxy:

az connectedk8s connect \
  --name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP \
  --proxy-https https://proxy.corp.local:8080 \
  --proxy-http  http://proxy.corp.local:8080 \
  --proxy-skip-range 10.0.0.0/16,kubernetes.default.svc,.svc.cluster.local,.svc \
  --proxy-cert /etc/ssl/certs/corp-root.crt

--proxy-cert is only for injecting a trusted root the proxy presents; it is not required just to use a proxy. The three flags most environments actually need are --proxy-http, --proxy-https, and --proxy-skip-range.

3. Configure Flux v2 GitOps via the Arc extension

Arc’s GitOps is Flux v2 delivered as the microsoft.flux cluster extension (it installs fluxconfig-agent and fluxconfig-controller alongside the upstream source/kustomize/helm controllers). You rarely install the extension by hand - creating your first fluxConfigurations pulls it in automatically. Register the configuration with az k8s-configuration flux create, scoped at the cluster level, with one or more Kustomizations:

# Needs the k8s-configuration CLI extension
az extension add --name k8s-configuration

az k8s-configuration flux create \
  --name fleet-baseline \
  --cluster-name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP \
  --cluster-type connectedClusters \
  --namespace cluster-config \
  --scope cluster \
  --url https://github.com/acme-platform/fleet-gitops \
  --branch main \
  --kustomization name=infra path=./infrastructure prune=true \
  --kustomization name=apps  path=./apps/prod prune=true dependsOn=["infra"]

The mechanics worth internalising:

--scope cluster lets the Kustomizations create cluster-scoped objects (CRDs, namespaces, ClusterRoles). Use --scope namespace for tenant-confined configs that may only touch their own namespace.
prune=true is non-negotiable for real GitOps: delete a manifest from Git and Flux garbage-collects the object from the cluster. Without it, Git stops being the source of truth.
dependsOn orders reconciliation - apps waits for infra to go Ready, so your ingress controller and CRDs land before the workloads that need them.
The same command works against AKS by passing --cluster-type managedClusters. That symmetry is the whole point: one Git repo, one CLI, identical config across Arc and AKS.

For a connected (non-AKS) cluster you do not need a managed identity to read a public Git repo - the source controller pulls directly. For private repos, pass --https-user/--https-key (PAT) or SSH key material; for Azure-hosted sources with workload identity, see section 7.

4. Apply Azure Policy (Gatekeeper) at fleet scope

Azure Policy for Kubernetes extends Gatekeeper v3 (the OPA admission webhook) so you can author guardrails once in ARM and enforce them as in-cluster admission decisions across the fleet. Install the extension per cluster, then assign initiatives at a scope that covers many clusters.

az provider register --namespace Microsoft.PolicyInsights

az k8s-extension create \
  --cluster-type connectedClusters \
  --cluster-name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP \
  --extension-type Microsoft.PolicyInsights \
  --name azurepolicy

Now assign a built-in initiative. The Pod Security baseline standards for Linux workloads initiative (a8640138-9b0a-4a28-b8cb-1666c838647d) bundles the deny rules most teams want - no privileged containers, no host namespaces, no hostPath, drop dangerous capabilities. Assign it at a management group so it lands on every connected cluster underneath, and exclude the system namespaces (otherwise you will block Arc’s own agents):

az policy assignment create \
  --name "psp-baseline-fleet" \
  --display-name "Pod Security baseline - Arc fleet" \
  --policy-set-definition "a8640138-9b0a-4a28-b8cb-1666c838647d" \
  --scope "/providers/Microsoft.Management/managementGroups/mg-arc-prod" \
  --params '{
    "effect": { "value": "deny" },
    "excludedNamespaces": { "value": ["kube-system","gatekeeper-system","azure-arc"] }
  }'

Two operational realities to respect:

Roll out in audit before deny. Set effect to audit, watch the compliance results in Azure Policy for a week, fix the violators, then flip to deny. Flipping straight to deny on a brownfield cluster will reject existing Deployments on their next rollout and page you at 02:00.
Constraints are pulled, not instant. The add-on syncs assignments roughly every 15 minutes and writes Gatekeeper Constraint objects whose names start with azurepolicy-. Inspect them in-cluster with kubectl get constrainttemplates and kubectl get constraints.

For org-specific rules beyond the built-ins (e.g. “all images must come from acme.azurecr.io”), author a custom constraint template + Rego and ship it as a custom policy definition - same assignment model, same fleet scope.

5. Cluster connect: kubectl without inbound firewall changes

This is the feature that wins over on-prem teams. The clusterconnect-agent holds an outbound channel open; az connectedk8s proxy uses your Azure token to open a local proxy and writes a kubeconfig that targets it. No inbound port, no VPN, no bastion.

First grant access. With Azure RBAC, assign the user/group a built-in role at the cluster scope - no kubectl ClusterRoleBinding required:

ARM_ID=$(az connectedk8s show -n $CLUSTER_NAME -g $RESOURCE_GROUP --query id -o tsv)
AAD_ID=$(az ad signed-in-user show --query id -o tsv)

# "Cluster User Role" grants the cluster-connect channel; "Viewer/Writer" grants in-cluster RBAC
az role assignment create --role "Azure Arc Enabled Kubernetes Cluster User Role" --assignee $AAD_ID --scope $ARM_ID
az role assignment create --role "Azure Arc Kubernetes Viewer" --assignee $AAD_ID --scope $ARM_ID

Then open the proxy (it blocks the shell) and run kubectl from a second shell:

# Shell 1 - opens the proxy, blocks
az connectedk8s proxy -n $CLUSTER_NAME -g $RESOURCE_GROUP

# Shell 2 - normal kubectl, routed over the Arc channel
kubectl get pods -A

If you prefer native Kubernetes RBAC over Azure RBAC, bind a service account token instead and pass --token $TOKEN to the proxy command. Either way, the request path is: your token to Azure Relay to clusterconnect-agent to kube-aad-proxy (Entra auth + user impersonation) to kube-apiserver. The impersonation step is why a fleet-wide Azure Arc Kubernetes Viewer role gives read-only kubectl on every cluster at once.

6. Enable Azure Monitor Container Insights

Ship stdout/stderr logs, inventory, and container metrics from every Arc cluster into one Log Analytics workspace via the Microsoft.AzureMonitor.Containers extension. Use managed identity auth (amalogs.useAADAuth=true) so there is no workspace key sitting in the cluster:

WORKSPACE_ID="/subscriptions/<sub>/resourceGroups/rg-observability/providers/Microsoft.OperationalInsights/workspaces/law-fleet"

az k8s-extension create \
  --name azuremonitor-containers \
  --cluster-name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP \
  --cluster-type connectedClusters \
  --extension-type Microsoft.AzureMonitor.Containers \
  --configuration-settings \
      logAnalyticsWorkspaceResourceID=$WORKSPACE_ID \
      amalogs.useAADAuth=true

The extension deploys the ama-logs DaemonSet (every node) and ama-logs-rs ReplicaSet (cluster-level) into kube-system. To control ingestion cost on chatty clusters, scope collection to specific namespaces with dataCollectionSettings at install time:

az k8s-extension create \
  --name azuremonitor-containers \
  --cluster-name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP \
  --cluster-type connectedClusters \
  --extension-type Microsoft.AzureMonitor.Containers \
  --configuration-settings amalogs.useAADAuth=true \
      dataCollectionSettings='{"interval":"1m","namespaceFilteringMode":"Include","namespaces":["prod","ingress"],"enableContainerLogV2":true}'

Once data lands, query the whole fleet from one workspace. Container logs carry the cluster identity, so a single KQL query slices across every onboarded cluster:

ContainerLogV2
| where TimeGenerated > ago(1h)
| where LogLevel in ("error","critical")
| summarize Errors = count() by Computer, ContainerName, _ResourceId
| sort by Errors desc

Note the migration: the legacy Helm-chart onboarding for the Container Insights agent is retired. On Arc, install via the Microsoft.AzureMonitor.Containers extension - that is the supported path and the one that participates in extension lifecycle/upgrades.

7. Workload identity and Key Vault secret access

Static secrets in manifests are the failure mode Arc lets you finally kill. The Azure Key Vault Secrets Provider extension (Microsoft.AzureKeyVaultSecretsProvider) installs the Secrets Store CSI Driver plus the Azure provider, so pods mount Key Vault secrets as files on tmpfs with no credential in the cluster:

az k8s-extension create \
  --cluster-name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP \
  --cluster-type connectedClusters \
  --extension-type Microsoft.AzureKeyVaultSecretsProvider \
  --name akvsecretsprovider \
  --configuration-settings \
      secrets-store-csi-driver.enableSecretRotation=true \
      secrets-store-csi-driver.rotationPollInterval=2m \
      secrets-store-csi-driver.syncSecret.enabled=true

For the auth itself, federate a user-assigned managed identity to a Kubernetes service account (workload identity) so the CSI provider exchanges the pod’s projected token for an Entra token - no client secret anywhere. A SecretProviderClass ties the service account to the vault:

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: app-kv
  namespace: prod
spec:
  provider: azure
  parameters:
    clientID: "<USER_ASSIGNED_CLIENT_ID>"   # the federated UAMI
    keyvaultName: "kv-acme-prod"
    tenantId: "<TENANT_ID>"
    objects: |
      array:
        - |
          objectName: db-connection-string
          objectType: secret
  # Optional: project mounted secrets into a native K8s Secret for env vars
  secretObjects:
    - secretName: app-db
      type: Opaque
      data:
        - objectName: db-connection-string
          key: DB_CONN

Grant the UAMI Key Vault Secrets User on the vault via Azure RBAC, federate it to the service account’s OIDC subject, then any pod using that service account and mounting this SecretProviderClass reads the secret. Because the access is scoped per-service-account, you get least privilege and clean per-app audit instead of one node-wide credential.

Rotation caveat: enableSecretRotation=true refreshes the mounted file on the poll interval. Apps that read the file each request pick up new values automatically; apps that load secrets once at boot, or consume the synced Secret as env vars, still need a restart to see a rotated value. Env vars are snapshotted at pod start - the kernel cannot rewrite a running process’s environment.

8. Scale governance across many clusters

Onboarding one cluster is a demo. Governing forty is the job. Three primitives make Arc fleet-ready:

Management groups carry policy and RBAC. Place subscriptions (and therefore their connected clusters) under a management-group hierarchy and assign Policy initiatives + Arc Kubernetes roles at the MG level. A new cluster onboarded into any child subscription inherits the baseline the moment it appears - you do not touch it cluster-by-cluster.

Tags drive targeting and chargeback. Tag connected clusters with environment, owner, and data-classification, then write policy assignments that key off tags or build Azure Resource Graph queries for fleet inventory:

// Every Arc cluster, its agent version, and connectivity health
resources
| where type == "microsoft.kubernetes/connectedclusters"
| project name, location,
          distribution = properties.distribution,
          k8sVersion   = properties.kubernetesVersion,
          connectivity = properties.connectivityStatus,
          agentVersion = properties.agentVersion,
          env = tags.environment
| order by connectivity asc

GitOps is the fleet rollout mechanism. Because the same az k8s-configuration flux create works across every connected cluster, codify it. The Bicep below registers the Flux config as ARM intent, so onboarding a cluster and deploying a Policy assignment that requires this config means new clusters self-bootstrap their baseline:

resource fluxBaseline 'Microsoft.KubernetesConfiguration/fluxConfigurations@2023-05-01' = {
  name: 'fleet-baseline'
  scope: connectedCluster      // the Microsoft.Kubernetes/connectedClusters resource
  properties: {
    scope: 'cluster'
    namespace: 'cluster-config'
    sourceKind: 'GitRepository'
    gitRepository: {
      url: 'https://github.com/acme-platform/fleet-gitops'
      repositoryRef: { branch: 'main' }
    }
    kustomizations: {
      infra: { path: './infrastructure', prune: true }
      apps:  { path: './apps/prod', prune: true, dependsOn: ['infra'] }
    }
  }
}

The end state: a cluster joins the fleet, ARM applies the inherited Policy initiative (admission guardrails), the Flux config (desired state), the Monitor extension (telemetry), and the role assignments (kubectl access) - all without a human SSHing into the cluster.

Verify

Confirm each layer landed before you call a cluster governed:

# Agents healthy and connected
az connectedk8s show -n $CLUSTER_NAME -g $RESOURCE_GROUP --query connectivityStatus -o tsv   # -> Connected
kubectl get pods -n azure-arc     # all Running

# Flux reconciled and Kustomizations Ready
az k8s-configuration flux show --name fleet-baseline -g $RESOURCE_GROUP \
  --cluster-name $CLUSTER_NAME --cluster-type connectedClusters \
  --query "statuses[].complianceState" -o tsv

# Policy add-on present and constraints synced
kubectl get constraints
az k8s-extension show --name azurepolicy -g $RESOURCE_GROUP \
  --cluster-name $CLUSTER_NAME --cluster-type connectedClusters --query provisioningState -o tsv

# Cluster connect works end to end (run proxy in another shell first)
kubectl get nodes

# Monitor agent shipping
kubectl get ds ama-logs -n kube-system

A fast negative test for Policy: kubectl run pwn --image=nginx --privileged=true -n prod should be denied by the Gatekeeper webhook once the baseline initiative is in deny mode. If it succeeds, your assignment scope or namespace exclusions are wrong, or the constraints have not synced yet.

Enterprise scenario

A retail platform team ran 28 store-edge clusters (k3s on ruggedised hardware, one per regional distribution center) plus a GKE cluster for their loyalty service. Security mandated two things the existing setup could not deliver: a centrally enforced ban on privileged containers, and break-glass kubectl access for the on-call SRE without opening inbound ports on store networks - the stores sat behind carrier-grade NAT with no public ingress and a websocket-stripping Layer-7 proxy.

The constraint that bit them first was the proxy. Onboarding succeeded, Flux reconciled, Policy enforced - but az connectedk8s proxy hung, because cluster connect rides Azure Relay over *.servicebus.windows.net and the proxy silently dropped the websocket upgrade. The fix was an allow-rule for the resolved, regional Service Bus endpoints with websockets explicitly permitted, expanded from the wildcard via the guest-notification allowlist API:

# Run per store region; feed results into the proxy allowlist with websockets enabled
for region in eastus westus2 centralus; do
  curl -s "https://guestnotificationservice.azure.com/urls/allowlist?api-version=2020-01-01&location=$region"
done

With egress fixed, they assigned the Pod Security baseline initiative at the mg-retail-edge management group - in audit first. The audit results surfaced exactly the violators they expected: a legacy label-printer DaemonSet that ran privileged to access /dev. They refactored it to a specific device plugin, then flipped the initiative to deny. New store clusters now onboard via a pipeline that runs az connectedk8s connect, and inherit the deny policy and the Flux baseline automatically from the management group - zero per-store configuration. On-call SREs hold Azure Arc Enabled Kubernetes Cluster User Role at the MG scope, giving them az connectedk8s proxy into any store on earth without a single inbound firewall rule. The whole 28-cluster fleet went from “28 snowflakes” to “one policy, one Git repo, one identity boundary” in under a sprint.

Azure Arc-Enabled Kubernetes: GitOps, Policy, and Fleet Governance for Hybrid Clusters

1. Agent architecture, connectivity, and outbound requirements

2. Onboard an on-prem or EKS/GKE cluster

3. Configure Flux v2 GitOps via the Arc extension

4. Apply Azure Policy (Gatekeeper) at fleet scope

5. Cluster connect: kubectl without inbound firewall changes

6. Enable Azure Monitor Container Insights

7. Workload identity and Key Vault secret access

8. Scale governance across many clusters

Verify

Enterprise scenario

Governance checklist

Written by Vinod

Comments

Keep Reading

Application Gateway for Containers: Gateway API on AKS with Traffic Splitting, mTLS, and Header Routing

Azure Event Hubs at Scale: Partitioning, Capture, Kafka Endpoint, and Stream Analytics Processing

Azure Service Bus at Scale: Sessions, Deduplication, and Dead-Letter Handling