Networking Azure

Production Site-to-Site VPN to Azure: Active-Active Gateways with BGP

A Site-to-Site VPN to Azure that survives a tunnel drop without a human in the loop is not a checkbox — it is an active-active gateway, two tunnels, and BGP doing the route arithmetic. This guide builds that end to end with Terraform, hardens the crypto so you are not running 2014 defaults, and shows exactly how to prove failover with a real reconvergence test.

Static routing vs BGP: why dynamic routing is non-negotiable for HA

A static-route S2S connection carries a hard-coded list of on-prem prefixes on the local network gateway. It works for a single tunnel. It falls apart the moment you want high availability, because static routes have no concept of liveness. If the active tunnel dies, Azure keeps the static route pointing at a dead path until something — a person, a script — intervenes. There is no automatic next-best path.

BGP changes the model entirely. Instead of declaring prefixes statically, both sides advertise their routes over the tunnel and withdraw them when the session drops. With an active-active gateway you get two independent IPsec tunnels (one per gateway instance), each carrying its own BGP session. When a tunnel flaps, its BGP session times out, those routes are withdrawn, and traffic shifts to the surviving tunnel’s advertised paths automatically. No NSG edit, no portal click.

Concern Static routing BGP
Failover Manual / scripted Automatic on session loss
On-prem prefix changes Edit local network gateway, re-apply Advertised dynamically, no Azure change
Active-active gateway Limited value The whole point
Transit / multi-site Painful Native path selection

Rule of thumb: if you have an active-active gateway and are not running BGP, you have paid for two instances and one of them is doing nothing useful for failover. The two halves only become an HA pair when BGP can withdraw the dead path.

Topology: active-active gateway, dual tunnels, and ASN/APIPA planning

An active-active gateway gets two public IPs and two BGP peer addresses, one per instance. Your on-prem device builds two tunnels — one to each Azure public IP — and forms a BGP session over each. The result is a full mesh of two tunnels carrying two BGP sessions.

on-prem VPN device (ASN 65010)
   |  tunnel 1  -> Azure GW instance 0 (PIP-1, BGP peer A)
   |  tunnel 2  -> Azure GW instance 1 (PIP-2, BGP peer B)
Azure active-active VNet gateway (ASN 65515)

Two planning decisions to nail before you touch any resource:

ASN selection. Azure VPN gateways default to ASN 65515. Your on-prem side needs a different ASN. Use a private ASN — the 16-bit private range is 64512–65534, or the 32-bit private range 4200000000–4294967294. Azure also reserves a handful of ASNs you cannot use on-prem (notably 65515, 65517, 65518, 65519, 65520). Pick something clean like 65010 for on-prem and leave Azure on 65515 unless you have a reason to change it.

BGP peering addresses (APIPA). By default Azure derives the BGP peer IP from the GatewaySubnet range, which is fine if your on-prem device peers from a real address. But many devices — and effectively all AWS/GCP-style and a lot of appliance configs — require APIPA (link-local 169.254.x.x) BGP addresses for S2S. If so, you must use Azure’s reserved APIPA range, which is 169.254.21.0 to 169.254.22.255, and you cannot expand it. For an active-active gateway you assign one APIPA address per instance.

A workable APIPA plan:

Endpoint BGP address
Azure GW instance 0 169.254.21.1
Azure GW instance 1 169.254.21.5
On-prem peer (for tunnel 1) 169.254.21.2
On-prem peer (for tunnel 2) 169.254.21.6

Keep each tunnel’s pair in its own little subnet mentally (.1/.2, .5/.6). Mismatched APIPA peers are the single most common reason a BGP session refuses to come up while the IPsec SA shows connected.

Step 1 - Deploy the active-active VPN gateway and local network gateway

Active-active requires a GatewaySubnet (Azure mandates this exact name) of at least /27, two public IPs, and an SKU that supports active-active and BGP. VpnGw1 and above qualify; the legacy Basic SKU does not support BGP at all. Use a zone-redundant generation 2 SKU like VpnGw2AZ for production.

resource "azurerm_subnet" "gateway" {
  name                 = "GatewaySubnet" # name is mandatory and case-sensitive
  resource_group_name  = azurerm_resource_group.hub.name
  virtual_network_name = azurerm_virtual_network.hub.name
  address_prefixes     = ["10.0.255.0/27"]
}

resource "azurerm_public_ip" "vpngw" {
  count               = 2
  name                = "pip-vpngw-${count.index}"
  resource_group_name = azurerm_resource_group.hub.name
  location            = azurerm_resource_group.hub.location
  allocation_method   = "Static"
  sku                 = "Standard"
  zones               = ["1", "2", "3"]
}

Now the gateway. The two ip_configuration blocks plus active_active = true are what make it an HA pair. bgp_enabled = true turns on dynamic routing, and the bgp_settings block assigns the per-instance APIPA addresses via peering_addresses, each tied to its ip_configuration by name.

resource "azurerm_virtual_network_gateway" "this" {
  name                = "vpngw-hub-prod"
  resource_group_name = azurerm_resource_group.hub.name
  location            = azurerm_resource_group.hub.location

  type     = "Vpn"
  vpn_type = "RouteBased"
  sku      = "VpnGw2AZ"

  active_active = true
  bgp_enabled   = true

  ip_configuration {
    name                          = "vnetGatewayConfig0"
    public_ip_address_id          = azurerm_public_ip.vpngw[0].id
    private_ip_address_allocation = "Dynamic"
    subnet_id                     = azurerm_subnet.gateway.id
  }

  ip_configuration {
    name                          = "vnetGatewayConfig1"
    public_ip_address_id          = azurerm_public_ip.vpngw[1].id
    private_ip_address_allocation = "Dynamic"
    subnet_id                     = azurerm_subnet.gateway.id
  }

  bgp_settings {
    asn = 65515 # Azure side ASN (default)

    peering_addresses {
      ip_configuration_name = "vnetGatewayConfig0"
      apipa_addresses       = ["169.254.21.1"]
    }
    peering_addresses {
      ip_configuration_name = "vnetGatewayConfig1"
      apipa_addresses       = ["169.254.21.5"]
    }
  }
}

Gateway creation takes 30-45 minutes. Plan for it. Run this early and let it bake while you stage the rest of the config. The provider will sit on the apply the whole time.

The local network gateway represents your on-prem side. With BGP, you do not list on-prem prefixes here — you set the on-prem ASN and BGP peer address and let routing do the rest. You need a representation of the on-prem peer; with APIPA peering the bgp_settings on the local network gateway carries the on-prem APIPA peer IP.

resource "azurerm_local_network_gateway" "onprem" {
  name                = "lng-onprem-dc1"
  resource_group_name = azurerm_resource_group.hub.name
  location            = azurerm_resource_group.hub.location

  # Public IP of the on-prem VPN device (the outside interface)
  gateway_address = "203.0.113.10"

  # With BGP + APIPA you do not enumerate prefixes here; advertise via BGP.
  # address_space is omitted intentionally for a pure-BGP design.

  bgp_settings {
    asn                 = 65010           # on-prem ASN, must differ from Azure
    bgp_peering_address = "169.254.21.2"  # on-prem APIPA peer for tunnel 1
  }
}

For a fully redundant on-prem edge with two devices you would create a second local network gateway (e.g. lng-onprem-dc2) with the second device’s public IP and 169.254.21.6. For a single on-prem device terminating both tunnels, the second tunnel’s APIPA pairing is expressed on the connection in Step 3.

Step 2 - Configure BGP peering and advertising on-prem prefixes

There is nothing extra to “turn on” for advertising once BGP is enabled — Azure automatically advertises the VNet address space (and, in hub-spoke with useRemoteGateways, peered spoke ranges) to your on-prem peer. Your job is to control what on-prem advertises back.

A clean production stance:

You will validate the learned and advertised routes in the Verify section using az network vnet-gateway list-learned-routes and list-advertised-routes. There is no Azure-side knob to add custom advertised prefixes on a basic VPN gateway beyond what the VNet/peering topology defines — control the advertisement story on the on-prem side and through your peering design.

If you need to advertise a summary route to on-prem that does not match a VNet range (a common need when fronting Azure with custom aggregates), that is a Route Server / NVA pattern, not a stock VPN-gateway feature. Do not expect the VPN gateway to synthesize arbitrary aggregates.

Step 3 - Harden the connection with a custom IPsec/IKE policy (no defaults)

By default Azure negotiates from a broad list of IKE/IPsec proposals that still includes weak options (DES, SHA1, DH Group 2). For anything production or regulated, pin a single strong policy on the connection with an ipsec_policy block. When you specify one, Azure stops offering the default set and proposes only what you declare — so the on-prem device must match it exactly.

A strong, broadly interoperable AES-GCM policy:

resource "azurerm_virtual_network_gateway_connection" "tunnel1" {
  name                = "cn-onprem-dc1"
  resource_group_name = azurerm_resource_group.hub.name
  location            = azurerm_resource_group.hub.location

  type                       = "IPsec"
  virtual_network_gateway_id = azurerm_virtual_network_gateway.this.id
  local_network_gateway_id   = azurerm_local_network_gateway.onprem.id

  connection_protocol = "IKEv2"
  bgp_enabled         = true
  shared_key          = var.vpn_shared_key # pull from Key Vault, never hardcode

  dpd_timeout_seconds = 45

  ipsec_policy {
    # IKE Phase 1
    ike_encryption = "GCMAES256"
    ike_integrity  = "GCMAES256" # with GCMAES encryption, integrity must match
    dh_group       = "DHGroup14"

    # IPsec Phase 2 (ESP)
    ipsec_encryption = "GCMAES256"
    ipsec_integrity  = "GCMAES256"
    pfs_group        = "PFS14" # enable Perfect Forward Secrecy

    sa_lifetime = 3600       # seconds; rekey hourly
    sa_datasize = 102400000  # KB
  }
}

A few accuracy points that bite people:

For the active-active second tunnel terminating on the same on-prem device, create a second connection. The custom_bgp_addresses block lets you bind which Azure-side APIPA address this connection uses for its BGP session — required when one local network gateway pairs with the second gateway instance.

resource "azurerm_virtual_network_gateway_connection" "tunnel2" {
  name                       = "cn-onprem-dc1-t2"
  resource_group_name        = azurerm_resource_group.hub.name
  location                   = azurerm_resource_group.hub.location
  type                       = "IPsec"
  virtual_network_gateway_id = azurerm_virtual_network_gateway.this.id
  local_network_gateway_id   = azurerm_local_network_gateway.onprem_t2.id
  connection_protocol        = "IKEv2"
  bgp_enabled                = true
  shared_key                 = var.vpn_shared_key
  dpd_timeout_seconds        = 45

  # Bind this connection's BGP session to the second instance's APIPA address
  custom_bgp_addresses {
    primary = "169.254.21.5"
  }

  ipsec_policy {
    ike_encryption   = "GCMAES256"
    ike_integrity    = "GCMAES256"
    dh_group         = "DHGroup14"
    ipsec_encryption = "GCMAES256"
    ipsec_integrity  = "GCMAES256"
    pfs_group        = "PFS14"
    sa_lifetime      = 3600
    sa_datasize      = 102400000
  }
}

Step 4 - On-prem device config: tunnel, BGP, and dead-peer detection

The on-prem side must mirror everything: two tunnels (one per Azure PIP), the exact crypto policy, APIPA BGP peers, and DPD. Vendors differ in syntax, but the values are fixed by what you set in Azure. Here is a representative Cisco IOS-style configuration for one tunnel — replicate it for the second to the other PIP.

! Phase 1 (IKEv2) - must match the Azure ipsec_policy exactly
crypto ikev2 proposal AZURE-P1
 encryption aes-gcm-256
 prf sha256
 group 14
!
! Phase 2 (ESP) - AES-GCM-256, PFS group 14
crypto ipsec transform-set AZURE-P2 esp-gcm 256
 mode tunnel
crypto ipsec profile AZURE-PROFILE
 set transform-set AZURE-P2
 set pfs group14
 set security-association lifetime seconds 3600
!
! Tunnel to Azure gateway instance 0 (PIP-1)
interface Tunnel1
 ip address 169.254.21.2 255.255.255.255
 tunnel source GigabitEthernet0/0
 tunnel mode ipsec ipv4
 tunnel destination <AZURE_PIP_1>
 tunnel protection ipsec profile AZURE-PROFILE
!
! Dead Peer Detection - detect a dead Azure peer quickly
crypto ikev2 dpd 10 3 periodic
!
! BGP: peer to the Azure instance-0 APIPA address, eBGP
router bgp 65010
 bgp log-neighbor-changes
 neighbor 169.254.21.1 remote-as 65515
 neighbor 169.254.21.1 ebgp-multihop 8
 neighbor 169.254.21.1 update-source Tunnel1
 !
 address-family ipv4
  network 192.168.10.0 mask 255.255.255.0
  neighbor 169.254.21.1 activate
 exit-address-family

Three on-prem details that matter:

Verify

Prove both the IPsec layer and the BGP layer independently. A connected tunnel with a dead BGP session is a silent failure waiting to bite you on failover.

Check connection status and that both tunnels are up:

# Both connections should report Connected
az network vpn-connection show \
  -g rg-hub-prod -n cn-onprem-dc1 \
  --query "{name:name, status:connectionStatus, ingress:ingressBytesTransferred, egress:egressBytesTransferred}" -o table

az network vpn-connection show \
  -g rg-hub-prod -n cn-onprem-dc1-t2 \
  --query "{name:name, status:connectionStatus}" -o table

Confirm BGP peers are established on the gateway (this is the real HA check):

az network vnet-gateway list-bgp-peer-status \
  -g rg-hub-prod -n vpngw-hub-prod \
  --query "value[].{peer:neighbor, state:state, asn:asn, routes:routesReceived}" -o table

You want state = Connected for both on-prem APIPA peers and a non-zero routesReceived. Then inspect what each side is exchanging:

# Routes Azure has LEARNED from on-prem (expect your summarized corp prefixes)
az network vnet-gateway list-learned-routes \
  -g rg-hub-prod -n vpngw-hub-prod \
  --query "value[].{network:network, nextHop:nextHop, asPath:asPath, source:sourcePeer}" -o table

# Routes Azure is ADVERTISING to a specific on-prem peer
az network vnet-gateway list-advertised-routes \
  -g rg-hub-prod -n vpngw-hub-prod --peer 169.254.21.2 \
  --query "value[].{network:network, nextHop:nextHop}" -o table

On the on-prem device, confirm the mirror image:

show crypto ikev2 sa          ! Phase 1 up on both tunnels
show crypto ipsec sa          ! Phase 2 SAs installed, encrypting/decrypting
show ip bgp summary           ! both neighbors in Established, prefixes received
show ip route bgp             ! Azure VNet/spoke prefixes learned via BGP

A healthy deployment shows: both connections Connected, both BGP peers Connected/Established, on-prem corp prefixes in list-learned-routes, Azure VNet ranges in the on-prem BGP table, and live ingress/egress byte counters on both connections.

Testing failover: simulating tunnel loss and observing reconvergence

Do not trust HA you have not broken on purpose. The cleanest non-destructive test is to drop one tunnel and watch BGP reconverge while a continuous ping keeps running.

  1. Start a continuous ping from an on-prem host to an Azure VM (private IP) and leave it running in another window.
  2. Tear down tunnel 1 from the on-prem side only — for example clear crypto session on that tunnel interface, or shut the tunnel interface (interface Tunnel1 then shutdown). This avoids touching Azure and mimics a real path failure.
  3. Watch the ping. With BGP + DPD tuned, you should see a small number of dropped packets (single-digit, depending on DPD timers and BGP hold time) and then traffic resumes over tunnel 2.
  4. Confirm the reconvergence on the Azure side:
# The downed peer should drop to a non-Connected state, the other stays up
az network vnet-gateway list-bgp-peer-status \
  -g rg-hub-prod -n vpngw-hub-prod \
  --query "value[].{peer:neighbor, state:state, routes:routesReceived}" -o table

# Learned routes should now show next-hop via the surviving peer only
az network vnet-gateway list-learned-routes \
  -g rg-hub-prod -n vpngw-hub-prod \
  --query "value[?network=='192.168.10.0/24'].{net:network, nextHop:nextHop}" -o table
  1. Bring tunnel 1 back (no shutdown), confirm the BGP session re-establishes, and verify the route reappears via both peers.

Failover speed is governed by your slowest detection timer — DPD on the IPsec layer and BGP hold time (default 180s, derived from a 60s keepalive) on the routing layer. If failover feels slow, BGP timers are usually the culprit. Lowering the keepalive/hold timers tightens reconvergence but increases sensitivity to jitter; tune deliberately, do not just slam them to the minimum.

Enterprise scenario

A payments platform ran active-active VPN gateways to two on-prem datacenters and reported “random” 30-90 second outages on failover, far worse than the single-digit packet loss they tested at launch. Their tunnels were healthy; the culprit was BGP path selection. On-prem advertised 192.168.0.0/16 over both tunnels with identical attributes, so Azure load-balanced (ECMP) across instance 0 and instance 1. When instance 0’s tunnel dropped, traffic on that path blackholed until the BGP hold timer (180s default, but they had tuned it to ~90s) expired and withdrew the route. DPD was tearing down IPsec fast, but BGP had not yet pulled the path.

The fix was twofold. First, they let DPD-driven IPsec teardown trigger faster BGP convergence by tightening the on-prem BGP timers to a 5s keepalive / 15s hold instead of relying on the default, accepting the jitter tradeoff on a clean MPLS underlay:

router bgp 65010
 neighbor 169.254.21.1 timers 5 15
 neighbor 169.254.21.5 timers 5 15

Second — the real win — they switched from symmetric advertisement to deterministic primary/backup using AS-path prepending on the secondary tunnel, so steady-state traffic was not ECMP-split across both instances and only one path had to converge on failure. They also added BFD-style fast detection where the on-prem platform supported it. Post-change failover dropped to under 3 seconds of loss. The lesson: active-active plus ECMP is throughput, not low-RTO HA. If you need deterministic sub-5s failover, make one path primary and let BGP timers, not just DPD, drive reconvergence.

Failover and hardening checklist

Throughput tuning, SKU sizing, and diagnosing tunnel flaps

SKU sizing drives throughput and tunnel count. The VpnGw* SKUs scale aggregate throughput and BGP scale roughly with the tier — VpnGw1 through VpnGw5, with the AZ variants adding zone redundancy. Size on aggregate throughput across all tunnels, not per-tunnel, and remember a single IPsec tunnel will not saturate a large gateway because per-tunnel throughput is capped well below the SKU aggregate. If you need more than one tunnel’s worth of bandwidth to a single site, that is a multi-tunnel or ExpressRoute conversation, not a bigger-VPN-SKU conversation.

Diagnose flaps from the logs, not the portal. Enable diagnostic settings on the gateway and stream the tunnel and route categories to Log Analytics:

az monitor diagnostic-settings create \
  --name vpngw-diag \
  --resource $(az network vnet-gateway show -g rg-hub-prod -n vpngw-hub-prod --query id -o tsv) \
  --workspace $(az monitor log-analytics workspace show -g rg-hub-prod -n law-hub --query id -o tsv) \
  --logs '[{"category":"TunnelDiagnosticLog","enabled":true},
           {"category":"RouteDiagnosticLog","enabled":true},
           {"category":"IKEDiagnosticLog","enabled":true},
           {"category":"GatewayDiagnosticLog","enabled":true}]'

Then a tunnel that keeps flapping shows its connect/disconnect events with a reason:

AzureDiagnostics
| where Category == "TunnelDiagnosticLog"
| where TimeGenerated > ago(6h)
| project TimeGenerated, status_s, stateChangeReason_s, remoteIP_s
| order by TimeGenerated desc

A repeating connect/disconnect with stateChangeReason pointing at a policy or peer mismatch is almost always a crypto or APIPA asymmetry between the two sides. IKEDiagnosticLog will show the failed Phase 1 negotiation directly.

Pitfalls that cause production outages

Build it active-active, pin a single hardened crypto policy on both ends, peer BGP over both tunnels, and then break a tunnel on purpose and watch the ping recover. Hybrid connectivity that has survived a deliberate failure is the only kind you should put production traffic on.

AzureVPNBGPHybridIPsecNetworking

Comments

Keep Reading