api.contoso.com should resolve to 10.20.4.10 for a workload inside the VPC and to 203.0.113.10 for a customer on the public internet. Same name, two answers, decided entirely by where the query originated. That is split-horizon DNS, and it is the cleanest way to keep an internal-only path (no NAT, no internet egress, lower latency, traffic that never leaves your network) while still presenting a normal public hostname to the outside world.
The trap is that “two answers for one name” is one config mistake away from “two sources of truth for one name,” and that is split-brain. The internal copy drifts, an internal-only record accidentally appears in the public zone, or the public answer points at an IP whose TLS certificate has no matching SAN. This article builds split-horizon correctly on three stacks — generic overlapping zones, AWS Route 53, and on-prem BIND/Windows — and then proves with dig from every network position that no record leaks across the horizon.
1. Why split-horizon exists, and the TLS SAN constraint
The point of split-horizon is path control without renaming. You want internal clients to take the private path and external clients the public path, but you cannot give them different hostnames — the TLS certificate, the connection string baked into a hundred services, and the customer’s bookmark all reference one FQDN.
That single-name requirement collides with TLS in a way people discover the hard way. A certificate’s validity is checked against its Subject Alternative Name list, not the IP it resolved to. So both the private 10.20.4.10 and the public 203.0.113.10 must terminate TLS with a certificate whose SAN includes api.contoso.com. If the internal endpoint serves a different cert (say a *.internal.contoso.com wildcard or a self-signed cert), the client name still says api.contoso.com, the SAN does not match, and you get a hostname-mismatch error that looks nothing like a DNS problem.
Rule of thumb: split-horizon splits the address, never the name and never the certificate identity. Every IP a horizon hands out for an FQDN must be reachable over TLS with a cert that lists that exact FQDN in its SAN. If you cannot put both IPs behind the same cert identity, you do not have a split-horizon problem — you have two different services and should use two different names.
There is a corollary for HTTP routing. The Host header and SNI carry api.contoso.com in both directions, so your internal load balancer and your public load balancer must both have a listener/rule that matches that host. A split-horizon design with mismatched listener host rules fails identically to a SAN mismatch.
2. The split-brain failure mode
Split-brain is what split-horizon degrades into when the two halves are maintained independently. Three concrete failures:
Divergent records. Someone repoints the public api.contoso.com to a new IP during a migration and forgets the internal zone. Internal clients keep hitting the dead old IP for days. Nothing alerts, because both records resolve fine in isolation — they just disagree.
Stale internal copies. The internal zone is a hand-maintained file. The public side moves to a new CDN or load balancer monthly via automation; the internal A record was set once in 2023 and never touched. The horizons silently desynchronize over time.
Accidental public exposure. This is the dangerous one. An engineer adds vault.contoso.com -> 10.20.9.5 to what they think is the internal-only zone, but the change lands in the public zone (or the public zone is authoritative for the whole apex and inherits it). Now the private IP of an internal secrets service is published on the internet. It is not reachable, but it is enumerable — you have just handed an attacker your internal addressing scheme and a target list.
The defense is structural, not procedural: never let a human decide which zone a record lands in, and never let a record exist in exactly one horizon by accident. Both come from infrastructure-as-code, covered in section 6.
3. Pattern A — overlapping public and private zones with provider precedence
The most portable pattern: define two zones of the same name, one private and one public, and rely on the provider’s precedence rule that a private/internal zone wins for clients that can see it.
In Azure, you create a public DNS zone contoso.com (delegated from your registrar, authoritative on the internet) and a Private DNS Zone of the same name contoso.com, linked to the VNets that should get internal answers.
# Public zone — authoritative on the internet
az network dns zone create \
--resource-group rg-dns-public \
--name contoso.com
az network dns record-set a add-record \
--resource-group rg-dns-public \
--zone-name contoso.com \
--record-set-name api \
--ipv4-address 203.0.113.10
# Private zone — same name, internal answers only
az network private-dns zone create \
--resource-group rg-dns-private \
--name contoso.com
az network private-dns record-set a add-record \
--resource-group rg-dns-private \
--zone-name contoso.com \
--record-set-name api \
--ipv4-address 10.20.4.10
# Link the private zone to the VNet that should see internal answers
az network private-dns link vnet create \
--resource-group rg-dns-private \
--zone-name contoso.com \
--name link-app-vnet \
--virtual-network app-vnet \
--registration-enabled false
The precedence rule that makes this work: a VNet using Azure-provided DNS (168.63.129.16) consults any Private DNS Zone linked to it before going to the public internet. So a VM in app-vnet resolving api.contoso.com gets 10.20.4.10 from the linked private zone. A host anywhere else resolves the public delegation and gets 203.0.113.10. Two zones, one name, the link membership decides the horizon.
The critical discipline with Pattern A: the private zone must contain a record for every name that internal clients query, including names that only live publicly. Because the private zone, once linked, shadows the entire contoso.com apex for those VNets, an internal client asking for www.contoso.com will get NXDOMAIN if www exists only in the public zone. Linking a private zone of the apex name silently makes that zone authoritative for the whole apex from the VNet’s perspective.
This is the single most common Pattern A outage. You link
contoso.comprivately for two internal records, and suddenly internal clients can no longer resolvewww,internal.contoso.com, or mirror every public record into the private zone (section 6 makes this automatic).
4. Pattern B — Route 53 private/public hosted zone pairs
Route 53 has first-class split-horizon support: a private hosted zone and a public hosted zone can carry the identical name, and a VPC associated with the private zone gets the private answer.
# Public hosted zone — internet-facing, delegated from registrar
resource "aws_route53_zone" "public" {
name = "contoso.com"
}
# Private hosted zone — same name, associated to the app VPC
resource "aws_route53_zone" "private" {
name = "contoso.com"
vpc {
vpc_id = aws_vpc.app.id
}
}
resource "aws_route53_record" "api_public" {
zone_id = aws_route53_zone.public.zone_id
name = "api.contoso.com"
type = "A"
ttl = 60
records = ["203.0.113.10"]
}
resource "aws_route53_record" "api_private" {
zone_id = aws_route53_zone.private.zone_id
name = "api.contoso.com"
type = "A"
ttl = 60
records = ["10.20.4.10"]
}
For this to resolve, the VPC must have enableDnsSupport and enableDnsHostnames set, and queries must hit the Amazon-provided resolver at the VPC base +2 address (the .2 of the VPC CIDR, e.g. 10.20.0.2). When a VPC is associated with a private hosted zone whose name matches a public zone, the resolver returns the private zone’s record for that name to clients in that VPC and the public record to everyone else. Same precedence model as Azure: association membership selects the horizon.
For on-prem clients that need the private answer (VPN/Direct Connect users who should take the internal path), you point their resolution at Route 53 via Resolver inbound endpoints and a forwarding rule:
resource "aws_route53_resolver_endpoint" "inbound" {
name = "r53-inbound"
direction = "INBOUND"
security_group_ids = [aws_security_group.resolver.id]
ip_address {
subnet_id = aws_subnet.resolver_a.id
}
ip_address {
subnet_id = aws_subnet.resolver_b.id
}
}
On-prem DNS then conditionally forwards contoso.com to the inbound endpoint IPs. Because the inbound endpoint resolves as if it were a VPC client, it returns the private-zone answers — which is exactly what you want for internal VPN users. The reverse case (VPN users who must take the public path) is a client-placement decision covered in section 7; do not solve it by forwarding to Route 53.
5. Pattern C — BIND views and Windows DNS policies on-prem
For authoritative on-prem servers, the split lives inside one server using views (BIND) or DNS policies (Windows Server). A view selects a different zone file based on the client’s source subnet via an ACL.
BIND named.conf:
acl "internal-nets" {
10.0.0.0/8;
172.16.0.0/12;
192.168.0.0/16;
};
view "internal" {
match-clients { internal-nets; };
recursion yes;
zone "contoso.com" {
type master;
file "/etc/bind/db.contoso.com.internal";
};
};
view "external" {
match-clients { any; }; // everything that did not match "internal"
recursion no;
zone "contoso.com" {
type master;
file "/etc/bind/db.contoso.com.external";
};
};
Two non-negotiable rules for BIND views. First, order matters: match-clients is evaluated top-down, so the restrictive internal view must come before the catch-all external view, or every client matches external first. Second, once you define any view, every zone must live inside a view — BIND refuses to mix view and non-view zones in one config.
The internal zone file carries the private A record; the external file carries the public one. Critically, the external file is a strict subset — it must not contain any internal-only hostnames:
; db.contoso.com.external — ONLY records the public should see
$TTL 300
@ IN SOA ns1.contoso.com. hostmaster.contoso.com. ( 2026060801 3600 600 604800 300 )
IN NS ns1.contoso.com.
api IN A 203.0.113.10
www IN A 203.0.113.20
; NOTE: no "vault", no "jenkins", no RFC1918 addresses here, ever.
Windows Server achieves the same with a client subnet plus a query-resolution policy:
# Define the internal subnet
Add-DnsServerClientSubnet -Name "InternalSubnet" `
-IPv4Subnet "10.0.0.0/8","172.16.0.0/12","192.168.0.0/16"
# Two zone scopes under the same zone
Add-DnsServerZoneScope -ZoneName "contoso.com" -Name "InternalScope"
Add-DnsServerZoneScope -ZoneName "contoso.com" -Name "ExternalScope"
# Records into each scope
Add-DnsServerResourceRecord -ZoneName "contoso.com" -ZoneScope "InternalScope" `
-A -Name "api" -IPv4Address "10.20.4.10"
Add-DnsServerResourceRecord -ZoneName "contoso.com" -ZoneScope "ExternalScope" `
-A -Name "api" -IPv4Address "203.0.113.10"
# Policy: internal subnet -> internal scope; default answers external scope
Add-DnsServerQueryResolutionPolicy -Name "SplitInternal" -Action ALLOW `
-ClientSubnet "EQ,InternalSubnet" -ZoneScope "InternalScope,1" `
-ZoneName "contoso.com"
6. Keep the two halves in sync with IaC
Everything above fails the same way if the two horizons are edited independently. The fix is to make the record set the unit of declaration and emit it into both horizons from one source. In Terraform, drive both zones from one map so a record physically cannot exist in only one view:
variable "split_records" {
description = "One declaration -> two horizons. private_ip null means public-only."
type = map(object({
public_ip = string
private_ip = optional(string)
}))
default = {
api = { public_ip = "203.0.113.10", private_ip = "10.20.4.10" }
www = { public_ip = "203.0.113.20" } # public-only, no private answer
}
}
resource "aws_route53_record" "public" {
for_each = var.split_records
zone_id = aws_route53_zone.public.zone_id
name = "${each.key}.contoso.com"
type = "A"
ttl = 60
records = [each.value.public_ip]
}
# Private record emitted for the SAME keys; for public-only names we still
# emit a private record pointing at the public IP so the apex shadow (section 3)
# never returns NXDOMAIN to internal clients.
resource "aws_route53_record" "private" {
for_each = var.split_records
zone_id = aws_route53_zone.private.zone_id
name = "${each.key}.contoso.com"
type = "A"
ttl = 60
records = [coalesce(each.value.private_ip, each.value.public_ip)]
}
The coalesce is the load-bearing line. A name with no internal-specific IP still gets a private record — pointing at its public IP — so the private zone is a complete superset of the public one and never NXDOMAINs an internal client for a public-only name. Internal-only names simply never appear in the public resource. The horizons cannot drift because there is exactly one place that defines both.
For on-prem BIND, the equivalent discipline is to generate both zone files from one templated data source (Ansible, dnscontrol, or a small render step in CI) rather than editing db.*.internal and db.*.external by hand. dnscontrol is purpose-built for this: it models records once and pushes to multiple providers/zones, with a diff you review before apply.
7. Client placement and the VPN straddle problem
Split-horizon is only as good as the resolver each client is configured to use. The selection happens at one of two layers:
- Network membership (Pattern A/B): the client is in a VNet/VPC linked to the private zone, so the platform resolver hands it the internal answer automatically. Nothing on the client changes.
- Resolver targeting (Pattern C / on-prem): the client’s configured DNS server, plus that server’s view ACL, decides the horizon. Get a client onto the wrong resolver and it sees the wrong horizon regardless of where it physically sits.
The hard case is VPN clients that straddle both horizons. A laptop on full-tunnel VPN has an internal source IP and should get internal answers — point its DNS at the internal resolver and the view ACL matches its tunnel IP. But a split-tunnel VPN client is the genuinely ambiguous one: some traffic goes through the tunnel, some goes direct to the internet. If such a client resolves api.contoso.com to the private 10.20.4.10 but its route table sends 10.20.4.10 out the tunnel — fine. If the OS resolver picks the physical adapter’s public DNS instead of the tunnel’s, it gets 203.0.113.10 and takes the internet path. The behavior is decided by DNS interface metric and DNS suffix scoping, not by your zones.
The clean answer is explicit DNS suffix scoping on the VPN profile: bind contoso.com resolution to the tunnel’s resolver so the OS never races adapters for your domain.
# Windows VPN: force contoso.com to resolve via the tunnel's internal resolver
Add-VpnConnectionTriggerDnsConfiguration -ConnectionName "Corp-VPN" `
-DnsSuffix "contoso.com" `
-DnsIPAddress "10.20.0.2" -PassThru
This makes the straddle deterministic: anything under contoso.com resolves through the tunnel resolver (internal horizon, internal path), everything else uses the local adapter. Decide per-domain which horizon VPN users belong to and encode it in the profile — never leave it to OS adapter-ordering heuristics.
Verify: prove no record leaks across the horizon
Split-horizon is only correct if you test it from every network position. Query each authoritative source directly with dig and assert the expected IP. Do not trust a single nslookup from your laptop.
# 1. From INSIDE the VPC/VNet — expect the PRIVATE IP
# (run on a VM in the linked network)
dig +short api.contoso.com
# expect: 10.20.4.10
# 2. From the PUBLIC internet — expect the PUBLIC IP
# Query a public resolver explicitly to bypass any local override
dig +short api.contoso.com @1.1.1.1
# expect: 203.0.113.10
# 3. Query the on-prem authoritative server AS an internal client
dig api.contoso.com @ns1.contoso.com -b 10.20.4.99
# expect: ANSWER 10.20.4.10 (-b sets source IP to match the internal view ACL)
# 4. LEAK TEST — an internal-only name must NOT exist publicly
dig +short vault.contoso.com @1.1.1.1
# expect: EMPTY / NXDOMAIN (any answer here = internal record leaked)
# 5. APEX-SHADOW TEST — a public-only name must still resolve internally
dig +short www.contoso.com # run from inside the VPC/VNet
# expect: 203.0.113.20 (NXDOMAIN here = private zone shadowed the apex)
Tests 4 and 5 are the ones that catch real bugs. Test 4 fails when an internal record leaked into the public zone — the most dangerous misconfiguration. Test 5 fails when a private apex zone is shadowing public-only names (section 3). Build them into CI as a post-apply assertion so a bad terraform apply is caught before customers are.
Validation matrix
| Query position | Resolver targeted | Name | Expected answer |
|---|---|---|---|
| Inside VPC/VNet | Platform resolver (.2 / 168.63.129.16) |
api.contoso.com |
10.20.4.10 |
| Public internet | 1.1.1.1 / 8.8.8.8 |
api.contoso.com |
203.0.113.10 |
| On-prem, internal src IP | on-prem authoritative | api.contoso.com |
10.20.4.10 |
| Full-tunnel VPN | tunnel internal resolver | api.contoso.com |
10.20.4.10 |
| Public internet | 1.1.1.1 |
vault.contoso.com |
NXDOMAIN (leak test) |
| Inside VPC/VNet | platform resolver | www.contoso.com |
203.0.113.20 (apex-shadow test) |
Enterprise scenario
A payments platform team ran pay.acme.com split-horizon on AWS: a public Route 53 zone pointing at an internet-facing ALB (203.0.113.40), and a private hosted zone of the same name pointing at an internal ALB (10.50.2.40) so that service-to-service calls inside the VPC stayed off the internet for PCI scope reasons. It worked for a year.
Then they onboarded a partner over a VPC peering connection. The partner’s VPC needed to call pay.acme.com and take the internal path. The team associated the private hosted zone with the partner VPC — and the partner immediately started getting 10.50.2.40, which they could not route, because the private hosted zone returned the internal ALB IP but there was no peering route for 10.50.2.0/24 from the partner side. Worse, when the team rolled the internal ALB to a new IP during a maintenance window, they updated only the private record their own automation knew about; a second, hand-created private record for the partner’s edge case still pointed at the old internal ALB. Classic split-brain — two private answers for one name, one of them stale.
The fix had two parts. First, they collapsed to a single source of truth: one Terraform map drove the public record, the internal private record, and the partner association, so the IP existed in exactly one declaration. Second — the real lesson — they stopped trying to serve the partner the raw RFC1918 internal IP. The partner sat in a different routing domain, so an internal A record was meaningless to them. Instead they fronted the internal path with a Route 53 Resolver inbound endpoint in a shared-services VPC that the partner could route to, and had the partner forward pay.acme.com there. The inbound endpoint resolved against the private zone and returned the internal answer — but now the answer was reachable, because the resolver IP lived in a routable shared subnet.
# Single declaration -> public + private, no hand-edited partner record
locals {
pay = { public = "203.0.113.40", private = "10.50.2.40" }
}
resource "aws_route53_record" "pay_public" {
zone_id = aws_route53_zone.public.zone_id
name = "pay.acme.com"
type = "A"
ttl = 30
records = [local.pay.public]
}
resource "aws_route53_record" "pay_private" {
zone_id = aws_route53_zone.private.zone_id
name = "pay.acme.com"
type = "A"
ttl = 30
records = [local.pay.private] # one place; partner forwards to the inbound endpoint
}
The principle that came out of the retro: split-horizon answers must be routable from wherever they are served. Handing a peered or VPN client a private IP it cannot route is not split-horizon working as designed — it is split-brain with extra steps. Serve the internal horizon through a resolver the client can actually reach, and let exactly one IaC declaration own the record.