DNSSEC End to End: Signing Public Zones and Enforcing Validation on Hybrid Resolvers

DNSSEC is the only widely deployed mechanism that lets a resolver prove a DNS answer came from the zone’s owner and was not tampered with in flight. Without it, a recursive resolver takes whatever the network hands it: an on-path attacker, a poisoned cache, or a misconfigured forwarder can substitute an A record, and your TLS client connects to the wrong IP before the certificate is ever checked. Signing covers only half the problem; a signed zone that nobody validates is decoration. This guide does both halves: build the key hierarchy and signatures on the authority side, publish the chain of trust up to the registrar, then enable validating resolution on cloud and on-prem resolvers so forged answers are dropped before an app sees them.

The hard parts of DNSSEC are not the signing commands. They are the rollovers and the failure modes. A lapsed signature or a stale DS record does not degrade gracefully; it returns SERVFAIL to every validating client and blackholes the zone. So we spend real time on rollover choreography and failure drills.

1. How a resolver validates: the chain of trust

Validation walks a chain of cryptographic delegations from the root down to the record you asked for. Four record types carry it:

Record	Lives in	Role
`DNSKEY`	The zone	Public keys. A ZSK (Zone Signing Key) and a KSK (Key Signing Key).
`RRSIG`	The zone	A signature over an RRset, made with a private key. Carries inception/expiration.
`DS`	The parent zone	A hash of the child’s KSK DNSKEY. This is the link across the delegation.
`NSEC` / `NSEC3`	The zone	Authenticated denial of existence - proves a name does not exist.

The walk for www.example.com:

.            DNSKEY (trust anchor, baked into the resolver)
  -> DS for com   (signed by root's ZSK, verified by root RRSIG)
com          DNSKEY
  -> DS for example.com   (signed by com's ZSK)
example.com  DNSKEY (KSK signs the DNSKEY set; ZSK signs everything else)
  -> RRSIG over the www A record   (made by the ZSK)

Each step verifies the next link’s signature using a key whose hash the parent vouched for. The root key is the anchor every validating resolver ships with. Break any link - a DS that does not match the published KSK, an RRSIG past its expiration - and the resolver returns SERVFAIL, not a forged answer and not the real one. That fail-closed behavior is the entire security value, and also the operational risk.

NSEC and NSEC3 handle the “this name does not exist” case. You cannot sign a record that is not there, so the zone proves a gap. NSEC lists the next existing name in canonical order, which lets anyone walk the zone to enumerate every name; NSEC3 hashes names so the zone is not trivially enumerable.

2. Sign a public zone: KSK vs ZSK, algorithm, NSEC3

Split your keys. The KSK signs only the DNSKEY RRset and is the key the parent’s DS points at. The ZSK signs every other RRset. The split exists so you can roll the workhorse ZSK frequently without touching the registrar, and roll the KSK rarely - the KSK roll being the one that requires a parent DS update.

Pick the algorithm deliberately. ECDSA P-256 (algorithm 13) is the modern default: 256-bit keys and far smaller signatures than RSA, which keeps responses under the UDP fragmentation pain point and shrinks amplification. RSASHA256 (algorithm 8) is the conservative interoperable fallback; use it only if a resolver in your path genuinely lacks ECDSA support (rare in 2026). Do not use algorithm 7 (NSEC3RSASHA1) - SHA-1 is deprecated for DNSSEC.

If you run a hosted zone on Route 53 or Azure DNS, the provider manages keys and signing for you - you enable DNSSEC and it generates a KSK (and handles ZSK internally), so the manual dnssec-keygen flow below is for self-hosted BIND. Route 53 uses ECDSA P-256; you still own the DS publication step at the registrar either way.

For a self-hosted BIND zone, generate the keys with ECDSA P-256:

# ZSK (zone signing key) - ZONE flag, signs the data
dnssec-keygen -a ECDSAP256SHA256 -n ZONE example.com

# KSK (key signing key) - KSK flag, signs the DNSKEY set, parent points here
dnssec-keygen -a ECDSAP256SHA256 -f KSK -n ZONE example.com

Modern BIND (9.16+) makes manual signing largely obsolete with dnssec-policy, which automates signing and rollovers inside named. The recommended path is to let BIND maintain it:

# named.conf
dnssec-policy "ecdsa-default" {
    keys {
        ksk lifetime P365D algorithm ecdsap256sha256;
        zsk lifetime P90D  algorithm ecdsap256sha256;
    };
    nsec3param iterations 0 optout no salt-length 0;
};

zone "example.com" {
    type primary;
    file "/var/named/example.com.zone";
    dnssec-policy "ecdsa-default";
    inline-signing yes;
};

A few non-obvious choices in that policy:

iterations 0 for NSEC3. RFC 9276 is explicit: zero extra iterations. More iterations add resolver CPU cost for negligible security benefit and invite DoS. Ignore the old “10 iterations” advice.
salt-length 0. The NSEC3 salt provides no meaningful protection against precomputation for a public zone and complicates rollovers. Empty salt is the current guidance.
opt-out no. Opt-out lets you skip signing insecure delegations - only useful at TLD scale with many unsigned children. For a normal zone it weakens denial-of-existence proofs; leave it off.

If you must sign offline/manually instead, the signer is dnssec-signzone:

dnssec-signzone -A -3 - -N INCREMENT -o example.com -t example.com.zone
# -3 - : NSEC3 with empty salt;  -N INCREMENT : bump the SOA serial

3. Publish the DS record and prove the chain

After the zone is signed, the parent (your registrar / TLD) must publish a DS record that hashes your KSK. Until that DS exists and matches, validating resolvers treat the zone as unsigned (insecure), not broken - so signing without publishing the DS gives you zero protection. Generate the DS from the KSK:

# From the signed zone, emit the DS for the active KSK (SHA-256, digest type 2)
dnssec-dsfromkey -2 Kexample.com.+013+12345.key

That prints a line like:

example.com. IN DS 12345 13 2 49FD9... (hex digest)

Hand that DS to the registrar. Most have a “DNSSEC / DS records” panel; some accept the full DNSKEY and compute the DS themselves. Critical: publish the DS only after the matching DNSKEY is live and propagated, and never remove a DS while clients may still be validating against that key. A DS that points at a key the zone no longer serves equals instant SERVFAIL.

Verify the live chain top to bottom with delv (the validating lookup tool that ships with BIND). Unlike dig +dnssec, delv actually performs validation against the root anchor and tells you the verdict:

delv @8.8.8.8 www.example.com A +rtrace
# Look for the verdict line:
#   ; fully validated

; fully validated means the whole chain checked out. ; unsigned answer means no secure delegation exists (DS missing or insecure). Anything else is a break to chase. To inspect the zone’s own consistency before it goes live, use dnssec-verify:

dnssec-verify -o example.com example.com.zone.signed

It confirms every RRset has a valid RRSIG, the NSEC3 chain is complete, and signatures are within their validity window.

4. Automate rollovers without a validation gap

Keys must rotate. The danger is a window where a resolver has cached old material but the zone now serves only new - a self-inflicted SERVFAIL. The two roll types use different choreography.

ZSK rollover - pre-publish. The ZSK is not referenced by the parent DS, so you never touch the registrar:

Publish the new ZSK in the DNSKEY set alongside the old one. Sign nothing with it yet.
Wait at least the DNSKEY TTL so every resolver has both keys cached.
Switch signing to the new ZSK (re-sign the zone). Resolvers already hold the key, so existing cached signatures still validate against the old key until they expire.
After the max RRSIG TTL has passed, remove the old ZSK from the DNSKEY set.

KSK rollover - double-DS (double-signature at the parent). The KSK is what the DS points at, so the parent must learn the new key before you retire the old one:

Generate the new KSK and add its DNSKEY to the zone. Both KSKs now sign the DNSKEY set.
Publish the new KSK’s DS at the registrar so the parent now has two DS records. Either chain validates.
Wait for the old DS’s TTL to expire from caches so no resolver is pinned to only the old DS.
Remove the old DS at the registrar, then remove the old KSK from the zone.

With dnssec-policy, named runs both rolls automatically from the lifetime values - but it cannot publish the KSK’s DS at the registrar. That handoff is the one manual step (or an API call). BIND signals readiness; you watch key state and act on the parent:

# Show every key's rollover state and the timing of each phase
rndc dnssec -status example.com

# When BIND reports the new KSK needs its DS submitted, emit the new DS:
dnssec-dsfromkey -2 /var/named/keys/Kexample.com.+013+54321.key
# ...publish it at the registrar, then tell BIND the parent has it:
rndc dnssec -checkds -key 54321 published example.com
# and once the old DS is gone from the parent:
rndc dnssec -checkds -key 12345 withdrawn example.com

The single most common DNSSEC outage is an automated CDS/CDNSKEY pipeline (or a registrar’s auto-DNSSEC) rotating a key while the registrar’s DS update lags or silently fails. Treat the DS-at-parent step as a monitored, alerting workflow - not fire-and-forget. RFC 7344 CDS/CDNSKEY records let the parent poll for changes, but only some registrars honor them; verify yours does before relying on it.

5. Enable validating resolution: Route 53, Azure, on-prem

Signing protects clients of other resolvers. To protect your apps, your resolvers must validate. This is the half most teams skip.

Amazon Route 53 Resolver. Validation is configured at the resolver level via a Resolver DNSSEC validation config, attached to a VPC. With the AWS CLI:

aws route53resolver update-resolver-dnssec-config \
  --resource-id vpc-0abc123 \
  --validation ENABLE

With this enabled, the Route 53 Resolver validates DNSSEC for queries from that VPC and returns SERVFAIL for responses that fail validation. (To sign a Route 53 hosted zone, that is a separate action - aws route53 enable-hosted-zone-dnssec plus a KMS-backed KSK - covered in step 2.)

Azure DNS Private Resolver. It sits in front of Azure-provided DNS and validates on its recursive path - there is no per-zone toggle; validation applies to forwarded/public names it resolves. To sign an Azure Public DNS zone, enable DNSSEC on the zone (a managed capability) and publish the DS at your registrar as in step 3. Confirm from a workload behind the inbound endpoint:

# From a VM whose VNet uses the Private Resolver inbound endpoint:
dig +dnssec @<inbound-endpoint-ip> www.cloudflare.com A
# AD flag set in the header == the resolver validated the answer

The AD (Authenticated Data) flag in the response header is the wire-level proof of validation. No AD on a signed name means your resolver is not validating, even if upstream is signed.

On-prem BIND. Validation is on by default in modern BIND, but make it explicit:

# named.conf - recursive resolver
options {
    recursion yes;
    dnssec-validation auto;   # built-in, RFC 5011-maintained root trust anchor
};

dnssec-validation auto tracks root KSK rollovers automatically. Avoid dnssec-validation yes with a hand-pasted static anchor unless you have a process to update it - a stale root anchor is a zone-wide outage waiting for the next root roll.

On-prem Windows Server DNS. Install the root trust anchor so the service can validate the chain, then require DNSSEC per namespace via a Name Resolution Policy Table (NRPT) rule:

# Install the root zone trust anchor (root KSK) on the DNS server:
Add-DnsServerTrustAnchor -Root

The validation requirement is pushed as a Group Policy NRPT rule (“Require DNSSEC validation in name and address data”) for the namespaces you care about; the trust anchor above is what lets the DNS service actually verify the signatures.

6. Negative caching, NSEC3 walking, and serve-stale

Three operational behaviors interact with DNSSEC in ways that bite later.

Negative-answer caching. Signed “does not exist” (NSEC3) proofs are cached like positive answers, bounded by the SOA minimum TTL (the last SOA field, RFC 2308). Too high and a newly added name is invisible for hours; too low and you lose the caching benefit. A 300-900s minimum is a sane public-zone default.

NSEC3 walking. NSEC3 hashes names, but offline dictionary attacks (nsec3walker/nsec3map-class tools) still recover many names. It raises the cost of enumeration; it does not make a zone secret. Do not put confidential names in a public signed zone assuming NSEC3 hides them.

Serve-stale. Returning expired cached data when the authority is unreachable can also mean serving an expired RRSIG, which a strict validator rejects. Keep the stale window short and lean on signature-expiry monitoring (step 8) rather than stale data to mask a signing lapse.

7. Failure drills

DNSSEC fails closed. Rehearse the three failures so the on-call recognizes them instantly.

Broken DS (parent points at the wrong key). The most common outage. The zone is signed and internally valid, but the registrar’s DS hashes a KSK the zone no longer serves. Every validating resolver SERVFAILs; non-validating ones are fine - hence the classic “works from my phone, not from the office.”

# Validators fail:
delv @1.1.1.1 www.example.com A      # -> resolution failed: SERVFAIL / no valid signature
# The actual answer is fine to a non-validator:
dig +cd @1.1.1.1 www.example.com A   # +cd = checking disabled -> returns the record

+cd returning the record while normal lookups SERVFAIL is the fingerprint of a DNSSEC break, not a server-down event.

Expired RRSIG. If re-signing stalls (a crashed cron, a dnssec-policy that lost access to its keys) the RRSIGs lapse and every validator SERVFAILs at expiry - a delayed-action outage that fires when the last signature ages out, often hours after the real failure.

# Show the RRSIG expiration so you can see how close you are:
dig +dnssec www.example.com A | grep RRSIG
# RRSIG fields 5/6 are expiration/inception as YYYYMMDDHHMMSS (UTC)

SERVFAIL as the client sees it. Applications get no “DNSSEC error” - just a generic resolution failure: getaddrinfo returns EAI_AGAIN/EAI_FAIL, browsers show DNS_PROBE_FINISHED_NXDOMAIN-style errors, libraries time out. The cause is invisible above the resolver, which is why the +cd test is what separates a validation failure from an ordinary outage in seconds.

Verify

Run these against a freshly signed zone before you call it done:

# 1. Zone signs cleanly and the NSEC3 chain is complete:
dnssec-verify -o example.com example.com.zone.signed

# 2. The full chain validates from the root anchor:
delv www.example.com A +rtrace        # expect: ; fully validated

# 3. The parent DS matches the live KSK (no stale DS):
dig +dnssec example.com DS @<parent-ns>          # DS at parent
dnssec-dsfromkey -2 Kexample.com.+013+*.key      # compute from live KSK; digests must match

# 4. Your own resolvers actually validate (AD flag present):
dig +dnssec www.cloudflare.com A | grep -E 'flags:.* ad'   # 'ad' in flags == validated

# 5. A known-broken signed test domain SERVFAILs (proves validation is on, not bypassed):
delv dnssec-failed.org A               # expect: resolution failed (validation working)

Item 5 is the one teams forget: a resolver that is not validating will happily return the deliberately broken dnssec-failed.org record. If that lookup succeeds, your validation is off.

8. Monitor expirations and key state

The worst failure mode is silent: a signature lapse or DS mismatch nobody catches until clients SERVFAIL. Alert on time-to-expiry, not on the outage. Export RRSIG expiry to Prometheus (blackbox_exporter and dnssec-checks-style exporters surface dnssec_zone_rrsig_expiry_timestamp_seconds) and alert when the soonest expiry is inside a comfortable re-sign margin:

# Prometheus alert: fire well before any signature actually expires
groups:
- name: dnssec
  rules:
  - alert: DnssecRrsigExpiringSoon
    # alert when the earliest RRSIG in the zone expires within 3 days
    expr: (min by (zone) (dnssec_zone_rrsig_expiry_timestamp_seconds) - time()) < 259200
    for: 15m
    labels: { severity: critical }
    annotations:
      summary: "RRSIG for {{ $labels.zone }} expires in under 3 days"

If your validating resolvers are in Azure and stream query logs to Log Analytics, watch SERVFAIL rate - a validation failure shows up as a spike for an otherwise healthy name:

// Azure DNS Private Resolver query logs: SERVFAIL rate by queried name
DnsResolverQueryLogs
| where TimeGenerated > ago(1h)
| where ResponseCode == "SERVFAIL"
| summarize servfail = count() by QueryName = tostring(QueryName), bin(TimeGenerated, 5m)
| where servfail > 10
| order by TimeGenerated desc

And track DS-at-parent agreement on a schedule: compute the DS from your live KSK and compare it to what the parent publishes. Any drift is a pending outage.

Enterprise scenario

A platform team running a public .com zone on self-hosted BIND for a regulated SaaS turned on DNSSEC and validation across their estate. Months later, every internal service that resolved through the on-prem corporate resolvers went dark for one of their own signed subdomains - but only internally; external customers were unaffected. The constraint that caused it: their corporate resolvers were configured with conditional forwarders that pointed at an internal DNS appliance which stripped DNSSEC records (RRSIG/DNSKEY) from responses to “simplify” answers. To the validating BIND resolvers downstream, a signed zone arriving with no RRSIGs is indistinguishable from tampering, so they SERVFAILed - fail-closed, exactly as designed. Customers used public validating resolvers (which got intact records) and saw nothing.

The root cause was a middlebox that did not preserve the DNSSEC RRsets or handle the DO bit / large EDNS correctly. The fix had two parts: make the forwarding path DNSSEC-transparent, and validate in one place rather than at every hop. They pointed the corporate resolvers straight at a DNSSEC-preserving upstream:

# Corporate BIND resolver: forward to a DNSSEC-transparent upstream, keep validating locally
options {
    dnssec-validation auto;          # validate here, once
};
zone "internal.example.com" {
    type forward;
    forward only;
    forwarders { 10.20.0.53; };      # upstream that preserves RRSIG/DNSKEY (DO bit honored)
};

The durable lesson: anything on the resolution path that does not preserve DNSSEC records breaks validation downstream. Old DNS proxies, some load balancers’ DNS modules, and “DNS firewall” appliances that filter record types are the usual suspects. Validate in as few places as possible, and make every forwarder in front of those validators DNSSEC-transparent with a large enough EDNS buffer to carry the signatures.

Checklist

DNSSEC pays off only when both ends are honest: a signed zone and a resolver that refuses unsigned-or-tampered answers. The commands are the easy part; the rest is rollover discipline and monitoring so a key never lapses silently - because when it does, DNSSEC does exactly what you told it to and takes the whole zone down. Build the chain, prove it with delv and the AD flag, rehearse the SERVFAIL drills, and alert on expiry long before the cliff.

DNSSEC End to End: Signing Public Zones and Enforcing Validation on Hybrid Resolvers

1. How a resolver validates: the chain of trust

2. Sign a public zone: KSK vs ZSK, algorithm, NSEC3

3. Publish the DS record and prove the chain

4. Automate rollovers without a validation gap

5. Enable validating resolution: Route 53, Azure, on-prem

6. Negative caching, NSEC3 walking, and serve-stale

7. Failure drills

Verify

8. Monitor expirations and key state

Enterprise scenario

Checklist

Written by Vinod

Comments

Keep Reading

Application Gateway v2 and WAF: L7 Routing, TLS Termination, and Tuning That Holds

AWS Gateway Load Balancer: Transparent Inline Inspection with Third-Party Appliances

AWS Network Firewall in Production: Suricata Rule Engineering for Egress Inspection