Servers Identity

Accurate Hybrid Time Sync: chrony on Linux and w32time in Active Directory

Time is the one dependency every distributed system shares and almost nobody owns. It is invisible until a domain controller drifts four minutes, Kerberos starts rejecting tickets with KRB_AP_ERR_SKEW, and half your help desk tickets become “I can’t log in” with no obvious cause. The clock is infrastructure, and in a hybrid Windows/Linux estate it has two stacks that disagree about almost everything except the wire protocol: chrony on Linux and w32time in Active Directory.

This is the playbook I use to build one authoritative time hierarchy across both. The design rule is simple and non-negotiable: there is exactly one root, every node has a defined parent, and nothing free-runs. Everything below is real, current configuration for chrony 4.x and Windows Server 2019/2022/2025. Commands that change state are marked; everything else is read-only.

1. Why time accuracy actually matters

Three systems break in three different ways when clocks diverge, and knowing which failure mode you are looking at saves hours.

Kerberos. The KDC stamps tickets and pre-auth data with wall-clock time. The default tolerance, controlled by the Maximum tolerance for computer clock synchronization policy (the on-wire MaxClockSkew), is 5 minutes. Cross that and authentication fails outright. The error is KRB_AP_ERR_SKEW / “Clock skew too great”, and it is binary - 4m59s works, 5m01s does not.

Logs and forensics. Correlating a SIEM timeline across a Linux web tier and a Windows app tier requires their timestamps to mean the same instant. A 30-second skew turns “the breach started here” into a guess, and it quietly breaks ordering in anything that merges event streams.

TLS and tokens. Certificate notBefore/notAfter, JWT exp/nbf, and OAuth flows are all wall-clock assertions. A client whose clock is far ahead will reject a freshly issued certificate as “not yet valid”; one far behind accepts an expired token.

Kerberos cares about relative skew between two hosts, not absolute accuracy. Two DCs that are both 3 minutes fast still authenticate each other fine - until one of them syncs to truth and the gap opens. That is why a single shared root matters more than any single host being “correct.”

2. Design the stratum hierarchy first

NTP is a tree. Stratum 0 is the reference (an atomic clock or GNSS receiver); stratum 1 is a server directly attached to one; each hop down adds a stratum number. Lower is better, but only as a measure of distance from the reference, not of accuracy you can necessarily trust.

A clean hybrid design looks like this:

              [ External: 2-4 sources, e.g. pool / GNSS / GPS appliance ]
                                   |
        +--------------------------+--------------------------+
        |                                                     |
  [ Linux NTP servers ]  <----- peer ----->  [ AD PDC emulator (forest root) ]
   (2x, stratum 3-4)                              (authoritative, NT5DS off)
        |                                                     |
  Linux clients (chrony)                          Domain members (w32time NT5DS)

Design decisions that matter:

For upstream sources, in priority order: a dedicated GNSS/GPS appliance on your own network (best, no internet dependency), your cloud provider’s link-local NTP (169.254.169.123 on AWS, metadata.google.internal on GCP, the host on Azure), then a regional NTP pool such as 2.pool.ntp.org as a fallback. Never use a single public IP as your only source.

3. Configure the chrony servers

These are your Linux time authorities - the two boxes everything else on the Linux side will trust. Install chrony (dnf install chrony on RHEL family, apt install chrony on Debian/Ubuntu) and replace /etc/chrony/chrony.conf (Debian) or /etc/chrony.conf (RHEL):

# /etc/chrony.conf on the Linux NTP servers (timesrv01/02)

# Upstream sources. iburst speeds initial sync; the pool directive
# expands to multiple servers and keeps a working set healthy.
pool 2.pool.ntp.org iburst maxsources 4
server gps-appliance.corp.example.com iburst prefer

# Peer the two internal servers so they agree if upstream is lost.
peer timesrv02.corp.example.com

# Record the rate at which the system clock gains/loses time so chrony
# can compensate immediately on restart instead of relearning drift.
driftfile /var/lib/chrony/drift

# Step the clock (instead of slewing) only on the first 3 updates and
# only if the offset is larger than 1 second. Safe for boot, avoids
# large jumps under a running workload.
makestep 1.0 3

# Keep the hardware RTC aligned to system time.
rtcsync

# Serve time to your internal networks (NTP is UDP/123). Default-deny:
# only these subnets may query.
allow 10.0.0.0/8
allow 192.168.0.0/16

# Optionally allow this server to answer even before it is itself
# synchronised, at a high (bad) stratum, so an isolated site still has
# a local reference. Use with care.
local stratum 10

makestep 1.0 3 is the line people get wrong. It says: if the offset exceeds 1.0 second, step (jump) the clock rather than slewing it, but only for the first 3 clock updates after start. After that, chrony only ever slews, so a long-running process never sees time go backwards mid-flight. Stepping at boot is fine; stepping a live database server is how you corrupt things.

Apply and confirm the daemon is healthy:

# State-changing
sudo systemctl enable --now chronyd
sudo systemctl restart chronyd

4. Configure the chrony clients

Clients are nearly identical but trust your internal servers, not the internet, and do not serve time:

# /etc/chrony.conf on Linux clients

server timesrv01.corp.example.com iburst
server timesrv02.corp.example.com iburst

driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync

# No allow lines: this host answers nobody.

If you manage clients with config management, render this from a template so the server list is a single variable. An Ansible snippet that ships the file and bounces the service safely:

- name: Deploy chrony client config
  ansible.builtin.template:
    src: chrony.conf.j2
    dest: /etc/chrony.conf
    owner: root
    group: root
    mode: "0644"
  notify: restart chronyd

# handlers/main.yml
- name: restart chronyd
  ansible.builtin.service:
    name: chronyd
    state: restarted

5. Make the PDC emulator the forest time root

This is the linchpin. The PDC emulator of the forest root domain is the one machine in Active Directory that should sync from an external source via NTP; everything else syncs down the domain hierarchy using NT5DS (domain hierarchy) and must be left alone.

First, find the PDC emulator so you configure the right box:

# Read-only
netdom query fsmo
# or
Get-ADDomain | Select-Object PDCEmulator

On that DC, configure it as the authoritative external-facing source. The two registry-level details that people get wrong are AnnounceFlags and the 0x8 flag on each NTP peer:

# State-changing. Run on the forest-root PDC emulator, elevated.

# Set source type to NTP (not NT5DS) and list 2-4 external peers.
# 0x8 = client mode: force standard NTP client requests, the only mode
# most upstream servers answer.
w32tm /config /manualpeerlist:"time-a-g.nist.gov,0x8 time-b-g.nist.gov,0x8 2.pool.ntp.org,0x8" /syncfromflags:manual /reliable:yes /update

# Announce this host as a reliable time source for the domain.
# 0x5 = 0x1 (always time server) + 0x4 (reliable time source).
reg add "HKLM\SYSTEM\CurrentControlSet\Services\W32Time\Config" /v AnnounceFlags /t REG_DWORD /d 5 /f

# Restart and force a resync.
net stop w32time && net start w32time
w32tm /resync /rediscover

Notes that save you a callback:

6. Tune w32time polling and member behaviour

Domain members need no per-host config - the default NT5DS type points them at the domain hierarchy automatically. The tuning that matters is on the PDC, where the out-of-the-box polling is far too lazy for accurate time.

The relevant w32time registry values live under HKLM\SYSTEM\CurrentControlSet\Services\W32Time and use log base-2 seconds for the poll intervals:

Value Location Default Meaning
MinPollInterval \Config 6 (64 s) Floor on poll interval, as log2 seconds
MaxPollInterval \Config 10 (1024 s) Ceiling on poll interval, as log2 seconds
SpecialPollInterval \TimeProviders\NtpClient 3600 (1 hr, in seconds) Fixed interval used when the peer has the 0x1 SpecialInterval flag
AnnounceFlags \Config 10 (0xA) on a DC 5 to force “always + reliable” on the PDC

To poll the upstream every 15 minutes instead of hourly, tighten the bounds and set a special interval. Note the unit difference - poll intervals are log2 seconds, SpecialPollInterval is raw seconds:

# State-changing, on the PDC emulator.
$cfg = "HKLM\SYSTEM\CurrentControlSet\Services\W32Time\Config"
$ntp = "HKLM\SYSTEM\CurrentControlSet\Services\W32Time\TimeProviders\NtpClient"

# 2^6 = 64s floor, 2^10 = 1024s ceiling.
reg add $cfg /v MinPollInterval /t REG_DWORD /d 6 /f
reg add $cfg /v MaxPollInterval /t REG_DWORD /d 10 /f

# 900s = 15 min fixed poll. Requires the 0x1 flag on the peer; combine
# with 0x8, i.e. list peers as "host,0x9" if you want SpecialInterval.
reg add $ntp /v SpecialPollInterval /t REG_DWORD /d 900 /f

net stop w32time && net start w32time

The flags are additive bitmasks. 0x8 is “client mode”, 0x1 is “use SpecialPollInterval”. To get both, list the peer as host,0x9. This is the single most common source of “my polling changes are ignored” tickets: SpecialPollInterval only takes effect when the peer carries the 0x1 flag.

For environments that genuinely need sub-second accuracy (financial timestamping, regulated trading), Windows Server 2016+ ships a high-accuracy mode you enable via the Windows Time Service Group Policy ADMX, lowering MinPollInterval/MaxPollInterval and clock update frequency. Most estates do not need it; chasing it on VMs with noisy host clocks usually makes things worse.

7. Cross the Windows/Linux boundary safely

The two stacks meet at the PDC and your Linux servers. Two ways to wire them, and a firewall rule that applies to both.

Option A - shared upstream (recommended). Point both the PDC manual peer list and the chrony pool/server lines at the same external sources. Each stack stays authoritative for its own clients but converges on identical truth. No cross-trust, nothing to break during a forest migration.

Option B - peer the PDC and chrony. Have your Linux clients (or even Windows members via GPO) treat the PDC as one NTP source alongside chrony servers. Useful when an isolated site has a DC but no other reliable source. A Linux box can sync from the PDC directly:

# On a Linux host that should trust the AD PDC for time.
server pdc01.corp.example.com iburst

Whichever you choose, NTP is UDP port 123 in both directions. On the Linux servers, open it explicitly - here with nftables and firewalld:

# nftables: allow inbound NTP from internal ranges only.
nft add rule inet filter input ip saddr { 10.0.0.0/8, 192.168.0.0/16 } udp dport 123 accept

# Or firewalld, the simpler path on RHEL family.
sudo firewall-cmd --permanent --add-service=ntp
sudo firewall-cmd --reload

Do not source-NAT NTP through a stateful device that rewrites timing if you can avoid it; asymmetric paths inflate the round-trip estimate and degrade accuracy.

Verify

Never trust that time is right because the service is running. Measure offset on both stacks.

On Linux, chronyc gives you the truth in two commands:

# Are sources reachable, and which one is selected? The '*' marks the
# currently synced source; '+' candidates; '?' unreachable.
chronyc sources -v

# System clock state: offset from true time, frequency error (drift in
# ppm), and skew. 'System time' should be a few ms or better.
chronyc tracking

A healthy chronyc tracking looks like this - note the sub-millisecond System time offset and the small, stable frequency:

Reference ID    : 0A000005 (timesrv01.corp.example.com)
Stratum         : 4
System time     : 0.000312453 seconds slow of NTP time
Last offset     : -0.000115 seconds
RMS offset      : 0.000204 seconds
Frequency       : 12.317 ppm fast
Skew            : 0.043 ppm
Leap status     : Normal

On Windows, confirm the PDC’s source and then chart live offset against it from any member:

# What is this host syncing from, and how good is it? /verbose adds
# the last successful sync time and poll interval.
w32tm /query /status /verbose

# Confirm the configured peer list and flags actually took.
w32tm /query /configuration

# Walk every DC in the domain and report offset + stratum in one shot.
w32tm /monitor

# Live offset between this host and the PDC over 5 samples, numbers only.
# This is the closest analogue to chronyc tracking.
w32tm /stripchart /computer:pdc01.corp.example.com /dataonly /samples:5

The cross-stack acceptance test: run w32tm /stripchart from a Windows member against a Linux chrony server (and vice versa with chronyc), and confirm the offset is well under a second. If both stacks agree to within a few tens of milliseconds, your hierarchy is sound.

8. Diagnose skew and Kerberos failures

When it goes wrong, work top-down: root first, then the failing leaf.

KRB_AP_ERR_SKEW / “Clock skew too great”. This is almost always a single host that fell off the hierarchy, not a forest-wide problem. Measure its offset against the PDC:

w32tm /stripchart /computer:pdc01.corp.example.com /dataonly /samples:5

If the offset is over ~300 seconds, you have found it. On a Windows member, force it back into the hierarchy:

# State-changing: reset the member to domain-hierarchy time and resync.
w32tm /config /syncfromflags:domhier /update
net stop w32time && net start w32time
w32tm /resync /rediscover

On a Linux host hitting clock-skew against AD (common with SSSD/Kerberos joins), confirm chrony is actually tracking and force a step if it is wildly off:

chronyc tracking          # read-only: is it synced at all?
sudo chronyc makestep     # state-changing: step the clock to truth NOW

chronyc makestep is the emergency override - it tells a running chronyd to step immediately regardless of the makestep config gate. Use it to recover a host that drifted past Kerberos tolerance; do not script it to run periodically.

The PDC itself is wrong. If w32tm /monitor shows every DC skewed by the same amount, the root is the problem, not the leaves. Check the PDC’s upstream:

w32tm /query /status /verbose
# Look at 'Source:' - if it says 'Local CMOS Clock' or 'Free-running
# System Clock', the manual peer list is not being used. Re-apply the
# config from section 5 and confirm /syncfromflags:manual stuck.

A PDC reporting Source: Free-running System Clock means it never reached an upstream - usually UDP/123 blocked outbound at the firewall, or peers listed without the ,0x8 flag.

Persistent drift that returns after a step. A host that re-skews hours after every correction has a hardware or virtualization problem, not a config one. On Linux, chronyc tracking showing a large, climbing Frequency (tens of ppm and growing) points at a bad oscillator or, on a VM, a host whose clock chronyd is fighting. Confirm the hypervisor time-sync integration is off on any guest that runs NTP, and that you are not double-disciplining the clock.

Enterprise scenario

A platform team running a 4,000-host estate - a Windows AD forest plus a large RHEL fleet under SSSD - hit intermittent KRB_AP_ERR_SKEW failures on Linux logins that no one could reproduce on demand. The forest-root PDC was correctly configured with an external manual peer list, every Windows member was healthy, and w32tm /monitor was clean across all DCs. The failures only ever hit Linux hosts, and only some of them.

The constraint: the RHEL fleet had been built from a golden image that shipped /etc/chrony.conf pointing at 2.pool.ntp.org directly - the public internet, not the internal hierarchy. Most of the time that was fine. But hosts in a restricted PCI network segment had outbound UDP/123 to the internet blocked by the segment firewall. With no reachable source, chrony fell back to free-running. Those hosts drifted a few minutes a day until they crossed the 5-minute Kerberos tolerance, logins broke, someone rebooted the box, chrony stepped the clock at boot (makestep 1.0 3), and the symptom vanished - until it drifted again. Classic Heisenbug: the act of investigating (a reboot) “fixed” it.

The fix was to repoint every Linux host at the internal chrony servers, which were reachable from every segment, and prove the source was actually selected rather than assuming it:

# Golden-image chrony.conf, internal sources only.
server timesrv01.corp.example.com iburst
server timesrv02.corp.example.com iburst
makestep 1.0 3
rtcsync
# Fleet-wide validation that fails loudly if a host is NOT synced.
chronyc tracking | awk -F: '/Leap status/ {gsub(/ /,"",$2); \
  if ($2 != "Normal") {print "UNSYNCED: '"$(hostname)"'"; exit 1}}'

That Leap status: Normal check became a pre-deployment gate and a monitoring probe. The deeper lesson, and the reason this took two weeks to find: never let a golden image hard-code an external time source. Every host points at the internal hierarchy; the internal hierarchy is the only thing that talks to the outside world; and you verify the selected source, not just that the daemon is running.

Checklist

linuxwindows-serverchronyw32timentp

Comments

Keep Reading