we run two active directory domain controllers. one on a beefy server, one on a little ryzen mini PC. graylog kept flagging event 142 on the mini PC — “the time service has stopped advertising as a time source” — roughly every hour, then event 139 a few minutes later when it resynced and started advertising again.
the domain still had the other DC as a reliable time source, so nothing was actually breaking. but in AD, time matters. kerberos authentication has a 5-minute skew tolerance by default. a domain controller that can’t keep accurate time is a domain controller that might start rejecting perfectly valid tickets.
the investigation
both DCs had the same NTP group policy: sync to cloudflare every 1024 seconds (~17 minutes). so why was one drifting and the other rock solid?
labdc1 labdc2
Poll Int 10 (1024s) 8 (256s)
Events zero cycling every ~hour
that poll interval difference was the clue. windows time service uses an adaptive algorithm — when it notices the clock is drifting fast, it polls more aggressively. labdc2 had already dropped itself from 1024s to 256s, polling every 4 minutes instead of 17. the algorithm was trying to compensate, but the initial interval from group policy was still 1024s after every restart. so every reboot, it’d start slow, drift, lose sync, get flagged, then gradually speed up its polling until it stabilized — only to reset on the next reboot.
the cause
it’s the hardware. cheaper mini PCs use less accurate crystal oscillators for their real-time clocks. the beefy server has a better oscillator that barely drifts between NTP syncs. the mini PC drifts enough in 17 minutes that it crosses the threshold where windows stops advertising it as a time source.
this is one of those things that makes total sense in retrospect. you buy a mini PC for a domain controller because it’s quiet, low power, and “good enough.” and it is good enough — for everything except keeping accurate time without help.
the fix
two GPO changes:
| setting | before | after |
|---|---|---|
SpecialPollInterval | 1024s (~17 min) | 256s (~4 min) |
MaxAllowedPhaseOffset | 300s (default) | 600s |
the first change matches what the adaptive algorithm was already doing — polling every 4 minutes. but now it starts there instead of spending the first hour after a reboot catching up. the second raises the threshold for when it stops advertising, giving it more headroom.
gpupdate /force
w32tm /resync /rediscover
applied to both DCs, forced a resync, and watched. labdc2 stabilized immediately. no more event 142 cycling.
the lesson
if you’re running domain controllers on mini PCs or low-power hardware, don’t trust the default NTP polling interval. your clock is probably drifting faster than a server-grade machine, and the default 1024-second interval gives it too much room to wander. tighten it to 256 or even 128 seconds. the NTP traffic is negligible and your kerberos tickets will thank you.
≽^•⩊•^≼
nyan