cheap clocks and kerberos: why your mini PC domain controller keeps losing time

we run two active directory domain controllers. one on a beefy server, one on a little ryzen mini PC. graylog kept flagging event 142 on the mini PC — “the time service has stopped advertising as a time source” — roughly every hour, then event 139 a few minutes later when it resynced and started advertising again.

the domain still had the other DC as a reliable time source, so nothing was actually breaking. but in AD, time matters. kerberos authentication has a 5-minute skew tolerance by default. a domain controller that can’t keep accurate time is a domain controller that might start rejecting perfectly valid tickets.

the investigation

both DCs had the same NTP group policy: sync to cloudflare every 1024 seconds (~17 minutes). so why was one drifting and the other rock solid?

            labdc1          labdc2
Poll Int    10 (1024s)      8 (256s)
Events      zero            cycling every ~hour

that poll interval difference was the clue. windows time service uses an adaptive algorithm — when it notices the clock is drifting fast, it polls more aggressively. labdc2 had already dropped itself from 1024s to 256s, polling every 4 minutes instead of 17. the algorithm was trying to compensate, but the initial interval from group policy was still 1024s after every restart. so every reboot, it’d start slow, drift, lose sync, get flagged, then gradually speed up its polling until it stabilized — only to reset on the next reboot.

the cause

it’s the hardware. cheaper mini PCs use less accurate crystal oscillators for their real-time clocks. the beefy server has a better oscillator that barely drifts between NTP syncs. the mini PC drifts enough in 17 minutes that it crosses the threshold where windows stops advertising it as a time source.

this is one of those things that makes total sense in retrospect. you buy a mini PC for a domain controller because it’s quiet, low power, and “good enough.” and it is good enough — for everything except keeping accurate time without help.

the fix

two GPO changes:

setting	before	after
`SpecialPollInterval`	1024s (~17 min)	256s (~4 min)
`MaxAllowedPhaseOffset`	300s (default)	600s

the first change matches what the adaptive algorithm was already doing — polling every 4 minutes. but now it starts there instead of spending the first hour after a reboot catching up. the second raises the threshold for when it stops advertising, giving it more headroom.

gpupdate /force
w32tm /resync /rediscover

applied to both DCs, forced a resync, and watched. labdc2 stabilized immediately. no more event 142 cycling.

the lesson

if you’re running domain controllers on mini PCs or low-power hardware, don’t trust the default NTP polling interval. your clock is probably drifting faster than a server-grade machine, and the default 1024-second interval gives it too much room to wander. tighten it to 256 or even 128 seconds. the NTP traffic is negligible and your kerberos tickets will thank you.

≽^•⩊•^≼

the investigation#

the cause#

the fix#

the lesson#

the investigation

the cause

the fix

the lesson