Stubby and NTP sync race condition issue?

jamesmacwhite · October 5, 2021, 11:11am

Just in relation to stubby with dnsmasq: https://openwrt.org/docs/guide-user/services/dns/dot_dnsmasq_stubby

I came across a race condition with NTP and DNS resolution. Because NTP couldn't sync to the openwrt pool [0-3].openwrt.pool.ntp.org servers that were configured as DNS resolution was broken through stubby which is set as nameservers in dnsmasq. In a chicken and egg scenario DNS resolution was broken through stubby because the system time was wrong and hence the TLS connection to the DoH servers defined was broken.

Is it worth adding a note around potentially making NTP pool DNS requests go through plain DNS like 8.8.8.8 or 1.1.1.1?

Example for dnsmasq:

list server '/0.openwrt.pool.ntp.org/1.openwrt.pool.ntp.org/2.openwrt.pool.ntp.org/3.openwrt.pool.ntp.org/1.1.1.1'

I must admit I haven't encountered this before, but it seems depending on timing NTP may be broken by DoT, if the NTP pool names are resolved with stubby and not in the clear if the system time is not accurate.

It doesn't look like stubby has a fallback resolver, unlike DNSCrypt.

jamesmacwhite · October 5, 2021, 11:19am

It appears there is a opportunistic mode supported for this scenario, which might be a better configuration with the potential of reduced privacy if TLS DNS request fail.

https://dnsprivacy.org/dns_privacy_daemon_-_stubby/configuring_stubby/

config stubby 'global'
       list dns_transport 'GETDNS_TRANSPORT_TLS'
       list dns_transport 'GETDNS_TRANSPORT_UDP'
       list dns_transport 'GETDNS_TRANSPORT_TCP'
       option tls_authentication '0'

vgaetera · October 5, 2021, 12:35pm

This is a common problem for all DNS encryption methods, not limited to Stubby.
Split-DNS is configured by default and known as one of the most efficient solutions.

An alternative way is also possible and documented here:

I'm afraid that the opportunistic mode can compromise security/privacy.
DNS encryption is pointless if it can be easily blocked and silently disabled.

jamesmacwhite · October 5, 2021, 12:50pm

Thanks for your reply. Yes I do see the point that opportunistic mode basically negates having DNS privacy when it can potentially bypass TLS DNS connections at any time and you wouldn't really know.

So the recommended approach is basically forcing NTP to use plain DNS, I guess the original DNS override would also suffice, subject to the NTP pool you use.

dlakelan · October 5, 2021, 12:54pm

What about putting at least one NTP server as a bare IP address?

vgaetera · October 5, 2021, 12:57pm

It doesn't sound reliable in a long term.
https://en.wikipedia.org/wiki/Murphy's_law

jamesmacwhite · October 5, 2021, 12:59pm

@vgaetera I just tried out the script you linked to and I can see it will add an IPv4 and IPv6 DNS override for openwrt.pool.ntp.org in a similar fashion to the original post.

However I noticed openwrt.pool.ntp.org doesn't resolve. Confirmed with several DNS resolvers, uk.pool.ntp.org does though and I know there's 0-3 of openwrt.pool.ntp.org NTP servers. That's why I explicity wrote the 0-3 servers as it seems openwrt.pool.ntp.org doesn't actually resolve to anything.

Is this in intended?

jamesmacwhite · October 5, 2021, 1:02pm

I guess hardcoded IP addresses usually leads to trouble later down the line for someone, maybe not today, tomorrow or a month, but probably will come to bite someone eventually. We have DNS for a reason, so probably a bit too far to just remove it entirely. I guess we can live with having NTP requests going over plain DNS without privacy.

vgaetera · October 5, 2021, 1:08pm

Yep, that's intentional as dnsmasq forwarding includes subdomains.

dlakelan · October 5, 2021, 2:16pm

Note I said at least one NTP server hardcoded, whereas it should also be at least 3 via DNS... that ensures you get at least one NTP server if DNS is down (unless the service goes away). That should let you get close enough that the DNS doesn't fail due to this issue.

jamesmacwhite · October 5, 2021, 2:30pm

It is an option certainly, but there's no assurance that the hard coded IP will work as an NTP server forever, without it being served from a DNS lookup, there is no way to know if it was invalid anymore, unless you checked the logs. It seems possibly drastic to hard code a NTP server by IP, when you can force plain DNS to resolve any NTP DNS request instead.

I don't think the issue is DNS itself, more a case of encrypted DNS being a problem for an edge case with services like NTP on startup which don't always happen, but can in some cases create the catch 22 scenario.

NTP doesn't work because of encrypted DNS
Encrypted DNS doesn't work because system time is incorrect because NTP couldn't update, meaning a valid TLS session couldn't happen because of certificates and time being sensitive.

Forcing plain DNS for NTP removes the main issue which is encrypted DNS, not DNS itself, but that's my take.

You can certainly set one NTP hostname to an IP, but then you'll still potentially encounter the issue if the host selected a DNS name which can't resolved. By at least forcing any pool.ntp.org domain to plain DNS, you can configure any NTP pool server and it will still work, given the dnsmasq forwarding and subdomains as pointed out by @vgaetera.

dlakelan · October 5, 2021, 7:21pm

What I actually do is use 3 or 4 public stratum 1 servers in my neighborhood on my router... and then everyone inside my LAN uses my router's ipv6 ULA address hard-coded. I've never had this particular issue come up so everything is DNS names. One way to deal with it might be to use /etc/hosts to provide local resolution of one name.

EricLuehrsen · March 13, 2022, 4:21am

NTP has a hotplug script directory. Start without TLS (or TLS without time). NTP calls all scripts in its hotplug directory when it updates. Add a script to restart the server. Also leave a time stamp file in /var/... so that restarts from /etc/init.d/... turn it on immediately. Otherwise it may be a while before NTP calls hotplug again. You can see examples for dnsmasq and unbound. It leaves your DNS exposed for maybe 10s, then cache flush and you are on your way.