NTP Service not setting time - running manually works

supersebbo · October 10, 2024, 7:48am

Having an odd situation. Despite the ntpd daemon clearly running on 5 devices I've just deployed, none of them have the time set correctly. I only realised it today when I started getting certificate not valid yet errors on an SSL issue.

On all the devices ntpd is running as defined by the system service but the clock is not set correctly, this continues after a reboot or restarting the service. However as soon as the command is issued in the console, the time instantly corrects. OpenWRT 23.05.5. There is zero log output from ntpd on all devices. Servers are reachable.

Worked example unedited console log:

root@GVH-AIRBRIDGE-VH:~# date
Wed Oct  9 13:01:35 UTC 2024
root@GVH-AIRBRIDGE-VH:~# ps | grep ntpd
 1703 root      2048 S    {ntpd} /sbin/ujail -t 5 -n ntpd -U ntp -G ntp -C /etc/capabilities/ntpd.json -c -u -r /bin/ubus -r /usr/bin/env -r /usr/bin/jshn -r /usr/sbin/ntpd-hotplug -r /usr/share/libubox/jshn.
 1745 ntp       1096 S    /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p 0.openwrt.pool.ntp.org -p 1.openwrt.pool.ntp.org -p 2.openwrt.pool.ntp.org -p 3.openwrt.pool.ntp.org
 2461 root      1096 S    grep ntpd
root@GVH-AIRBRIDGE-VH:~# date
Wed Oct  9 13:01:38 UTC 2024
root@GVH-AIRBRIDGE-VH:~# ntpd -dqp 0.openwrt.pool.ntp.org
root@GVH-AIRBRIDGE-VH:~# date
Thu Oct 10 07:47:13 UTC 2024
root@GVH-AIRBRIDGE-VH:~# logread | grep ntpd
root@GVH-AIRBRIDGE-VH:~#

/etc/config/system

config system 'system'
        option hostname 'GVH-AIRBRIDGE-VH'
        option log_size '64'
        option timezone 'UTC'
        option ttylogin '0'
        option urandom_seed '0'

config timeserver 'ntp'
        option enabled '1'
        option enable_server '0'
        list server '0.openwrt.pool.ntp.org'
        list server '1.openwrt.pool.ntp.org'
        list server '2.openwrt.pool.ntp.org'
        list server '3.openwrt.pool.ntp.org'

What's going on?

supersebbo · October 10, 2024, 7:54am

Another capture, this one has been running 12 hours and never got a time correction.

root@GVH-AP-RR-1:~# date
Thu Oct 10 06:12:05 UTC 2024
root@GVH-AP-RR-1:~# uptime
 06:12:11 up 12:42,  load average: 0.01, 0.00, 0.00
root@GVH-AP-RR-1:~# logread | ntpd
root@GVH-AP-RR-1:~# ps | grep ntpd
 1828 root      2048 S    {ntpd} /sbin/ujail -t 5 -n ntpd -U ntp -G ntp -C /etc/capabilities/ntpd.json -c -u -r /bin/u
 1875 ntp       1096 S    /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p 0.openwrt.pool.ntp.org -p 1.openwrt.pool.n
 9505 root      1096 S    grep ntpd
root@GVH-AP-RR-1:~# date
Thu Oct 10 06:12:42 UTC 2024
root@GVH-AP-RR-1:~# ntpd -dqp  0.openwrt.pool.ntp.org
root@GVH-AP-RR-1:~# date
Thu Oct 10 07:53:43 UTC 2024
root@GVH-AP-RR-1:~# uptime
 07:53:52 up 12:43,  load average: 0.00, 0.00, 0.00
root@GVH-AP-RR-1:~#

greem · October 10, 2024, 8:16am

ISTR that the trigger for a sync is a hot plug event, by default triggered when the WAN port comes up and has connectivity.

[EDIT] Ah no, that's the ntpclient trigger.

Obviously your firewall is ok, because a manual run syncs fine. Can you try this?

https://openwrt.org/docs/guide-user/services/ntp/client-server#troubleshooting

moeller0 · October 10, 2024, 8:27am

Which ntp service are you running? I seem to recall that the non-busybox version had a hotplug script that errored out, stopping all further hotplug scripts and resulting in NTP issues, but that was several years ago that I noticed that.

supersebbo · October 10, 2024, 10:53am

There might be somthing to this, as these devices are configured as dumb-APs so there is no WAN interface (only bridged LAN) so might not be triggering time updates.

Any ideas where to look to confirm this?

supersebbo · October 10, 2024, 10:58am

Whatever is out of the box. I assume busybox, no additional ntp packages installed.

root@GVH-AP-RR-1:/# opkg list-installed | grep ntp
root@GVH-AP-RR-1:/#

supersebbo · October 10, 2024, 11:18am

Nope... still nothing. This is very weird.

root@GVH-AP-VH-1:~# cat /etc/hotplug.d/ntp/20-ntpd-logger
#!/bin/sh
[ $ACTION = "step" ]    && logger -t ntpd Time set, stratum=$stratum interval=$poll_interval offset=$offset
[ $ACTION = "stratum" ] && logger -t ntpd Stratum change, stratum=$stratum interval=$poll_interval offset=$offset
root@GVH-AP-VH-1:~# reboot
root@GVH-AP-VH-1:~# Connection to gvh-ap-vh-1.lan closed by remote host.
Connection to gvh-ap-vh-1.lan closed.
seb@jumpbox:~$ ssh root@gvh-ap-vh-1.lan


BusyBox v1.36.1 (2024-09-28 16:33:21 UTC) built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt 23.05.5, r24106-10cc5fcd00
 -----------------------------------------------------
=== WARNING! =====================================
There is no root password defined on this device!
Use the "passwd" command to set up a new password
in order to prevent unauthorized SSH logins.
--------------------------------------------------
root@GVH-AP-VH-1:~# date
Thu Oct 10 10:34:28 UTC 2024
root@GVH-AP-VH-1:~# logread | grep ntp
Thu Oct 10 10:33:52 2024 kern.info kernel: [    0.001271] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
root@GVH-AP-VH-1:~# ps | grep ntp
 1724 root      2048 S    {ntpd} /sbin/ujail -t 5 -n ntpd -U ntp -G ntp -C /etc/capabilities/ntpd.json -c -u -r /bin/ubus -r /usr/bin/env -r /usr/bin/jshn -r /usr/sbin/ntpd-hotplug -r /usr/share/libubox/jshn.
 1750 ntp       1096 S    /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p 0.openwrt.pool.ntp.org -p 1.openwrt.pool.ntp.org -p 2.openwrt.pool.ntp.org -p 3.openwrt.pool.ntp.org
 2202 root      1096 S    grep ntp
root@GVH-AP-VH-1:~# uptime
 10:35:09 up 1 min,  load average: 0.45, 0.23, 0.08
root@GVH-AP-VH-1:~# date
Thu Oct 10 10:35:12 UTC 2024
root@GVH-AP-VH-1:~# logread | grep ntp
Thu Oct 10 10:33:52 2024 kern.info kernel: [    0.001271] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
root@GVH-AP-VH-1:~# ping 0.openwrt.pool.ntp.org
PING 0.openwrt.pool.ntp.org (5.39.80.51): 56 data bytes
64 bytes from 5.39.80.51: seq=0 ttl=55 time=13.788 ms
64 bytes from 5.39.80.51: seq=1 ttl=55 time=16.706 ms
^C
--- 0.openwrt.pool.ntp.org ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 13.788/15.247/16.706 ms
root@GVH-AP-VH-1:~# logread | grep ntp
Thu Oct 10 10:33:52 2024 kern.info kernel: [    0.001271] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
root@GVH-AP-VH-1:~# ntpd -dqp  0.openwrt.pool.ntp.org
root@GVH-AP-VH-1:~# date
Thu Oct 10 11:16:29 UTC 2024
root@GVH-AP-VH-1:~# uptime
 11:16:32 up 3 min,  load average: 0.13, 0.17, 0.08
root@GVH-AP-VH-1:~# logread | grep ntp
Thu Oct 10 10:33:52 2024 kern.info kernel: [    0.001271] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
root@GVH-AP-VH-1:~#

greem · October 10, 2024, 11:52am

Can you add

[ $ACTION = "step" ]    && logger -t ntpd Time set, stratum=$stratum interval=$poll_interval offset=$offset
[ $ACTION = "stratum" ] && logger -t ntpd Stratum change, stratum=$stratum interval=$poll_interval offset=$offset
[ $ACTION = "periodic" ] && logger -t ntpd Periodic run, stratum=$stratum interval=$poll_interval offset=$offset
[ $ACTION = "unsync" ] && logger -t ntpd Sync failure, stratum=$stratum interval=$poll_interval offset=$offset

to your hotplug script? That might help unearth some more actions.

supersebbo · October 10, 2024, 12:29pm

I can try but I think the problem is deeper than that. I ran a tcpdump on port 123 for about an hour and there was not a single packet captured from ntpd.

root@GVH-AP-VH-1:~# uptime
 11:26:27 up 13 min,  load average: 0.24, 0.10, 0.07
root@GVH-AP-VH-1:~# ps | grep ntp
 1724 root      2048 S    {ntpd} /sbin/ujail -t 5 -n ntpd -U ntp -G ntp -C /etc/capabilities/ntpd.json -c -u -r /bin/ubus -r /usr/bin/env -r /usr/bin/jshn -r /usr/sbin/ntpd-hotplug -r /usr/share/libubox/jshn.
 1750 ntp       1096 S    /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p 0.openwrt.pool.ntp.org -p 1.openwrt.pool.ntp.org -p 2.openwrt.pool.ntp.org -p 3.openwrt.pool.ntp.org
 2494 root      1096 R    grep ntp
root@GVH-AP-VH-1:~# tcpdump -evni any udp port 123
tcpdump: data link type LINUX_SLL2
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
^C
0 packets captured
3 packets received by filter
0 packets dropped by kernel
root@GVH-AP-VH-1:~# uptime
 12:27:43 up  1:14,  load average: 0.00, 0.01, 0.00
root@GVH-AP-VH-1:~# ps | grep ntp
 1724 root      2048 S    {ntpd} /sbin/ujail -t 5 -n ntpd -U ntp -G ntp -C /etc/capabilities/ntpd.json -c -u -r /bin/ubus -r /usr/bin/env -r /usr/bin/jshn -r /usr/sbin/ntpd-hotplug -r /usr/share/libubox/jshn.
 1750 ntp       1096 S    /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p 0.openwrt.pool.ntp.org -p 1.openwrt.pool.ntp.org -p 2.openwrt.pool.ntp.org -p 3.openwrt.pool.ntp.org
 3101 root      1096 R    grep ntp
root@GVH-AP-VH-1:~#

Whan I manually run ntpd from another console, the output is captured as expected.

Manual ntpd run:

root@GVH-AP-VH-1:~# tcpdump -evni any udp port 123
tcpdump: data link type LINUX_SLL2
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
12:31:35.261462 eth0  Out ifindex 2 [MAC] ethertype IPv4 (0x0800), length 96: (tos 0x48, ttl 64, id 53209, offset 0, flags [DF], proto UDP (17), length 76)
    10.70.70.153.59828 > 131.111.8.60.123: NTPv4, Client, length 48
        Leap indicator:  (0), Stratum 0 (unspecified), poll 0 (1s), precision 0
        Root Delay: 0.000000, Root dispersion: 0.000000, Reference-ID: (unspec)
          Reference Timestamp:  0.000000000
          Originator Timestamp: 0.000000000
          Receive Timestamp:    0.000000000
          Transmit Timestamp:   2450472225.970160211 (1977-08-26T22:43:45Z)
            Originator - Receive Timestamp:  0.000000000
            Originator - Transmit Timestamp: 2450472225.970160211 (1977-08-26T22:43:45Z)
12:31:35.278499 lan   In  ifindex 3 [MAC] ethertype IPv4 (0x0800), length 96: (tos 0x0, ttl 49, id 48866, offset 0, flags [DF], proto UDP (17), length 76)
    131.111.8.60.123 > 10.70.70.153.59828: NTPv4, Server, length 48
        Leap indicator:  (0), Stratum 2 (secondary reference), poll 0 (1s), precision -26
        Root Delay: 0.000549, Root dispersion: 0.000640, Reference-ID: 0xc5cd3407
          Reference Timestamp:  3937551814.320691908 (2024-10-10T12:23:34Z)
          Originator Timestamp: 2450472225.970160211 (1977-08-26T22:43:45Z)
          Receive Timestamp:    3937552295.325844175 (2024-10-10T12:31:35Z)
          Transmit Timestamp:   3937552295.325869511 (2024-10-10T12:31:35Z)
            Originator - Receive Timestamp:  +1487080069.355683964
            Originator - Transmit Timestamp: +1487080069.355709299
12:31:35.278512 br-lan In  ifindex 6 [MAC] ethertype IPv4 (0x0800), length 96: (tos 0x0, ttl 49, id 48866, offset 0, flags [DF], proto UDP (17), length 76)
    131.111.8.60.123 > 10.70.70.153.59828: NTPv4, Server, length 48
        Leap indicator:  (0), Stratum 2 (secondary reference), poll 0 (1s), precision -26
        Root Delay: 0.000549, Root dispersion: 0.000640, Reference-ID: 0xc5cd3407
          Reference Timestamp:  3937551814.320691908 (2024-10-10T12:23:34Z)
          Originator Timestamp: 2450472225.970160211 (1977-08-26T22:43:45Z)
          Receive Timestamp:    3937552295.325844175 (2024-10-10T12:31:35Z)
          Transmit Timestamp:   3937552295.325869511 (2024-10-10T12:31:35Z)
            Originator - Receive Timestamp:  +1487080069.355683964
            Originator - Transmit Timestamp: +1487080069.355709299
^C
10 packets captured
13 packets received by filter
0 packets dropped by kernel

supersebbo · October 10, 2024, 12:40pm

I just replaced BusyBox ntpd with ntpd package from OPKG - it is now working as expected.

EDIT: looks like the was a red-herring, the restart of the service was what fixed it

greem · October 10, 2024, 12:41pm

I'm smelling DNS here. Is the NTP service started before name resolution is available?

In fact, can you deliberately set the clock wrong (via whichever route) and then do service ntpd restart and see what happens? (With the busybox version)

supersebbo · October 10, 2024, 12:42pm

Why? Do you think ntpd caches the IPs for the servers at startup?

Trying now....

supersebbo · October 10, 2024, 12:48pm

Yes that worked, restarting the service on a live system updates the time correctly, rebooting does not. As you suggest possibly a startup timing issue though I'm surprised it doesn't error into the log if it can't resolve any of the named servers.

I will put a cron in to restart ntpd every night to avoid future problems but I feel this needs to be fixed in OpenWRT and it's not possible to gaurentee DNS is available when NTPD starts, especially when the upstream link is wireless.

Perahaps a hotplug trigger to restart ntpd on interface changes?

dave14305 · October 10, 2024, 12:58pm

For APs, I added a hotplug script to restart ntp after the LAN interface comes up.

$ cat /etc/hotplug.d/iface/99-sysntpd 
[ ifup = "$ACTION" ] && [ lan = "$INTERFACE" ] && service sysntpd restart
logger -t ntp-hotplug "Restarted ntp after lan up."

supersebbo · October 10, 2024, 1:01pm

Great idea. Also going to test if having full ntpd installed works over reboots, possibly the busybox version lacks any backoff/retry logic to save on size.

supersebbo · October 10, 2024, 1:07pm

This does indeed work, full ntpd can handle the situation.

root@GVH-AP-VH-1:~# /etc/init.d/ntpd stop
root@GVH-AP-VH-1:~# date
Thu Oct 10 13:03:59 UTC 2024
root@GVH-AP-VH-1:~# date --set="2024-10-2 18:00:00"
Wed Oct  2 18:00:00 UTC 2024
root@GVH-AP-VH-1:~# date
Wed Oct  2 18:00:05 UTC 2024
root@GVH-AP-VH-1:~# reboot
root@GVH-AP-VH-1:~# Connection to gvh-ap-vh-1.lan closed by remote host.
Connection to gvh-ap-vh-1.lan closed.
seb@jumpbox:~$ ssh root@gvh-ap-vh-1.lan


BusyBox v1.36.1 (2024-09-28 16:33:21 UTC) built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt 23.05.5, r24106-10cc5fcd00
 -----------------------------------------------------
=== WARNING! =====================================
There is no root password defined on this device!
Use the "passwd" command to set up a new password
in order to prevent unauthorized SSH logins.
--------------------------------------------------
root@GVH-AP-VH-1:~# date
Thu Oct 10 13:05:25 UTC 2024
root@GVH-AP-VH-1:~#

greem · October 10, 2024, 2:12pm

It's always DNS!

Glad you got it sorted.

Cthulhu88 · October 10, 2024, 4:39pm

"Conversation" starts here: NTP issues on a dumb AP configuration - #13 by Cthulhu88

What I am using right now for my dumb AP and my dumb switch: have my main router host a NTP server (main router doesn't exhibit the problem), then have all dumb devices pull NTP data from the main router via its raw IPv4 address 192.168.0.1.

Also, something that I haven't mentioned in that post chain is that the problem is very likely caused by a bad /etc/resolv.conf, not necessarily failure to resolve names.
See my main router setup:

dnsmasq writes a local resolv.conf at its service start:

search intranet
nameserver 127.0.0.1
nameserver ::1

dnsmasq forwards global internet requests to dnsproxy:

server=127.0.0.1#5353
server=::1#5353

Because dnsproxy spams my syslog with large network unreachable errors until both wan and wan6 are up, I only start dnsproxy via the following hotplug:

INTERFACES="wan wan6"
STATE="/var/run/dnsproxy"

check_interface() {
    local iface

    for iface in $INTERFACES; do
        [ $INTERFACE = $iface ] && return
    done

    exit 0
}

all_up() {
    local iface

    for iface in $INTERFACES; do
        [ ! -f "$STATE/$iface" ] && return 1
    done

    return 0
}

case $ACTION in
    ifup)
        check_interface && mkdir -p "$STATE" && {
            flock -x 1000 && touch "$STATE/$INTERFACE" && all_up && {
                service dnsproxy start
            }
        } 1000>"$STATE/.lock"
        ;;
    ifdown)
        check_interface && mkdir -p "$STATE" && {
            flock -x 1000 && rm -f "$STATE/$INTERFACE" && {
                service dnsproxy stop
            }
        } 1000>"$STATE/.lock"
        ;;
esac

exit 0

This leaves DNS unresolvable for about ~10 seconds, but sysntpd doesn't deadlock and works correctly after DNS services become available.

Cthulhu88 · October 10, 2024, 5:47pm

Not sure if intentional, but your script will log "Restarted ntp after lan up." every time it's executed, not just when the conditions above are met.

dave14305 · October 10, 2024, 5:54pm

Good point. It was quick and dirty, but it generally only runs once anyway on an AP.