Collectd rrdtool data corruption (warning due to system time change)

Since some time with every reboot I am getting a lot errors like that:

Oct 26 00:35:42 router Sat Oct 26 00:00:54 2019 daemon.err collectd[4662]: rrdtool plugin: rrd_update_r failed: /mnt/sdb1/tmp/rrd/router/conntrack/conntrack-max.rrd: illegal attempt to update using time 1572040854 when last update time is 1572042817 (minimum one second step)

Is there any way to get rid of those? Preferably without wiping historical data.

You are trying to update with a time in the past.
Get the correct time first, and only then update the rrd.

1 Like

That is now normal at router startup since the change a few weeks ago, as collectd errors are now shown more clearly in the system log. The error has been there earlier, but is now more visible.

Like tmomas indicated, the reason lies in the system clock changes at startup. Routers do not have realtime clocks, so time starts wrongly.

Collectd starts before there is ntp time from internet, and then after ntp had adjusted system time, notices the time change at the next data update and complains.

I tried patching that complaint away from collectd sources but did not quickly figure a reasonable way to disable just that initial complaint.

Might it make sense to delay start of collectd than?

Delay to when?

There is no guarantee that there will be connectivity and NTP time update. The current time might be right or there may be no change.

It might be delayed some arbitrary amount like 20 seconds at boot time, but there is no guarantee that it would be enough, especially with pppoe and vpn style connectivity.

Was thinking about delaying until NTP will kick in however I can't see ntpstat nor have an idea how to query ntpd status.

All these are extra packages.
You can verify that you have the correct time with date
Otherwise you could start ntpd with -n -d to be verbose and in foreground.

I think that you are going astray in your thoughts.

We are not talking about manually verifying if the router has the correct time. We are talking about a possible ntp time change event triggering a delayed service start at boot time (if there even is a ntp event).

1 Like

ntpd release hotplug events that could be used. Those event contain the stratum info.

I use this in my own build to output ntp events into system log:

root@router1:~# cat /etc/hotplug.d/ntp/20-ntpd-logger
#!/bin/sh
[ $ACTION = "step" ]    && logger -t ntpd Time set, stratum=$stratum interval=$poll_interval offset=$offset
[ $ACTION = "stratum" ] && logger -t ntpd Stratum change, stratum=$stratum interval=$poll_interval offset=$offset

That shows the ntp events clearly, both at the boot time and later:

root@router1:~# logread | grep ntpd
Sun Oct 27 12:12:50 2019 user.notice ntpd: Time set, stratum=16 interval=32 offset=263.995011
Sun Oct 27 12:12:56 2019 user.notice ntpd: Stratum change, stratum=3 interval=1 offset=0.000494

This below is what happens at reboot. Collectd starts first, and a bit later ntpd gets the right time and adjusts the system time, which collectd then notices at the update cycle.

root@router1:~# logread | grep -E "collectd|ntp"
Sun Oct 27 12:08:23 2019 daemon.err collectd[4235]: configfile: stat (/etc/collectd/conf.d) failed: No such file or directory
Sun Oct 27 12:08:23 2019 daemon.err collectd[4235]: plugin_load: plugin "entropy" successfully loaded.
Sun Oct 27 12:08:23 2019 daemon.err collectd[4235]: plugin_load: plugin "conntrack" successfully loaded.
...
Sun Oct 27 12:08:23 2019 daemon.err collectd[4235]: plugin_load: plugin "thermal" successfully loaded.
Sun Oct 27 12:08:23 2019 daemon.err collectd[4235]: cpufreq plugin: Found 2 CPUs
Sun Oct 27 12:08:23 2019 daemon.err collectd[4235]: Initialization complete, entering read-loop.
Sun Oct 27 12:08:24 2019 daemon.err collectd[4235]: rrdtool plugin: rrd_update_r failed: /tmp/rrd/router1/load/load.rrd: opening '/tmp/rrd/router1/load/loa
d.rrd': No such file or directory
Sun Oct 27 12:08:24 2019 daemon.err collectd[4235]: rrdtool plugin: rrd_update_r failed: /tmp/rrd/router1/thermal-thermal_zone4/temperature.rrd: expected 1
 data source readings (got 0) from /tmp/rrd/router1/thermal-thermal_zone4/temperature.rrd:...
Sun Oct 27 12:08:24 2019 daemon.err collectd[4235]: rrdtool plugin: rrd_update_r failed: /tmp/rrd/router1/thermal-thermal_zone0/temperature.rrd: opening '/
tmp/rrd/router1/thermal-thermal_zone0/temperature.rrd': No such file or directory
Sun Oct 27 12:12:50 2019 user.notice ntpd: Time set, stratum=16 interval=32 offset=263.995011
Sun Oct 27 12:12:56 2019 user.notice ntpd: Stratum change, stratum=3 interval=1 offset=0.000494
Sun Oct 27 12:13:17 2019 daemon.err collectd[4235]: Not sleeping because the next interval is 234.000 seconds in the past!
Sun Oct 27 12:13:18 2019 daemon.err collectd[4235]: uc_update: Value too old: name = router1/thermal-thermal_zone2/temperature; value time = 1572171198.521
; last cache update = 1572171198.528;
Sun Oct 27 12:13:18 2019 daemon.err collectd[4235]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1
.
Sun Oct 27 12:13:18 2019 daemon.err collectd[4235]: uc_update: Value too old: name = router1/thermal-thermal_zone0/temperature; value time = 1572171198.522
; last cache update = 1572171198.528;
Sun Oct 27 12:13:18 2019 daemon.err collectd[4235]: rrdtool plugin: cu_rrd_create_file: File "/tmp/rrd/router1/thermal-thermal_zone0/temperature.rrd" is al
ready being created.
Sun Oct 27 12:13:18 2019 daemon.err collectd[4235]: rrdtool plugin: cu_rrd_create_file (/tmp/rrd/router1/thermal-thermal_zone0/temperature.rrd) failed.
Sun Oct 27 12:13:18 2019 daemon.err collectd[4235]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1
.
Sun Oct 27 12:13:18 2019 daemon.err collectd[4235]: Filter subsystem: Built-in target `write': Some write plugin is back to normal operation. `write' succe
eded.
Sun Oct 27 12:13:18 2019 daemon.err collectd[4235]: uc_update: Value too old: name = router1/thermal-thermal_zone10/temperature; value time = 1572171198.52
1; last cache update = 1572171198.528;
Sun Oct 27 12:13:18 2019 daemon.err collectd[4235]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1
.
Sun Oct 27 12:13:18 2019 daemon.err collectd[4235]: Filter subsystem: Built-in target `write': Some write plugin is back to normal operation. `write' succe
eded.
Sun Oct 27 12:13:18 2019 daemon.err collectd[4235]: rrdtool plugin: rrd_update_r failed: /tmp/rrd/router1/thermal-thermal_zone4/temperature.rrd: illegal at
tempt to update using time 1572171199 when last update time is 1572171199 (minimum one second step)
Sun Oct 27 12:13:18 2019 daemon.err collectd[4235]: rrdtool plugin: rrd_update_r failed: /tmp/rrd/router1/thermal-cooling_device2/gauge.rrd: illegal attemp
t to update using time 1572171199 when last update time is 1572171199 (minimum one second step)

Ps.
I think that we are not talking about actual "data corruption" here. Just about collectd noticing the system time change and handling it in its own way.

1 Like

Yeah, I realized already that rrdtool db should be ok. And thank you a lot for your hints. Will have something to work on for a few long autumn evenings :wink:

In /etc/init.d/collectd file just after line start_service() {
I have added:

        while [ ! -f "/var/state/dnsmasqsec" ]; do sleep 1; done

EDIT: actually it doesn't work as intended. Will continue playing with collectd or will ignore those errors.

I believe I found the workaround to get rid of those error messages. With previous attempt existance of file created after NTP sync was checked prior to execution of NTP script so in the result neither collectd nor ntp was started.
For it to work START variable for collectd init script should be higher (>98) than the one for NTP script.
/etc/init.d/collectd changed in line 4 and addition of line 342:

#!/bin/sh /etc/rc.common
# Copyright (C) 2006-2016 OpenWrt.org

START=99
STOP=10
/cut/
start_service() {
	[ ! -z "$(uci get system.ntp.server)" ] && while [ ! -f "/var/state/dnsmasqsec" ]; do sleep 1; done
	process_config

	procd_open_instance
	procd_set_param command /usr/sbin/collectd
	procd_append_param command -C "$COLLECTD_CONF"
	procd_append_param command -f # don't daemonize
	procd_set_param nice "$NICEPRIO"
	procd_set_param stderr 1
	procd_set_param respawn
	procd_close_instance
}
/cut/

And this is the result from syslog:

Oct 28 12:08:33 router ntpd: Time set, stratum=16 interval=32 offset=100.057514
Oct 28 12:09:06 router ntpd: Stratum change, stratum=3 interval=32 offset=-0.000095
Oct 28 12:09:07 router collectd[6647]: configfile: stat (/etc/collectd/conf.d) failed: No such file or directory
Oct 28 12:09:07 router collectd[6647]: plugin_load: plugin "entropy" successfully loaded.
Oct 28 12:09:07 router collectd[6647]: plugin_load: plugin "conntrack" successfully loaded.
Oct 28 12:09:07 router collectd[6647]: plugin_load: plugin "iwinfo" successfully loaded.
Oct 28 12:09:07 router collectd[6647]: plugin_load: plugin "ping" successfully loaded.
Oct 28 12:09:07 router collectd[6647]: plugin_load: plugin "disk" successfully loaded.
Oct 28 12:09:07 router collectd[6647]: plugin_load: plugin "memory" successfully loaded.
Oct 28 12:09:07 router collectd[6647]: plugin_load: plugin "cpu" successfully loaded.
Oct 28 12:09:07 router collectd[6647]: plugin_load: plugin "load" successfully loaded.
Oct 28 12:09:07 router collectd[6647]: plugin_load: plugin "rrdtool" successfully loaded.
Oct 28 12:09:07 router collectd[6647]: rrdtool plugin: RRASingle = true: creating only AVERAGE RRAs
Oct 28 12:09:07 router collectd[6647]: plugin_load: plugin "interface" successfully loaded.
Oct 28 12:09:07 router collectd[6647]: plugin_load: plugin "cpufreq" successfully loaded.
Oct 28 12:09:07 router collectd[6647]: plugin_load: plugin "uptime" successfully loaded.
Oct 28 12:09:07 router collectd[6647]: plugin_load: plugin "thermal" successfully loaded.
Oct 28 12:09:07 router collectd[6647]: cpufreq plugin: Found 2 CPUs
Oct 28 12:09:07 router collectd[6647]: Initialization complete, entering read-loop.
Oct 28 12:11:46 router ntpd: Stratum change, stratum=4 interval=32 offset=0.002022
Oct 28 12:11:48 router ntpd: Stratum change, stratum=3 interval=32 offset=-0.000704

Before editing init script has to be disabled:

/etc/init.d/collectd disable

and once done re-enabled:

/etc/init.d/collectd enable
2 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.