[Solved] No luck with traffic monitoring - nlbwmon, nvstat, luci-app-statistics

Hey everyone,

I'm a fairly new user to OpenWRT and now LEDE. I got a Linksys WRT1200AC router and installed LEDE on it, which is mostly running great. However, one of the things I'm missing from my Asus Router before is network/traffic statistics.

So I installed luci-app-statistics. Which is almost what I want feature-wise and seems to work fine for some time after installation (a few hours). I'm writing the traffic data to an sdcard that is mounted at boot time.
After some time though, none of my graphs contain data. Sometimes only my tun0 (OpenVPN interface) graph shows some data (so its not like nothing works). But not eth1, nor cpu, nor thermal, etc. All are empty. A reboot does not change this.
Uninstalling the package and its dependencies fixes it for a few hours.
I'm wondering if it has to do with my OpenVPN interface in some way, maybe everything stops working when the the tun0 interface goes down because of a disconnect?

I also tried luci-app-nlbwmon. Very limited feature set, but it would meet my needs. Also configured to save data files on an sdcard mounted at boot. Works great - until I need to reboot. Then it will not re-read old data and start counting from zero (even though the data file is still present on the sdcard).

I also tried nvstat. This is not really all that suited to my needs, but it mostly works. However, when I rebooted today, a new date appeared in my graphs: "09/10/23". No idea where that comes from or how to get rid of it. Since it's "more recent" than the current date, it shows up in most graphs - and I don't know how to remove it.

If anyone has a clue as to what I am doing wrong, I'm open for ideas or hints. As far as I can tell, my data storage is working fine, files are there and timestamps are regularly updated. LEDE version is 17.01.4.

Thanks!

edit: now that I wrote this down, I'm starting to think that maybe that ominous date of "09/10/23" is playing a role with all my problems here? Maybe that's why luci-app-statistics refuses to show data at some point, because that date gets inserted into the statistics somewhere? Where could I start looking for this?

Have you looked at YAMon (http://usage-monitoring.com)? It is a different traffic monitoring solution.

Let me know if you have any questions about it... I care for and feed YAMon.

Al

Thanks a lot for the reply! I really appreciate when developers are active in the community. Honestly I haven't cared to check out yamon yet because it is not contained in the official repository (I wanted to fiddle with non-standard stuff as little as possible), and the aesthetics are a bit offputting to me (I realize that this is a petty complaint when it comes to things like this).
Ideally I wanted to have something that I can configure and access through luci that also matches the overall design. But if I don't get any of the above solutions to work properly, I might as well just give yamon a try.

Can you provide the output of "nlbw -c list" and "ls -lh" of your database directory? Also mind to send me the data files via private message if you still happen to have them around?

Did you check that the location where the files are placed is actually mounted? Maybe the files are placed in internal flash instead which runs full? Did you check "df -h" for free space and "dmesg" output for errors?

Ji jow,

thanks for the reply. Here's the output:

Blockquote
root@WRT1200AC:~# nlbw -c list
2018-01-01
root@WRT1200AC:~# ls -lh /mnt/data/nlbwmon/
-rw-r-----    1 root     root        3.6K Jan  9 08:42 20180101.db
root@WRT1200AC:~# df -h
Filesystem                Size      Used Available Use% Mounted on
/dev/root              2.8M      2.8M         0 100% /rom
tmpfs                   250.8M      1.4M    249.4M   1% /tmp
/dev/sda1                 4.7G     65.1M      4.4G   1% /overlay
overlayfs:/overlay        4.7G     65.1M      4.4G   1% /
tmpfs                   512.0K         0    512.0K   0% /dev
/dev/sda2                 9.7G     37.5M      9.2G   0% /mnt/data

.

root@WRT1200AC:~# vi /etc/config/nlbwmon

root@WRT1200AC:~# cat /etc/config/nlbwmon

config nlbwmon
            option database_interval '1'
            option database_limit '10000'
            option protocol_database '/usr/share/nlbwmon/protocols'
            option database_compress '0'
            option database_generations '24'
            option refresh_interval '5m'
            option commit_interval '10m'
            list local_network '192.168.178.0/24'
            list local_network '192.168.10.0/24'
            list local_network 'lan'
            list local_network 'wlan0'
            option database_directory '/mnt/data/nlbwmon'

I'll try sending you the (current) datafile. If I reboot the router, this file will not be read again and counts will start from zero.

Best regards,
Hendrik

Thanks for the feedback. Can you also test nlbw -c show -t 2018-01-01 and see if it yields an error or shows the proper data?

That's working fine, I'm seeing the same data there as in luci (everything beginning from last boot). Problems will only start when rebooting, and the old data will not be re-read (I think). That's really my only issue. Everything is working fine, except that every reboot zeros all data again.

Did I make any obvious stupid mistake when configuring nlbwmon?

Also I do not think that the issue is with storage itself. I'm storing the data for luci-app-statistics on the same mount point, and that data gets successfully re-read on reboot and is not lost, same goes for the data of nvstat.

One more thing: maybethere is some weird date (see my comment to a date far in the future in nvstat) added to the log when rebooting and that causes the file to not be read again? Do you know what would happen if a dataset with a date far in the future would be added to the DB file?

No, your configuration appears fine. Can you check if an /etc/init.d/nlbwmon stop; /etc/init.d/nlbwmon start results in the same, counter-resetting behavior?

I suspect the nlbwmon startup races with your partition mounting... first the daemon starts, finds no databases, initializes new ones; then the partition is eventually mounted and nlbwmon overwrites the preexisting database file with the freshly initialized one. - I think I'll add a workaround for this to the package...

It should be easy to test this assumption by moving the database directory to a persistent location on the internal overlay, e.g. /usr/lib/nlbwmon/, then rebooting the device and see if the counters are still reset.

If the assumption turns out to be correct you can try the following in the meanwhile to work around the mount race condition:

  • Disable automatic startup with /etc/init.d/nlbwmon disable
  • Create a script /etc/hotplug.d/block/99-start-nlbwmon with the following contents:

--

#!/bin/sh

sleep 5
if [ "$ACTION" = add ] && grep -sq /mnt/data /proc/mounts; then
    /etc/init.d/nlbwmon start
fi

That was one of my suspicions also. As per your advice, I set the storage to /usr/lib/nlbwmon, and restarted the service. The datafile was created.
Then I rebooted the router, and interestingly enough, immediately after the reboot no data file was there - and the counter showed zero again.
I'm guessing that that's the same thing happening when I'm storing the datafile on the sdcard.

That's really odd. Do you have any more ideas what to do?

Interesting... so the file was present before the reboot and absent after reboot? This sounds odd, will need to see if I can reproduce it on my end as well.

Exactly. There before the reboot, gone after the reboot. Maybe that is a general issue with my setup? I'll check with a random file that I'll create and then reboot shortly after to see if it is still there.
"Reboot" meaning a proper reboot through the command line or luci of course, not a hard poweroff.

Thanks for all your effort!

Just to rule out any misbehavior in nlbwmon, you can also add something like ls -lh /usr/lib/nlbwmon | logger -t debug to /etc/init.d/nlbwmon, procedure start_service, then check logread after reboot to see if the file is already missing when the init script starts. If not, there might be something wrong with nlbwmon itself.

Also carefuly review dmesg and look for overlay or ubifs related errors.

Will do and report back.

I just confirmed the behaviour again.
I also created an empty file through "touch" in both /mnt/data/nlbwmon and /usr/lib/nlbwmon and rebooted, the file is still there after the reboot in both directories.

I do see one ubi related error in dmesg 8 seconds into boot:

[ 8.283229] UBIFS error (pid: 783): cannot open "ubi1:syscfg", error -19

No idea if that's relevant. I also added the "ls" to the init script - which points mo the date issue, interestingly enough. That's the first time I see it in another place than vnstat:

Sun Sep 10 07:43:32 2023 user.notice debug: -rw-r----- 1 root root 2.6K Jan 9 2018 20180101.db

So my hardware time seems completely off at boot time. Maybe that's the issue after all? Any idea how to fix this to see if it helps?

This is the point at boot where the date gets switched to the correct date (I'm guessing through ntp since the network interface comes up, although that should happen way before nlbwmon gets started, nlbwmon is S60, network is S20, and sysntpd is S39):

Sun Sep 10 07:43:37 2023 daemon.notice netifd: Network device 'wlan1' link is up
Sun Sep 10 07:43:37 2023 daemon.notice netifd: Interface 'wlan0' is enabled
Sun Sep 10 07:43:37 2023 daemon.notice netifd: Interface 'wlan0' is setting up now
Sun Sep 10 07:43:37 2023 daemon.notice netifd: Interface 'wlan0' is now up
Sun Sep 10 07:43:37 2023 daemon.info dnsmasq[2875]: reading /tmp/resolv.conf.auto
Sun Sep 10 07:43:37 2023 daemon.info dnsmasq[2875]: using nameserver 209.222.18.218#53
Sun Sep 10 07:43:37 2023 daemon.info dnsmasq[2875]: using nameserver 209.222.18.222#53
Sun Sep 10 07:43:37 2023 daemon.info dnsmasq[2875]: using local addresses only for domain lan
Sun Sep 10 07:43:37 2023 daemon.info dnsmasq[2875]: using nameserver 209.222.18.222#53
Sun Sep 10 07:43:37 2023 daemon.info dnsmasq[2875]: using nameserver 209.222.18.218#53
Sun Sep 10 07:43:37 2023 daemon.info dnsmasq[2875]: using nameserver 209.222.18.222#53
Sun Sep 10 07:43:37 2023 daemon.info dnsmasq[2875]: using nameserver 209.222.18.218#53
Sun Sep 10 07:43:37 2023 daemon.notice netifd: Interface 'wlan0' has link connectivity
Sun Sep 10 07:43:37 2023 daemon.notice procd: /etc/rc.d/S96led: setting up led wlan24
Sun Sep 10 07:43:37 2023 user.notice vpn-policy-routing [3104]: Creating table 'wan/eth1/178.24.162.254' [✓]
Sun Sep 10 07:43:38 2023 user.notice firewall: Reloading firewall due to ifup of wlan0 (wlan1)
Sun Sep 10 07:43:38 2023 daemon.info procd: - init complete -
Sun Sep 10 07:43:38 2023 daemon.notice vnstatd[3467]: vnStat daemon 1.12 started. (uid:0 gid:0)
Sun Sep 10 07:43:38 2023 daemon.notice vnstatd[3467]: Monitoring: eth1 (1000 Mbit) tun0 (1000 Mbit)
Sun Sep 10 07:43:38 2023 user.notice vpn-policy-routing [3104]: Creating table 'PIA_VPN/tun0/10.32.10.5' [✓]
Sun Sep 10 07:43:38 2023 user.notice vpn-policy-routing [3104]: Routing 'vpn subnet' via PIA_VPN [✓]
Sun Sep 10 07:43:38 2023 user.notice vpn-policy-routing [3104]: Routing 'vpn wlan0' via PIA_VPN [✓]
Sun Sep 10 07:43:38 2023 user.notice vpn-policy-routing [3104]: Routing 'speedtest' via PIA_VPN [✓]
Sun Sep 10 07:43:38 2023 user.notice vpn-policy-routing [3104]: service started on wan/eth1/178.24.162.254 PIA_VPN/tun0/10.32.10.5 [✓]
Sun Sep 10 07:43:38 2023 daemon.info odhcpd[1265]: Using a RA lifetime of 0 seconds on br-lan
Tue Jan  9 10:04:35 2018 daemon.info dnsmasq[2875]: read /etc/hosts - 4 addresses

Ah it starts to make sense :slight_smile: nlbwmon is configured to keep at most 24 generations (24 months) of data, but september 2023 is way more than 24 months in the future so the old file gets purged because it is considered stale.

The bogus time is introduced by /etc/init.d/fixtime which sets the initial pre-NTP system time to the latest modification time of all files found in /etc.

Please run ls -lhR /etc/ | grep 2023 and perform a touch on every found file to set its modtime to the current RTC adjusted time.

Already checked on this in the meantime. :slight_smile:

I found the "fixtime" script and copied/modified it to see which file is causing the issue. In my case, it's

root@WRT1200AC:/mnt/data/stuff# sh newest
/etc/uhttpd.key
root@WRT1200AC:/mnt/data/stuff# ls -lh /etc/uhttpd.key
-rw-r--r-- 1 root root 1.7K Sep 10 2023 /etc/uhttpd.key
root@WRT1200AC:/mnt/data/stuff#

So I did

root@WRT1200AC:/mnt/data/stuff# touch /etc/uhttpd.key
root@WRT1200AC:/mnt/data/stuff# ls -lh /etc/uhttpd.key
-rw-r--r-- 1 root root 1.7K Jan 9 10:23 /etc/uhttpd.key
root@WRT1200AC:/mnt/data/stuff#

Then I generated some traffic.

Unfortunately, no success yet. But I think I know why: on boot, my sdcard gets mounted as root (I ran out of space), so when booting, the ubi file system is still mounted and the timestamp probably pulled from there. Do you know what device I have to mount to access the ubi "/" partition?

Success! I rebooted without the sdcard to have access to the internal file system (didn't know how to mount it manually), and "touch"ed both uhttp* files in /etc. Now it works!

edit: I tried mounting ubi, but I guess I need the correct parameters (in my case it's /dev/ubi0_1).

I was hoping to fix my issues with luci-app-statistics along the way, but unfortunately those stats still get cut off after a reboot and don't get repopulated. Ah well, 2 out of 3 ain't so bad. :slight_smile:

Reached my post limit now, just in time.

edit2: false alarm regarding luci-app-statistics, it works fine.

edit3 since it's my only way of posting right now: do you have a paypal donation option? I'd be glad to contribute. :slight_smile:

Ah great. I think to get access to the internal overlay without having to remove the SD card you need to mount the right /dev/ubiblockX_Y device.

You can also look at bandwidthd or darkstat for monitoring: both are in LEDE.