OpenWrt 18.06.2 mt7621-mikrotik_rbm33g Build & netdata 1.10.0 package: mtk_soc_eth 1e100000 eth0 link up and down and latency

Randomly am seeing the following in logread and latency experiences:

ue Mar 12 13:34:34 2019 kern.info kernel: [67508.608247] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
Tue Mar 12 13:34:37 2019 kern.info kernel: [67511.399512] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
Tue Mar 12 13:36:41 2019 kern.info kernel: [67636.117419] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
Tue Mar 12 13:36:44 2019 kern.info kernel: [67638.914500] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
Tue Mar 12 13:37:05 2019 kern.info kernel: [67660.082141] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
Tue Mar 12 13:37:08 2019 kern.info kernel: [67662.837129] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
Tue Mar 12 13:37:14 2019 kern.info kernel: [67668.242163] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
Tue Mar 12 13:37:16 2019 kern.info kernel: [67671.025557] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
Tue Mar 12 13:37:20 2019 kern.info kernel: [67674.362291] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
Tue Mar 12 13:37:23 2019 kern.info kernel: [67677.210627] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
Tue Mar 12 13:37:36 2019 kern.info kernel: [67690.682753] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
Tue Mar 12 13:37:39 2019 kern.info kernel: [67693.505558] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
Tue Mar 12 13:37:41 2019 kern.info kernel: [67695.783240] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
Tue Mar 12 13:37:44 2019 kern.info kernel: [67698.554572] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
Tue Mar 12 13:48:35 2019 kern.info kernel: [68349.621119] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
Tue Mar 12 13:48:38 2019 kern.info kernel: [68352.402163] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
Tue Mar 12 13:58:15 2019 kern.info kernel: [68929.508578] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
Tue Mar 12 13:58:18 2019 kern.info kernel: [68932.304579] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
Tue Mar 12 14:00:52 2019 kern.info kernel: [69086.592178] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
Tue Mar 12 14:00:55 2019 kern.info kernel: [69089.399656] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up

dmesg... looks like its netdata again!

[17377.697300] do_page_fault(): sending SIGSEGV to netdata for invalid write access to 77e21d2c
[17377.705767] epc = 77fa1e20 in libc.so[77f2c000+92000]
[17377.711236] ra  = 0043c6fd in netdata[400000+95000]
[19093.387541] do_page_fault(): sending SIGSEGV to netdata for invalid write access to 77d92d2c
[19093.396058] epc = 77f12e20 in libc.so[77e9d000+92000]
[19093.401154] ra  = 0043c6fd in netdata[400000+95000]
[65179.884133] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
[65182.649342] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
[65273.726004] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
[65276.525859] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
[67508.608247] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
[67511.399512] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
[67636.117419] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
[67638.914500] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
[67660.082141] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
[67662.837129] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
[67668.242163] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
[67671.025557] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
[67674.362291] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
[67677.210627] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
[67690.682753] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
[67693.505558] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
[67695.783240] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
[67698.554572] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
[68349.621119] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
[68352.402163] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
[68929.508578] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
[68932.304579] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up
[69086.592178] mtk_soc_eth 1e100000.ethernet eth0: port 0 link down
[69089.399656] mtk_soc_eth 1e100000.ethernet eth0: port 0 link up

Do you even know what netdata does?

Your device is unstable for whatever reason and it's not caused by netdata. It could be anything from hw design quirk(s) to kernel/driver issues.

If you have another device available, try to replicate the issues you're seeing (best way to troubleshoot) otherwise try using the master branch. Just make sure you can recover your device in case it fails to boot.

Yes, I know what netdata does, what makes you think its not related to netdata? The so-called stable package in repo is still Version 1.10, its old and has been broken throwing these faults since 18.06.1, at least on the OpenWrt firmware for the RBM33G. Is it the libc.so module?

I'm working on a build environment to be able to compile netdata myself for the RBM33g. If its not the netdata package then what is it? Whatever it is, it shouldn't be in stable branch. I've yet to have a smoke test good enough to have faith in a production environment. Its not the hardware, I've loaded 15 of these boards - all have the same behavior. I've also removed the package and haven't seen this error since. Thought I'd try the package on the new 18.06.2, same result it looks like.

It is unlikely to be the standard C libraries, but something calling them with "bad" input. invalid write access to 77e21d2c suggests a flaw in netdata, potentially in the way it manages memory.

"Stable" applies to OpenWrt itself. Packages that are not part of OpenWrt itself are varying in quality, and are dependent almost entirely on the original sources being correct.

Because why would netdata make your ethernet interface flap?
As far as stable goes, it's leaning more towards Debian "stable" (http://howfuckedismydistro.com/debian/) :slight_smile:

I've mentioned this before but rolling branches/snapshots would probably be a better idea looking at the flow of development in general but it's not my call.

Found that hostapd is still missing this in config: option wpa_group_rekey '86400' Without it, eventually rolling deauths begin to show up (still). The line should be in every interface in /etc/config/network as well. I'm building a list of edits and check-ins, hope to get to them soon.

Hmmm, /etc/config/wireless perhaps?

Don't you find it strange that there are thousands of routers running OpenWrt stably without that line?

Jeff, correct /etc/config/wireless .

It just happened again, this time when:

/etc/init.d/netdata stop
/etc/init.d/netdata disable

Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: SIGNAL: Received SIGTERM. Cleaning up to exit...
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: EXIT: netdata prepares to exit with code 0...
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: EXIT: cleaning up the database...
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: Cleaning up database [1 hosts(s)]...
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: Cleaning up database of host 'edge-sbl.cyfr.tel'...
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: EXIT: stopping master threads...
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: EXIT: Stopping master thread: PLUGIN[proc]
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: EXIT: Stopping master thread: PLUGIN[diskspace]
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: cleaning up...
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: EXIT: Stopping master thread: PLUGIN[cgroup]
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: thread with task id 2485 finished
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: cleaning up...
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: EXIT: Stopping master thread: PLUGIN[idlejitter]
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: thread with task id 2486 finished
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: EXIT: Stopping master thread: HEALTH
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: cleaning up...
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: EXIT: Stopping master thread: PLUGINSD
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: cleaning up...
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: cleaning up...
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: thread with task id 2491 finished
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: thread with task id 2487 finished
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: cleaning up...
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: cleanup completed.
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: thread with task id 2489 finished
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: thread with task id 2492 finished
Wed Mar 13 00:18:39 2019 daemon.info netdata[2059]: EXIT: Stopping master thread: STATSD
Wed Mar 13 00:18:39 2019 kern.info kernel: [  967.006344] do_page_fault(): sending SIGSEGV to netdata for invalid write access to 77e06d2c
Wed Mar 13 00:18:39 2019 kern.info kernel: [  967.014816] epc = 77f86e20 in libc.so[77f11000+92000]
Wed Mar 13 00:18:39 2019 kern.info kernel: [  967.019864] ra  = 0043c6fd in netdata[400000+95000]