I observed the same spontaneous restart in last Nov, and while the workaround mentioned in the bug report was to use performance governor, I still use the ondemand and instead set the min freq to 800MHz in rc.local.
Have not experienced any restarts due to same issue after changing min freq and the freq changes has drastically reduced after;
Would be good to understand if 600mgz can also be used just to reduce power consumption even more. (Very old test were not right as the voltage was actually never changed and ethernet EEE was disabled)
Anyway I think I will propose a startup script that tweak the governor as removing the 300 opp Freq from the dts seems too aggressive and would remove user choise.
(Wonder if the nss build crash were caused by the CPU scaling and not the nss core scaling)
Thanks for the idea.
I have so far done the tuning by patching kernel before compilation, as I saw /etc/rc.local as too vulnerable. But your idea about /etc/init.d/something is great. It makes the frequency scaling more easily configurable by the user, but is still not as vulnerable as a config setting or user script.
This seems to work ok for me in R7800 (combined with the normal symlink in /etc/rc.d).
Joined the bandwagon and compiled Ansuel ipq806x-5.10 on top of the latest master today. TFTP factory img + manual config, no sysupgrade.
Just for the sake of it run dslreports speed test and it is still quite a bit behind current swconfig@master where I get 600-700Mbps both up/down. No SQM or SW offloading used. Test run with performance governor and /etc/config/network config is standard "factory".
Sirq hit max 50%, assuming loading only the CPU-0.
Otherwise the logs are clean so far and seems to be working ok.
br-lan seems to be going up and down pretty frequently;
Sat Mar 27 20:38:11 2021 kern.info kernel: [ 5831.161832] qca8k 37000000.mdio-mii:10 lan4: Link is Down
Sat Mar 27 20:38:11 2021 kern.info kernel: [ 5831.161946] br-lan: port 4(lan4) entered disabled state
Sat Mar 27 20:38:14 2021 kern.info kernel: [ 5834.283224] qca8k 37000000.mdio-mii:10 lan4: Link is Up - 1Gbps/Full - flow control rx/tx
Sat Mar 27 20:38:14 2021 kern.info kernel: [ 5834.283292] br-lan: port 4(lan4) entered blocking state
Sat Mar 27 20:38:14 2021 kern.info kernel: [ 5834.290473] br-lan: port 4(lan4) entered forwarding state
Sat Mar 27 20:38:14 2021 daemon.notice netifd: Network device 'lan4' link is up
Based on earlier reported issue on lan going down/up, disabled eee with ethtool --set-eee lan4 eee off.
Let's see if any help.
Just a quick update, I got an image to boot and run:
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 5.10.26 (sn@DESKTOP-TPH49OG) (arm-openwrt-linux-muslgnueabi-gcc (OpenWrt GCC 8.4.0 r16313+21-851dadc257b7) 8.4.0, GNU ld (GNU Binutils) 2.34) #0 SMP Fri Mar 26 20:43:29 2021
[ 0.000000] CPU: ARMv7 Processor [512f04d0] revision 0 (ARMv7), cr=10c5787d
[ 0.000000] CPU: div instructions available: patching division code
[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
[ 0.000000] OF: fdt: Machine model: Netgear Nighthawk X4 R7500v2
After a few attempts configuring DSA (one of which made my device fail to finish booting - it's fragile to mistakes) i did get my AP only, 2 VLAN network up but it was not stable wrt to ipv4. I don't know why yet as my testing window closed when my family started asking for internet (I did not get a chance to disable eee as described above).
I can post a gist with a full dmesg, regulator_summary, plus some other system info (temperatures and cpuidle seem to be working) if you like. I'll see if i can test again tomorrow and try disabling eee to keep the network stable for longer.
Note I still get the build error I mention above. First time through after make clean/distclean no images are generated but after doing /usr/bin/make -j 1 IGNORE_ERRORS=m V=s immediatly after the build failure, images are generated (which i used for the test) but the same compile error still comes up.
I don't know why this happens yet, it looks like the command that generates qcom-ipq8064-g10.dtb silently fails.
That port dropping can be related to the same eee related issue @rog reported earlier.
I've now disabled eee for all the lan & wan ports.
So need to wait and see if I'll get those port drops anymore.
I tried again; however, this time turning off eee on lan1, lan2, lan3, lan4, and wan.
So far then network and device are stable so I can let this go for at least today. I'll post a link to the dmesg, a regulator_smmary, /etc/config/network (for dsa) plus some other configuration output in a bit.
I tried to set stp and igmp snooping via luci for the one bridge defined and lost connectivity to the device. I had to use my serial cable to access the device, theses options were not enabled but I am able to set these options and restart the network through the command line without issue. I'll report this in the luci DSA config thread as well and post logs.
I'm in and out today, so please be patient for the logs...
This gist has the dmesg (burried in the "logcat" file), regulator_summary (at the end of the "logcat" file), /etc/config/network, and excerpt from /etc/config/system that i'm using.
Note the dmesg and other info in logcat is out of sync with /etc/config/network config file as stp and igmp snooping are currently enabled. I've done a few reboots just to test some things and for now I leave eee disabled.
No issues after just under 24 hrs up on 5.10.26 with dsa. I did re-enable eee for all ifs for about half of the latter part of that and did not experience issues; hence its not clear to me that eee was/is my issue.
Due to my family's need for reliable service on weekdays, I've gone back to 5.4 for now but I'll see if I can't test again next weekend.