But the the 384k step still have visible impact in flent testing.
This is from flent. I tested ondemand governor with the default 384k step enabled, then 600k set as the scaling_min_freq, and finally with performance governor. There is not much difference between performance and the 600k min, but the 384k min lets lots of bandwidth unused. SQM is used.
The original firmware set min Freq with 800mhz... So your solution would be remove the 300mhz opp or set the min freq in the gov. Think it's cleaner to set the min Freq with a startup script by default so that if someone wants to keep the low settings, he can change that.
Also nice graph. Wonder if openwrt can create a custom startup script for this. (A boot init.d like it's done for mvebu to reset the boot counter?)
That is actually a nice idea (at least for my own build). I tried having it a script setting scaling_min_freq in /etc/rc.local (but that got lost in some reset).
However, I am more worried about R7800 users in general, so a global startup modification might be viable.
Flent is pretty cool tool for network performance measurement A bit more comprehensive that a casual speed test.
Those three test runs were run from the same PC inside a 10 minutes range, so the only real change was on the CPU frequency scaling settings. So the impact from the scaling is quite visible.
I observed the same spontaneous restart in last Nov, and while the workaround mentioned in the bug report was to use performance governor, I still use the ondemand and instead set the min freq to 800MHz in rc.local.
Have not experienced any restarts due to same issue after changing min freq and the freq changes has drastically reduced after;
Would be good to understand if 600mgz can also be used just to reduce power consumption even more. (Very old test were not right as the voltage was actually never changed and ethernet EEE was disabled)
Anyway I think I will propose a startup script that tweak the governor as removing the 300 opp Freq from the dts seems too aggressive and would remove user choise.
(Wonder if the nss build crash were caused by the CPU scaling and not the nss core scaling)
Highly possible as it is very difficult to get logs for spontaneous restarts.
But would not tie up the reported CPU freq changes bug report to nss build as it was reported by non nss build user.
Thanks for the idea.
I have so far done the tuning by patching kernel before compilation, as I saw /etc/rc.local as too vulnerable. But your idea about /etc/init.d/something is great. It makes the frequency scaling more easily configurable by the user, but is still not as vulnerable as a config setting or user script.
This seems to work ok for me in R7800 (combined with the normal symlink in /etc/rc.d).
Joined the bandwagon and compiled Ansuel ipq806x-5.10 on top of the latest master today. TFTP factory img + manual config, no sysupgrade.
Just for the sake of it run dslreports speed test and it is still quite a bit behind current swconfig@master where I get 600-700Mbps both up/down. No SQM or SW offloading used. Test run with performance governor and /etc/config/network config is standard "factory".
Sirq hit max 50%, assuming loading only the CPU-0.
Otherwise the logs are clean so far and seems to be working ok.
Edit:
br-lan seems to be going up and down pretty frequently;
Sat Mar 27 20:38:11 2021 kern.info kernel: [ 5831.161832] qca8k 37000000.mdio-mii:10 lan4: Link is Down
Sat Mar 27 20:38:11 2021 kern.info kernel: [ 5831.161946] br-lan: port 4(lan4) entered disabled state
Sat Mar 27 20:38:14 2021 kern.info kernel: [ 5834.283224] qca8k 37000000.mdio-mii:10 lan4: Link is Up - 1Gbps/Full - flow control rx/tx
Sat Mar 27 20:38:14 2021 kern.info kernel: [ 5834.283292] br-lan: port 4(lan4) entered blocking state
Sat Mar 27 20:38:14 2021 kern.info kernel: [ 5834.290473] br-lan: port 4(lan4) entered forwarding state
Sat Mar 27 20:38:14 2021 daemon.notice netifd: Network device 'lan4' link is up
Edit2:
Based on earlier reported issue on lan going down/up, disabled eee with ethtool --set-eee lan4 eee off.
Let's see if any help.
Just a quick update, I got an image to boot and run:
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 5.10.26 (sn@DESKTOP-TPH49OG) (arm-openwrt-linux-muslgnueabi-gcc (OpenWrt GCC 8.4.0 r16313+21-851dadc257b7) 8.4.0, GNU ld (GNU Binutils) 2.34) #0 SMP Fri Mar 26 20:43:29 2021
[ 0.000000] CPU: ARMv7 Processor [512f04d0] revision 0 (ARMv7), cr=10c5787d
[ 0.000000] CPU: div instructions available: patching division code
[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
[ 0.000000] OF: fdt: Machine model: Netgear Nighthawk X4 R7500v2
After a few attempts configuring DSA (one of which made my device fail to finish booting - it's fragile to mistakes) i did get my AP only, 2 VLAN network up but it was not stable wrt to ipv4. I don't know why yet as my testing window closed when my family started asking for internet (I did not get a chance to disable eee as described above).
I can post a gist with a full dmesg, regulator_summary, plus some other system info (temperatures and cpuidle seem to be working) if you like. I'll see if i can test again tomorrow and try disabling eee to keep the network stable for longer.
Note I still get the build error I mention above. First time through after make clean/distclean no images are generated but after doing /usr/bin/make -j 1 IGNORE_ERRORS=m V=s immediatly after the build failure, images are generated (which i used for the test) but the same compile error still comes up.
I don't know why this happens yet, it looks like the command that generates qcom-ipq8064-g10.dtb silently fails.
Edit:
That port dropping can be related to the same eee related issue @rog reported earlier.
I've now disabled eee for all the lan & wan ports.
So need to wait and see if I'll get those port drops anymore.
I tried again; however, this time turning off eee on lan1, lan2, lan3, lan4, and wan.
So far then network and device are stable so I can let this go for at least today. I'll post a link to the dmesg, a regulator_smmary, /etc/config/network (for dsa) plus some other configuration output in a bit.
I tried to set stp and igmp snooping via luci for the one bridge defined and lost connectivity to the device. I had to use my serial cable to access the device, theses options were not enabled but I am able to set these options and restart the network through the command line without issue. I'll report this in the luci DSA config thread as well and post logs.
I'm in and out today, so please be patient for the logs...
Anyway i'm splitting the 5.10 pr from dsa since it still not ready and require more testing. In theory this should make the merge of the 5.10 commits quicker and less patch will be needed to test dsa