[PR] ipq806x: kernel 5.10 bump code propose

That is actually a nice idea (at least for my own build). I tried having it a script setting scaling_min_freq in /etc/rc.local (but that got lost in some reset).

However, I am more worried about R7800 users in general, so a global startup modification might be viable.

Flent is pretty cool tool for network performance measurement :wink: A bit more comprehensive that a casual speed test.

Those three test runs were run from the same PC inside a 10 minutes range, so the only real change was on the CPU frequency scaling settings. So the impact from the scaling is quite visible.

+1 vote for setting min freq higher (up to 800MHz) but from different perspective.

At least late last year there were issues causing spontaneous restarts and pointing to Krait CPU freq changes.

https://bugs.openwrt.org/index.php?do=details&task_id=3099

I observed the same spontaneous restart in last Nov, and while the workaround mentioned in the bug report was to use performance governor, I still use the ondemand and instead set the min freq to 800MHz in rc.local.

Have not experienced any restarts due to same issue after changing min freq and the freq changes has drastically reduced after;

1 Like

Would be good to understand if 600mgz can also be used just to reduce power consumption even more. (Very old test were not right as the voltage was actually never changed and ethernet EEE was disabled)

Anyway I think I will propose a startup script that tweak the governor as removing the 300 opp Freq from the dts seems too aggressive and would remove user choise.

(Wonder if the nss build crash were caused by the CPU scaling and not the nss core scaling)

1 Like

Highly possible as it is very difficult to get logs for spontaneous restarts.
But would not tie up the reported CPU freq changes bug report to nss build as it was reported by non nss build user.

Could be that the problem is more present since you stress more the chip using the nss core and the regulator since you use more power...

Thanks for the idea.
I have so far done the tuning by patching kernel before compilation, as I saw /etc/rc.local as too vulnerable. But your idea about /etc/init.d/something is great. It makes the frequency scaling more easily configurable by the user, but is still not as vulnerable as a config setting or user script.

This seems to work ok for me in R7800 (combined with the normal symlink in /etc/rc.d).

root@router1:~# cat /etc/init.d/cpufreq
#!/bin/sh /etc/rc.common

START=15

boot() {
  # Select 'ondemand'=scaling or 'performance'=always max freq
  echo ondemand > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
  echo ondemand > /sys/devices/system/cpu/cpufreq/policy1/scaling_governor

  # Effective only with ondemand
  echo 600000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
  echo 600000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_min_freq
  echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
  echo 50 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
}
1 Like

Joined the bandwagon and compiled Ansuel ipq806x-5.10 on top of the latest master today. TFTP factory img + manual config, no sysupgrade.

Just for the sake of it run dslreports speed test and it is still quite a bit behind current swconfig@master where I get 600-700Mbps both up/down. No SQM or SW offloading used. Test run with performance governor and /etc/config/network config is standard "factory".

image

Sirq hit max 50%, assuming loading only the CPU-0.

Otherwise the logs are clean so far and seems to be working ok.

Edit:
br-lan seems to be going up and down pretty frequently;

Sat Mar 27 20:38:11 2021 kern.info kernel: [ 5831.161832] qca8k 37000000.mdio-mii:10 lan4: Link is Down
Sat Mar 27 20:38:11 2021 kern.info kernel: [ 5831.161946] br-lan: port 4(lan4) entered disabled state
Sat Mar 27 20:38:14 2021 kern.info kernel: [ 5834.283224] qca8k 37000000.mdio-mii:10 lan4: Link is Up - 1Gbps/Full - flow control rx/tx
Sat Mar 27 20:38:14 2021 kern.info kernel: [ 5834.283292] br-lan: port 4(lan4) entered blocking state
Sat Mar 27 20:38:14 2021 kern.info kernel: [ 5834.290473] br-lan: port 4(lan4) entered forwarding state
Sat Mar 27 20:38:14 2021 daemon.notice netifd: Network device 'lan4' link is up

Edit2:
Based on earlier reported issue on lan going down/up, disabled eee with ethtool --set-eee lan4 eee off.
Let's see if any help.

1 Like

DSA only uses one CPU for now.

Just a quick update, I got an image to boot and run:

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 5.10.26 (sn@DESKTOP-TPH49OG) (arm-openwrt-linux-muslgnueabi-gcc (OpenWrt GCC 8.4.0 r16313+21-851dadc257b7) 8.4.0, GNU ld (GNU Binutils) 2.34) #0 SMP Fri Mar 26 20:43:29 2021
[    0.000000] CPU: ARMv7 Processor [512f04d0] revision 0 (ARMv7), cr=10c5787d
[    0.000000] CPU: div instructions available: patching division code
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
[    0.000000] OF: fdt: Machine model: Netgear Nighthawk X4 R7500v2

After a few attempts configuring DSA (one of which made my device fail to finish booting - it's fragile to mistakes) i did get my AP only, 2 VLAN network up but it was not stable wrt to ipv4. I don't know why yet as my testing window closed when my family started asking for internet (I did not get a chance to disable eee as described above).

I can post a gist with a full dmesg, regulator_summary, plus some other system info (temperatures and cpuidle seem to be working) if you like. I'll see if i can test again tomorrow and try disabling eee to keep the network stable for longer.

Note I still get the build error I mention above. First time through after make clean/distclean no images are generated but after doing /usr/bin/make -j 1 IGNORE_ERRORS=m V=s immediatly after the build failure, images are generated (which i used for the test) but the same compile error still comes up.

I don't know why this happens yet, it looks like the command that generates qcom-ipq8064-g10.dtb silently fails.

HTH

are you sure the swconfig results are not with sw offload enabled?
Anyway i wonder if using only one cpu is the cause of the port dropping

Just changed back to today master build to run the same dslreports test:

root@R7800:~# ubus call system board
{
	"kernel": "5.4.106",
	"hostname": "R7800",
	"system": "ARMv7 Processor rev 0 (v7l)",
	"model": "Netgear Nighthawk X4S R7800",
	"board_name": "netgear,r7800",
	"release": {
		"distribution": "OpenWrt",
		"version": "SNAPSHOT",
		"revision": "r16345-d71424a085",
		"target": "ipq806x/generic",
		"description": "OpenWrt SNAPSHOT r16345-d71424a085"
	}
}
root@R7800:~#

Dslreports run with standard ondemand governor with min freq 800MHz:

root@R7800:~# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
ondemand
root@R7800:~# cat /sys/devices/system/cpu/cpufreq/policy1/scaling_governor
ondemand
root@R7800:~# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
800000
root@R7800:~# cat /sys/devices/system/cpu/cpufreq/policy1/scaling_min_freq
800000

image

And at least I've not tick the box for SW offloading;

Edit:
That port dropping can be related to the same eee related issue @rog reported earlier.
I've now disabled eee for all the lan & wan ports.
So need to wait and see if I'll get those port drops anymore.

2 Likes

I tried again; however, this time turning off eee on lan1, lan2, lan3, lan4, and wan.

So far then network and device are stable so I can let this go for at least today. I'll post a link to the dmesg, a regulator_smmary, /etc/config/network (for dsa) plus some other configuration output in a bit.

I tried to set stp and igmp snooping via luci for the one bridge defined and lost connectivity to the device. I had to use my serial cable to access the device, theses options were not enabled but I am able to set these options and restart the network through the command line without issue. I'll report this in the luci DSA config thread as well and post logs.

I'm in and out today, so please be patient for the logs...

1 Like

Anyway i'm splitting the 5.10 pr from dsa since it still not ready and require more testing. In theory this should make the merge of the 5.10 commits quicker and less patch will be needed to test dsa

1 Like

So, the kernel 5.10 PR is now only about the kernel version bump?
Sounds easier to get accepted.

yes, i crated another draft pr with the dsa changes.

I've encountered the lost connectivity as well it's because of LuCI.

I think it's because LuCI somehow do this to the network config.

uci del network.lan.type

It only happens after you change the settings of LAN interface at least once.

1 Like

So it looks like eee was the problem?

I'm not 100% sure yet, as I need to re-enable it and let it fail. I hope I'll get a chance to do that later today.

This gist has the dmesg (burried in the "logcat" file), regulator_summary (at the end of the "logcat" file), /etc/config/network, and excerpt from /etc/config/system that i'm using.

Note the dmesg and other info in logcat is out of sync with /etc/config/network config file as stp and igmp snooping are currently enabled. I've done a few reboots just to test some things and for now I leave eee disabled.

HTH

No issues after just under 24 hrs up on 5.10.26 with dsa. I did re-enable eee for all ifs for about half of the latter part of that and did not experience issues; hence its not clear to me that eee was/is my issue.

Due to my family's need for reliable service on weekdays, I've gone back to 5.4 for now but I'll see if I can't test again next weekend.