Netgear R7800 exploration (IPQ8065, QCA9984)

Lucky you!

I switched off 802.11r, to see if its better, its not really better the STA does not get disconnected, but is unable to reach the internet for a while.
But I noticed this just after i changed the setting:

Mon Jul  6 15:42:30 2020 daemon.notice hostapd: wlan0: interface state DFS->ENABLED
Mon Jul  6 15:42:30 2020 daemon.notice hostapd: wlan0: AP-ENABLED
Mon Jul  6 15:42:30 2020 kern.info kernel: [71416.682067] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0-2: link becomes ready
Mon Jul  6 15:42:30 2020 daemon.notice netifd: Network device 'wlan0' link is up
Mon Jul  6 15:42:30 2020 daemon.notice netifd: Network device 'wlan0-1' link is up
Mon Jul  6 15:42:30 2020 daemon.notice netifd: Network device 'wlan0-2' link is up
Mon Jul  6 15:44:14 2020 daemon.err hostapd: nl80211: NL80211_ATTR_STA_VLAN (addr=xx:xx:xx:xx:xx:xx ifname=wlan1-2 vlan_id=0) failed: -2 (No such file or directory)

Maybe thats the root cause of my issues,
Looks a bit like: https://bugs.openwrt.org/index.php?do=details&task_id=2286
Maybe there is just a bug with VLANs, but well then i have an issue either way...

Several users have reported issues with virtual inferfaces. Turn off the virtual and run just one SSID per radio frequency. For simplicity I would use a different SSID per radio frequency and see what your results are.

Seems like im not the only one to get that error... @ParanoidZoid got it as well...

I found a mistake in the lan bridging which may have caused some of the issues I had. I am now running the latest master now with 802.11k and v enabled, but without 802.11r (I guess something is not implemented right with VLANs). The result is that the roaming is now really smooth. So im going to leave it like this for now (until the vlan 802.11r things get cleared up).

Thanks everybody for the help.

Ramon

I'm planning to push tsens support for ipq806x upstream and I'm testing some thing... I notice that there is an upstream driver that is almost identical to ours... So i'm asking here @hnyman how can i check if values output by this driver are correct?

Here is some output of the temp

root@OpenWrt:/# cat /sys/devices/virtual/thermal/*/temp
29897
29900
30022
29899
29903
29899
30021
30022
30022
30020
30021
root@OpenWrt:/# cat /sys/devices/virtual/thermal/thermal_zone0/temp
29897
root@OpenWrt:/# cat /sys/devices/virtual/thermal/*/temp
29897
29900
30022
29899
29903
29899
30021
30022
30022
30020
30021

They should be good or completely wrong?

If I remember right, @dissent1 backported/modified the upstream driver for us.

Values should be millicelcius, so 30000 = 30'C and sounds like a cold device right after boot. Normal values for should be around 50000-55000 for router in light use.

Please check the discussion in the thread below and the PRs linked there (for pointers how our driver is related to the upstream one)


Thx for the data.
I can notice that the (i don't know why) unused 9860 driver in the upstream kernel is more or less a 1:1 copy of our driver and vary only of some part that are not supported upstream... (the configurable trip point)

The output I got was from using this not used driver...
I'm searching some reason to not use this driver and propose our custom one.
From what i can see i think driver 9860 and ipq806x use the same tsens version or actually they are just the same driver and one of them made to the upstream kernel.

It would not be anything new for QCA SoC-s of the same generation to share a lot of stuff, specially power and thermal management, basic peripherals etc.
Sometimes its identical and sometimes with minor changes.

From the thread i can read that msm8960 ipq8064 and apq8064 are all twins... So i think i will follow this path and just propose the ipq8064 support using the msm8960 driver. (and also add some code so we can remove the hack we use to make the driver use the gcc regs)

3 Likes

Any experience with the schedutil governor?

This link below lists a lot of hw gotcha's to make the ondemand governor work; I doubt that schedutil as extensively tested by QCA.

https://source.codeaurora.org/quic/qsdk/oss/system/openwrt/tree/target/linux/ipq806x/base-files/etc/init.d/powerctl?h=NHSS.QSDK.6.1.r1

1 Like

Interesting values. Maybe my router doesn't lock up anymore because I too use min freq 800MHz.
They are using 1s for sampling_rate, mine is 20ms, maybe too aggressive?

Agreed, those are too aggressive: it makes sense to just use the performance governor and forget about it. But the main point there is the frequencies: the hw does not take some of the combinations well. schedutil does not have so many nobs to turn, so might not work well but I never tested.

1 Like

I’ve been trying this combo for a little while. Little more aggressive, irqbalance enabled, global packet steering enabled, averaging 97% of the time at 800mhz with light home usage... the interrupts seem pretty balanced (If it doesn’t make sense / is stupid I’m open to feedback).


root@OpenWrt:~# uname -a
Linux OpenWrt 5.4.50 #0 SMP Thu Jul 9 11:01:20 2020 armv7l GNU/Linux

root@OpenWrt:~# cat /etc/rc.local
# Put your custom commands here that should be executed once
# the system init finished. By default this file does nothing.

echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo 800000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo 20 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 60 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
echo 1000000 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate

echo 1 > /sys/devices/virtual/net/br-lan/queues/rx-0/rps_cpus
echo 0 > /sys/devices/virtual/net/eth0.2/queues/rx-0/rps_cpus
echo 1 > /sys/devices/virtual/net/eth1.1/queues/rx-0/rps_cpus
echo 0 > /sys/devices/virtual/net/ifb4eth0.2/queues/rx-0/rps_cpus
echo 1 > /sys/devices/virtual/net/lo/queues/rx-0/rps_cpus

exit 0


root@OpenWrt:~# cat /proc/interrupts
           CPU0       CPU1       
 16:    3852601    2178341     GIC-0  18 Edge      gp_timer
 18:       1452       6737     GIC-0  51 Edge      qcom_rpm_ack
 19:          0          0     GIC-0  53 Edge      qcom_rpm_err
 20:          0          0     GIC-0  54 Edge      qcom_rpm_wakeup
 26:          0          0     GIC-0 241 Level     ahci[29000000.sata]
 27:          0          0     GIC-0 210 Edge      tsens_interrupt
 30:     188226       3681     GIC-0 202 Level     adm_dma
 31:   11681556   11504074     GIC-0 255 Level     eth0
 32:   12039598   17363567     GIC-0 258 Level     eth1
 33:          0          0     GIC-0 130 Level     bam_dma
 34:          0          0     GIC-0 128 Level     bam_dma
 36:          0          0   PCI-MSI   0 Edge      aerdrv
 38:          0          0   PCI-MSI 134217728 Edge      aerdrv
 39:         13          0     GIC-0 184 Level     msm_serial0
 40:          2          0   msmgpio   6 Edge      keys
 41:          2          0   msmgpio  54 Edge      keys
 42:          2          0   msmgpio  65 Edge      keys
 43:          0          0     GIC-0 142 Level     xhci-hcd:usb1
 44:          0          0     GIC-0 237 Level     xhci-hcd:usb3
 45:         38          0   PCI-MSI 524288 Edge      ath10k_pci
 46:         32          0   PCI-MSI 134742016 Edge      ath10k_pci
IPI0:          0          0  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:     252551     787520  Rescheduling interrupts
IPI3:    5776514    3873354  Function call interrupts
IPI4:          0          0  CPU stop interrupts
IPI5:      91678     158242  IRQ work interrupts
IPI6:          0          0  completion interrupts

How are you not getting any IRQ 45 & 46? Not using wifi?
From mine:

31:          4    3841293     GIC-0 255 Level     eth0
32:        237   52518910     GIC-0 258 Level     eth1
...
45:   37160470          0   PCI-MSI 524288 Edge      ath10k_pci
46:   16937144          0   PCI-MSI 134742016 Edge      ath10k_pci

I chose to just move IRQ31/32 to CPU1, I am not running irqbalance.

That was my main router (wifi turned off). Not sure if splitting it by irq or how I did is better. This what it looks like on a r7800 AP with the same settings (goal was to split the work- wifi from other functions):


root@OpenWrt:~# cat /proc/interrupts
           CPU0       CPU1       
 16:   10475230    3262708     GIC-0  18 Edge      gp_timer
 18:        185        296     GIC-0  51 Edge      qcom_rpm_ack
 19:          0          0     GIC-0  53 Edge      qcom_rpm_err
 20:          0          0     GIC-0  54 Edge      qcom_rpm_wakeup
 26:          0          0     GIC-0 241 Level     ahci[29000000.sata]
 27:          0          0     GIC-0 210 Edge      tsens_interrupt
 30:     187232         56     GIC-0 202 Level     adm_dma
 31:         30      42940     GIC-0 255 Level     eth0
 32:         99    5316328     GIC-0 258 Level     eth1
 33:          0          0     GIC-0 130 Level     bam_dma
 34:          0          0     GIC-0 128 Level     bam_dma
 36:          0          0   PCI-MSI   0 Edge      aerdrv
 38:          0          0   PCI-MSI 134217728 Edge      aerdrv
 39:         13          0     GIC-0 184 Level     msm_serial0
 40:          2          0   msmgpio   6 Edge      keys
 41:          2          0   msmgpio  54 Edge      keys
 42:          2          0   msmgpio  65 Edge      keys
 43:          0          0     GIC-0 142 Level     xhci-hcd:usb1
 44:          0          0     GIC-0 237 Level     xhci-hcd:usb3
 45:   15412016          0   PCI-MSI 524288 Edge      ath10k_pci
 46:   33585658          0   PCI-MSI 134742016 Edge      ath10k_pci
IPI0:          0          0  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:     155188    2828194  Rescheduling interrupts
IPI3:     920797    6022548  Function call interrupts
IPI4:          0          0  CPU stop interrupts
IPI5:     136970     120890  IRQ work interrupts
IPI6:          0          0  completion interrupts
Err:          0

I don't believe ash will process this right but I know bash will:

find /sys/devices/ -name xps_cpus | while read f; do echo 1 > $f; done
find /sys/devices/ -name rps_cpus | while read f; do echo 2 > $f; done

That covers all devices in a generic way.

Have you tested this transmission vs receiving split compared to your settings:


devices/virtual/net/br-lan/queues/rx-0/rps_cpus = 3
devices/virtual/net/eth0.2/queues/rx-0/rps_cpus = 3
devices/virtual/net/eth1.1/queues/rx-0/rps_cpus = 3
devices/virtual/net/ifb4eth0.2/queues/rx-0/rps_cpus = 3
devices/virtual/net/lo/queues/rx-0/rps_cpus = 3

anybody having problems with setting MTU on eth0? this used to work but now seems not to.

@Ansuel i guess the DSA patches never made it into master, do you have a patch that changes R7800 to use DSA instead of swconfig?