Netgear R7800 exploration (IPQ8065, QCA9984)

I could, but this thing is sitting as a main router at a family members house. Not really feeling like i should mess around to much with it (remotely). I've currently put it on V17.01 + the newer wifi firmware. My main focus is longterm stability.

i do have full a access to it via site-to-site vpn though.

Well, I actually do not face any issues with that updated wireless driver, so it would have been useful to know if it helps you with that syslog spam, because even previous firmware update fixed it mostly.

I've just bought a R7800 router and I'm running latest lede snapshot for 2 days with great results!
Wireless performance is really good, as good as the stock firmware, even better sometime. I was able to run a nperf.com test at 300Mbpps/250Mbps with my 11ac (80 MHz) laptop.

Regarding the wired performance, it seems stuck at around 800/850Mbps in a regular NAT scenario. (My fiber WAN connection is 1Gbps/250Mbps, and during the night, Iā€™m able to perform 980Mbps/250Mbps speedtest with a PC directly connected to ONT). I know that hardware NAT is currently not implemented but with a dual core 1,7GHz CPU, I was especting the 1Gbps bandwidth...

On this thread, I can see many discussion regarding CPU clock variation: could you tell me how to monitor/check CPU clock ?

The easiest way is to install the LuCI statistics package and also the collectd-mod-cpufreq package that enables CPU frequency monitoring. (you might also install collectd-mod-thermal to see the CPU temps)

Or you can also look the current value from console:

root@LEDE:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
384000
root@LEDE:~# cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq
384000

When there is more load, it might look like this:

root@LEDE:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
1725000
384000
1 Like

Thanks a lot ! This is what I get on heavy load (750Mbps):

root@LEDE:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
1725000
1000000

and:

CPU: 0% usr 6% sys 0% nic 18% idle 0% io 0% irq 75% sirq

I guess CPU could be the bottleneck...

Could you show a cat /proc/interrupts? Might be to many services are running on the first cpu, and switching e.g. The wireless interfaces to the second cpu would improve overall performance.

Here the output :

root@LEDE:~# cat /proc/interrupts
           CPU0       CPU1
 16:    3282539    2952357       GIC  18 Edge      gp_timer
 18:         33          0       GIC  51 Edge      qcom_rpm_ack
 19:          0          0       GIC  53 Edge      qcom_rpm_err
 20:          0          0       GIC  54 Edge      qcom_rpm_wakeup
 26:          0          0       GIC 241 Edge      29000000.sata
 27:    3432167          0       GIC  67 Edge      qcom-pcie-msi
 28:    9562915          0       GIC  89 Edge      qcom-pcie-msi
 29:     183179          0       GIC 202 Edge      adm_dma
 30:    7136044          0       GIC 255 Level     eth0
 31:    9659778          0       GIC 258 Level     eth1
 32:          0          0       GIC 130 Level     bam_dma
 33:          0          0       GIC 128 Level     bam_dma
 40:          2          0   msmgpio   6 Edge      gpio-keys
 88:          2          0   msmgpio  54 Edge      gpio-keys
 99:          2          0   msmgpio  65 Edge      gpio-keys
103:          0          0   PCI-MSI   0 Edge      aerdrv
104:    3432167          0   PCI-MSI   1 Edge      ath10k_pci
136:          0          0   PCI-MSI   0 Edge      aerdrv
137:    9562915          0   PCI-MSI   1 Edge      ath10k_pci
169:         11          0       GIC 184 Level     msm_serial0
170:          2          0       GIC 187 Level     1a280000.spi
171:         67          0       GIC 142 Level     xhci-hcd:usb1
172:          0          0       GIC 237 Level     xhci-hcd:usb3
IPI0:          0          0  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:     149144    5975393  Rescheduling interrupts
IPI3:          0          0  Function call interrupts
IPI4:      72281    3751366  Single function call interrupts
IPI5:          0          0  CPU stop interrupts
IPI6:          1          0  IRQ work interrupts
IPI7:          0          0  completion interrupts
Err:          0

How can I possibly move services to the second CPU?

[edit] oops, wrong values, this should be the correct one
[edit2] hmm, just testing here and i can't move wireless for some reason, but i can move eth0 and 1:

echo 2 >/proc/irq/30/smp_affinity
echo 2 >/proc/irq/31/smp_affinity

That should move eth0 and 1 over to the second cpu, while keeping Wifi on the original one. You'll have to run it after each reboot.

Let me know how this works out, am really curious if it makes a difference

Thanks @johnnysl for your help.
At first sight, it looks ok, less multicast TV macro-blocs when running wireless high-rate transfers (QoS is my next fight).
I'll perform wired speedtest later, and keep in touch.
Just wondering:

cat /proc/irq/default_smp_affinity
3

Why 3 ?

Bitmask.
1 = core 0
2 = core 1
4 = core 2 ... Etc.

3 = both core 0 and 1

Exactly, it is binary translated.
1=01
2=10
3=11

So 3 would mean it could do multi cpu sharing, but in reality it just only grabs the first one.

@hnyman: any idea why cpu affinity of the wifi interfaces is not allowed to be changed? (Which is no issue on the wrt1900ac)

Thanks for that clarification !
I've noticed another strange behavior: the R7800 has a higher latency than my previouns WNDR3800. The ping for any remote destination is 0,6ms higher. It's also longer to answer ping request as you can see on my smokeping chart:

Do you think it's related to the SoC architecture or something like that ?

I must admit, that switching eth0 over to the 2nd core makes aria download way faster. Probably this is due to USB and wan have been sharing the same 1St core.

I haven't looked at the IPQ806x target kernel options if there is anything about multi-core balancing, but maybe we need to pay attention to those.

I would have assumed that Linux somehow distributes the workload to cores, but apparently not, at least not automatically & optimally.

As far as I have found out, irq load is managed by apic that puts all hardware irq onto core0. The only way to balance the load is to use irqbalance package that is absent in lede

Simplest might be possible to add the affinity setting commands to /etc/rc.local, but I will look into irqbalance: https://github.com/Irqbalance/irqbalance

EDIT: you noticed the irqbalance, while I was writing this.

I was able to get irqbalance compiled. It distributed interrupts rather nicely.

Makefile https://gist.github.com/hnyman/c01b7d1d5e00cc9eea89f6eb4e7e0f27#file-irqbalance-makefile
Additional patch (patches/100-disable-ui-compilation.patch):
https://gist.github.com/hnyman/c01b7d1d5e00cc9eea89f6eb4e7e0f27#file-patch-100-disable-ui-compilation-patch

I made a rather rough first version: I disabled practically all optional functionality, opted to use the external glib2 (that adds a large package dependency) and disabled compilation of the UI, as that gave some errors on the first build attempts.

Below is what happened. I ran "oneshot IRQ balancing" and after a while it can be noticed that 27/qcom-pcie-msi, 31/eth1, 104/ath10k_pci got moved. So both fixed and wlan interrupts got split to different cores.

root@LEDE:/tmp# irqbalance --oneshot
...
root@LEDE:/tmp# cat /proc/interrupts
           CPU0       CPU1
 16:    1038754     541144       GIC  18 Edge      gp_timer
 18:         33          0       GIC  51 Edge      qcom_rpm_ack
 19:          0          0       GIC  53 Edge      qcom_rpm_err
 20:          0          0       GIC  54 Edge      qcom_rpm_wakeup
 26:          0          0       GIC 241 Edge      29000000.sata
 27:     689539      11246       GIC  67 Edge      qcom-pcie-msi
 28:     770553          0       GIC  89 Edge      qcom-pcie-msi
 29:     182217          0       GIC 202 Edge      adm_dma
 30:     329046          0       GIC 255 Level     eth0
 31:     535468      23525       GIC 258 Level     eth1
 32:          0          0       GIC 130 Level     bam_dma
 33:          0          0       GIC 128 Level     bam_dma
 40:          2          0   msmgpio   6 Edge      gpio-keys
 88:          2          0   msmgpio  54 Edge      gpio-keys
 99:          2          0   msmgpio  65 Edge      gpio-keys
103:          0          0   PCI-MSI   0 Edge      aerdrv
104:     689539      11246   PCI-MSI   1 Edge      ath10k_pci
136:          0          0   PCI-MSI   0 Edge      aerdrv
137:     770553          0   PCI-MSI   1 Edge      ath10k_pci
169:         14          0       GIC 184 Level     msm_serial0
170:          2          0       GIC 187 Level     1a280000.spi
171:          0          0       GIC 142 Level     xhci-hcd:usb1
172:        591          0       GIC 237 Level     xhci-hcd:usb3
IPI0:          0          0  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:      64319      94950  Rescheduling interrupts
IPI3:          0          0  Function call interrupts
IPI4:      16372     411777  Single function call interrupts
IPI5:          0          0  CPU stop interrupts
IPI6:          1          2  IRQ work interrupts
IPI7:          0          0  completion interrupts

IRQ affinity distribution:

root@LEDE:/tmp# cat /proc/irq/16/smp_affinity
3
root@LEDE:/tmp# cat /proc/irq/18/smp_affinity
1
root@LEDE:/tmp# cat /proc/irq/19/smp_affinity
1
root@LEDE:/tmp# cat /proc/irq/20/smp_affinity
2
root@LEDE:/tmp# cat /proc/irq/26/smp_affinity
1
root@LEDE:/tmp# cat /proc/irq/27/smp_affinity
2
root@LEDE:/tmp# cat /proc/irq/28/smp_affinity
1
root@LEDE:/tmp# cat /proc/irq/29/smp_affinity
2
root@LEDE:/tmp# cat /proc/irq/30/smp_affinity
1
root@LEDE:/tmp# cat /proc/irq/31/smp_affinity
2
root@LEDE:/tmp# cat /proc/irq/32/smp_affinity
1
root@LEDE:/tmp# cat /proc/irq/33/smp_affinity
2
root@LEDE:/tmp# cat /proc/irq/40/smp_affinity
3
root@LEDE:/tmp# cat /proc/irq/88/smp_affinity
3
root@LEDE:/tmp# cat /proc/irq/99/smp_affinity
3
root@LEDE:/tmp# cat /proc/irq/103/smp_affinity
3
root@LEDE:/tmp# cat /proc/irq/104/smp_affinity
3
root@LEDE:/tmp# cat /proc/irq/136/smp_affinity
3
root@LEDE:/tmp# cat /proc/irq/137/smp_affinity
3
root@LEDE:/tmp# cat /proc/irq/169/smp_affinity
2
root@LEDE:/tmp# cat /proc/irq/170/smp_affinity
1
root@LEDE:/tmp# cat /proc/irq/171/smp_affinity
2
root@LEDE:/tmp# cat /proc/irq/172/smp_affinity
1

I pushed irqbalance to the packages repo:
https://github.com/openwrt/packages/commit/c5913bd12d73c4c781b79c16246a8e9c8d236b8f

EDIT:
Would be great if somebody can figure out how to avoid the glib2 dependency. Sounds crazy to pull in a 900 kB library to get 3-4 list functions :frowning:

When I set the configure options in Makefile to disable external glib2, I got errors during compilation. I did not look closer into them, yet, but just decided to use the external glib2 to get buildbot to compile the first versions...

@hnyman - just make it use its bundled glib-local which is merely a tiny stub adding a few linked list functions.

I suppose it's better to exclude wifi affinity change, because sometimes it doesn't broadcast after reboot if it's balanced.

Ps I've put irqbalance into startup.

Update:
Wed Feb 8 02:42:05 2017 user.notice : IRQ 27 was BANNED.
Wed Feb 8 02:42:05 2017 daemon.warn irqbalance: WARNING: MSI interrupts found in /proc/interrupts
Wed Feb 8 02:42:05 2017 daemon.warn irqbalance: But none found in sysfs, you need to update your kernel
Wed Feb 8 02:42:05 2017 daemon.warn irqbalance: Until then, IRQs will be improperly classified

Updated: adm_dma should be excluded as well

Actually it seems that the most correct way is to put eth0 and eth1 onto core1 manually. Irqbalance leads to numerous issues :confused: