Netgear R7800 exploration (IPQ8065, QCA9984)

Hello everyone!

Been using OpenWRT stable branch on Netgear R7800 for some time now. Huge improvements over stock firmware! Very thankful that this project is alive and strong.

I am thinking about trying out master branch now. I am absolute novice in Linux and networking, but understand how to SSH login, install Luci, understand possible risks mentioned in FAQs (small time window for package installation, risk of bricking device, etc.). I would like to clariy info about current state of performance of R7800, as it is not so easy to correlate between all the different R7800 topics. I am sorry about such a long post, but I am hoping we could clarify some questions and create a single post with adjustments needed for best performance for R7800, as of today. Here are my questions.

  1. As I understand, master branch recently moved to kernel 4.19 (before only community builds were 4.19)?

  2. There recently was a fix for software offloading on IPQ806x on master branch (did not notice it not working, as I use sqm anyway, but just curious)?

  3. I thought I read in "R7800 cache scaling issue" topic that the bug impacting performance with on-demand scheduler is fixed (so that on-demand scheduler with adjusted treshold values should always give computational power on par with performance scheduler) and changes are merged in to master branch. but I can not find any such post anymore. Could somebody clarify if fix is in master or still to be tested in community builds?

  • Curious if on master I could move from this:
    echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
    echo performance > /sys/devices/system/cpu/cpufreq/policy1/scaling_governor
    echo 800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
    echo 800000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
    sleep 1
    echo 1725000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
    echo 1725000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq

  • to this:
    echo 35 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
    echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

  1. To improve performance, there are two recommendations. a) use irqbalance b) set irq affinities manually in rc.local. IMHO I would prefer irqbalance, as it seems more dynamic process, instead of statically setting affinity to specific CPU. But I also read in forum that some IRQs have to be assigned manually, otherwise the change does not work and these processes always run on the same CPU core. Could somebody shead a light on this? Current config on my router in rc.local (taken from Kong's build) is:
    # wifi0 - 5GHz
    echo 2 > /proc/irq/28/smp_affinity
    # wifi1 - 2GHz
    echo 1 > /proc/irq/29/smp_affinity
    # eth0 - WAN
    echo 1 > /proc/irq/31/smp_affinity
    # eth1 - LAN
    echo 2 > /proc/irq/32/smp_affinity
    # USB1 & USB2
    echo 2 > /proc/irq/105/smp_affinity
    echo 2 > /proc/irq/106/smp_affinity
    plus running irqbalance. However, honestly, I do not know if any of this makes sense and if it works together.

  2. I read that IRQ assignments have changed in kernel 4.19, so it is likely this static assignment would be wrong now? Has somebody made a new one for 4.19 or should I just drop it and use irqbalance?

  3. I am interested if the following lines are still important or have maybe some fixes been made so that these can be dropped (as it was mentioned that the ethtool values set here are not even the optimal ones (and they should be set by the driver, without using ethtool) and that there are some bugs that do not allow the correct ones to be set - but did not find more info in the original thread)?

swconfig dev switch0 set ar8xxx_mib_poll_interval 0

ethtool -C eth0 tx-usecs 0
ethtool -C eth1 tx-usecs 0
ethtool -C eth0 rx-usecs 31
ethtool -C eth1 rx-usecs 31

  1. Several people mention that these commands (taken from Kong's build) improve throughput a lot (writing them here just so all info is gathered);

for file in /sys/class/net/*
do
echo 3 > $file"/queues/rx-0/rps_cpus"
echo 3 > $file"/queues/tx-0/xps_cpus"
done

  1. Information in forum suggests that Kong's build also had the following (but this is way above my understanding, aside from the obvious, that two processes arassigned the least possible priority):

# There is no need for collectd to run above nice == 19
if [ ! grep "NICEPRIO=19" /etc/init.d/collectd ]; then
sed -i 's/^NICEPRIO.*/NICEPRIO=19/g' /etc/init.d/collectd
# Restart does not pick up the above change right away
(sleep 300 ; /etc/init.d/collectd stop; sleep 15; /etc/init.d/collectd start) &
fi

# There is no need for uhttpd to run above nice == 19
if [ ! grep "nice -n 19" /etc/init.d/uhttpd ]; then
sed -i "s/procd_set_param command/procd_set_param command nice -n 19/g" /etc/init.d/uhttpd
# Restart does not pick up the above change right away
(sleep 300 ; /etc/init.d/uhttpd stop; sleep 15; /etc/init.d/uhttpd start) &
fi

  1. If there other customizations to improve performance, please be so kind to share. This is all I could find going through the different topics. I am just curious what is the most up-to-date information regarding all this.

Where do these IRQ 28 and 29 come from? I don't see them if I cat /proc/interrupts

I took the config directly from this post without even checking or confirming assignments, so I do not know what is the basis for choosing these IRQs to be set.

The benefit is: IRQ from the SoC ethernet adapters primarily get processed by the second CPU core, not only exclusively on the first CPU core.

try cat /proc/interrupts

A good strategy would be to try it out by yourself and not just discuss in theory.

Which performance specifically do you want to improve?

If you already get line speed routing through your setup, you are satisfied with the measured latency and the CPU has idle cycles left on your application full load, then there is nothing to gain maybe just except for having fun playing with parameters.

If you e.g. want to get big improvements on VPN encryption performance you probably need x86 based devices.

These parameter tweaks are more small optimizations.

Software offloading helped with latency and improved throughput for me. If you are not running SQM - worthwhile to flip on.

Running hnyman’s stable 12 December build (k 4.14). On demand seems to be working as intended. With no CPU tweaks I was getting 650mbps WAN throughput. On demand’s default threshold is set at 95% usage before it ups the CPU frequency. Was able to get on demand to flip on (flips on at 20% usage) and turn up the frequency to get full gigabit WAN speed with this:

echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo 800000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo 20 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

Makes sense to divide up the load. I’m getting line rates without the irq tweaks and I heard a couple things are changing with 4.19 so I’ve held off on tweaks here. Don’t think they’ll improve performance much unless you are running a heavy load.

I didn’t find any improvement with this tweak (k 4-14).

Are you sure this is working as it should, I ran the command in shell and got:

ash: NICEPRIO=19: unknown operand

Try removing the square brackets.

The strange part here about this advice us that why select such a complex way to edit one line in a text file.

The line in /etc/init.d/collectd could be edited before the build in the package sources (the proper way for your own builds), or it could be edited in a live router (if you have just installed collectd via opkg).

Using a script with grep and sed for an one-time change sounds complicated.

Ps. And naturally that collectd nice adjustment has no specific connection to R7800.

I sort of get it, if you want it set on boot and don't build your own images (but you update a lot, like master build). Otherwise you would need to edit the file manually every time you update the image.
I would've used renice tho', but it seems it doesn't exist per default.

1 Like

Yes... thanks, also the script for uhttpd is missing ` signs in the grep line.

EDIT:
Btw, like wired pointed out, IRQ 28 & 29 doesn't exist, so that particular tweak won't work at all.
It's IRQ 45 and 46 for wifi on this router, for some reason you can't set smp affinity on those. Might be a driver issue (?), you can only change affinity on GIC IRQs and ath10k_pci is using PCI-MSI IRQs.

It all still works in 19.07

What does? Setting the affinity for wifi? 4.19.88 below.

$ uname -a
Linux nighthawk 4.19.88 #0 SMP Fri Dec 20 09:10:22 2019 armv7l GNU/Linux
$ cat /proc/interrupts
---snip---
 45:   37202094          0   PCI-MSI 524288 Edge      ath10k_pci
 46:   15512040          0   PCI-MSI 134742016 Edge      ath10k_pci
---snip---

$ echo 2 > /proc/irq/45/smp_affinity
ash: write error: Invalid argument

As I said , it all works in 19.07 which is 4.14.

Odd... wonder what's changed then, that seems like a minor flaw in 4.19 unless it's done for a reason.

@locojohn @perceival @RainGater

Can you guys please confirm that in order to get that performance improvement for 4.14 you put this in the local startup:
for FILE in /sys/class/net/*; do echo 3 > $FILE"/queues/rx-0/rps_cpus"; echo 3 > $FILE"/queues/tx-0/xps_cpus"; done

for 4.19 this:
for FILE in /sys/class/net/*; do [ -f "$FILE/queues/rx-0/rps_cpus" ] && echo 3 > $FILE"/queues/rx-0/rps_cpus"; [ -f "$FILE/queues/tx-0/xps_cpus" ] && echo 3 > $FILE"/queues/tx-0/xps_cpus"; done

thank you

See that post: Build for Netgear R7800
It's universal and is going to work with both kernels.

2 Likes

The L2 fix and new cpufreq driver are approved and waiting to be merged

3 Likes

Seems to be wrong, I also do not see them on K4.19 master builds. Instead, wifi seems to be on 45 and 46, as mentioned in posts below. But changing assignment for 45 and 46 is not allowed for some reason. Do not know why 28 and 29 were in Kong builds (I only recently started using CLI and learning/checking about what does what, before it was just copy in forum, paste in Luci, so I do not have more info).

For me this is currently mostly playing and exploring, as, it seems, R7800 is more or less able to handle my not-so-fast line speed even with SQM cake on. However, I have not yet compared if irqbalance maybe gave some noticeable improvements. Lately the router throughput seems better to me, but this may as well be just some improvements gained from moving to master branch (?). I will compare throughly one day (this or next week) and then give feedback. So far though seems that I might have to agree, that these are actually only small tweaks and improvements and do not actually change much.

What would you say, does it make sense to keep these in rc.local? I'm wondering if there really is any measureable improvement from just setting lower priority for two processes?

Those processes are competing for CPU with dnsmasq, stubby, hostapd, and the likes so it does make sense to lower their priorities.