R7800 performance

Can you share your tweaks? I am using hnyman build and I am wondering if those will work.

This sounds like a result of the Linux kernel initially starting with the performance governor before reverting to the ondemand governor shortly after (1-2 minutes?) settling in to normal operation, which is apparently what happens.

You have said that you've experimented with the governor settings however I'd suggest you look at these again because there are a couple of things that can be overlooked:

  • there's a separate governor for each CPU core, so you need to set each one separately
  • unless the commands to set the governor are saved in a place like /etc/rc.local, changes made via SSH are lost when you reboot

The simplest approach is just to set the governor to performance for each core, which then means each core runs at maximum speed. If you want to keep the CPU cores running at low speeds when there is little activity, then @ddwrt_refugee's settings here are your best bet.

You probably also need to look into moving some interrupt handling off the first core (cpu0) as well.

These settings definitely affect network throughput once you're looking at handling over 100-150Mbps of traffic with any form of QoS.

LOL 100-150Mbps with current official openwrt build? Seriously you need to fix your builds. 1.7Ghz Arm core can easily do 1gbps nat in sw. In fact I have a custom ipq8x board here with a custom fixed openwrt kernel, that does this:-)

Feel free to bring in your knowledge. It'll be appreciated.

I believe the issue here is that traffic shaping (often the core of a QoS system) is computationally expensive (much more so than NAT/masquerading and pppoe-en-/decapsulation)... And unlike essential steps like NAT and PPPoE, hardware vendors so far ignore traffic shaping as attractive functions for their offload engines.
(On the R7800 there are serious interactions between frequency scaling and demands of in kernel traffic shaper's, which are also discussed somewhere in this thread I believe).

I can try this tonight, but I'm skeptical. First of all, there is basically no load--the CPU is 85% idle when I'm maxing out the NAT performance. Second of all, I think I already did this. The rc.local thing seems kind of irrelevant. If I can make performance fast once, then I will obviously put the right settings into rc.local to survive reboots. But my problem is that I can't find any combination of commands to make performance good in the first place. Now I'm having trouble even getting fast performance after a reboot, so I'm wondering if that was a red herring.

I believe I don't have any QoS enabled, but is there something I should do to make absolutely sure?

Please post the output of tc -s qdisc

I had to install it with opkg install tc, but here's what I get:

# tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc mq 0: dev eth0 root 
 Sent 11356829394 bytes 79637525 pkt (dropped 0, overlimits 0 requeues 1787717) 
 backlog 0b 0p requeues 1787717
qdisc fq_codel 0: dev eth0 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 11356829394 bytes 79637525 pkt (dropped 0, overlimits 0 requeues 1787717) 
 backlog 0b 0p requeues 1787717
  maxpacket 31794 drop_overlimit 0 new_flow_count 550500 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc mq 0: dev eth1 root 
 Sent 227233744009 bytes 166480158 pkt (dropped 0, overlimits 0 requeues 5062992) 
 backlog 0b 0p requeues 5062992
qdisc fq_codel 0: dev eth1 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 227233744009 bytes 166480158 pkt (dropped 0, overlimits 0 requeues 5062992) 
 backlog 0b 0p requeues 5062992
  maxpacket 12112 drop_overlimit 0 new_flow_count 1185940 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan1 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0.sta1 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0

As of approx a week ago or less I noticed that a dslreports speedtest is maxing out both CPU cores (CPU0 is running 5GHz and CPU1 is running eth0). I have a 50M/10M VDSL connection and htop -d 1 is reporting both cores at 100% and almost all of it is sirq processing.
I wish there was an explanation for this.

UPDATE: I do use SQM.

Thanks, no sign of any traffic shaper, so most likely no interference from QoS.

At 50/10 your R7800 should be comfortably traffic shaping without maxing out the CPU(s), so this looks like a regression of some sort.

Have you previously run 18.06.2 or 18.06.1 on either of your R7800s? If you haven't, it might be worth trying either version to see whether there might in fact be a regression with 18.06.4 on the R7800 as @moeller0 wonders...

Take a look here. The plain vanilla 19.07/R7800 build with that script, software offload enabled, and no SQM can easily do a single stream 700..800Mbps NAT-ed (LAN <--> WAN, iperf3) as per my test just a couple of days ago. Wired, not wireless. hnyman's build will do just fine as that build is not making any code changes.
If that does not help and nothing else is hogging the CPU (can you post the screenshot of top -d 1 or htop -d 1) then it is either a faulty router, wiring, or one of the computers are not fast enough....

UPDATE: the tx-usecs changes get lost after every change via LuCI, so a reboot is required to get those back. Can be done via hotplug.d, but I never cared to optimize that part.

Yeah, and it was caused by me. So, a false alarm.

2 Likes

I've tried to put together performance related tweaks:

  1. On-demand governor
# https://forum.openwrt.org/t/netgear-r7800-exploration-ipq8065-qca9984/285/1659                                                 
# https://forum.openwrt.org/t/netgear-r7800-exploration-ipq8065-qca9984/285/1661
#echo 800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
#echo 800000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_min_freq
#sleep 1
echo ondemand > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpufreq/policy1/scaling_governor
# https://forum.lede-project.org/t/r7800-performance/15780/5
#echo 35 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold        
#echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
  1. Performance governor
# https://forum.openwrt.org/t/netgear-r7800-exploration-ipq8065-qca9984/285/1442
echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
echo performance > /sys/devices/system/cpu/cpufreq/policy1/scaling_governor
echo 800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 800000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
sleep 1                                                                
echo 1725000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1725000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
  1. KONG changes
# https://www.desipro.de/openwrt/sources/startup
# https://forum.openwrt.org/t/r7800-cache-scaling-issue/44187/20
echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo 800000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo 20 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold

#utilize both cpu cores for network processing
for file in /sys/class/net/*
do
	echo 3 > $file"/queues/rx-0/rps_cpus"
	echo 3 > $file"/queues/tx-0/xps_cpus"
done

Common

# https://gist.github.com/fantom-x/629fac1e82639979ae7fa02cb3c6d0b4

echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6
echo 0 > /proc/sys/net/ipv6/conf/all/forwarding
echo 0 > /proc/sys/net/ipv6/conf/default/forwarding
/etc/init.d/odhcpd disable                           
/etc/init.d/odhcpd stop   

	# Disable SNMP MIBs on the switch
	swconfig dev switch0 set ar8xxx_mib_poll_interval 0

	# wifi0 - 5GHz                    
	echo 2 > /proc/irq/28/smp_affinity
	# wifi1 - 2GHz                    
	echo 1 > /proc/irq/29/smp_affinity
	# eth0 - WAN                      
	echo 1 > /proc/irq/31/smp_affinity
	# eth1 - LAN                      
	echo 2 > /proc/irq/32/smp_affinity

	# USB1 & USB2
	echo 2 > /proc/irq/105/smp_affinity
	echo 2 > /proc/irq/106/smp_affinity

	# These get lost after making any change via LuCI, so a reboot is required after every change
	ethtool -C eth0 tx-usecs 0
	ethtool -C eth1 tx-usecs 0 
	ethtool -C eth0 rx-usecs 31
	ethtool -C eth1 rx-usecs 31

	/etc/init.d/uhttpd restart                              
  
# There is no need for collectd to run above nice == 19
if [ ! `grep "NICEPRIO=19" /etc/init.d/collectd` ]; then
  sed -i 's/^NICEPRIO.*/NICEPRIO=19/g' /etc/init.d/collectd
  # Restart does not pick up the above change right away
  (sleep 300 ; /etc/init.d/collectd stop; sleep 15; /etc/init.d/collectd start) &
fi

# There is no need for uhttpd to run above nice == 19
if [ ! `grep "nice -n 19" /etc/init.d/uhttpd` ]; then
  sed -i "s/procd_set_param command/procd_set_param command nice -n 19/g" /etc/init.d/uhttpd
  # Restart does not pick up the above change right away
  (sleep 300 ; /etc/init.d/uhttpd stop; sleep 15; /etc/init.d/uhttpd start) &
fi

So far option with governor configured for performance seams to be in par with KONG changes (based on qualitative assessment). Does it make sense to combine those two? Any other ideas?

Are these performance tweaks included in newer firmwares or do we still need to perform the changes manually??

AFAIK those aren't included. Feel free to test them.
Please note due to different IRQ mapping some won't work with 4.19 kernel.

I just upgraded my home ISP (Frontier FiOS) from 100 up / 100 down to 1GB up and 1GB down. And it appears that my R7800 running "OpenWrt 18.06.4 r7808-ef686b7292" is not able to handle such high speeds as a typical speedtest.net tops off at about 500 -+ up and 500 -+ down regardless of qos state (for the most part). Only tested Wired/CAT6 connections as my file and web server are on a wired cat6 connection.

Is this typical of this router? Any different firmware version / tweaks that would help me get closer to 1gb?

Did you try the offload feature on the firewall page?

The ZyXEL Armor Z2/ NBG6817 is basically the same hardware, so the same applies here:

Only using the OEM firmware (which can offload large parts of the routing in hardware (well, proprietary firmware) to the NSS/ NPU cores).

If you're dealing with this kind of WAN throughput, you're looking for mvebu or x86_64 instead, the closer you get towards 1 GBit/s (and beyond), the stronger the pendulum goes towards x86_64.

I ended up putting the below into my rc.local file and that did help a lot. Now I get about 500 up/down'ish. Still a far cry from 1gb. I also enabled software offload, but not sure if that helped much. either way the most I can get from a basic speed test is 500-+ up/down

echo 35 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
echo performance > /sys/devices/system/cpu/cpufreq/policy1/scaling_governor

echo 800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 800000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
sleep 1
echo 1750000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1750000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq

ethtool -C eth0 tx-usecs 0
ethtool -C eth1 tx-usecs 0
ethtool -C eth0 rx-usecs 31
ethtool -C eth1 rx-usecs 31

1 Like