Netgear R7800 exploration (IPQ8065, QCA9984)

BTW i have irqbalance installed in both units r6334 and r3498. I'll try to see if irqbalance is a factor to take into account, but in the first test with the r3498 yielded the same results.

Just want to add a data point -

I have the "newer" version of r7800 - the one with numbers etched on the antenna.

I tested with 4 concurrent ping sessions for ~15mins, and experienced no latency spike issues.

My r7800 is used in wireless client mode with masqurade and most firewall disabled.

I am using this build:
[ 0.000000] Linux version 4.9.82 (perus@ub1710) (gcc version 5.5.0 (OpenWrt GCC 5.5.0 r5953-d58c8f4029) ) #0 SMP Sat Mar 3 08:02:51 2018

@hnyman I am using your build
motd reports
OpenWrt SNAPSHOT, r6365-45fdb12258

dmesg 2nd line reports (gcc version 5.5.0 (OpenWrt GCC 5.5.0 r5953-d58c8f4029) ) #0 SMP Sat Mar 3 08:02:51 2018.

That could be the difference?. Mine is used as main router with NAT and firewall enabled.

i think i need to make more tests an try to isolate ...

I think that @dissent1 got some weird results with irqbalance, so it might make sense to either adjust IRQs manually, or to run "irqbalance oneshot mode" with option "-o", where irqbalanacve runs once and then exits.

I tried irqbalance and it made no difference. The only thing that did help to some degree was to isolate cpus: one for network interrupts and the other one for everything else.

Yep, with or without irqbalance the results for me are the same though, one curious thing, if i ping the wan interface from and to mi public IP, there are no spikes, but if i ping internally, between my wired lan clients, the spikes become noticeable. It can go from 0,8ms to 43ms.

About irqbalance and oneshot option, nice point @hnyman i've noticed that eth0 is assigned to core1 and eth1 to core2, wifi0 to core2 and wifi1 to core1. If i remember correctly dissent pointed out that both eth should be assigned to the same core ... but i will test with this config to see if there is some performance increase noticeable, due to the cache misses that @dissent1 pointed out could occur if irqbalance was always "on".

BTW if anybody wants to assign the irqs manually i've adapted @dissent1 script to the R7800 board. It can be executed after the irqbalance --oneshot option if the irqs are not balanced as expected.

#!/bin/sh /etc/rc.common
# First start irqbalance with the --oneshot option
# Try to balance manually both eth to core2 and wifi0 to core2 ifthey are not balanced correctly
# Startup command for openwrt/lede
# /usr/sbin/irqbalance --oneshot --debug > /var/log/irqbalance.log

START=99

set_irq_affinity() {
	local name="$1"
	local val="$2"
  
case "$name" in
wifi0)
  	local irq_wifi0=`grep -E -m1 'qcom-pcie-msi' /proc/interrupts | cut -d: -f1 | tail -n1 | tr -d ' '`
	[ -n "$irq_wifi0" ] || echo "$name irq not found."
	echo "$val" > "/proc/irq/$irq_wifi0/smp_affinity"
	;;
wifi1)
  	local irq_wifi1=`grep -E -m2 'qcom-pcie-msi' /proc/interrupts | cut -d: -f1 | tail -n1 | tr -d ' '`
	[ -n "$irq_wifi1" ] || echo "$name irq not found."
	echo "$val" > "/proc/irq/$irq_wifi1/smp_affinity"
	;;
eth0)
  	local irq_eth0=`grep -E -m3 'eth0' /proc/interrupts | cut -d: -f1 | tail -n1 | tr -d ' '`
	[ -n "$irq_wifi1" ] || echo "$name irq not found."
	echo "$val" > "/proc/irq/$irq_eth0/smp_affinity"
	;;
eth1)
  	local irq_eth1=`grep -E -m3 'eth1' /proc/interrupts | cut -d: -f1 | tail -n1 | tr -d ' '`
	[ -n "$irq_wifi1" ] || echo "$name irq not found."
	echo "$val" > "/proc/irq/$irq_eth1/smp_affinity"
	;;
*)
  	local irq=`grep -m 1 "$name" /proc/interrupts | cut -d: -f1 | sed 's, *,,'`
	[ -n "$irq" ] || echo "$name irq not found."
	echo "$val" > "/proc/irq/$irq/smp_affinity"
	;;
esac
}

start() {

. /lib/functions.sh

    set_irq_affinity eth0 2
	set_irq_affinity eth1 2
	set_irq_affinity wifi0 2

}

@luarane

Trying to compile your 4.14 branch for the C2600, it's failing to create the factory image with "os-image partition too big (more than 2097152 bytes): Undefined error: 0"

Any ideas?

Relevant output: https://pastebin.com/9gG97ins

Do you want to try my recipe (Netgear R7800 exploration (IPQ8065, QCA9984) - #903 by fantom-x) ? It has helped me drop the spikes form 100ms down to around 20ms?

1 Like

@dissent1's idea to move the network IRQ's to CPU1 was a right one, but not complete without making sure that nothing else can hog CPU1 and by doing so delay the network interrupts processing. Isolating CPU1 for the exclusive use by eth0, eth1, and wifi0 and also making collectd, nlbwmon, and uhttpd run nicer (with nice -n 19) makes things so much better. I have been testing the latency for the last several hours during the daily peak usage and below is what I am getting now (my best ping is around 11ms and I ignore everything below 20ms). Not perfect, but much more usable. Have not tried VoIP yet, but the online games got better (comparing to 50..100 ms pings several times a minute before the change).

2018-03-08 18:59:14 PING 8.8.8.8 (8.8.8.8): 56 data bytes
2018-03-08 19:01:03 64 bytes from 8.8.8.8: icmp_seq=108 ttl=60 time=20.031 ms
2018-03-08 19:01:45 64 bytes from 8.8.8.8: icmp_seq=150 ttl=60 time=21.197 ms
2018-03-08 19:03:21 64 bytes from 8.8.8.8: icmp_seq=246 ttl=60 time=21.607 ms
2018-03-08 19:04:22 64 bytes from 8.8.8.8: icmp_seq=307 ttl=60 time=21.162 ms
2018-03-08 19:05:43 64 bytes from 8.8.8.8: icmp_seq=388 ttl=60 time=21.669 ms
2018-03-08 19:05:53 64 bytes from 8.8.8.8: icmp_seq=398 ttl=60 time=20.957 ms
2018-03-08 19:06:51 64 bytes from 8.8.8.8: icmp_seq=456 ttl=60 time=20.281 ms
2018-03-08 19:07:23 64 bytes from 8.8.8.8: icmp_seq=488 ttl=60 time=21.706 ms
2018-03-08 19:11:42 64 bytes from 8.8.8.8: icmp_seq=746 ttl=60 time=21.359 ms
2018-03-08 19:13:08 64 bytes from 8.8.8.8: icmp_seq=832 ttl=60 time=23.206 ms
2018-03-08 19:13:09 64 bytes from 8.8.8.8: icmp_seq=833 ttl=60 time=20.857 ms
2018-03-08 19:13:36 64 bytes from 8.8.8.8: icmp_seq=860 ttl=60 time=21.929 ms
2018-03-08 19:15:08 64 bytes from 8.8.8.8: icmp_seq=952 ttl=60 time=25.217 ms
2018-03-08 19:15:35 64 bytes from 8.8.8.8: icmp_seq=979 ttl=60 time=22.741 ms
2018-03-08 19:15:55 
2018-03-08 19:15:55 --- 8.8.8.8 ping statistics ---
2018-03-08 19:15:55 1000 packets transmitted, 1000 packets received, 0.0% packet loss
2018-03-08 19:15:55 round-trip min/avg/max/stddev = 10.824/11.858/25.217/1.462 ms
1 Like

Will try this new approach and rebalance IRQs with a manual script.

It's the same problem I've observed for the other boards (d7800, r7500, r7500v2, r7800, vr2600v).
I missed yours, because KERNEL_SIZE variable is not declared for your board, and the utility tplink-safeloader is doing the size check and throwing the error (it's on your log).

Let's hope all those problems will go away once the target split is done (ipq40xx and ipq806x), which should reduce the size of the ipq806x kernel.

btw, last night I rebased my ipq806x-k4.14 branch against OpenWRT master, and is now running on my Asus RT-AC58U, but I think the size problem remains for the other boards.

I don't know if there is a timeline for the split of the target, or if it will be split before the branch of the 18.0x release, maybe @mkresin or @blogic could answer that.

i am planning to push the ipq40xx split patch next week. this will move ipq40xx to v4.14
i have no plans o invest any time into ipq806x in the near future and it will remain on v4.9 for the time being.

1 Like

Hi, rookie here, my ISP is using VLAN 500, and I'm currently using just a VLAN which is VLAN 500, CPU(0)(1) LAN 1 2 3 4 all untagged and WAN Tagged, and I found my ping time is better for about 2ms than originally 2 VLANs which put CPU(1) to LAN 4 (all untagged) as VLAN 1 and CPU(0) untagged WAN tagged as VLAN 500.

Will it cause any problems? and wonder why would they split it into 2 VLANs from the beginning

@fantom-x
Good job on solving the latency issue!
Would you mind sharing your "recipe" on how to get the latency-spikes down?
So us noobs can benefit as well? :blush:

Well, the key is to compile your own firmware with custom kernel boot parameter isolcpus=1. That takes CPU1 out and the scheduler no longer uses it. Then use a script a few posts above to move network IRQ’s to CPU1. I also lower priority for things like collectd, nlbwmon, uhttpd, etc which do not have any business running with default priority.
In the end CPU1 is used exclusively for the network interrupts while CPU0 is running everting else. I have not seen ether overloaded yet.
The most difficult step is to compile your firmware. I can share mine if that helps.

2 Likes

Thank you for you summary! I know how to create my own image, thanks to @hnyman and @escalade. Also, I know the function of the set_cpu_affinity script.
Unfortunately, setting a custom kernel boot parameter and lowering the priority of collectd, nlbwmon, uhttpd is completely new to me. Can you please provide some details on how to dot this?

That will be complicated if you're new to build a firmware. First thing i'll try to start in this thread

hnyman made it easy to compile your own image but you also need to change a parameter (isolcpus) with make menuconfig command

To change the priority you need to use the nice command. I advise to change the service scripts in /etc/init.d/uhttpd , collectd, and nlbwmon

By memory:

make kernel_menuconfig

Then look for Boot Parameters and then there is an item to set kernel parameters. Type isolcpus=1 there and rebuild.

Once you install your image, run cat /proc/cmdline to see this new parameter.

Deal with IRQ’s now.

Then you need to edit some startup files under /etc/Init.d/ (collectd, nlbwmon, uhttpd) by adding nice -n 19 to the line that starts the processes. If you get in trouble there, I will post more details in a few hours once I get to my computer.

Great, thanks for the detailed explanation! With this extra info I will manage to create a functioning build.

Some more details a promised are below. The procedure is manual, so extra attention is warranted. Any screw-ups are not my fault.

  1. Start with @hnyman's build and only continue once you can built and deploy it.

  2. Add a custom kernel boot parameter either via make kernel_menuconfig / Boot options / Default kernel command string : isolcpus=1 or by modifying this config file by adding one line:

grep isolcpus target/linux/ipq806x/config-4.9 
CONFIG_CMDLINE="isolcpus=1"
  1. Build and deploy the image, then check that the new config is active:

cat /proc/cmdline
isolcpus=1

  1. Move wifi0, eth0, and eth1 to CPU1 and verify that it actually worked. I leave wifi1 on CPU0 and I do not care much about the 2.4GHz clients. Verify that the numbers in the CPU1 column are increasing.
cat /proc/interrupts | egrep "eth|qcom-pcie-msi|CPU0"
           CPU0       CPU1       
 97:       8296   57872293     GIC-0  67 Edge      qcom-pcie-msi
 98:   16322413          0     GIC-0  89 Edge      qcom-pcie-msi
100:       1069   26137176     GIC-0 255 Level     eth0
101:        511    9101453     GIC-0 258 Level     eth1
  1. Add nice -n 19 to the services that should be running in the background. They will be running on CPU0, but they have absolutely no business to run with default priority.
grep nice /etc/init.d/*
/etc/init.d/collectd:	procd_set_param command nice -n 19 /usr/sbin/collectd -f
/etc/init.d/nlbwmon:	procd_set_param command nice -n 19 "$PROG"
/etc/init.d/uhttpd:	procd_set_param command nice -n 19 "$UHTTPD_BIN" -f
  1. Restar these services or just reboot. Verify that the change took affect (look for SN; N means nice; or use htop that they running nicer):
ps -w | egrep "collectd|nlbwmon|uhttpd"
 1652 root      3256 SN   /usr/sbin/uhttpd
 1892 root      4104 SN   /usr/sbin/collectd -f
 2039 root      1460 SN   /usr/sbin/nlbwmon
  1. Stop all services that you do not use. Here is what I do, but you may have use for some of them.
/etc/init.d/etherwake disable
/etc/init.d/etherwake stop
/etc/init.d/miniupnpd disable
/etc/init.d/miniupnpd stop
/etc/init.d/odhcpd disable
/etc/init.d/odhcpd stop
/etc/init.d/vsftpd disable
/etc/init.d/vsftpd stop
  1. Reboot just in case

  2. Share your results. I am for one curious if this works for others.

  3. For the super adventurous among us, run the following lines and add them to /etc/rc.local. This CPU takes 100 us to switch frequencies, which is quite long.

echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor

2 Likes