I am trying to wring maximum WiFi performance from my fleet of Archer C7 routers (configured as dumb AP's) running OpenWrt 19.07.1 r1091. I am using 80MHz wide channels @ 5GHz and clients are connected by highest speed, using proper 866.7 Mbit/s, 80MHz, VHT-MCS 9, VHT-NSS 2, Short GI.
However, I am never able to pass 330Mbit in iperf3 (measured to wired PC. I tried both directions, multiple streams etc.)
When I run top in router itself I can see that my sirq is hovering around 99% during transfer. Is my WiFi throughput capped by CPU or is there something else I can do to gain maximal performance (I am fully aware that it will be less than PHY 866Mbit)?
P.S. I have just replaced ath10k-firmware-qca988x-ct with ath10k-firmware-qca988x and throughput rose from ~320Mbit to ~370Mbit with sirq still pegged @ 99%.
Iperf with the archer c7 or is the archer c7 in between two computers (the second better represents reality because the archer c7 is not generating traffic, more routing traffic in real life)?
This is LAN-only bandwidth measured between iPhone 8+ running iperf3 server and Ethernet connected PC running iperf3 client (I tested other way around as well). There is no NAT/routing involved, C7 only acts as dumb AP. speedtest gives roughly same speed (I have Gbit connection and wired x86 router, so it is basically 802.11ac that is the bottleneck here (1:st world problem, I know)
I upgraded from ancient netgear r6250’s for APs about 6 months ago (3x3 first gen 802.11ac) to r7800s when they were on sale. Gained on average 100-200mbps max wifi throughput and 200mbps wired line speed for gig wan. On sale it was worth it for me.
Wired line speed (using my C7 as combined edge-switch/AP) is close to 1Gbit, so that works for me. It is just 802.11ac that is pegging sirq so I cannot fully utilize 80MHz channel. Otherwise, everything works fine
OK, I will try to overclock one of my C7's to 1GHz and test throughput. For science!
BTW, looking at benchmarks on smallnetbuilder, it seems that actual useful data rate on majority of home routers on 802.11ac @ 886Mbit PHY tops around 400Mbit, so I guess it is not bad after all.
First of all, set the performance governor. You should be able to do this. The ondemand one will create latency as the cpu needs to spool up to max. This script will let you set the governor or show you how to set it, whatever you choose. Unless you're using bash the script will not work as it uses a bash-specific regular expression comparison.
#!/bin/bash
CPUS=$(grep -c ^processor /proc/cpuinfo)
AVAILABLE_GOVERNORS=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors)
show_governor()
{
echo "available governors: $AVAILABLE_GOVERNORS"
for i in $(seq 0 $(expr $CPUS - 1)); do
echo "cpu$i: $(cat /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor)"
done
}
case "$1" in
"")
show_governor
exit 0;;
*)
;;
esac
if [[ "$AVAILABLE_GOVERNORS" =~ "$1" ]]; then
for i in $(seq 0 $(expr $CPUS - 1)); do
echo $1 > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor
done
else
echo "$(basename $0): unknown governor $1"
echo "supply valid governor type or no arguments to show current available and current governors"
exit 1
fi
Make sure you get the iperf direction right. To simulate download with the iperf server on the lan side and the client on the wlan side, the client needs the -R flag, otherwise the client sends data to the server (upload effectively) and then you're getting contention on the medium with the other clients. With the -R flag the server will send data to the wireless client and it's not contending for the medium with anything else, since the openwrt server itself schedules the download.
The sirqs are almost certainly being generated by network TX or RX. You can see by
cat /proc/sofitirqs
Do it before and after a iperf run, copy/paste the two outputs into an excel sheet using the text import paste function and check the difference to see where they're being generated. Depending on the direction of your iperf transfer most of the softirqs will likely either be in NET_TX or NET_RX.
Once you've got the iperf direction flow right and you've checked the softirqs, you may get some performance enhancement if you tune the smp_affinity of the irqs on your network adapters. For example, if it is the NET_RX ones doing the interrupts, put half of the network adapter hardware irqs on one core and half on the other core to spread the load a bit - the kernel will likely do the softirq on the same core as the initial hardware irq for that flow.
You can see the hardware irqs in /proc/interrupts and you pin them to a core by echoing a hex mask to /proc/irq/<irq>/smp_affinity, where in a two core system the binary mask 11 or hex 3 means use both cores, 1 means use core 0 and 2 means use core 1. On an untuned system, you'll likely see most network card irqs cluster on core 0.
Are you using sqm scripts or some kind of qos? If so, this will definitely generate a whole lot more softirqs and you should turn it off.
EDIT: you're unlikely, even in a cpu-unbounded state, to get wireless performance much above 500mbps on a 2x2 mimo client, no matter whether you use 80Mhz channels or not....
Most of my sirq's are NET_RX so I am likely fully CPU-bound. No sqm is used, this is just an dumb Access Point.
Regarding tuning of core affinity: this is Archer C7. It is powered by single-core QCA9558 @ 720 MHz so there is not much to tune. There is no "scaling_governor" in /sys/devices/system/cpu/cpu0/cpufreq/ as there is only one core.
OK, I have just overclocked it to 1GHz and it hit 511 Mbit/sec in iperf3 at roughly 90% sirq. So it seems that around 920MHz overclock is a sweet spot where CPU is no longer limiting factor (at least for 2x2 866 PHY ac). Everything above that will not make WiFi go faster.
That is 62MB/sec true wireless transfer rate on 7 years old router costing 40$. Not bad.
Also, make sure to configure DDR CAS to 5 or router refuses to boot. I had it up to 1GHz w/o any issues but took it down to 920MHz in order to protect the CPU.
Next step is to find true 3x3 802.11ac client so I can check whether 1200Mbit PHY is also capped by CPU. If not, I would not need WiFi 6 for a long long time
Very interesting... I have had a C7 as my main router/wifi, now it's doing AP duty with an x86 router box. It used to be the cpu/idle time ran out due to SQM load, around 100-140mbit, that was my issue on a 300mbit link. I thought I could do better than that w/o SQM, but was limited by my basic ISP speed which would cap out at 300-350. Router easily handled that over the wire, but seemed to run into a wall at 270-280mbit on 5ghz. Im now seeing that again, idle 0%, sirq 95% over the 5ghz radio. BUT... I didn't see it at first since I was running at 40mhz not 80mhz! Set to 40mhz bandwidth, I see a speed of 240-250mbit, but still 20% idle and 80% sirq or better. Hmmm...
The interesting note is that others and myself have been seeing a stopping of the wifi, usually the 2.4ghz radio, with high traffic. Been hard to figure out, seems dependent on heavy traffic, also seems dependent on the ath10k (5ghz) firmware/driver somewhat, though that didn't seem to make sense. Wondering if its some kind of driver lockup provoked by running out of system resources? Have to pass this idea along...
Yes, I experienced it to. I tried everything and my current conclusion is that Archer C7 2.4GHz is broken in OpenWRT (due to bug in driver blob?) and should not be used. I tried swapping out ath10k drivers and it did not work (quite plausible as ath10k is 5GHz driver).
Yes, using stock clock you are CPU-bound ~330MBit with 80MHz channel on 5GHz. Actual maximum seems to be ~520Mbit (using WPA2 PSK CCMP, maybe more with less encryption).