Netgear R7800 exploration (IPQ8065, QCA9984)

Aggressive ondemand settings are similar to the performance governor.

Performance governor is at max cpu speed all the time so there might be a use scenario that you don’t want your router to skip a beat.

I’ve run both. Recently I am running the performance governor.

I think about it the other way: why not use the perf governor all the time, what is the harm? In my experience it adds about a 2C temperature increase which is just noise really and some will say it uses more power, but then again just noise in the grand scheme of things of how much power you use in a household (there may be use cases where low power use is a requirement but those I don't think are common).

In the past there were issues with ondemand switching frequencies at low freq but I think those are fixed now. That's why you'd see people recommend using 800MHz as the lowest freq with ondemand.

I experimented with ondemand, schedutil and performance governor and chose to always use performance for ... max performance.

This is for people building for R7800 by themselves:

Looks like commit fa731838 in master causes trouble for R7800.

PC does not get DHCP address via fixed line. But wifi works ok and PC gets IP.
But nothing obvious seems to be amiss.
Router seems to boot ok, ifconfig shows sensible stuff at the first glance, all settings were restored & etc.

I reverted that for my master-r16678-cde31976e3-20210507 build, which made R7800 to work again.

EDIT:
fixed with

1 Like

A bit of a rant, but still. Made a build using current 21.02 branch and ath10k-ct-htt, no SQM and other bells-and-whistles.

Default performance of both NAT and WiFi is just horrible, one CPU core at 100% another is sitting idle. I had to manually:

  1. install and enable irqbalance
  2. enable packet steering
  3. enable NAT software offload
  4. ramp up ondemand governor settings.

Only then NAT performance could barely cap my 500Mbit/s ISP link.
WiFi reaches around 550-600Mbit/sec of TCP iperf3 on 867 PHY rate.

I mean its 2021, IRQ issues on multi-core Openwrt platforms have been known for years, as well as (partial) solutions to them. Why, oh why can't these mitigations be enabled by default to reach majority of OpenWrt users and give them at least average performace, instead of horrible one?

Even then, without SQM R7800 should run circles and NAT gigabit line without breaking a sweat, at 50% load or so.

Little MIPS CPU on TPLink WDR4300 could software fastpath NAT at 700Mbit/s in OpenWRT years and years ago, where is a bottleneck now?

2 Likes

Eagerly awaiting your patches, quick, I'm holding my breath.

--
The performance limits and pecularities about ipq806x vs NSS/ NPU offloading are well documented by now, please search for them.
Furthermore, don't compare apples to kiwis - hardware performance and out-of-tree offloading are not a fair comparison. Without SFE, the tl-wdr4300 barely achieves 150 MBit/s, while ipq8065 does 500 MBit/s, at its limits, but still.

5 Likes

Software offloading won’t get you to gig NAT. Try out a NSS build. Right out of the box the CPU barely breaks a sweat with gig NAT. :sunglasses:

Short answer: Qualcomm.

Long answer: There are still some patches that can improve performance are still sitting in Qualcomm's codeaurora repo but they are now more focusing on ath11k.

Stock firmware can handle gigabit line is because it relies on NSS cores to handle NAT and WiFi.

But unfortunately NSS cores stuff is not upstreamable but stuff like ath10k's encap/decap offload are upstreamable because they did send it upstream but abandoned it after a few revision.

actually the encap/decap offload are now merged upstream as ath11k use them... just the ath10k lacks support for it (but the mac driver has support for it... really minimal changes needs to be done)... but more important is... DOES THE FIRMWARE SUPPORT THESE FEATURE?

Also the original firmware doesn't have shortcut about nss offload... only part of it so what really limits wifi perf is the fact that the main core are used for both wifi processing and packet processing...

@ACwifidude if you have some time to waste, can you consider doing some test with wifi encap/decap offload and if they actually make some difference to wifi perf?

I’m running @castiel652 encap offloading patch (the ct patch is not working currently). I have both ath10k and ath10k-ct builds.

I’ve seen a small performance improvement when they are working. If you have other offloading patches I’ll gladly try them out.

All the patches I’m running:

Can you measure the difference without any nss patch related. I suspect the ct build won't work as the firmware is not compiled with that feature enabled.

Upstream ath11k only has encap offload.
They did sent the ath11k and mac80211 decap offload patch but didn't get it accept
Felix later made another implementation of mac80211 decap offload which got merged.

I think they made encap/decap offload in driver instead of firmware.
I have a more updated patch (encap offload only) in my repo if you want to try it encap offload
Tested without the NSS core I can see a small improvement with iPerf3 test.

ct driver firmware doesn't work with encap offload because of the modifications Ben did to the driver.

(just a remainder, main problem of this target is lack of qcom trying to merge patch upstream... or remove part of them as the series gets approved. So we can't really complain if this target has performance problem as we keep using qcom workaround and never try to push a solution... As @slh said make a complain and provide a patch or at least some details about the problem and if you actually find a solution so someone else can propose or improve that)

One example...
the nss series can't be pushed upstream BUT all the tweaks and the missing code to the clk and all can and maintain less patch is always good.
Currently for example the 5.10 patch we have in openwrt master has all the needed changes EXCEPT the nss scaling clk and handling that honestly looks like an hack and should be in a dedicated driver.
(when I have some time I will try to address that and the wifi offload, but for now i'm busy pushing qca8k patches)

6 Likes

Do you happen to have that comparison of master vs your offload branch handy or could you replicate it quickly? :grin:

I can build a copy but it’ll take a minute.

@Ansuel - comments from the upstream work claim up to 20% improvement. Personally I’ve seen more like up to 5-10%. I’d love to see if this type of offloading could be permanently merged down the road.

if done right ath10k can't ignore a patch that improve 5-10% perf without major changes...
The critical part is doing this in the most standard and clean way.

Think some good bench with real difference would also help in take some attention from the reviewers

With all due respect, but thats just not true, R7800 most certanly can push 900+ Mbit/s even in pure software mode, without any out-of-openwrt-tree drivers and patches, without any extra compiler optimizations.
Being exploration thread, here are the iperf3 results, made a few minutes ago.
Pumping 5 TCP threads through NAT. Freshly built current 21.02 branch, nothing else:

iperf3 -c 1gbit_server_behind_nat -t 60 -P 5 -R

latest 21.02 branch, 2021-05-10
===============

# all off
[SUM]   0.00-60.00  sec  2.75 GBytes   394 Mbits/sec                  receiver

# flow_offloading
[SUM]   0.00-60.00  sec  3.67 GBytes   525 Mbits/sec                  receiver

# packet_steering
[SUM]   0.00-60.00  sec  3.95 GBytes   565 Mbits/sec                  receiver

# irqbalance
[SUM]   0.00-60.00  sec  5.40 GBytes   773 Mbits/sec                  receiver

# packet_steering + irqbalance
[SUM]   0.00-60.00  sec  5.17 GBytes   740 Mbits/sec                  receiver

# packet_steering + flow_offloading
[SUM]   0.00-60.00  sec  5.42 GBytes   776 Mbits/sec                  receiver

# irqbalance + flow_offloading
[SUM]   0.00-60.00  sec  5.90 GBytes   845 Mbits/sec                  receiver

# irqbalance + flow_offloading + packet_steering
[SUM]   0.00-60.00  sec  6.46 GBytes   924 Mbits/sec                  receiver

Current default 21.02 configuration gives user extremely poor experience, no load distribution between cores at all. Majority of users won't bother to investigate, will complain and switch to something else.

Right, I will gladly make PR, installing and enabling irqbalance for ipq806x and ipq40xx targets. But it won't get accepted, right?

3 Likes

Don't have the result right now and I am working on something else.

I know that synthetic iperf3 benchmarks can achieve those results, but they don't really agree with real-world usage.

Disclaimer: I do own/ use a nbg6817, which -in performance terms- is close to identical to the r7800, I have irqbalance enabled and also tested (and currently use-) software flow-offloading, but I do see the nbg6817 getting close to the ceiling (attaining 90-100% core load for times) on my 430/220 MBit/s ftth connection (plain ethernet/ dhcp, no PPPoE, no SQM). It's still fine to cope with this speed (as long as I stay away from SQM) - and I do concur with the gospel others with faster internet connections have confirmed that it can do around 500 MBit/s without flow-offloading and maybe up to 650-660 MBit/s with flow-offloading, as that matches my observations. I do not agree that it could do 1 GBit/s line-speed (~931 MBit/s) without NSS offloading, not in the field with NAT, firewalling, routing, concurrent traffic patterns to different targets - and even less once WLAN enters the picture[0] (which would be part of its expected duties).

--
[0] by that I mean WLAN active and used, not benchmarking wirelessly over the air.

1 Like

In the spirit of exploration, built and tested current 19.07 branch, shorter test, but still.

iperf3 -c 1gbit_server_behind_nat -t 60 -P 5 -R

latest 19.07 branch, 2021-05-10
===============

# all off
[SUM]   0.00-60.00  sec  4.47 GBytes   640 Mbits/sec                  receiver

# flow offloading
[SUM]   0.00-60.00  sec  6.28 GBytes   899 Mbits/sec                  receiver

So there coming from 19.07 to 21.02 there is like 40% NAT performance drop in default OpenWRT configuration. On more or less flagship router platform. Thats what people see, good for PR, right?

Digging a bit deeper, I'd say the most probable suspect is this commit: https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=d3868f15f876507db54afacdef22a7059011a54e
Packet steering was enabled by default in 19.07 and then disabled, without providing sensible default alternative, even though commit message says "hey, we should probably use irqbalance instead"

4 Likes

@Ansuel

Using an iphone 7 (80mhz 2x2) and a hardwired computer as the server. No NSS. Made two ath10k builds (with encap offloading and without) and used @hnyman build for the -ct build. All builds had matching driver and firmware (no mixing). Performance governor, irqbalance enabled, and global packet steering enabled. iperf3 settings = 5 streams & 30 seconds. Using @castiel652 encap offloading patch for ath10k. Encap offloading looks good!

ath10k with encap offloading:
[SUM]   0.00-30.01  sec  1.63 GBytes   468 Mbits/sec                  receiver
[SUM]   0.00-30.01  sec  1.50 GBytes   428 Mbits/sec  1717             sender

ath10k
[SUM]   0.00-30.01  sec  1.60 GBytes   459 Mbits/sec                  receiver
[SUM]   0.00-30.01  sec  1.14 GBytes   326 Mbits/sec  699             sender

ath10k-ct
[SUM]   0.00-30.01  sec  1.53 GBytes   437 Mbits/sec                  receiver
[SUM]   0.00-30.01  sec  1.21 GBytes   347 Mbits/sec  763             sender
1 Like

the ath10k driver is matched with ath10k firmware or the ct variant?