Netgear R7800 exploration (IPQ8065, QCA9984)

Software offloading won’t get you to gig NAT. Try out a NSS build. Right out of the box the CPU barely breaks a sweat with gig NAT. :sunglasses:

Short answer: Qualcomm.

Long answer: There are still some patches that can improve performance are still sitting in Qualcomm's codeaurora repo but they are now more focusing on ath11k.

Stock firmware can handle gigabit line is because it relies on NSS cores to handle NAT and WiFi.

But unfortunately NSS cores stuff is not upstreamable but stuff like ath10k's encap/decap offload are upstreamable because they did send it upstream but abandoned it after a few revision.

actually the encap/decap offload are now merged upstream as ath11k use them... just the ath10k lacks support for it (but the mac driver has support for it... really minimal changes needs to be done)... but more important is... DOES THE FIRMWARE SUPPORT THESE FEATURE?

Also the original firmware doesn't have shortcut about nss offload... only part of it so what really limits wifi perf is the fact that the main core are used for both wifi processing and packet processing...

@ACwifidude if you have some time to waste, can you consider doing some test with wifi encap/decap offload and if they actually make some difference to wifi perf?

I’m running @castiel652 encap offloading patch (the ct patch is not working currently). I have both ath10k and ath10k-ct builds.

I’ve seen a small performance improvement when they are working. If you have other offloading patches I’ll gladly try them out.

All the patches I’m running:

Can you measure the difference without any nss patch related. I suspect the ct build won't work as the firmware is not compiled with that feature enabled.

Upstream ath11k only has encap offload.
They did sent the ath11k and mac80211 decap offload patch but didn't get it accept
Felix later made another implementation of mac80211 decap offload which got merged.

I think they made encap/decap offload in driver instead of firmware.
I have a more updated patch (encap offload only) in my repo if you want to try it encap offload
Tested without the NSS core I can see a small improvement with iPerf3 test.

ct driver firmware doesn't work with encap offload because of the modifications Ben did to the driver.

(just a remainder, main problem of this target is lack of qcom trying to merge patch upstream... or remove part of them as the series gets approved. So we can't really complain if this target has performance problem as we keep using qcom workaround and never try to push a solution... As @slh said make a complain and provide a patch or at least some details about the problem and if you actually find a solution so someone else can propose or improve that)

One example...
the nss series can't be pushed upstream BUT all the tweaks and the missing code to the clk and all can and maintain less patch is always good.
Currently for example the 5.10 patch we have in openwrt master has all the needed changes EXCEPT the nss scaling clk and handling that honestly looks like an hack and should be in a dedicated driver.
(when I have some time I will try to address that and the wifi offload, but for now i'm busy pushing qca8k patches)

7 Likes

Do you happen to have that comparison of master vs your offload branch handy or could you replicate it quickly? :grin:

I can build a copy but it’ll take a minute.

@Ansuel - comments from the upstream work claim up to 20% improvement. Personally I’ve seen more like up to 5-10%. I’d love to see if this type of offloading could be permanently merged down the road.

if done right ath10k can't ignore a patch that improve 5-10% perf without major changes...
The critical part is doing this in the most standard and clean way.

Think some good bench with real difference would also help in take some attention from the reviewers

With all due respect, but thats just not true, R7800 most certanly can push 900+ Mbit/s even in pure software mode, without any out-of-openwrt-tree drivers and patches, without any extra compiler optimizations.
Being exploration thread, here are the iperf3 results, made a few minutes ago.
Pumping 5 TCP threads through NAT. Freshly built current 21.02 branch, nothing else:

iperf3 -c 1gbit_server_behind_nat -t 60 -P 5 -R

latest 21.02 branch, 2021-05-10
===============

# all off
[SUM]   0.00-60.00  sec  2.75 GBytes   394 Mbits/sec                  receiver

# flow_offloading
[SUM]   0.00-60.00  sec  3.67 GBytes   525 Mbits/sec                  receiver

# packet_steering
[SUM]   0.00-60.00  sec  3.95 GBytes   565 Mbits/sec                  receiver

# irqbalance
[SUM]   0.00-60.00  sec  5.40 GBytes   773 Mbits/sec                  receiver

# packet_steering + irqbalance
[SUM]   0.00-60.00  sec  5.17 GBytes   740 Mbits/sec                  receiver

# packet_steering + flow_offloading
[SUM]   0.00-60.00  sec  5.42 GBytes   776 Mbits/sec                  receiver

# irqbalance + flow_offloading
[SUM]   0.00-60.00  sec  5.90 GBytes   845 Mbits/sec                  receiver

# irqbalance + flow_offloading + packet_steering
[SUM]   0.00-60.00  sec  6.46 GBytes   924 Mbits/sec                  receiver

Current default 21.02 configuration gives user extremely poor experience, no load distribution between cores at all. Majority of users won't bother to investigate, will complain and switch to something else.

Right, I will gladly make PR, installing and enabling irqbalance for ipq806x and ipq40xx targets. But it won't get accepted, right?

3 Likes

Don't have the result right now and I am working on something else.

I know that synthetic iperf3 benchmarks can achieve those results, but they don't really agree with real-world usage.

Disclaimer: I do own/ use a nbg6817, which -in performance terms- is close to identical to the r7800, I have irqbalance enabled and also tested (and currently use-) software flow-offloading, but I do see the nbg6817 getting close to the ceiling (attaining 90-100% core load for times) on my 430/220 MBit/s ftth connection (plain ethernet/ dhcp, no PPPoE, no SQM). It's still fine to cope with this speed (as long as I stay away from SQM) - and I do concur with the gospel others with faster internet connections have confirmed that it can do around 500 MBit/s without flow-offloading and maybe up to 650-660 MBit/s with flow-offloading, as that matches my observations. I do not agree that it could do 1 GBit/s line-speed (~931 MBit/s) without NSS offloading, not in the field with NAT, firewalling, routing, concurrent traffic patterns to different targets - and even less once WLAN enters the picture[0] (which would be part of its expected duties).

--
[0] by that I mean WLAN active and used, not benchmarking wirelessly over the air.

1 Like

In the spirit of exploration, built and tested current 19.07 branch, shorter test, but still.

iperf3 -c 1gbit_server_behind_nat -t 60 -P 5 -R

latest 19.07 branch, 2021-05-10
===============

# all off
[SUM]   0.00-60.00  sec  4.47 GBytes   640 Mbits/sec                  receiver

# flow offloading
[SUM]   0.00-60.00  sec  6.28 GBytes   899 Mbits/sec                  receiver

So there coming from 19.07 to 21.02 there is like 40% NAT performance drop in default OpenWRT configuration. On more or less flagship router platform. Thats what people see, good for PR, right?

Digging a bit deeper, I'd say the most probable suspect is this commit: https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=d3868f15f876507db54afacdef22a7059011a54e
Packet steering was enabled by default in 19.07 and then disabled, without providing sensible default alternative, even though commit message says "hey, we should probably use irqbalance instead"

4 Likes

@Ansuel

Using an iphone 7 (80mhz 2x2) and a hardwired computer as the server. No NSS. Made two ath10k builds (with encap offloading and without) and used @hnyman build for the -ct build. All builds had matching driver and firmware (no mixing). Performance governor, irqbalance enabled, and global packet steering enabled. iperf3 settings = 5 streams & 30 seconds. Using @castiel652 encap offloading patch for ath10k. Encap offloading looks good!

ath10k with encap offloading:
[SUM]   0.00-30.01  sec  1.63 GBytes   468 Mbits/sec                  receiver
[SUM]   0.00-30.01  sec  1.50 GBytes   428 Mbits/sec  1717             sender

ath10k
[SUM]   0.00-30.01  sec  1.60 GBytes   459 Mbits/sec                  receiver
[SUM]   0.00-30.01  sec  1.14 GBytes   326 Mbits/sec  699             sender

ath10k-ct
[SUM]   0.00-30.01  sec  1.53 GBytes   437 Mbits/sec                  receiver
[SUM]   0.00-30.01  sec  1.21 GBytes   347 Mbits/sec  763             sender
1 Like

the ath10k driver is matched with ath10k firmware or the ct variant?

matched driver and firmware for each run. All runs using the latest encap offloading patch

Here is what it looks like with matched ath10k + NSS (I think I have the old ath10k encap offloading patch on my NSS build so after I rebuild with the newer patch I might get more):

[SUM]   0.00-30.01  sec  1.65 GBytes   473 Mbits/sec                  receiver
[SUM]   0.00-30.01  sec  1.81 GBytes   517 Mbits/sec  784             sender


funny how the nss handling gives more or less 100 mb of extra band.
Did we ever tried if nss core + wifi offload works (i mean without the gmac custom driver?)

Wifi use the virtual interface feature so I wonder if it does actually work without the gmac offload part

(I think nss firmware + nss core + wifi patch can be part of openwrt if they improve the wifi speed with 100mb)

1 Like

Haven't tried that before. Let me know if you want to simplify or improve the patches to make it more friendly for mainstream integration. I'll gladly tweak and test.

The NSS driver is dependent on the NSS gmac driver. Splitting them them looks interesting. Difficult or doable?

actually nope... the nss core (the drv package) can be loaded without the gmac package... what we need to understand is if it can work without it. (if the gmac part is mandatory or the nss firmware can offload for example only wifi packet)

From the kmod part... they can be loaded separately in fact you load nss-dr first and the nss-gmac withour a problem

A quick test would be remove the gmac driver and check if the wifi offload works (but you need to enable the old gmac binding to make the master driver work)