Ipq806x NSS build (Netgear R7800 / TP-Link C2600 / Linksys EA8500)

All good. I'm on AEST. Enjoy dinner, your name looks very Spaniard to me, so I guess it's time for dinner there. :wink:

It doesn't look right to me, sadly. However, in this test scenario where the AX210 connects as a client to the WiFi it shouldn't matter, correct? It should be relying on the fq_codel implementation in the R7800.

on the rrul test
A) Total potential bandwidth is halved (but buffering not controlled by fq_codel stays the same). Both sides have buffering.

B) The solo download test from the r7800 was pretty good, perhaps it will get better with those patches.

Make sure you're using the 5.15-qsdk11-new-krait-cc branch

It incorporates @amteza 's patch.

@pattagghiu @asvio, if you're still getting pppoe compile errors, can you open an issue on my nss-packages repo? I'd like to keep this thread more focused on issues related to the functioning/stability of NSS drivers.

I'd like you to run:
make target/clean

and a verbose output of the error with the following:
make package/{qca-nss-drv,qca-nss-clients,qca-nss-ecm}/{clean,compile} V=sc -j$(nproc)

My compilation has just failed with these errors

ERROR: modpost: "nss_crypto_pm_notify_register" [/home/R7800-qosmio-5.15-qsdk11-new-krait-cc/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-crypto-2021-03-20-2271a3a/v1.0/src/qca-nss-crypto.ko] undefined!
ERROR: modpost: "nss_crypto_notify_register" [/home/R7800-qosmio-5.15-qsdk11-new-krait-cc/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-crypto-2021-03-20-2271a3a/v1.0/src/qca-nss-crypto.ko] undefined!
ERROR: modpost: "nss_crypto_pm_notify_unregister" [/home/R7800-qosmio-5.15-qsdk11-new-krait-cc/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-crypto-2021-03-20-2271a3a/v1.0/src/qca-nss-crypto.ko] undefined!
ERROR: modpost: "nss_crypto_tx_msg" [/home/R7800-qosmio-5.15-qsdk11-new-krait-cc/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-crypto-2021-03-20-2271a3a/v1.0/src/qca-nss-crypto.ko] undefined!
ERROR: modpost: "nss_crypto_data_register" [/home/R7800-qosmio-5.15-qsdk11-new-krait-cc/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-crypto-2021-03-20-2271a3a/v1.0/src/qca-nss-crypto.ko] undefined!
ERROR: modpost: "nss_crypto_tx_buf" [/home/R7800-qosmio-5.15-qsdk11-new-krait-cc/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-crypto-2021-03-20-2271a3a/v1.0/src/qca-nss-crypto.ko] undefined!
make[5]: *** [scripts/Makefile.modpost:133: /home/R7800-qosmio-5.15-qsdk11-new-krait-cc/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-crypto-2021-03-20-2271a3a/Module.symvers] Error 1
make[5]: *** Deleting file '/home/R7800-qosmio-5.15-qsdk11-new-krait-cc/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-crypto-2021-03-20-2271a3a/Module.symvers'
make[4]: *** [Makefile:1813: modules] Error 2
make[4]: Leaving directory '/home/R7800-qosmio-5.15-qsdk11-new-krait-cc/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/linux-5.15.68'
make[3]: *** [Makefile:79: /home/R7800-qosmio-5.15-qsdk11-new-krait-cc/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-crypto-2021-03-20-2271a3a/.built] Error 2
make[3]: Leaving directory '/home/R7800-qosmio-5.15-qsdk11-new-krait-cc/feeds/nss/qca-nss-crypto'
time: package/feeds/nss/qca-nss-crypto/compile#5.40#1.19#10.66
    ERROR: package/feeds/nss/qca-nss-crypto failed to build.
make[2]: *** [package/Makefile:116: package/feeds/nss/qca-nss-crypto/compile] Error 1
make[2]: Leaving directory '/home/R7800-qosmio-5.15-qsdk11-new-krait-cc'
make[1]: *** [package/Makefile:110: /home/R7800-qosmio-5.15-qsdk11-new-krait-cc/staging_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/stamp/.package_compile] Error 2
make[1]: Leaving directory '/home/R7800-qosmio-5.15-qsdk11-new-krait-cc'
make: *** [/home/R7800-qosmio-5.15-qsdk11-new-krait-cc/include/toplevel.mk:231: world] Error 2

I saw that you applied patches under package/kernel/mac80211, they should be applied under target/linux they must live inside build_dir/target-*/linux-*/linux-*/patches/generic, add them there to your quilt series.

1 Like

@qosmio

I had no build problem with qca-nss-drv-pppoe enabled, using the your latest commits yesterday as well as with your openwrt master merge to 5.15-qsdk11-new-krait-cc today.

Unfortunately, even with this new 5.15 based build, I have still encountered random reboots due to "RCU stalling", similar to what we observed in NSS 22.03 or NSS master builds. Not everyone reported it, but at least 4 of us (Tishipp, Mpilon, D43m0n and me) had a chance to see it in the console or remote syslog using NSS 22.03 or Master. The symptom of this RCU stalling is that the router seemes to get hung for about 30 seconds (WIFI was also down) prior to the router's spontaneous reboot. 30 seconds is the default watchdog timeout in OpenWrt. The spontaneous reboot was most likely triggered by the Watchdog timeout, thus there was no ramoops crash dump.

Mpilon also reported a spontaneous reboot with his router running your 5.15-based build after 7 hours. Even though he did not get any syslog, I'm very sure it was caused by the same RCU stalling issue. Mpilon and I were the two particularly unlucky creatures who have bumped into this RCU stalling issue quite a few times lately.

Anyway, I have disabled irqbalance and run the same 5.15 build again. Disabling irqbalance seemed to mitigate the RCU stalling-caused reboots to large extent, based on my previous encounters with recent NSS 22.03 or Master builds. As for clamping the Krait cores to a specific frequency, I intentionally did not use it.

1 Like

@sppmaster

make clean && make download && make -j1 V=s will fix the build errors most of the time for me, unless some feeds, patches or commits are actually bad.

1 Like


we are on the right track.

2 Likes

After the strange data that was obtained, I have set up a native linux machine (a bit old -Q9550) connected by cable to serve as a test server. the laptop that I am using as a client is the same but from now on I will do the tests in the native kubuntu that I have installed.
Previous latency issues stemmed from using a virtual machine as a server even though it was running on an i7-12700k with 16 assigned threads.

Here are the new tests.
wifi5 160mhz

wifi5 80mhz

wifi5 20mhz

¡Sí señor, mucho mejor! Can you redo it by adding -l 300 -s 0.5 parameters to the flent rrul_be test parameters? And, let's see how it behaves after running for a little while.

Is this with aql_txq_limit set at 2000 2000 and our suggested patches?

yes, it is. the last test I did yesterday already included it. I do not reboot router. 22h active so far.

I would need the full command. I'm very very rookie to these things.

Apologies, here you go:

flent rrul_be --verbose -t 'rrul_be Kubuntu v22.03-ath10k-fixes WLAN ecn-off tx_burst-2 ts2time-8ms napi-pool-8 aql-2000ms' -H ubuntu-server -p all  --figure-width=12.80 --figure-height=9.60 -l 300 -s .05 -o rrul_be-Kubuntu-v22.03-ath10k-fixes-WLAN-ECN-of_tx_burst-2_ts2time-8ms-napi_poll-8-aql-2000ms-noaql.png

Based on your graphs, I imagined you run netserver and irtt in your ubuntu-server.

Why were your throughputs so low in all these tests? Only around 30 Mbps Upload / Download, even with WIFI 5 160 MHz?

@vochong, WiFi is not a full duplex connection. It is a half-duplex one, so you need to add both to get the total throughput in a simultaneous download and upload test. So it looks pretty decent.

I was writing the post below when I see your post:

"In previous tests I have noticed that the speed was too low in my opinion. I have done additional tests and I have realized that the switch to which the new server I have prepared connects had a problem and was working at 100 mbits."

I need to do new test.

@amteza
@asvio

R7800 is running a 5.15 image based on the latest Qosmio's commits today (which may have the incorrectly applied patches as you mentioned). Netperf server is running on the same R7800.

The flent client is running on a very old Dell laptop (E7470) with 2x2 WIFI5.

# cat /sys/kernel/debug/ieee80211/*/aql_txq_limit
AC	AQL limit low	AQL limit high
VO	2000		2000
VI	2000		2000
BE	2000		2000
BK	2000		2000

AC	AQL limit low	AQL limit high
VO	2000		2000
VI	2000		2000
BE	2000		2000
BK	2000		2000

Stray thoughts about some of these test results -

  1. you may want to first run these tests with 22.03 as a baseline for each of your test setups - and then run the same tests with 5.15, sundry patches ...

I think assumptions about what's ok/not are being made for performance in general, not how your test system is working.

B) I recall reading (for developers, latency thread) that most of the Intel WiFi adapters have unexplained stall or stuttering and the issue is in their firmware - we can't fix it.

I had one such m.2 adapter and saw my WiFi fall on its face sometimes. Suggestion was to go with a mediatek mt7921, as the 7922AX is difficult to source.

As is the 7921 ... I found one removed from an hp laptop on ebay here in the US - the rest looked like knockoffs.

Anyway .... Suggest you test your setups with acceptable good released code, then compare with this latest stuff.
M.

EDIT: here's the Discussion about mediatek, Intel issues.

3 Likes


data

iperf after test

asvio@MSI-GS72-6QE:~$ date
vie 23 sep 2022 08:39:56 CEST
asvio@MSI-GS72-6QE:~$ iperf3 -c 192.168.1.208 -f M -i 60 -t 10 -P 1
Connecting to host 192.168.1.208, port 5201
[  5] local 192.168.1.10 port 37436 connected to 192.168.1.208 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-10.00  sec   733 MBytes  73.2 MBytes/sec    0   3.17 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   733 MBytes  73.2 MBytes/sec    0             sender
[  5]   0.00-10.00  sec   732 MBytes  73.2 MBytes/sec                  receiver

iperf Done.

Wow, I'm completely puzzled by your new graph. It makes no sense that after your previous one, you have this change. Was someone else using the network? What changed? It should be more in line with your previous one or @vochong's.