Ipq806x NSS build (Netgear R7800 / TP-Link C2600 / Linksys EA8500)

hello,
i tried starting with a clean 22.03 to recreate all my configs with firewall4 and i have to confirm that NSS is not working. I also have pppoe connection, so maybe you are right and it depends on that..
I'm probably trying with master..

1 Like

Hello,
trying to build master (with some added packages).
I get this not understandable (for me :slight_smile: ) error building gdb
Any idea?
I also think some of my packages (probably adblock..) is forcing iptables to be selected..
Thanks

It hasn't lasted even 24 hours on 22.03 and it has been restarted... it's a shame but it will surely be what the colleagues say, that inaccessible NSS kernel driver has to be the cause of all our ills.

Iā€™m not sure how to troubleshoot the pppoe issue (I donā€™t have pppoe). Iā€™m open to any suggestions for fixing the patch.

For reassurance for all the reported crashes - My router and x2 APs had 13d uptime each with the last master build (timer reset because I updated). I run the default CPU settings, donā€™t use any extra packages, and donā€™t use any extra NSS features other than the hardware acceleration for NAT / wifi.

The kernel driver for nss is open source, only the firmware is closed source, which is almost the same with the wireless driver with the exception, that candela tech has access to the firmware code in order to produce ct firmware.

Mine has never crashed, but only use the nss router as secondary unit, thus can't really fix anything if I'm not able to easily reproduce the crashes, same applies to pppoe, I don't use pppoe on my nss unit.

1 Like

New firmware ver 10.4-3.9.0.2-00157 is included in the latest master for ath-10k.
Is this commit included in master https://git.openwrt.org/?p=openwrt/staging/nbd.git;a=commit;h=a01864235ecddbaa56b0c998c0ccb3b1925a2d16

He rebased from the master @Date: Sat Jul 23 19:24:53 2022 -0500, so the actual commit 9f1d6223289b5571ddc77c0e5327ab51137199d9 was in the build. The link you mentioned was only the staging commit.

commit 9f1d6223289b5571ddc77c0e5327ab51137199d9
Author: Felix Fietkau <nbd@nbd.name>
Date:   Wed Jul 13 07:52:04 2022 +0200

    mac80211: fix AQL issue with multicast traffic
    
    Exclude multicast from pending AQL budget
    
    Signed-off-by: Felix Fietkau <nbd@nbd.name>

diff --git a/package/kernel/mac80211/patches/subsys/339-mac80211-exclude-multicast-packets-from-AQL-pending-.patch b/package/kernel/mac80211/patches/subsys/339-mac80211-exclude-multicast-packets-from-AQL-pending-.patch
new file mode 100644
index 0000000000..43c3e75d65
--- /dev/null
+++ b/package/kernel/mac80211/patches/subsys/339-mac80211-exclude-multicast-packets-from-AQL-pending-.patch
@@ -0,0 +1,30 @@
+From: Felix Fietkau <nbd@nbd.name>
+Date: Wed, 13 Jul 2022 07:32:26 +0200
+Subject: [PATCH] mac80211: exclude multicast packets from AQL pending airtime
+
+In AP mode, multicast traffic is handled very differently from normal traffic,
+especially if at least one client is in powersave mode.
+This means that multicast packets can be buffered a lot longer than normal
+unicast packets, and can eat up the AQL budget very quickly because of the low
+data rate.
+Along with the recent change to maintain a global PHY AQL limit, this can lead
+to significant latency spikes for unicast traffic.
+
+Since queueing multicast to hardware is currently not constrained by AQL limits
+anyway, let's just exclude it from the AQL pending airtime calculation entirely.
+
+Fixes: 8e4bac067105 ("wifi: mac80211: add a per-PHY AQL limit to improve fairness")
+Signed-off-by: Felix Fietkau <nbd@nbd.name>
+---
+
+--- a/net/mac80211/tx.c
++++ b/net/mac80211/tx.c
+@@ -3792,7 +3792,7 @@ begin:
+ encap_out:
+       IEEE80211_SKB_CB(skb)->control.vif = vif;
+ 
+-      if (vif &&
++      if (tx.sta &&
+           wiphy_ext_feature_isset(local->hw.wiphy, NL80211_EXT_FEATURE_AQL)) {
+               bool ampdu = txq->ac != IEEE80211_AC_VO;
+               u32 airtime;
1 Like

Dmesg looks much cleaner these days after these ATF/RRS and multicast fixes. Before that, it was an unassailable potpourri of mac80211 and ath10k failure log spam :slight_smile:

1 Like

Thanks to ACwifidude for the suggestion to remove CONFIG_ALL_KMODS=y from diffconfig file. Without it, I struggled with searching Google for workarounds for seemingly endless build failures regarding libraries and packages. With the removal of such setting, the build process worked well except one final hiccup with the most recent feeds update with regards to the wolfssl library. If anyone has a similar wolfssl build failure, add this setting "--enable-fastmath" in the wolfssl Makefile

package/libs/wolfssl/Makefile
CONFIGURE_ARGS +=
--enable-fastmath \

Also, for people using lots of kmod packages, please keep in mind that without "CONFIG_ALL_KMODS=y", only about 170 kmod packages will be generated, instead of the full list of about 920 of them. If you need a kmod package that is not available, you must build your own image and include it in your diffconfig.

2 Likes

I didn't use git to put that patch in the correct directory, so it won't be part of the build. I'm trying to understand how this build system works. The files themselves are present in the build_root since I assumed that after issuing a make command, directories will be traversed and anything found in those directories get included. But now I think I need to use git to get patches included?

I think I have found a way to do it in this post:

git remote add 22.03 https://github.com/openwrt/openwrt.git
git fetch 22.03
git cherry-pick <hash>
git cherry-pick ā€¦.

kong in its compilation has two more pppoe packages, I haven't tested its compilation yet, but can it be the cause of this that NSS doesn't speed up pppoe? Then I'll try kong to see if it works.

rp-pppoe-common - 3.14-3
rp-pppoe-sniff - 3.14-3

@ACwifidude
You may want to remove bcp38 from the NSS master snapshot and 22.03 branches as this package still fully relies on iptables to configure the firewall behind the scene. With bcp38 disabled or enabled, you can try to manually "/etc/init.d/firewall restart" and you will see lots of warnings/errors due to bcp38. You may want to commit another "iptables further changes 2" :slight_smile: by removing "CONFIG_PACKAGE_luci-app-bcp38=y" from the diffconfig for the NSS master snapshot and 22.03.

With OpenWrt routers being used mostly as home routers with IP masquerade enabled (NAT overload in Cisco terminology) and LAN with private IP addresses, I don't think bcp38 will not help much as any spoofed source IP address would always be changed to the public WAN IP address of the router.

3 Likes

@D43m0n: Once you fetch the remote repository, make sure that you switch to your 22.03 RC1 branch (git branch branch-name) prior to issuing "git cherry-pick ec9f82fa18c7c8deb4875152d7907855d186f4c6" for the recent multicast AQL fix.

Thanks, I basically did that yesterday and let my NAS build overnight. I followed what @ACwifidude mentioned earlier this year by adding a remote git repo, fetching it, and then cherry picking commits.

I wasnā€™t sure if thatā€™s correct after I read your reply just now, so I went on the Internet and found that what I did was at least one way to do it. You described another way to do it so thatā€™s great, I learned something again today :nerd_face:

This new build I created based on 22.03-RC1 but with the specific WiFi patches from Felix has been flashed to both R7800ā€™s 3 hours ago. The version number is different this time so I think I did it the correct way. So far WiFi feels snappier, but it could very well be my wishful thinking that wants it to be snappier :sweat_smile:

I have run a few bufferbloat tests on waveformā€™s tool via WiFi and that seems promising. I have enabled NSS fq-codel on the router and left it disabled on the dumb AP. When both my wife and I work from home, we are connected to the dumb AP wired with our laptops and our iPhones are connected to it via WiFi. The next few days we will be our own guinea pigs for at least stability without reboots. So far the bufferbloat tests via WiFi seem promising because I get the same excellent results as I got with the 22.03-RC5 build with the WiFi fixes included.

In fact, it seems bufferbloat test results are a bit better via WiFi than via a wired connection :face_with_raised_eyebrow:

In a few test results, I got straight A results that are good enough for online gaming via WiFi. Now I donā€™t see that often on my wired connection. For now Iā€™ve got my fingers crossed :crossed_fingers:

1 Like

Hope you can keep the wife happy :slight_smile:

Some good news from me too: soon after ACwifidude released his latest master snapshot last Saturday, I did a git pull and built another private image. So far it has run perfectly well without any random reboot for almost 4 days (I did not enable NSS fq_codel). I now prefer building from the master snapshot because it has included the latest board-2.bin and the latest mainline ath10k firmware version 157, so I don't have to add it as a custom file during the build process. The mainline ath10k firmware in the 22.03 branch is still the old vesion 131 from several years ago.

3 Likes

If anyone has a similar wolfssl build failure, add this setting "--enable-fastmath" in the wolfssl Makefile

package/libs/wolfssl/Makefile
CONFIGURE_ARGS +=
--enable-fastmath \

there seem to be a discussion over using fastmath vs heapmath. This pending patch (upstream) fixed the issue for me though. wolfssl: fix math library build #10317

How's the news after almost 5 days? Have you been tempted to enable NSS fq_codel yet to see what it does?

After roughly 2 days here I can say that with this 22.03-RC1 build that has the WiFi fixes and NSS fq_codel enabled is still running smooth :crossed_fingers:

I must say I am tempted to build an image based on master too. I've done some buffer bloat tests via wired and wifi connections. Wifi seems to manage that better. But, when I use a different browser when testing buffer bloat via wired connection, I get great results too. If someone is running a Mac, try waveform's buffer bloat tool with Safari and Firefox.

Hello,
i can confirm that also master does not accelerate pppoe.
So i tried adding the two packages you reported.
I added the first (rp-pppoe-common), but could not find the second: in menuconfig, searching for the package, it says rp-pppoe-sniff is in network->dialup, but it is not there.
So i tried building with just the first package added (and nothing more), and it still does not accelerate.
I think we need help on this, at least to understand where to look to understand what's happening.. @ACwifidude ? @quarky ? @KONG ?
Thanks :slight_smile:

In logs i just can see this, but seems not related

Sun Jul 24 00:25:08 2022 daemon.notice procd: /etc/rc.d/S19qca-nss-ecm: Failed to find shortcut-fe. Maybe it is a built in module ?
Sun Jul 24 00:25:08 2022 daemon.notice procd: /etc/rc.d/S19qca-nss-ecm: Failed to find shortcut-fe-ipv6. Maybe it is a built in module ?
Sun Jul 24 00:25:08 2022 daemon.notice procd: /etc/rc.d/S19qca-nss-ecm: Failed to find shortcut-fe-drv. Maybe it is a built in module ?
Sun Jul 24 00:25:08 2022 daemon.notice procd: /etc/rc.d/S19qca-nss-ecm: net.bridge.bridge-nf-call-ip6tables = 1
Sun Jul 24 00:25:08 2022 kern.info kernel: [   32.348221] ECM init
Sun Jul 24 00:25:08 2022 kern.info kernel: [   32.348274] ECM database jhash random seed: 0x4e1ecd94
Sun Jul 24 00:25:08 2022 kern.info kernel: [   32.350521] ECM init complete
Sun Jul 24 00:25:08 2022 daemon.notice procd: /etc/rc.d/S19qca-nss-ecm: net.bridge.bridge-nf-call-iptables = 1
Sun Jul 24 00:25:08 2022 daemon.notice procd: /etc/rc.d/S19qca-nss-ecm: dev.nss.general.redirect = 1

I've also noticed that my old config (working with 21) says

option device 'eth0'

where now with master i need

option device 'eth0.835'

i assume this is not impacting at all, but just to give all the info i can see..

1 Like

I doubt that rp-pppoe-common is required for NSS acceleration. I also need to set up a pppoe connection but don't have this particular package. I've got a fiber connection. I also need to select the correct VLAN to get the internet connection via pppoe working. NSS acceleration works here (on 22.03-RC1 and on 22.03-RC5). Although it's not stable on 22.03-RC5. I'm building a master image right now, takes a while and I need to find a time slot when I can flash it.

Have you tried Kong 22.03 that has those packages if it speeds up pppoe? It is the test that remains for me to do.