Ipq806x NSS build (Netgear R7800 / TP-Link C2600 / Linksys EA8500)

How exactly do we apply that patch. Do we have to edit a file before compiling?

2 Likes

I must say that with the work that has collectively been done so far by all involved it already "feels" pretty darn good. Love the commitment by the way.

Are there particular points of measurement you're interested in? Or would you settle for our praises and high spirits?

2 Likes

Which NSS build is currently more stable, version 22 or version 21?

@pattagghiu , @D43m0n

Correct, you can simply add my NSS repo to your feeds.conf. My repository has a copy of it already. You can also use the base r7800-diffconfig config as starting point.

And then just the usual prepping commands.

cp r7800-diffconfig .config
make menuconfig

2 Likes

NAPI_POLL_WAIT to 8, from it's present 64

Complete shot in the dark, but is it the following? I was unable to find any variable like that in backports package, or any existing patches.

--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -67,7 +67,7 @@
 #define ATH10K_KEEPALIVE_MAX_UNRESPONSIVE 3900

 /* NAPI poll budget */
-#define ATH10K_NAPI_BUDGET      64
+#define ATH10K_NAPI_BUDGET      8
1 Like

I was unaware that the ATH10k were not inheriting NAPI_POLL_WEIGHT! Good find! Try that!

Explains a lot. Our original work for the ath10k in 2016 - had no NAPI support in it at all - IMHO it simply can't interrupt often enough for
it to matter, all NAPI does is bulk up operations for far, far too long... but it got added anyway by someone else much later for cargo cult reasons.

So there's a separate patch for NAPI_POLL_WEIGHT, which will service the ethernet driver more often, @amteza?

I too am very happy and relieved we all working together have made this release of openwrt the best ever. Now what? :slight_smile:

Abstractly, my goal - oft expressed - is to make wifi capable of FPS gaming, and one day cloud gaming, even with several other heavy users on the link. We've got new benchmarks now - like "apple's responsiveness" and iperf's new "bounceback" test that can show the minimum possible latency a given wifi link can achieve, which was about 21000 RPM, or 350RPS, with no load. Up until a few patches for the mt76 on the other thread went by, with load, we were seeing 350 RPM in many scenarios, such as mesh, and ath10k is still doing poorly here.

In an ideal pre-wifi7 environment, getting 100 RPS under load seems achievable, if we rip ever more latencies out.

As for wanting to try a few patches on the NSS build, you have plenty of cpu left over due to all the offloads, making it more possible to isolate
wtf the ath10k is doing so badly. Maybe.

2 Likes

Here you go:

--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2440,7 +2440,7 @@ static inline void *netdev_priv(const st
 /* Default NAPI poll() weight
  * Device drivers are strongly advised to not use bigger value
  */
-#define NAPI_POLL_WEIGHT 64
+#define NAPI_POLL_WEIGHT 8

 /**
  *	netif_napi_add - initialize a NAPI context

It might be interesting going with this one too:

--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -514,7 +514,7 @@ struct sta_info *sta_info_alloc(struct i
 	sta->sta.max_rc_amsdu_len = IEEE80211_MAX_MPDU_LEN_HT_BA;

 	sta->cparams.ce_threshold = CODEL_DISABLED_THRESHOLD;
-	sta->cparams.target = MS2TIME(20);
+	sta->cparams.target = MS2TIME(8);
 	sta->cparams.interval = MS2TIME(100);
 	sta->cparams.ecn = true;

@@ -2548,15 +2548,9 @@ static void sta_update_codel_params(stru
 	if (!sta->sdata->local->ops->wake_tx_queue)
 		return;

-	if (thr && thr < STA_SLOW_THRESHOLD * sta->local->num_sta) {
-		sta->cparams.target = MS2TIME(50);
-		sta->cparams.interval = MS2TIME(300);
-		sta->cparams.ecn = false;
-	} else {
-		sta->cparams.target = MS2TIME(20);
-		sta->cparams.interval = MS2TIME(100);
-		sta->cparams.ecn = true;
-	}
+	sta->cparams.target = MS2TIME(8);
+	sta->cparams.interval = MS2TIME(100);
+	sta->cparams.ecn = true;
 }

 void ieee80211_sta_set_expected_throughput(struct ieee80211_sta *pubsta,
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1564,7 +1564,7 @@ int ieee80211_txq_setup_flows(struct iee

 	codel_params_init(&local->cparams);
 	local->cparams.interval = MS2TIME(100);
-	local->cparams.target = MS2TIME(20);
+	local->cparams.target = MS2TIME(8);
 	local->cparams.ecn = true;

 	local->cvars = kcalloc(fq->flows_cnt, sizeof(local->cvars[0]),
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -2842,11 +2842,11 @@ enum wiphy_params_flags {
 #define IEEE80211_DEFAULT_AIRTIME_WEIGHT	256

 /* The per TXQ device queue limit in airtime */
-#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L	5000
-#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_H	12000
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L	2000
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_H	4000

 /* The per interface airtime threshold to switch to lower queue limit */
-#define IEEE80211_AQL_THRESHOLD			24000
+#define IEEE80211_AQL_THRESHOLD			8000

 /**
  * struct cfg80211_pmksa - PMK Security Association
2 Likes

if you use the performance CPU governor on 22.03 or clamp down CPU frequency so your router will always run on the same frequency, there's really no difference. There have been some WiFi improvements done to 22.03 recently so give that a shot. But remember locking CPU frequency to a specific number.

Here's an update of recent commits for 22.03.

And here's one for master.

Specifically some mac80211 fixes have been committed.

2 Likes

Current 22.03 and master with kernel 5.10 stay stable for over a week when CPU is clamped to one specific frequency. At the moment I'm testing kernel 5.15 with patches from @Ansuel for stability where my R7800's CPU's may roam freely between 600MHz and 1725MHz, this build is without NSS. After a week of testing and reporting back, I'll give it a go with @qosmio 's work on kernel 5.15 with NSS. Hopefully Ansuels PR will be accepted into master by then, otherwise I'll add that to my private build. If someone could get these patches from @amteza in a GitHub repository, I'll add that too:

--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2440,7 +2440,7 @@ static inline void *netdev_priv(const st
 /* Default NAPI poll() weight
  * Device drivers are strongly advised to not use bigger value
  */
-#define NAPI_POLL_WEIGHT 64
+#define NAPI_POLL_WEIGHT 8

 /**
  *	netif_napi_add - initialize a NAPI context

And the interesting one:

--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -514,7 +514,7 @@ struct sta_info *sta_info_alloc(struct i
 	sta->sta.max_rc_amsdu_len = IEEE80211_MAX_MPDU_LEN_HT_BA;

 	sta->cparams.ce_threshold = CODEL_DISABLED_THRESHOLD;
-	sta->cparams.target = MS2TIME(20);
+	sta->cparams.target = MS2TIME(8);
 	sta->cparams.interval = MS2TIME(100);
 	sta->cparams.ecn = true;

@@ -2548,15 +2548,9 @@ static void sta_update_codel_params(stru
 	if (!sta->sdata->local->ops->wake_tx_queue)
 		return;

-	if (thr && thr < STA_SLOW_THRESHOLD * sta->local->num_sta) {
-		sta->cparams.target = MS2TIME(50);
-		sta->cparams.interval = MS2TIME(300);
-		sta->cparams.ecn = false;
-	} else {
-		sta->cparams.target = MS2TIME(20);
-		sta->cparams.interval = MS2TIME(100);
-		sta->cparams.ecn = true;
-	}
+	sta->cparams.target = MS2TIME(8);
+	sta->cparams.interval = MS2TIME(100);
+	sta->cparams.ecn = true;
 }

 void ieee80211_sta_set_expected_throughput(struct ieee80211_sta *pubsta,
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1564,7 +1564,7 @@ int ieee80211_txq_setup_flows(struct iee

 	codel_params_init(&local->cparams);
 	local->cparams.interval = MS2TIME(100);
-	local->cparams.target = MS2TIME(20);
+	local->cparams.target = MS2TIME(8);
 	local->cparams.ecn = true;

 	local->cvars = kcalloc(fq->flows_cnt, sizeof(local->cvars[0]),
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -2842,11 +2842,11 @@ enum wiphy_params_flags {
 #define IEEE80211_DEFAULT_AIRTIME_WEIGHT	256

 /* The per TXQ device queue limit in airtime */
-#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L	5000
-#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_H	12000
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_L	2000
+#define IEEE80211_DEFAULT_AQL_TXQ_LIMIT_H	4000

 /* The per interface airtime threshold to switch to lower queue limit */
-#define IEEE80211_AQL_THRESHOLD			24000
+#define IEEE80211_AQL_THRESHOLD			8000

 /**
  * struct cfg80211_pmksa - PMK Security Association
2 Likes

@dtaht @qosmio, my understanding of how NAPI_BUDGET is used is that to control the rx path. I don't think it controls the tx path, which is what is causing all the latency. The NAPI budget, if my understanding is correct, is used to control how many packets to clear from the device received queue and push it into the Linux stack. For bursty received traffic, higher NAPI_BUDGET value will help.

Happy to be corrected tho. if my understanding is wrong.

I have been toying with the AQL low/high and threshold limit recently to see if that helps with latency (for my R7800, running 21.02- 5.4.211 kernel with NSS acceleration), although my test case is how often I see slow response when playing online games ... haha. I think the existing AQL limits are set too high, as those are per queue, and each associated station can have up to 16 queues assigned. If each queues takes up to 5ms of airtime, this will cause havoc with latency.

Testing 500us/1000us low/high limits per queue, and 2000us for threshold. Doesn't look like it has much benefit tho. Maybe have to go lower.

1 Like

I will do it, give me a few hours, unless you are in a hurry. Then, I will upload them to mine.

So basically i can clone directly your repository and build?
it would be nice to add also last @Ansuel 's patch for cpu scaling, have you added it already?
i'll give a try, i just need to add my packages to .config :slight_smile:

2 Likes

Not in a hurry, I'm testing CPU frequency patches first. I want to do that for at least a week to get some sense of effect on stability first. My network has a few dumb AP's and always a few IOT devices connected. Some phone home (Philips Hue, Somfy) and some sent frequent MQTT messages over WiFi. Some clients regularly send backups to cloud destinations. I also have IPTV so multicast is also sent over my network. Enough devices I think to keep my R7800 "busy", so if this is stable without unexpected reboots for a week then I'll report to @Ansuel and move on to testing more improvements :nerd_face: :+1:

1 Like

Question for @qosmio : i see the package CONFIG_PACKAGE_MAC80211_NSS_SUPPORT=y
is not present in your code, are we missing something or it's correct? i think this is the package enabling nss acceleration on wifi.. am i wrong?
thanks!

Edit to add: i also get this warning when building

WARNING: Makefile 'package/feeds/nss/qca-nss-clients/Makefile' has a dependency on 'kmod-qca-ssdk-nohnat', which does not exist

Second edit:
i also get a build error:

cp -fpR feeds/nss/qca-nss-drv/files/nss-firmware/nss_fw_version.h /home/massi/rutto/qosmio/openwrt-ip806x/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/exports/nss_fw_version.h
cp: cannot stat 'feeds/nss/qca-nss-drv/files/nss-firmware/nss_fw_version.h': No such file or directory
make[3]: *** [Makefile:180: /home/massi/rutto/qosmio/openwrt-ip806x/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/.configured_68b329da9893e34099c7d8ad5cb9c940] Error 1
make[3]: Leaving directory '/home/massi/rutto/qosmio/openwrt-ip806x/feeds/nss/qca-nss-drv'
time: package/feeds/nss/qca-nss-drv/compile#0.13#0.00#0.12
    ERROR: package/feeds/nss/qca-nss-drv failed to build.
make[2]: *** [package/Makefile:116: package/feeds/nss/qca-nss-drv/compile] Error 1
make[2]: Leaving directory '/home/massi/rutto/qosmio/openwrt-ip806x'
make[1]: *** [package/Makefile:110: /home/massi/rutto/qosmio/openwrt-ip806x/staging_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/stamp/.package_compile] Error 2
make[1]: Leaving directory '/home/massi/rutto/qosmio/openwrt-ip806x'
make: *** [/home/massi/rutto/qosmio/openwrt-ip806x/include/toplevel.mk:231: world] Error 2

did i forget something?

1 Like

This very informative, as always! Thank you.

@amteza , Thank you! I’ve made the changes to incorporate these into my build as well. So far, so good.

@pattagghiu , are you using the “11.2-K5.15” branch of the NSS packages? There is a line in the Makefile for qca-nss-drv that copies that header file into the proper folder prior to running the build. Can you verify it’s there?

Also, as for the mac80211 NSS option, you can enable that, I did a very barebones config from my own “.config” and must’ve forgot. I also have @Ansuel’s latest Krait-CC and @amteza NAPI polling patches under the branch 5.15-qsdk11-new-krait-cc.

EDIT: @pattagghiu , I just found the bug. I was using the old path for the nss packages in my build which is 'package/qca' vs 'feeds/nss'. Pushed the changes now.

--- a/qca-nss-drv/Makefile
+++ b/qca-nss-drv/Makefile
@@ -156,7 +156,7 @@ ifeq ($(CONFIG_TARGET_BOARD), "ipq806x")
 endif

 define Build/Configure
-       $(CP) $(SOURCE)/files/nss-firmware/nss_fw_version.h $(PKG_BUILD_DIR)/exports/nss_fw_version.h
+       $(CP) $(TOPDIR)/$(SOURCE)/files/nss-firmware/nss_fw_version.h $(PKG_BUILD_DIR)/exports/nss_fw_version.h
4 Likes

Latency is the sum of tx + rx + overhead. The specific needle I was trying to budge was the gross disparity between upload and download
performance on the rrul test here: AQL and the ath10k is *lovely* - #901 by amteza - which we didn't have in 2016. Spending less time processing smaller reads I would have hope to make more time for bigger writes.

Makes sense. I hope this does not result in the device dropping received packets tho. due to limited receive buffer.

thanks man!
I'm gonna try a new build, but i'm not sure about this package. My point is not that the package is disabled, my point is that i can't find the quoted package in menuconfig!
This is from @ACwifidude 's master and is missing in your repo:

1 Like