Ipq806x NSS build (Netgear R7800 / TP-Link C2600 / Linksys EA8500)

can we get ramoops also on 21.02?
just got a self-reboot after more than 3 weeks of uptime :frowning: it would be nice to see what happened (and maybe helping debugging it..)
Thanks

Would the -htt firmware be more appropriate for this build? The OP states a desire for better fast roaming capability and that is what the -htt firmware offers.

and I got that today in my log, idea? :slight_smile:

Mon Feb 28 11:18:49 2022 daemon.notice hostapd: wlan1: AP-STA-POLL-OK d2:c4:d3:33:0a:9b
Mon Feb 28 11:21:28 2022 kern.err kernel: [ 4847.146809] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 11:24:04 2022 daemon.notice hostapd: wlan1: AP-STA-POLL-OK d2:c4:d3:33:0a:9b
Mon Feb 28 11:26:27 2022 kern.err kernel: [ 5146.050736] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 11:29:10 2022 daemon.notice hostapd: wlan1: AP-STA-POLL-OK d2:c4:d3:33:0a:9b
Mon Feb 28 11:31:26 2022 kern.err kernel: [ 5444.952767] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 11:34:14 2022 daemon.notice hostapd: wlan1: AP-STA-POLL-OK d2:c4:d3:33:0a:9b
Mon Feb 28 11:36:25 2022 kern.err kernel: [ 5743.853564] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 11:39:21 2022 daemon.notice hostapd: wlan1: AP-STA-POLL-OK d2:c4:d3:33:0a:9b
Mon Feb 28 11:41:24 2022 kern.err kernel: [ 6042.763137] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 11:44:23 2022 daemon.notice hostapd: wlan1: AP-STA-POLL-OK d2:c4:d3:33:0a:9b
Mon Feb 28 11:46:23 2022 kern.err kernel: [ 6341.671297] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 11:49:24 2022 daemon.notice hostapd: wlan1: AP-STA-POLL-OK d2:c4:d3:33:0a:9b
Mon Feb 28 11:51:22 2022 kern.err kernel: [ 6640.577974] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 11:54:31 2022 daemon.notice hostapd: wlan1: AP-STA-POLL-OK d2:c4:d3:33:0a:9b
Mon Feb 28 11:56:21 2022 kern.err kernel: [ 6939.482403] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 11:59:43 2022 daemon.notice hostapd: wlan1: AP-STA-POLL-OK d2:c4:d3:33:0a:9b
Mon Feb 28 12:01:20 2022 kern.err kernel: [ 7238.386407] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 12:05:00 2022 daemon.notice hostapd: wlan1: AP-STA-POLL-OK d2:c4:d3:33:0a:9b
Mon Feb 28 12:06:19 2022 kern.err kernel: [ 7537.288983] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 12:10:10 2022 daemon.notice hostapd: wlan1: AP-STA-POLL-OK d2:c4:d3:33:0a:9b
Mon Feb 28 12:11:17 2022 kern.err kernel: [ 7836.190181] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 12:12:13 2022 daemon.info hostapd: wlan1: STA d2:c4:d3:33:0a:9b IEEE 802.11: disconnected due to excessive missing ACKs
Mon Feb 28 12:12:13 2022 daemon.notice hostapd: wlan1: AP-STA-DISCONNECTED d2:c4:d3:33:0a:9b
Mon Feb 28 12:12:43 2022 daemon.info hostapd: wlan1: STA d2:c4:d3:33:0a:9b IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Mon Feb 28 12:16:16 2022 kern.err kernel: [ 8135.089937] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 12:21:15 2022 kern.err kernel: [ 8433.998358] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 12:26:14 2022 kern.err kernel: [ 8732.905440] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 12:31:13 2022 kern.err kernel: [ 9031.811075] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 12:36:12 2022 kern.err kernel: [ 9330.714397] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 12:41:11 2022 kern.err kernel: [ 9629.617288] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 12:46:10 2022 kern.err kernel: [ 9928.518783] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 12:51:09 2022 kern.err kernel: [10227.418863] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 12:56:08 2022 kern.err kernel: [10526.327571] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 13:01:07 2022 kern.err kernel: [10825.234885] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 13:06:05 2022 kern.err kernel: [11124.139424] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 13:11:04 2022 kern.err kernel: [11423.044012] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 13:16:03 2022 kern.err kernel: [11721.947189] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 13:21:02 2022 kern.err kernel: [12020.848892] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 13:26:01 2022 kern.err kernel: [12319.748979] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 13:31:00 2022 kern.err kernel: [12618.657954] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 13:35:59 2022 kern.err kernel: [12917.565569] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 13:40:58 2022 kern.err kernel: [13216.471777] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 13:45:57 2022 kern.err kernel: [13515.376550] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 13:50:56 2022 kern.err kernel: [13814.280017] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 13:55:55 2022 kern.err kernel: [14113.182118] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT
Mon Feb 28 14:00:53 2022 kern.err kernel: [14412.082769] wlan1: NSS TX failed with error: NSS_TX_FAILURE_TOO_SHORT

The issue seems to be from having mesh nodes in the network. Those nodes sends zero length frames and spam the log (not an actual issue, just log spam). There is a patch in the nss drivers thread that I can add with a future build. :sunglasses:

2 Likes

thanks yes it would be great.

or if you give me the link for the patch, I could try it

Haven’t tried it. Looks promising. :sunglasses:

1 Like

There is something in the image that causes a heavy LAN ethernet throughput problem with the latest NSS release. I have tested in an R7800 and the LAN performance drops to about 200 Mbps when using the stable NSS release from about 960 Mbps when using the official latest stable release.

Any idea on how to solve the issue?

The problem with WSL is the path. In WSL the path includes spaces and the make process doesn't allow that kind of path with spaces. There is an easy workaround, just call make -j5 as:

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin make -j5

WSL is wonderful for development. VSCode works like a charm.

Not tracking any big changes. Anything unique in your setup?

Not really, I am playing with this R7800 unit before I decide on using it or discard it. I have two important issues:

  1. I don't really see hardware offloading working here, the ksoftirq/0 is taking almost the whole CPU regardless the configuration I apply, and it is as loaded with the NSS as it is with the official image. However, that is noticeable only when the data flows from the terminal to the router (uploading). And this leads to the second critical issue.
  2. In the opposite direction (downloading), with is the common behavior, the performance is very low, between 300-400Mbps when using wireless. In the opposite direction I get wireless speeds beyond 1Gbps, actually I've seen 1.33Gbps using the latest stable non NSS image. For these tests I am using iperf3 -s running in the router and testing from wireless terminals in sender mode and also in reverse mode (more real life scenario).
    What I want to bring here is the fact that we can get quite a good throughput when uploading from the terminal to the router, beyond 1.25Gbps. But, when downloading from the router (or another wired computer) to a wireless unit, the throughput is about 1/2 than in the other direction.
    I am using a laptop capable of connecting at 1733Mbps, 160Mhz channel to test the speed. Any thought on this behavior?

Don’t use the router as the iperf server (all in ones routers should never be the iperf server or client at these speeds- it is CPU intensive and will give false data).

Test wireless using a wired LAN computer as the iperf server, wireless client as the iperf client with the router between the two.

Yes, I know iperf3 is quite CPU intensive when receiving data, not that much when sending it.
In any case when using a wireless computer A (using 160MHz channel and 1733 Mbps transfer speed) and wired computer B wired to a LAN ethernet. Serving iperf3 -s from computer B and using the command iperf3 -c IP_B -n 5G -P 10 -V -R in the client I got this:

  1. When the client is the R7800 I get 936 Mbits/sec. This is BW from B(wired) -> R7800. The CPU is almost 100% in the router, it is the expected situation.
  2. When the client is computer A I get 543 Mbits/sec. This is BW from B(wired) -> R7800 -> A(wireless). In this case the htop looks like this:

    This is my rc.local:
# Put your custom commands here that should be executed once
# the system init finished. By default this file does nothing.

/usr/sbin/ethtool -C eth0 tx-usecs 0
/usr/sbin/ethtool -C eth1 tx-usecs 0
/usr/sbin/ethtool -C eth0 rx-usecs 31
/usr/sbin/ethtool -C eth1 rx-usecs 31

echo 3 > /proc/irq/30/smp_affinity
echo 3 > /proc/irq/32/smp_affinity
echo 3 > /proc/irq/36/smp_affinity
echo 3 > /proc/irq/69/smp_affinity

echo min_power > /sys/devices/platform/soc/29000000.sata/ata1/host0/scsi_host/host0/link_power_management_policy
echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo 800000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo 35 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

for FILE in /sys/class/net/*/queues/[rt]x-0/[rx]ps_cpus; do
   [ -w "$FILE" ] && echo 3 > "$FILE" 2>/dev/null
done

modprobe nss-ifb
ip link set up nssifb
# Shape ingress traffic to 900 Mbit with chained NSSFQ_CODEL
tc qdisc add dev nssifb root handle 1: nsstbl rate 900Mbit burst 1Mb
tc qdisc add dev nssifb parent 1: handle 10: nssfq_codel limit 10240 flows 1024 quantum 1514 target 5ms interval># Shape egress traffic to 900 Mbit with chained NSSFQ_CODEL
tc qdisc add dev eth0 root handle 1: nsstbl rate 900Mbit burst 1Mb
tc qdisc add dev eth0 parent 1: handle 10: nssfq_codel limit 10240 flows 1024 quantum 1514 target 5ms interval 1>

uci set irqbalance.irqbalance.enabled=1
uci set network.globals.packet_steering=1
uci commit

exit 0

I am showing results after a few minutes of normal light operation. Doing the test after booting returns the expected result of 911 Mbits/sec. This is BW from B(wired) -> R7800 -> A(wireless) after booting. And this is the htop in this case:

Question: Why this difference when doing exactly the same test?
I am asking you since this is, by far, the best performing build I've tested in this router. Thanks a lot.

I wonder if your router is reverting to 80Mhz after a while (ex: picking up DFS band radar or other interference). 543 mbps is 80Mhz 2x2 speed.

@ACwifidude There's something wrong with the switch config on your stable builds for the EA8500. Flashed the latest build and I don't get and IP address and setting static on my computer to try and connect to the router doesn't work either.

Something about the switch config changed a while ago, Kong had to fix it on his builds back around Oct/Nov.

Suck. @RadioOperator This is what the ea8500 portion of the patch looks like now, any suggestions?


--- b/arch/arm/boot/dts/qcom-ipq8064-ea8500.dts
+++ a/arch/arm/boot/dts/qcom-ipq8064-ea8500.dts
@@ -107,15 +107,15 @@	
 };

 &gmac1 {
-	qcom,phy_mdio_addr = <4>;
-	qcom,poll_required = <1>;
-	qcom,rgmii_delay = <0>;
+	qcom,phy-mdio-addr = <0>;
+	qcom,poll-required = <0>;
+	qcom,rgmii-delay = <1>;
	qcom,emulation = <0>;
 };

 /* LAN */
 &gmac2 {
-	qcom,phy_mdio_addr = <0>;	/* none */
+	qcom,phy-mdio-addr = <4>;
	qcom,poll_required = <0>;	/* no polling */
	qcom,rgmii_delay = <0>;
	qcom,emulation = <0>;

Entire patch:

1 Like

Hi @ACwifidude , yes, I'm using the config on kernel5.10-nss-qsdk11.0 branch.
It seems your patches did not cover kernel5.10-nss-qsdk11.0 branch.

I still wonder why not many ea8500 users report this issue, maybe there are 2 types hardware in the market. After applied your patches, the other users will lost LAN connections, please note my ea8500 is Hong Kong version.

1 Like

mine is not hong kong but I have the same thing

Only people who have done a factory reset will have switch issues. If you upgrade and don't do a reset of settings the switch works fine.

1 Like

One fact of my hong kong version is, cannot install openwrt from any linksys factory fw, but uart only.

This is new for me, did not know it, unreasonable.