Ipq806x NSS build (Netgear R7800 / TP-Link C2600 / Linksys EA8500)

Posted a new master build with the new mac80211 commits. Wifi performance and latency looks good!

7 Likes

Thanks a lot ACwifidude. You rock!

I did some throughput/latency plots comparing the previous AQL/VTBA with the new AQL/RR (Felix' latest commits) here:

1 Like

Hello, I am trying to setup NSS fq_codel in /etc/rc.local

## Setup NSSFQ_CODEL
/sbin/modprobe nss-ifb
/sbin/ip link set up nssifb

## Shape ingress traffic to 900 Mbit with chained NSSFQ_CODEL
/sbin/tc qdisc add dev nssifb root handle 1: nsstbl rate 900Mbit burst 1Mb
/sbin/tc qdisc add dev nssifb parent 1: handle 10: nssfq_codel limit 10240 flows 1024 quantum 1514 target 5ms interval 100ms set_default

## Shape egress traffic to 900 Mbit with chained NSSFQ_CODEL
/sbin/tc qdisc add dev eth0 root handle 1: nsstbl rate 900Mbit burst 1Mb
/sbin/tc qdisc add dev eth0 parent 1: handle 10: nssfq_codel limit 10240 flows 1024 quantum 1514 target 5ms interval 100ms set_default

However I am getting the following system log messages:

Mon Jun 27 00:21:46 2022 daemon.notice procd: /etc/rc.d/S95done: Unknown qdisc "nsstbl", hence option "rate" is unparsable
Mon Jun 27 00:21:46 2022 daemon.notice procd: /etc/rc.d/S95done: Unknown qdisc "nssfq_codel", hence option "limit" is unparsable
Mon Jun 27 00:21:46 2022 daemon.notice procd: /etc/rc.d/S95done: Unknown qdisc "nsstbl", hence option "rate" is unparsable
Mon Jun 27 00:21:46 2022 daemon.notice procd: /etc/rc.d/S95done: Unknown qdisc "nssfq_codel", hence option "limit" is unparsable

OpenWrt Version (Master NSS with ath10k non-ct):

OpenWrt SNAPSHOT r19916+18-326e109f24 / LuCI Master git-22.167.28356-8effea5
ISP: AT&T Fiber, Symmetrical Gigabit

Are the "tc" commands different from what I have in /etc/rc.local?

How are Felix's confirmations going, does it improve the Wi-Fi?

Honestly I cannot exactly say that the Wi-Fi is improved compared to the things that were present (with VTBS) around 15 days ago (before the patches that spoiled the WLAN). But at least now (with latest commits) the WLAN is stable and with really good performance. At least that is in my case.
@ACwifidude
After the latest master firmware update I see this in status->firewall

I got an unexpected crash after 3 days running the latest master. The crash dump was documented here:

1 Like

Hi,

I have been using the 5.10 kernel build OpenWrt 22.03 (Stable) + NSS Hardware Offloading Dowload and i seem to be having issues with getting full speeds, my connection uses PPPOE BT in uk 980mps down and 120mbs up. I would lke to add on the OpenWrt 21.02 (Stable) + NSS Hardware Offloading Download there is no issues, ive setup FQ Codel for Nss as per the instructions , with performace governor, irqbalance, but i dont use packet steering, the loss of speed only seems to effect the 5.10 kernel builds, is PPPOE offloading broke in these builds? also is anyone else having same issues? ive reverted back to the 21.02 branch for now and everything is working as expected. @ACwifidude - will there be any new builds for the 21.02 branch, with the new wifi patches? out of interest, and is there any other reports of the PPPOE offloading issues? (assuming this is the issue)

Thanks

i've just installed an updated 21.02 and there's something wrong.
now and then (i think when some device exits the wifi coverage) simply all connections lock up, i mean both wired and wireless.
after some minutes, everything starts up again
it doesnt't seem to be @quarky 's issue, since this is also for wired connections, possibly it's the router itself to be busy doing something else, i'll try to check cpu occupation next time it happens..

Does master work for you? Additionally which type of device?

sorry, R7800, will try latest Master and see if thats ok.

Hi Quarky,

Since switching to the schedutil governor, my r7800 crashed twice. The first crash did not save any ramoop, but the second crash gave me the following ramoops dump that showed something related to NSS. Could you please take a look. I used the latest master snapshot build from ACwifidude.

I have switched back to the ondemand governor for now, since I had never seen this NSS-related crash prior to switching to the schedutil governor.

<1>[48528.076300] NSS core 0 signal COREDUMP COMPLETE 4000
<1>[48528.076338] 
<1>[48528.076338] fd47b999: Starting NSS-FW logbuffer dump for core 0
<1>[48528.080421] fd47b999: Warn: trap[813]: Trap on CHIP ID 00050000
<1>[48528.087796] fd47b999: Warn: trap[620]: Trapped: TRAP_TD(00000004) DCAPT(3C000080)
<1>[48528.093361] fd47b999: Warn: trap[645]: Trapped: Thread: 2, reason: 00000020, PC: 4002F30C, previous PC: 4002F308
<1>[48528.101073] fd47b999: Warn: trap[594]: A0_3: 4AC96ED0 402301C0 3F020D88 4AC96ED2
<3>[48528.104389] wlan0: NSS TX failed with error: NSS_TX_FAILURE_NOT_READY
<1>[48528.111316] fd47b999: Warn: trap[594]: A4_7: 4AC96ED2 40052304 3F020D88 3F00AEF0
<1>[48528.111326] fd47b999: Warn: trap[599]: D0_3: 00000026 00000009 00000006 4AC96EC0
<1>[48528.111334] fd47b999: Warn: trap[599]: D4_7: 00060000 00000026 4368E0CC 4368E0B4
<1>[48528.111342] fd47b999: Warn: trap[599]: D8_11: 4368E0B8 4368E0BC 4C08867C 00000000
<1>[48528.111356] fd47b999: Warn: trap[599]: D12_15: 00000000 00000000 00D84001 00003C00
<1>[48528.154617] fd47b999: Warn: trap[649]: Thread_2 has non-recoverable trap
<1>[48528.165281] NSS core 1 signal COREDUMP COMPLETE 4000
<1>[48528.169143] 
<1>[48528.169143] 7f68f8b9: Starting NSS-FW logbuffer dump for core 1
<0>[48528.173840] Kernel panic - not syncing: NSS FW coredump: bringing system down
<2>[48528.181215] CPU1: stopping
<4>[48528.188233] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.10.120 #0
<4>[48528.190833] Hardware name: Generic DT based system
<4>[48528.197017] [<c030e46c>] (unwind_backtrace) from [<c030a204>] (show_stack+0x14/0x20)
<4>[48528.201701] [<c030a204>] (show_stack) from [<c0632ea8>] (dump_stack+0x94/0xa8)
<4>[48528.209597] [<c0632ea8>] (dump_stack) from [<c030d190>] (do_handle_IPI+0x140/0x184)
<4>[48528.216627] [<c030d190>] (do_handle_IPI) from [<c030d1f0>] (ipi_handler+0x1c/0x2c)
<4>[48528.224178] [<c030d1f0>] (ipi_handler) from [<c037184c>] (__handle_domain_irq+0x90/0xf4)
<4>[48528.231821] [<c037184c>] (__handle_domain_irq) from [<c064c154>] (gic_handle_irq+0x90/0xb8)
<4>[48528.240068] [<c064c154>] (gic_handle_irq) from [<c0300b8c>] (__irq_svc+0x6c/0x90)
<4>[48528.248130] Exception stack(0xc146df18 to 0xc146df60)
<4>[48528.255768] df00:                                                       00000000 00002c22
<4>[48528.260822] df20: 1cd58000 dd99fd80 00000000 d8cba8a0 c1c69040 00000000 dd99f030 00002c22
<4>[48528.268980] df40: 00000000 00002c22 0e22a980 c146df68 c07bd41c c07bd43c 60000013 ffffffff
<4>[48528.277137] [<c0300b8c>] (__irq_svc) from [<c07bd43c>] (cpuidle_enter_state+0x180/0x380)
<4>[48528.285292] [<c07bd43c>] (cpuidle_enter_state) from [<c07bd68c>] (cpuidle_enter+0x3c/0x5c)
<4>[48528.293450] [<c07bd68c>] (cpuidle_enter) from [<c034e678>] (do_idle+0x208/0x2a4)
<4>[48528.301522] [<c034e678>] (do_idle) from [<c034e9d0>] (cpu_startup_entry+0x1c/0x20)
<4>[48528.309072] [<c034e9d0>] (cpu_startup_entry) from [<423015ac>] (0x423015ac)

Hi @vochong,

What you have encountered is the NSS core dumps (and also likely the NSS firmware lockup ... see below), which the NSS driver will in-turn trigger a kernel panic. Unfortunately we cannot do anything about this. It's odd that you only see this with the schedutil governor. I have two ipq8065 routers (Netgear R7800 and Askey RT4230W Rev 6) running my custom 21.02 NSS builds, and using the schedutil governor. So far, I'm seeing good uptime (> 30 days) for my routers.

I do encounter the occasional reboot due to (I think) the NSS firmware locking up, which likely resulted in the watch-dog mechanism rebooting the router. Such reboots will not have any ramoops logs. It is rare for me tho. if the NSS core clocks is forced to it's max frequency (800MHz for ipq8065.)

Unfortunately we do not have the source to the NSS firmware to fix such issues, so we just have to live with it.

2 Likes

@ACwifidude : upgraded to your Latest Master Build, speed seems good now. Thanks

1 Like

@quarky

Thanks a lot for the clear explanations. I also had encountered random crashes without ramoops dumps when previously using the ondemand governor. Your watchdog explanation for it totally makes sense.

Regarding the frequency of failures, perhaps QSDK 11.2 used in your build may work a bit better than the QSDK 10.0 used in ACwifidude's builds.

Talking here about random crashes, I've found out that every time I plug in a network cable to the PC network port the R7800 reboots. I don't do that often and it may seem like a random crash.
It's running the latest master build. Can anyone else try to unplug and replug a LAN cable to see if the router would reboot.

I would think this is unlikely. I was previously using QSDK 6 NSS firmwares runnng on lede 17.01 branch. I remember lengthy router up times as well then, as long as I set the NSS code core to 800MHz. I do occasionally see a random reboot, but it's not common.

echo ondemand > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpufreq/policy1/scaling_governor
echo 1725000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1725000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
echo 800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
echo 800000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_min_freq

So is this still the best option? or better to put it in high performance?

These are the defaults for the r7800 so you can skip these lines. My build sets the min to 600000 to avoid the 384000 upscaling issue. You can set the min to 800000 if you desire and adjust the up threshold for when it upscales (20-35 is popular with 800000 as the min).

This is the build default:


echo 600000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo 600000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo 25 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

thanks friend. I understand that I don't need to put those values ​​either since they are default, right? Thank you.

1 Like

What sort of throughput to you get on this build? I’m still only getting about 450 mbps down.