Ipq806x NSS build (Netgear R7800 / TP-Link C2600 / Linksys EA8500)

Felix has communicated with Toke (creator of the VTBA scheduler), decided to drop the VTBA scheduler and reverted back to the round-robin scheduler plus some additional improvements:

The virtual time scheduler code has a number of issues:
  10 - queues slowed down by hardware/firmware powersave handling were not properly
  11   handled.
  12 - on ath10k in push-pull mode, tx queues that the driver tries to pull from
  13   were starved, causing excessive latency
  14 - delay between tx enqueue and reported airtime use were causing excessively
  15   bursty tx behavior
  16 
  17 The bursty behavior may also be present on the round-robin scheduler, but there
  18 it is much easier to fix without introducing additional regressions
  19 
  20 Signed-off-by: Felix Fietkau <nbd@nbd.name>

Quarky's finding seems to have been corroborated. Hope OpenWrt Wifi will become reliable again!

1 Like

Can confirm, my WLAN is completely broken after a few hours of uptime with the current kernel 5.10 build.
Is there a way to download one of the older builds so I can at least revert back until its fixed? I don't have a backup of the image I used before unfortunately :frowning:

You can use the 21.02.x image.

1 Like

I have several backups. If you want I can share the 20220519 master build which works OK for me.
It spits periodically
kern.warn kernel: [276542.060762] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 83 tid 0
but the WLAN at least continue working.
Felix published updated info

1 Like

yea ofc, that would be very nice of you. Thanks in advance!

Hi ACwifidude,

I rebased your master to the latest commits and built it but it failed. I tried -j4 V=sc but it failed earlier on so I used -j1 V=sc but it kept failing when building qca-nss-drv. The errors are as follows:

                 from /home/python/Venv/OpenWrt/openwrt/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-809a00de/nss_tx_rx_common.h:25,
                 from /home/python/Venv/OpenWrt/openwrt/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-809a00de/nss_bridge.c:17:
/home/python/Venv/OpenWrt/openwrt/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-809a00de/nss_core.h: In function 'nss_core_dma_cache_maint':
/home/python/Venv/OpenWrt/openwrt/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-809a00de/nss_core.h:120:17: error: implicit declaration of function 'dmac_inv_range'; did you mean 'outer_inv_range'? [-Werror=implicit-function-declaration]
  120 |                 dmac_inv_range(start, start + size);
      |                 ^~~~~~~~~~~~~~
      |                 outer_inv_range
/home/python/Venv/OpenWrt/openwrt/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-809a00de/nss_core.h:123:17: error: implicit declaration of function 'dmac_clean_range'; did you mean 'dmac_flush_range'? [-Werror=implicit-function-declaration]
  123 |                 dmac_clean_range(start, start + size);
      |                 ^~~~~~~~~~~~~~~~
      |                 dmac_flush_range
cc1: some warnings being treated as errors
make[5]: *** [scripts/Makefile.build:280: /home/python/Venv/OpenWrt/openwrt/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-809a00de/nss_bridge.o] Error 1
make[4]: *** [Makefile:1822: /home/python/Venv/OpenWrt/openwrt/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-809a00de] Error 2
make[4]: Leaving directory '/home/python/Venv/OpenWrt/openwrt/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/linux-5.10.120'
make[3]: *** [Makefile:127: /home/python/Venv/OpenWrt/openwrt/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-809a00de/.built] Error 2
make[3]: Leaving directory '/home/python/Venv/OpenWrt/openwrt/package/qca/qca-nss-drv'
time: package/qca/qca-nss-drv/compile#7.54#1.44#8.32
    ERROR: package/qca/qca-nss-drv failed to build.
make[2]: *** [package/Makefile:116: package/qca/qca-nss-drv/compile] Error 1
make[2]: Leaving directory '/home/python/Venv/OpenWrt/openwrt'
make[1]: *** [package/Makefile:110: /home/python/Venv/OpenWrt/openwrt/staging_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/stamp/.package_compile] Error 2
make[1]: Leaving directory '/home/python/Venv/OpenWrt/openwrt'
make: *** [/home/python/Venv/OpenWrt/openwrt/include/toplevel.mk:230: world] Error 2

Usually it is a patch that conflicts with an update to master.

Try


make -j4 V=s 2>&1 | tee build.log

There with be a hunk that fails to apply somewhere.

Hi Acwifidude,

Felix has just committed his fix for mac80211 several hours ago, please kick off a new master build. I trust a build on your machine than mine, plus you said you always test it first for us all :slight_smile:

Quarky must be chuckling at his Eureka moment :slight_smile:

mac80211: add airtime fairness improvements

This reverts the airtime scheduler back from the virtual-time based scheduler
to the deficit round robin scheduler implementation.
This reduces burstiness and improves fairness by improving interaction with AQL.

https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=6d49a25804d78d639e08a67c86b26991ce6485d8

Thanks a lot!

2 Likes

@ACwifidude
There is one more new commit (besides the one above) just pushed by Felix to master.
Erase and rewind time again.

1 Like

Posted a new master build with the new mac80211 commits. Wifi performance and latency looks good!

7 Likes

Thanks a lot ACwifidude. You rock!

I did some throughput/latency plots comparing the previous AQL/VTBA with the new AQL/RR (Felix' latest commits) here:

1 Like

Hello, I am trying to setup NSS fq_codel in /etc/rc.local

## Setup NSSFQ_CODEL
/sbin/modprobe nss-ifb
/sbin/ip link set up nssifb

## Shape ingress traffic to 900 Mbit with chained NSSFQ_CODEL
/sbin/tc qdisc add dev nssifb root handle 1: nsstbl rate 900Mbit burst 1Mb
/sbin/tc qdisc add dev nssifb parent 1: handle 10: nssfq_codel limit 10240 flows 1024 quantum 1514 target 5ms interval 100ms set_default

## Shape egress traffic to 900 Mbit with chained NSSFQ_CODEL
/sbin/tc qdisc add dev eth0 root handle 1: nsstbl rate 900Mbit burst 1Mb
/sbin/tc qdisc add dev eth0 parent 1: handle 10: nssfq_codel limit 10240 flows 1024 quantum 1514 target 5ms interval 100ms set_default

However I am getting the following system log messages:

Mon Jun 27 00:21:46 2022 daemon.notice procd: /etc/rc.d/S95done: Unknown qdisc "nsstbl", hence option "rate" is unparsable
Mon Jun 27 00:21:46 2022 daemon.notice procd: /etc/rc.d/S95done: Unknown qdisc "nssfq_codel", hence option "limit" is unparsable
Mon Jun 27 00:21:46 2022 daemon.notice procd: /etc/rc.d/S95done: Unknown qdisc "nsstbl", hence option "rate" is unparsable
Mon Jun 27 00:21:46 2022 daemon.notice procd: /etc/rc.d/S95done: Unknown qdisc "nssfq_codel", hence option "limit" is unparsable

OpenWrt Version (Master NSS with ath10k non-ct):

OpenWrt SNAPSHOT r19916+18-326e109f24 / LuCI Master git-22.167.28356-8effea5

How are Felix's confirmations going, does it improve the Wi-Fi?

Honestly I cannot exactly say that the Wi-Fi is improved compared to the things that were present (with VTBS) around 15 days ago (before the patches that spoiled the WLAN). But at least now (with latest commits) the WLAN is stable and with really good performance. At least that is in my case.
@ACwifidude
After the latest master firmware update I see this in status->firewall

I got an unexpected crash after 3 days running the latest master. The crash dump was documented here:

1 Like

Hi,

I have been using the 5.10 kernel build OpenWrt 22.03 (Stable) + NSS Hardware Offloading Dowload and i seem to be having issues with getting full speeds, my connection uses PPPOE BT in uk 980mps down and 120mbs up. I would lke to add on the OpenWrt 21.02 (Stable) + NSS Hardware Offloading Download there is no issues, ive setup FQ Codel for Nss as per the instructions , with performace governor, irqbalance, but i dont use packet steering, the loss of speed only seems to effect the 5.10 kernel builds, is PPPOE offloading broke in these builds? also is anyone else having same issues? ive reverted back to the 21.02 branch for now and everything is working as expected. @ACwifidude - will there be any new builds for the 21.02 branch, with the new wifi patches? out of interest, and is there any other reports of the PPPOE offloading issues? (assuming this is the issue)

Thanks

i've just installed an updated 21.02 and there's something wrong.
now and then (i think when some device exits the wifi coverage) simply all connections lock up, i mean both wired and wireless.
after some minutes, everything starts up again
it doesnt't seem to be @quarky 's issue, since this is also for wired connections, possibly it's the router itself to be busy doing something else, i'll try to check cpu occupation next time it happens..

Does master work for you? Additionally which type of device?

sorry, R7800, will try latest Master and see if thats ok.

Hi Quarky,

Since switching to the schedutil governor, my r7800 crashed twice. The first crash did not save any ramoop, but the second crash gave me the following ramoops dump that showed something related to NSS. Could you please take a look. I used the latest master snapshot build from ACwifidude.

I have switched back to the ondemand governor for now, since I had never seen this NSS-related crash prior to switching to the schedutil governor.

<1>[48528.076300] NSS core 0 signal COREDUMP COMPLETE 4000
<1>[48528.076338] 
<1>[48528.076338] fd47b999: Starting NSS-FW logbuffer dump for core 0
<1>[48528.080421] fd47b999: Warn: trap[813]: Trap on CHIP ID 00050000
<1>[48528.087796] fd47b999: Warn: trap[620]: Trapped: TRAP_TD(00000004) DCAPT(3C000080)
<1>[48528.093361] fd47b999: Warn: trap[645]: Trapped: Thread: 2, reason: 00000020, PC: 4002F30C, previous PC: 4002F308
<1>[48528.101073] fd47b999: Warn: trap[594]: A0_3: 4AC96ED0 402301C0 3F020D88 4AC96ED2
<3>[48528.104389] wlan0: NSS TX failed with error: NSS_TX_FAILURE_NOT_READY
<1>[48528.111316] fd47b999: Warn: trap[594]: A4_7: 4AC96ED2 40052304 3F020D88 3F00AEF0
<1>[48528.111326] fd47b999: Warn: trap[599]: D0_3: 00000026 00000009 00000006 4AC96EC0
<1>[48528.111334] fd47b999: Warn: trap[599]: D4_7: 00060000 00000026 4368E0CC 4368E0B4
<1>[48528.111342] fd47b999: Warn: trap[599]: D8_11: 4368E0B8 4368E0BC 4C08867C 00000000
<1>[48528.111356] fd47b999: Warn: trap[599]: D12_15: 00000000 00000000 00D84001 00003C00
<1>[48528.154617] fd47b999: Warn: trap[649]: Thread_2 has non-recoverable trap
<1>[48528.165281] NSS core 1 signal COREDUMP COMPLETE 4000
<1>[48528.169143] 
<1>[48528.169143] 7f68f8b9: Starting NSS-FW logbuffer dump for core 1
<0>[48528.173840] Kernel panic - not syncing: NSS FW coredump: bringing system down
<2>[48528.181215] CPU1: stopping
<4>[48528.188233] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.10.120 #0
<4>[48528.190833] Hardware name: Generic DT based system
<4>[48528.197017] [<c030e46c>] (unwind_backtrace) from [<c030a204>] (show_stack+0x14/0x20)
<4>[48528.201701] [<c030a204>] (show_stack) from [<c0632ea8>] (dump_stack+0x94/0xa8)
<4>[48528.209597] [<c0632ea8>] (dump_stack) from [<c030d190>] (do_handle_IPI+0x140/0x184)
<4>[48528.216627] [<c030d190>] (do_handle_IPI) from [<c030d1f0>] (ipi_handler+0x1c/0x2c)
<4>[48528.224178] [<c030d1f0>] (ipi_handler) from [<c037184c>] (__handle_domain_irq+0x90/0xf4)
<4>[48528.231821] [<c037184c>] (__handle_domain_irq) from [<c064c154>] (gic_handle_irq+0x90/0xb8)
<4>[48528.240068] [<c064c154>] (gic_handle_irq) from [<c0300b8c>] (__irq_svc+0x6c/0x90)
<4>[48528.248130] Exception stack(0xc146df18 to 0xc146df60)
<4>[48528.255768] df00:                                                       00000000 00002c22
<4>[48528.260822] df20: 1cd58000 dd99fd80 00000000 d8cba8a0 c1c69040 00000000 dd99f030 00002c22
<4>[48528.268980] df40: 00000000 00002c22 0e22a980 c146df68 c07bd41c c07bd43c 60000013 ffffffff
<4>[48528.277137] [<c0300b8c>] (__irq_svc) from [<c07bd43c>] (cpuidle_enter_state+0x180/0x380)
<4>[48528.285292] [<c07bd43c>] (cpuidle_enter_state) from [<c07bd68c>] (cpuidle_enter+0x3c/0x5c)
<4>[48528.293450] [<c07bd68c>] (cpuidle_enter) from [<c034e678>] (do_idle+0x208/0x2a4)
<4>[48528.301522] [<c034e678>] (do_idle) from [<c034e9d0>] (cpu_startup_entry+0x1c/0x20)
<4>[48528.309072] [<c034e9d0>] (cpu_startup_entry) from [<423015ac>] (0x423015ac)