Qualcommax NSS Build

@Crect @qosmio

just a small fyi... 22 stas, zero bitrate data in the iw... wifi down + wifi up, and its there...

root@OPENWRT-UPSTAIRS:~# iw dev phy1-ap1 station dump | grep Station | wc -l;
22
root@OPENWRT-UPSTAIRS:~# iw dev phy1-ap1 station dump | grep bitrate | wc -l;
0
root@OPENWRT-UPSTAIRS:~# wifi down ; sleep 10 ; wifi up ; sleep 30 ;
root@OPENWRT-UPSTAIRS:~# iw dev phy1-ap1 station dump | grep Station | wc -l;
22
root@OPENWRT-UPSTAIRS:~# iw dev phy1-ap1 station dump | grep bitrate | wc -l;
44
root@OPENWRT-UPSTAIRS:~#

Yeah I tried that before and it fixed caps and rates and somehow it also made the inactive timer of that device work for a short period of time and it could stay associated for more than 5 minutes but it quickly returned back to being broken

I only switched over to NSS fairly recently so my experience with what worked before and what not is rather limited but could these issues be related to your rebase from a few days ago @qosmio?

set you max_inactivity to 86400.

btw, for me... the stations dont actually get kicked...

hosapd sends out a POLL request, which the station answers no problem and the counter gets reset to 0.

Yeah will try increasing the time and I also see some poll responses in the log but not from the ESP devices. It might be that their WiFi stack just doesn't support that

this is going to sound stupid, but compile a build with all the qca-nss-* bloat...

i have a feeling that maybe when qca_nss_drv is large then theres some sort of timing issue that doesn't get triggered and thats why im not seeing this now.

here is what modules im loading:

qca_nss_drv           909312 14 ecm,ath11k,mac80211,qca_nss_tunipip6,qca_nss_tun6rd,qca_nss_vxlanmgr,qca_nss_pptp,qca_nss_pppoe,qca_nss_map_t,qca_nss_lag_mgr,qca_nss_l2tpv2,qca_nss_gre,qca_nss_bridge_mgr,qca_nss_vlan
qca_nss_gre            24576  0
qca_nss_l2tpv2         16384  0
qca_nss_lag_mgr        12288  0
qca_nss_map_t          16384  0
qca_nss_pppoe          12288  0
qca_nss_pptp           12288  0
qca_nss_tun6rd         12288  0
qca_nss_tunipip6       20480  0
qca_nss_vlan           24576  1 qca_nss_bridge_mgr
qca_nss_vxlanmgr       24576  0
qca_ssdk             1118208  4 qca_nss_bridge_mgr,qca_nss_vlan,qca_nss_drv,qca_nss_dp

like i said i wanted to get rid of 90% of that since i dont use pptp map lag or most of that..

but as soon as i fire up a build that excludes all of that the issue comes back.

edit : if you do decide to try this, include the qca-nss-drv options as well... i have a ton of those enabled that i never actually use.

im hitting high throughput so it doesnt appear to affect performacne :smiley:

It's just weird that it used to be absolutely fine until Saturday when I uploaded the latest build with no other changes.

The different NSS DRV features are independent of each other and shouldn't be enabled explicitly. You can clean them out, and let the individual client packages qca-nss-drv-* enable what it needs.

Qualcomm's nss-hosts which is what the Makefiles for eacho package in my repo are based on don't provide any knobs to disable/enable. They just enable everything by default which in turn enables a bunch of unnecessary kernel features... All of which were a dependency hell and increased the kernel size for no reason.

You shouldn't even need to explicitly enable anything anymore. The base config includes selecting the appropriate NSS modules. The only extras I choose are:

# Additional NSS packages (VLAN, Multicast Snooping)
CONFIG_PACKAGE_kmod-qca-nss-drv-vlan-mgr=y
CONFIG_PACKAGE_kmod-qca-mcs=y

# NSS SQM Traffic Shaping
CONFIG_PACKAGE_sqm-scripts-nss=y

Which will select what it needs after a make oldconfig

I have the following loaded with NSS FW 11.4 and not seeing any of the TX/RX info or station inactive issues with MX5300, MX4300, or DL-WRX36.

qca_nss_bridge_mgr     32768  0
qca_nss_dp             61440  1 qca_nss_drv
qca_nss_drv          1191936  7 ecm,ath11k,mac80211,qca_nss_wifi_meshmgr,qca_nss_bridge_mgr,qca_nss_vlan
qca_nss_vlan           32768  1 qca_nss_bridge_mgr
qca_nss_wifi_meshmgr   36864  1 ath11k
qca_ssdk             1118208  4 qca_nss_bridge_mgr,qca_nss_vlan,qca_nss_drv,qca_nss_dp

NSS DRV final generated output

CONFIG_NSS_DRV_BRIDGE_ENABLE=y
# CONFIG_NSS_DRV_CAPWAP_ENABLE is not set
# CONFIG_NSS_DRV_C2C_ENABLE is not set
# CONFIG_NSS_DRV_CLMAP_ENABLE is not set
# CONFIG_NSS_DRV_CRYPTO_ENABLE is not set
# CONFIG_NSS_DRV_DTLS_ENABLE is not set
# CONFIG_NSS_DRV_GRE_ENABLE is not set
CONFIG_NSS_DRV_IGS_ENABLE=y
# CONFIG_NSS_DRV_IPSEC_ENABLE is not set
# CONFIG_NSS_DRV_IPV4_REASM_ENABLE is not set
CONFIG_NSS_DRV_IPV6_ENABLE=y
# CONFIG_NSS_DRV_IPV6_REASM_ENABLE is not set
# CONFIG_NSS_DRV_L2TP_ENABLE is not set
# CONFIG_NSS_DRV_LAG_ENABLE is not set
# CONFIG_NSS_DRV_MAPT_ENABLE is not set
# CONFIG_NSS_DRV_MATCH_ENABLE is not set
# CONFIG_NSS_DRV_MIRROR_ENABLE is not set
# CONFIG_NSS_DRV_OAM_ENABLE is not set
# CONFIG_NSS_DRV_PORTID_ENABLE is not set
# CONFIG_NSS_DRV_LSO_RX_ENABLE is not set
# CONFIG_NSS_DRV_PPPOE_ENABLE is not set
# CONFIG_NSS_DRV_PPTP_ENABLE is not set
# CONFIG_NSS_DRV_PVXLAN_ENABLE is not set
# CONFIG_NSS_DRV_QRFS_ENABLE is not set
# CONFIG_NSS_DRV_QVPN_ENABLE is not set
# CONFIG_NSS_DRV_RMNET_ENABLE is not set
CONFIG_NSS_DRV_SHAPER_ENABLE=y
# CONFIG_NSS_DRV_SJACK_ENABLE is not set
# CONFIG_NSS_DRV_TLS_ENABLE is not set
# CONFIG_NSS_DRV_TRUSTSEC_ENABLE is not set
# CONFIG_NSS_DRV_UDP_ST_ENABLE is not set
# CONFIG_NSS_DRV_TSTAMP_ENABLE is not set
# CONFIG_NSS_DRV_TUN6RD_ENABLE is not set
# CONFIG_NSS_DRV_TUNIPIP6_ENABLE is not set
CONFIG_NSS_DRV_VIRT_IF_ENABLE=y
CONFIG_NSS_DRV_VLAN_ENABLE=y
# CONFIG_NSS_DRV_VXLAN_ENABLE is not set
CONFIG_NSS_DRV_WIFIOFFLOAD_ENABLE=y
CONFIG_NSS_DRV_WIFI_EXT_VDEV_ENABLE=y
CONFIG_NSS_DRV_WIFI_MESH_ENABLE=y
# CONFIG_NSS_DRV_WIFI_LEGACY_ENABLE is not set

My suggestion, is to start clean, use the ./scripts/env script to version control your current .config and files and run.

perl -i -ne '
  if (/^(CONFIG_NSS_DRV|^CONFIG_PACKAGE_kmod.*=)/ && !/^CONFIG_PACKAGE_kmod-(usb|wireguard|ramoops|pstore|fs).*=(y|m)/) {
    next;
  }
  print;
' .config

I excluded removing some kernel modules that aren't selected by default which you may have enabled. Should provide for fewer questions asked during make oldconfig. Also do a factory install vs. sysupgrade.

3 Likes

FWIW, after switching out Tasmota with ESPHome, my issues went away. My devices no longer "disappeared" from my network either, which required hard resetting. I know it doesn't exactly address the issue at hand.

I would also test with just disabling NSS offload to rule out other parts of the mac80211 patches. Since not all of them are specific to offloading.

1 Like

i dont have a rax120. but you can extract it from the factory firmware. and its really not wise to ask me in a open forum how to increase the permitted power levels to a higher limit than permitted. find out the calibrated power settings within the calibration file (its a big array of values) and modify these values. the unit is in 0.25 db. so increasing 4 db for instance means you need to add 16 to each value

i can provide you the original header file. but it will not make you happy. trust me

but i can tell you i have the same sporadic problem. only a reboot helps and then it works without issues. its also not just the rate. the rssi is identical with noise so snr = 0. thats the effect i see. no matter which firmware i'm using.

Yep, It's the same for me.
image

I wrote about those (obviously not cosmetic only) issues some time ago. Reported Noise level only on 5 GHz is -72 dBm. There have been lots of commits since then and some things have been improved/fixed some haven't.
A simple workaround (that doesn't always work though) is to to run Channel Analysis from LUCI and then I can see the actual Noise level on 5 GHz (around -110 dBm in my air) at least for some time before it gets back to -72 dBm readings.

scanning, survey etc. all works for me. but signal is very weak and on receive site i see the same shit. i always thought it had something todo with failed calibrations etc. but since it does not happen on non nss builds it seem to have somethign todo with nss. consider that the nss firmware has full system wide access even to the wireless firmware. currently i'm testing to load the wifi driver at later time after nss and lan networking has been initialized. (this is the way qca seem to handle the module loading in the original firmwares).

3 Likes

@BrainSlayer @qosmio

so using your idea i was able to make the stats come up on every boot... dd-wrt probably does this a little different but essentially i've made loading of ath11k-ahb delayed, from within rc.local...

so essentially once i hit rc.local, ath11k IS loaded, but ath11k_ahb is not... so the radios are not yet up but lan and all the bridging as well as nss is up.

then at that point i modprobe ath11k_ahb and wifi up... and the wifi stack comes up. and the rx / tx bitrate are always there.

mickey mouse? hell yes.

does it work? hell yes :smiley:

this method keeps the openwrt wifi subsystem fairly untouched... later wifi down / wifi reload will all work as normal.


exactly what i did:

  • rename /etc/modules.d/ath11k to 99-ath11k
  • rm (or comment the module out inside) /etc/modules.d/ath11k_ahb
  • add the rc.local bits, modprobe ath11k_ahb && sleep 5 && wifi up
  • ath11k_ahb takes a second to load so the sleep needs to be there
1 Like

This reminded me to add another recent thing that I've found/experienced.
Read below after the prelude.
May be important and looks to me somehow related to what you and @anom3
just wrote.
CC to @qosmio
Usually after almost every NSS firmware update my LAN cable connected clients simply cannot get IPs from DHCP. WLAN clients get IPs normally but no Internet. In this case I had only one solution to reboot the router (with the power button) and it usually starts completely OK after this. All clients get IPs and connect to WAN.
I wrote about this issue in the past too. All of this doesn't happen on non NSS builds with the clarification that I rarely used non NSS builds only to compare.

Next are my newer findings.
Recently I've changed what I do after every NSS firmware update. I connect with my smartphone to the router. In LUCI I see that it's not only the LAN cable clients that don't get IPs but the WAN (Internet) connection isn't established too. In this case the only thing I could do was to reboot the router from my Smartphone via LUCI reboot (or via power button as before) and after it booted for the second time, again all was OK - WAN and LAN worked OK.

Now comes the latest. Probably important.
I've decided to add some additional functionality to my router and added wifitoggle package (it includes an advanced script) to my build in order to be able to use the WPS button to turn wifi on and off.
It was the only change (addition) to my builds for a long time.
After I configured the wifitoggle settings now after every reflash (I've done more than 20) all networks WAN, LAN and clients start immediately after the first reboot without any issue.
I can only guess that all of this is related to the start order of the services/modules and maybe this order is changed somehow by the wifitoggle script.

i have seen something like this on ipq5018 (which has a qca8337 switch buildin) if i use qca8k dsa tag. if i use normal handling i have never seen it

1 Like

yes dd-wrt works very different since there are no shell scripts involved. its all written in c, but good that you tested it already. i'm still compiling. long term bug i'm hunting for 2 months. and yes i also made exactly the same sleep of 5 seconds already since i was running into the same problem of async init of some driver things

2 Likes

8 reboots and the rx / tx bitrate stats come up every time. im going call this fixed.

ill roll up a build now with all the bloat removed, usually, in my case this would cause the "inactive time" issue to come up.

ill see if delayed loading ath11k_ahb until after lan+nss fixes this as well.

1 Like

btw you can decrease the sleep by doing a loop and checking for phy0/phy1 becomes available in sysfs. so make a loop with a sleep of 1 with a maximum loop length of 5 and exit the loop when phy0/phy1 got present

1 Like

this is infuriating :rage:

all i did was load up the vm with the build that DOES NOT exhibit the inactive time issue, removed all the qca-nss-* bloat i dont use, disabled the qca driver options i do not use, recompiled and the inactive time issue is back:

root@OPENWRT-UPSTAIRS:~# iw dev phy1-ap1 station dump | grep inact
        inactive time:  84150 ms
        inactive time:  84450 ms
        inactive time:  78710 ms
        inactive time:  80550 ms
        inactive time:  79750 ms
        inactive time:  78680 ms
        inactive time:  78520 ms
        inactive time:  24190 ms
        inactive time:  68450 ms
        inactive time:  77610 ms
        inactive time:  76750 ms
        inactive time:  76200 ms
        inactive time:  75540 ms
        inactive time:  9990 ms
        inactive time:  7590 ms
        inactive time:  3990 ms
        inactive time:  390 ms
        inactive time:  20190 ms
        inactive time:  21090 ms
        inactive time:  990 ms
        inactive time:  11190 ms

now i will load up my saved vm state with the exact same build just the bloat included and i will flash that image and it will be fixed.

edit: of course, flashing the exact same build just with the bloat, inactive time issue all good :rage:

root@OPENWRT-UPSTAIRS:~# iw dev phy1-ap1 station dump | grep inact
        inactive time:  3610 ms
        inactive time:  1510 ms
        inactive time:  610 ms
        inactive time:  1510 ms
        inactive time:  10 ms
        inactive time:  2410 ms
        inactive time:  610 ms
        inactive time:  11210 ms
        inactive time:  5610 ms
        inactive time:  210 ms
        inactive time:  1110 ms
        inactive time:  1410 ms
        inactive time:  210 ms
        inactive time:  510 ms
        inactive time:  1710 ms
        inactive time:  2310 ms
        inactive time:  2810 ms
        inactive time:  110 ms
        inactive time:  110 ms

here is my wifiinit.sh script for those who are interested...

note, its got the bits that make the module loading changes as well... be warned, there a reboot in there... very unlikely but you may get stuck in a reboot loop if the mv or rm commands fail and keep failing... for me, no problems. worst case you should be able to failsafe and remove it.

also i am checking /sys/class/ieee80211/phy* to see if the interfaces are up... i am not 100% sure if this is the correct place to do so ( @BrainSlayer ?) but it seamed like a good place :smiley:

last but not least, you need bash... it will probably work with whatever slimmed down sh owrt ships with by default but i always include the real bash.

save as a script somewhere and load it at the top of your rc.local.

#!/bin/bash
reboot=false
if [ -e /etc/modules.d/ath11k ]; then
        mv /etc/modules.d/ath11k /etc/modules.d/99-ath11k
        reboot=true
fi
if [ -e /etc/modules.d/ath11k-ahb ]; then
        rm /etc/modules.d/ath11k-ahb
        reboot=true
fi
if $reboot; then
        reboot
        exit
else
        modprobe ath11k_ahb
        for i in {1..5}; do
                if [ -L /sys/class/ieee80211/phy0 ] && [ -L /sys/class/ieee80211/phy1 ]; then
                        break
                fi
                sleep 1
        done
        wifi up
fi
1 Like