Qualcommax NSS Build

looks okay. consider i hate shell scripts. i'm a c and assembler guy

regarding your changes. i mean you removed bloat so timing was reduced. so ath11k is loaded earlier then expected again. my approach is different. i initialze lan and everything first. and i load ath11k when i initialize wifi and not at modules init which is just a static sequence at startup

the reason i left ath11k alone is because from what i remember i think when i removed ath11k loading some time ago when experimenting the boot process hung.

will try now.

i was reviewing the nss driver code now a little bit. the race condition we have is simple. the nss driver loads the firmware and as last step in driver probe it resets the nss core to boot it. but its not waiting until booting has finished or something like that. it simply exits driver probe and you are fucked if you do something too early

2 Likes

that sounds like the real way to fix it.

im able to delay loading of ath11k as well btw... so now i will try again with both ath11k* modules delayed.

1 Like

again for those who are interested, here is an updated version of my wifiinit.sh script, this one delays loading of ath11k as well... not just the ahb.

looking at dmesg this will also delay loading of everything else wifi related, mac80211 and cfg80211 as well.

#!/bin/bash
reboot=false
if [ -e /etc/modules.d/ath11k ]; then
        rm /etc/modules.d/ath11k
        reboot=true
fi
if [ -e /etc/modules.d/ath11k-ahb ]; then
        rm /etc/modules.d/ath11k-ahb
        reboot=true
fi
if $reboot; then
        reboot
        exit
else
        modprobe ath11k nss_offload=1 frame_mode=2
        modprobe ath11k_ahb
        for i in {1..5}; do
                if [ -L /sys/class/ieee80211/phy0 ] && [ -L /sys/class/ieee80211/phy1 ]; then
                        break
                fi
                sleep 1
        done
        wifi up
fi
1 Like

try the following patch for qca-nss-drv (it adds core boot wait)

Index: nss_core.c
===================================================================
--- nss_core.c  (revision 57802)
+++ nss_core.c  (working copy)
@@ -2208,7 +2208,16 @@ static inline void nss_core_handle_tx_unblocked(st
         */
        nss_hal_disable_interrupt(nss_ctx, nss_ctx->int_ctx[0].shift_factor, NSS_N2H_INTR_TX_UNBLOCKED);
 }
+static int nss_bootstate = 0;

+void nss_bootwait(void)
+{
+       int dead = 10*10;
+       while(!nss_bootstate && dead-- > 0)
+       {
+               msleep(100);
+       }
+}
 /*
  * nss_core_handle_cause_nonqueue()
  *     Handle non-queue interrupt causes (e.g. empty buffer SOS, Tx unblocked)
@@ -2276,6 +2285,7 @@ static void nss_core_handle_cause_nonqueue(struct
                }
 #endif
 #endif
+               nss_bootstate = 1;
        }

 #if defined(NSS_DRV_EDMA_LITE_ENABLE)
Index: nss_hal/nss_hal.c
===================================================================
--- nss_hal/nss_hal.c   (revision 57802)
+++ nss_hal/nss_hal.c   (working copy)
@@ -868,6 +868,7 @@ int nss_hal_probe(struct platform_device *nss_dev)
        }

        nss_info("%px: All resources initialized and nss core%d has been brought out of reset", nss_ctx, nss_dev->id);
+       nss_bootwait();
        goto out;

 err_register_irq:
Index: nss_core.h
===================================================================
--- nss_core.h  (revision 57802)
+++ nss_core.h  (working copy)
@@ -988,6 +988,8 @@ void nss_core_update_max_ipv4_conn(int conn);
 void nss_core_update_max_ipv6_conn(int conn);
 void nss_core_update_qos_mem_size(int size);
 int nss_core_get_qos_mem_size(void);
+void nss_bootwait(void);
+
 extern void nss_core_register_subsys_dp(struct nss_ctx_instance *nss_ctx, uint32_t if_num,
                                        nss_phys_if_rx_callback_t cb,
                                        nss_phys_if_rx_ext_data_callback_t ext_cb,

@qosmio check this

2 Likes

huge thanks!

give me about 15 minutes only because this build is just about spun up (archiving now) so might as well try it and see what happens :slight_smile:

the dead counter should be 10*10 you dont want to wait 100 seconds if boot fails

@BrainSlayer you were correct btw... of course im going to try the patch now but:

delaying BOTH ath11k and ath11k_ahb on a build with NO bloat works!

im not f*cking crazy after all! :stuck_out_tongue_winking_eye:

root@OPENWRT-UPSTAIRS:~# lsmod | grep qca ; iw dev phy1-ap1 station dump | grep inact
qca_mcs                53248  1 ecm
qca_nss_dp             49152  1 qca_nss_drv
qca_nss_drv           909312  3 ath11k,mac80211,ecm
qca_ssdk             1118208  2 qca_nss_drv,qca_nss_dp
        inactive time:  410 ms
        inactive time:  3780 ms
        inactive time:  14700 ms
        inactive time:  3650 ms
        inactive time:  3410 ms
        inactive time:  4680 ms
        inactive time:  7610 ms
        inactive time:  2660 ms
        inactive time:  2710 ms
        inactive time:  6570 ms
        inactive time:  7240 ms
        inactive time:  7040 ms
        inactive time:  1640 ms
        inactive time:  1700 ms
        inactive time:  11280 ms
        inactive time:  430 ms
        inactive time:  6450 ms
        inactive time:  610 ms
        inactive time:  410 ms
        inactive time:  120 ms
        inactive time:  490 ms
        inactive time:  0 ms

consider that i edited the patch i posted 3 times now

1 Like

understood.

if the patch works you dont need to care about delays anymore.

edit : nevermind, just a whitespace issue.

ill have it up in 10 mins :smiley:

edit 2 : ok its spooling the build up now... ill disable my delay code and flash, will let u know in 5 mins.

@BrainSlayer

didnt work :frowning: both the inactive time and lack of bitrate popped back up.

no biggy, for now i will re-implement my rc.local based delay.

thank you very much regardless.

all that being said, probably not bad to wait for the cores to be up before moving forward :wink:

root@OPENWRT-UPSTAIRS:~# iw dev phy1-ap1 station dump  | grep 'inact\|bitrate'
        inactive time:  63930 ms
        inactive time:  63980 ms
        inactive time:  70850 ms
        inactive time:  70840 ms
        inactive time:  70000 ms
        inactive time:  69730 ms
        inactive time:  55010 ms
        inactive time:  7640 ms
        inactive time:  69100 ms
        inactive time:  66900 ms
        inactive time:  66930 ms
        inactive time:  67730 ms
        inactive time:  66740 ms
        inactive time:  1540 ms
        inactive time:  65260 ms
        inactive time:  29740 ms
        inactive time:  3640 ms
        inactive time:  63670 ms
        inactive time:  340 ms
        inactive time:  940 ms
        inactive time:  9340 ms

( @qosmio )

for the time being i will consider both the inactive time as well as the lack of bitrate in iw dev station dump fixed.

its a bit of a mickey mouse way to do it, im sure as @BrainSlayer says there is just a race condition somewhere... but... without a doubt, the below works.

i just slimmed down my build even more... like you said, i removed all the qca-nss-* modules i do not use and i disabled everything i had MANUALLY set in the drv options... so its all makefile based.

zero issues.

here is my wifiinit.sh script, if i were to hand this to the public id probably add a 60 second sleep before the reboot so people could ssh in quickly to remove it if need be for any reason... but for my private consumption this works... i get a couple of reboots when i sysupgrade btw... probably because the /etc restore doesnt have time to finish :stuck_out_tongue:

regardless, here is my script:

#!/bin/bash
reboot=false
if [ -e /etc/modules.d/ath11k ]; then
        rm /etc/modules.d/ath11k
        reboot=true
fi
if [ -e /etc/modules.d/ath11k-ahb ]; then
        rm /etc/modules.d/ath11k-ahb
        reboot=true
fi
if $reboot; then
        reboot
        exit
else
        modprobe ath11k nss_offload=1 frame_mode=2
        modprobe ath11k_ahb
        for i in {1..5}; do
                if [ -L /sys/class/ieee80211/phy0 ] && [ -L /sys/class/ieee80211/phy1 ]; then
                        break
                fi
                sleep 1
        done
        wifi up
fi

and here is my dump after many reboots:

root@OPENWRT-UPSTAIRS:~# iw dev phy1-ap1 station dump | grep 'inact' ; iw dev phy1-ap1 station dump | grep 'bitrate' ;
        inactive time:  1210 ms
        inactive time:  11270 ms
        inactive time:  2380 ms
        inactive time:  15730 ms
        inactive time:  230 ms
        inactive time:  12690 ms
        inactive time:  2330 ms
        inactive time:  2070 ms
        inactive time:  3300 ms
        inactive time:  2070 ms
        inactive time:  1680 ms
        inactive time:  3400 ms
        inactive time:  9980 ms
        inactive time:  2500 ms
        inactive time:  2070 ms
        inactive time:  5050 ms
        inactive time:  160 ms
        inactive time:  610 ms
        inactive time:  980 ms
        inactive time:  990 ms
        inactive time:  0 ms
        inactive time:  1140 ms
        tx bitrate:     28.9 MBit/s MCS 3 short GI
        rx bitrate:     39.0 MBit/s MCS 4
        tx bitrate:     6.0 MBit/s
        rx bitrate:     26.0 MBit/s MCS 3
        tx bitrate:     54.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     54.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     54.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     72.2 MBit/s MCS 7 short GI
        rx bitrate:     72.2 MBit/s MCS 7 short GI
        tx bitrate:     72.2 MBit/s MCS 7 short GI
        rx bitrate:     65.0 MBit/s MCS 6 short GI
        tx bitrate:     57.8 MBit/s MCS 5 short GI
        rx bitrate:     39.0 MBit/s MCS 4
        tx bitrate:     6.0 MBit/s
        rx bitrate:     26.0 MBit/s MCS 3
        tx bitrate:     65.0 MBit/s MCS 6 short GI
        rx bitrate:     39.0 MBit/s MCS 4
        tx bitrate:     6.0 MBit/s
        rx bitrate:     26.0 MBit/s MCS 3
        tx bitrate:     6.0 MBit/s
        rx bitrate:     65.0 MBit/s MCS 7
        tx bitrate:     6.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     72.2 MBit/s MCS 7 short GI
        rx bitrate:     58.5 MBit/s MCS 6
        tx bitrate:     65.0 MBit/s MCS 6 short GI
        rx bitrate:     72.2 MBit/s MCS 7 short GI
        tx bitrate:     58.5 MBit/s MCS 6
        rx bitrate:     52.0 MBit/s MCS 5
        tx bitrate:     72.2 MBit/s MCS 7 short GI
        rx bitrate:     11.0 MBit/s
        tx bitrate:     58.5 MBit/s MCS 6
        rx bitrate:     48.0 MBit/s
        tx bitrate:     72.2 MBit/s MCS 7 short GI
        rx bitrate:     72.2 MBit/s MCS 7 short GI
        tx bitrate:     54.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     72.2 MBit/s MCS 7 short GI
        rx bitrate:     72.2 MBit/s MCS 7 short GI
        tx bitrate:     28.9 MBit/s MCS 3 short GI
        rx bitrate:     19.5 MBit/s MCS 2

i would guess that the reason some people are seeing this and others are not is simply because those who do not see these issues have a more heavy startup. 2 of my aps where i do most of my flashing are dumb aps.... theres really nothing going on here except for a docker container which i manually start later down the boot process... so the boots are fairly quick, and ath11k (and mac80211 and cfg80211 etc etc) both get processed very quickly... theres no wireguard here, no adblock dnsmasq firewalling even... its as raw as it gets.

i'm not giving up that fast. expect a new patch soon

2 Likes

today will be my last evening i can play... well.. a little tomorrow... saturday our real vacation stats.

ill be more than happy to spin a couple builds up if you come up with something :slight_smile:

all that being said, at least we know somewhat where the issue lies.

btw, i just implemented the same delay script on my 2nd dumbap and works 100%.

for the next few hours im off, time to pick the kiddo up.

root@OPENWRT-SALON:~# iw dev phy1-ap1 station dump | grep inact ; iw dev phy1-ap1 station dump | grep bitrate
        inactive time:  14940 ms
        inactive time:  25700 ms
        inactive time:  2210 ms
        inactive time:  2420 ms
        inactive time:  680 ms
        inactive time:  1810 ms
        inactive time:  2590 ms
        inactive time:  910 ms
        inactive time:  22630 ms
        inactive time:  10 ms
        inactive time:  2610 ms
        inactive time:  16230 ms
        inactive time:  19300 ms
        inactive time:  520 ms
        inactive time:  5680 ms
        inactive time:  4570 ms
        inactive time:  6240 ms
        inactive time:  5990 ms
        inactive time:  8570 ms
        inactive time:  1460 ms
        inactive time:  0 ms
        inactive time:  3490 ms
        inactive time:  1140 ms
        inactive time:  530 ms
        tx bitrate:     6.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     6.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     65.0 MBit/s MCS 7
        rx bitrate:     52.0 MBit/s MCS 5
        tx bitrate:     54.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     72.2 MBit/s MCS 7 short GI
        rx bitrate:     52.0 MBit/s MCS 5
        tx bitrate:     43.3 MBit/s MCS 4 short GI
        rx bitrate:     72.2 MBit/s MCS 7 short GI
        tx bitrate:     54.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     6.0 MBit/s
        rx bitrate:     39.0 MBit/s MCS 4
        tx bitrate:     6.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     54.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     6.0 MBit/s
        rx bitrate:     39.0 MBit/s MCS 4
        tx bitrate:     65.0 MBit/s MCS 6 short GI
        rx bitrate:     52.0 MBit/s MCS 5
        tx bitrate:     6.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     72.2 MBit/s MCS 7 short GI
        rx bitrate:     48.0 MBit/s
        tx bitrate:     6.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     54.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     6.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     6.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     65.0 MBit/s MCS 7
        rx bitrate:     72.2 MBit/s MCS 7 short GI
        tx bitrate:     6.0 MBit/s
        rx bitrate:     48.0 MBit/s
        tx bitrate:     65.0 MBit/s MCS 6 short GI
        rx bitrate:     72.2 MBit/s MCS 7 short GI
        tx bitrate:     6.0 MBit/s
        rx bitrate:     11.0 MBit/s
        tx bitrate:     65.0 MBit/s MCS 6 short GI
        rx bitrate:     52.0 MBit/s MCS 5
        tx bitrate:     57.8 MBit/s MCS 5 short GI
        rx bitrate:     58.5 MBit/s MCS 6

i saw it all the time because my c code is way faster with loading and everthing. easier to trigger

2 Likes

my build(s) must be right on the edge...

because as soon as i would do anything that speeds up module loading it would trigger it.

thats why i was so confused because.

  1. i remove the qca-* bloat... it happens.
  2. ok so i revert back... now instead, i enable some of the gcc optims people recommend... it happens.
  3. etc etc etc

edit : now heres food for thought... and this is way beyond my depth... but why does this manifest itself in missing station stats?

what i should have tested is how it behaves when a station disconnects and reconnects... eg: is this an issue only for stations which connect within the "first little while"...