i am not sure on the correlation of peers vs stations but remember to have seen that the ipq8074 has num_peers = 512 and qcn9074 set to 128 ... this is defined on core.c .. the number of vdevs is also defined there

unrelated ... I just seen a patch that adds support for dynamic vlan to ath11k (this is still being worked)

:see_no_evil:

Of course, ath10k vs ath11k, sorry for the noise!

kirdes found that sysupgrade was aborting and rebooting before the flash rewrite due to hostapd not stopping. As a workaround, it's a simple matter to force-teardown WiFi before running sysupgrade if using the command line: "wifi down; killall -9 wpad; killall -9 hostapd; sysupgrade blahblah.bin".

Chaining commands like this works even if connected over WiFi (client or WDS), whereas doing similar via the GUI would require a proper fix in sysupgrade or maybe some workaround baked into the AX3600 /lib/upgrade/platform.sh script.

Steps #14 and #15 in the standard sysupgrade process sound like they attempt a similar teardown, but the list of potential deficiencies points to [OpenWrt-Devel] Sysupgrade and Failed to kill all processes. So this is not an AX3600-specific problem, but affects it more.

I haven't looked into this further, but can confirm a 100% success rate killing wifi before sysupgrade and ~30% without.

1 Like

To me it still makes no sense why killing it fails only on AX3600

I agree!

Perhaps the increased CPU speed/core count is making things run faster and exposing a sensitivity. That OpenWrt-Devel post says the kill loop runs 10 times "as fast as the shell can loop" (i.e. without throttling). If any of the WiFi processes have a wall clock delay in shutting down gracefully then AX3600 could be hitting an inherent sysugrade problem more often than single core 600MHz class MIPS devices did.

I have 100% success rate issuing just "service wpad stop" before the sysupgrade (instead of those bruteish kill -9 xxx).

If that was the case then AX9000 and others using much faster IPQ8072A would always trigger the issue but they never do

Maybe we can try this. What and where can we add here to test it? I need a lot of tries to do the sysupgrade, usually between 3 and 9, so I can test in my two AX3600 and confirm if it works for sure. I have not tried before the "service wpad stop".

If it works, at least we have an ugly workaround that can be tested and verified until we found the real solution.

To me, this seems the best place:

But I don't know about implications.

1 Like

Perhaps a use case or setup dependency? It seems there are enough people who see no sysupgrade problems on AX3600 also. For me: all three radios active, WDS links, the commanding client (GUI or ssh) being connected via WiFi. Perhaps wifi teardown is slower when these are true. Makes no obvious sense - as you say.

vit0r: I'll change to "service wpad stop" and see how things go.

1 Like

the same setup here ... isn't the wlan0 (ath10) particular to the ax3600 ?

it may be worth branching out to a "xiaomi,ax3600)" as the other models are not reporting this issue if it works

I'm talking about a local test. If it works robi can transform it in a good and real code :wink:

EDIT: first test failed. I modified the "local" /lib/upgrade/platform.sh adding the service wpad stop just before the nand_do_upgrade "$1" and it didn't work. After restart I'm on the same version.

2 Likes

I started the same thought process as you, but didn't have time to go further. Lazily applying the runtime workaround since. Sorry.

It's an ugly hack to specifically stop WiFi in a platform-specific helper, and placing it at the appropriate phase: not too early, such that (e.g.) WiFi is not killed if input image verification or other early-stage item fails ...and not too late such that sysupgrade aborts before your hack is called

If you have the time to experiment, maybe adding a 'sleep 1' throttle in the /lib/upgrade/stage2 kill loop or finding a appropriate/safe place to add explicit wifi/wpad/hostapd shutdown in sysupgrade before the kill loops would be the only actionable fix.

EDIT: /lib/upgrade/stage2, maybe adding "service wpad stop" at line 155, increasing the sleep 4 on line 158. A 'sleep 1' in the kill_remaining() loop woudl be painful.

I'd guess that platform_do_upgrade isn't being actually called during a failed update: that's step #19 in the sysupgrade Wiki document, but it's probably stopping after the step #14 or #15 stop/kills.

Tested with my two AX3600 using the service wpad stop;sysupgrade -k xxxxxxxxxx and it worked perfectly at the first try. Until now, each update of the firmware was a nightmare involving, sometimes, 7 or 8 tries.

4 Likes

@robimarko would it be possible to bring your repo up to date with openwrt? I want to test the compile/build with gcc12.1 thank you

For those who have the same problem, Software flow offloading was not enabled.
After enabling it, I now have full RX/TX speed :+1:

1 Like

Maybe we can add some echo message to a file, in the different sh of the sysupgrade process, to see where it stops? Or the filesystem is not available for writing?

I doubt its gonna happen today, maybe tommorow.

3 Likes

I'm using gcc 12 from at least 3 week... no problem... didn't test with all the package

2 Likes

@robimarko decided to ignore all the complexity of a good solution (impossible without making changes to the firmware and make it provide the required data in msdu and other 2 part)

And follow a simple implementation....

root@No-Lag-Router:~# ubus call hostapd.wlan1 wnm_disassoc_imminent '{"addr":"28:C2:1F:xx:xx:xx"}'
root@No-Lag-Router:~# ubus call hostapd.wlan2 wnm_disassoc_imminent '{"addr":"28:C2:1F:xx:xx:xx"}'
root@No-Lag-Router:~# ubus call hostapd.wlan1 wnm_disassoc_imminent '{"addr":"28:C2:1F:xx:xx:xx"}'
root@No-Lag-Router:~# ubus call hostapd.wlan2 wnm_disassoc_imminent '{"addr":"28:C2:1F:xx:xx:xx"}'
root@No-Lag-Router:~# ubus call hostapd.wlan1 wnm_disassoc_imminent '{"addr":"28:C2:1F:xx:xx:xx"}'

This is the output

Fri Jun  3 15:30:18 2022 daemon.info hostapd: wlan2: STA 28:c2:1f:xx:xx:xx IEEE 802.11: associated (aid 2)
Fri Jun  3 15:30:18 2022 daemon.notice hostapd: wlan1: Prune association for 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:18 2022 daemon.notice hostapd: wlan1: AP-STA-DISCONNECTED 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:18 2022 daemon.notice hostapd: wlan2: AP-STA-CONNECTED 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:18 2022 daemon.info hostapd: wlan2: STA 28:c2:1f:xx:xx:xx RADIUS: starting accounting session 47C93DE71F4CD488
Fri Jun  3 15:30:18 2022 daemon.info hostapd: wlan2: STA 28:c2:1f:xx:xx:xx WPA: pairwise key handshake completed (RSN)
Fri Jun  3 15:30:18 2022 daemon.notice hostapd: wlan2: EAPOL-4WAY-HS-COMPLETED 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:30 2022 daemon.info hostapd: wlan1: STA 28:c2:1f:xx:xx:xx IEEE 802.11: authenticated
Fri Jun  3 15:30:30 2022 daemon.info hostapd: wlan1: STA 28:c2:1f:xx:xx:xx IEEE 802.11: associated (aid 2)
Fri Jun  3 15:30:30 2022 daemon.notice hostapd: wlan2: Prune association for 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:30 2022 daemon.notice hostapd: wlan2: AP-STA-DISCONNECTED 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:30 2022 daemon.notice hostapd: wlan1: AP-STA-CONNECTED 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:30 2022 daemon.info hostapd: wlan1: STA 28:c2:1f:xx:xx:xx RADIUS: starting accounting session 7732A99DA1FBB8C0
Fri Jun  3 15:30:30 2022 daemon.info hostapd: wlan1: STA 28:c2:1f:xx:xx:xx WPA: pairwise key handshake completed (RSN)
Fri Jun  3 15:30:30 2022 daemon.notice hostapd: wlan1: EAPOL-4WAY-HS-COMPLETED 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:37 2022 daemon.info hostapd: wlan2: STA 28:c2:1f:xx:xx:xx IEEE 802.11: associated (aid 2)
Fri Jun  3 15:30:37 2022 daemon.notice hostapd: wlan1: Prune association for 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:37 2022 daemon.notice hostapd: wlan1: AP-STA-DISCONNECTED 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:37 2022 daemon.notice hostapd: wlan2: AP-STA-CONNECTED 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:37 2022 daemon.info hostapd: wlan2: STA 28:c2:1f:xx:xx:xx RADIUS: starting accounting session 47C93DE71F4CD488
Fri Jun  3 15:30:37 2022 daemon.info hostapd: wlan2: STA 28:c2:1f:xx:xx:xx WPA: pairwise key handshake completed (RSN)
Fri Jun  3 15:30:37 2022 daemon.notice hostapd: wlan2: EAPOL-4WAY-HS-COMPLETED 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:42 2022 daemon.info hostapd: wlan1: STA 28:c2:1f:xx:xx:xx IEEE 802.11: authenticated
Fri Jun  3 15:30:42 2022 daemon.info hostapd: wlan1: STA 28:c2:1f:xx:xx:xx IEEE 802.11: associated (aid 2)
Fri Jun  3 15:30:42 2022 daemon.notice hostapd: wlan2: Prune association for 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:42 2022 daemon.notice hostapd: wlan2: AP-STA-DISCONNECTED 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:42 2022 daemon.notice hostapd: wlan1: AP-STA-CONNECTED 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:42 2022 daemon.info hostapd: wlan1: STA 28:c2:1f:xx:xx:xx RADIUS: starting accounting session 7732A99DA1FBB8C0
Fri Jun  3 15:30:42 2022 daemon.info hostapd: wlan1: STA 28:c2:1f:xx:xx:xx WPA: pairwise key handshake completed (RSN)
Fri Jun  3 15:30:42 2022 daemon.notice hostapd: wlan1: EAPOL-4WAY-HS-COMPLETED 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:45 2022 daemon.notice hostapd: wlan1: AP-STA-DISCONNECTED 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:45 2022 daemon.info hostapd: wlan1: STA 28:c2:1f:xx:xx:xx IEEE 802.11: disassociated due to inactivity
Fri Jun  3 15:30:45 2022 daemon.info hostapd: wlan2: STA 28:c2:1f:xx:xx:xx IEEE 802.11: associated (aid 2)
Fri Jun  3 15:30:45 2022 daemon.notice hostapd: wlan1: Prune association for 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:45 2022 daemon.notice hostapd: wlan2: AP-STA-CONNECTED 28:c2:1f:xx:xx:xx
Fri Jun  3 15:30:45 2022 daemon.info hostapd: wlan2: STA 28:c2:1f:xx:xx:xx RADIUS: starting accounting session 47C93DE71F4CD488
Fri Jun  3 15:30:45 2022 daemon.info hostapd: wlan2: STA 28:c2:1f:xx:xx:xx WPA: pairwise key handshake completed (RSN)
Fri Jun  3 15:30:45 2022 daemon.notice hostapd: wlan2: EAPOL-4WAY-HS-COMPLETED 28:c2:1f:xx:xx:xx
Fri Jun  3 15:31:15 2022 daemon.info hostapd: wlan1: STA 28:c2:1f:xx:xx:xx IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)

Think the problem is fixed... now i just need to spam the code with comments and produce a good rfc patch... (and post it here if someone wants to test)

(the phone correctly transition to the other band with the ubus call, just checked with luci webui)

5 Likes