Ath11k possible firmware bug - traffic interruptions when a client disconnects from WLAN

I use QNAP QHora-301W as a main router. QCA Ath11k.
Currently runs my own compiled build with NSS Offloading support from @bitthief repo.

What I see is this:

  1. I run iperf3 server on a PC (Windows 11) connected with cable to the router LAN1 port.
  2. I run iperf3 client on a Laptop (Windows 11) with Mediatek WiFi 6 WLAN network adapter. It is of course connected to the 5G WLAN 80MHz.
  3. I have a Smart Phone (Android 13) with Qualcomm WiFi 6 chipset connected to same 5G WLAN 80MHz.
    During the iperf3 session if I turn off the phone WiFi (manually pressing on the phone WiFi icon) the traffic on the Laptop is interrupted for several seconds. See the zeroes below.

This is absolutely repeatable/reproducible for me every time I disconnect the client from the WiFi.

Edit - Even simpler test is to just ping the router IP from any wireless client while disconnecting another one and watch for the ping delay or loss.

1 Like

What firmware version are you using? You can check with....

dmesg | grep WLAN.HK

It was just updated in snapshot to WLAN.HK.2.9.0.1-01862.

What OpenWrt version is your build based on?

1 Like

I'm currently with it - WLAN.HK.2.9.0.1-01862
I've tried older one Ath11k firmware WLAN.HK.2.9.0.1-01837 but it's absolutely the same except for the fact that I have better iperf3 speed test results with WLAN.HK.2.9.0.1-01837.

root@QNAP:~# uname -a
Linux QNAP 6.1.42 #0 SMP PREEMPT Fri Jul 28 17:07:17 2023 aarch64 GNU/Linux

My goal here is more users to try the same scenario and test so we can quickly see if this is a mass bug or something specific to my setup.

I remember that @quarky had seen a bug in the past with Ath10k during a client disassociation/disconnection from the WLAN.

Maybe @Ansuel other developers and more experienced users can comment here.

The setup to check this with iperf3 is really simple.
Anyone who can try it, please test and tell us if it's OK or not.

1 Like

My DL-WRX36 runs my own compiled 23.05:

{
        "kernel": "5.15.120",
        "hostname": "DL-WRX36",
        "system": "ARMv8 Processor rev 4",
        "model": "Dynalink DL-WRX36",
        "board_name": "dynalink,dl-wrx36",
        "rootfs_type": "squashfs",
        "release": {
                "distribution": "OpenWrt",
                "version": "23.05-SNAPSHOT",
                "revision": "r23313-017827e205",
                "target": "ipq807x/generic",
                "description": "OpenWrt 23.05-SNAPSHOT r23313-017827e205"
        }
}
root@DL-WRX36:~# dmesg | grep WLAN.HK
[   13.354655] ath11k c000000.wifi: fw_version 0x290a84a5 fw_build_timestamp 2023-06-21 21:36 fw_build_id WLAN.HK.2.9.0.1-01837-QCAHKSWPL_SILICONZ-1

Wireless testing channel 36, 80 MHZ:

Wired Desktop PC windows 10 running iperf3 -s
Laptop Windows 10 Intel AX200
Samsung S20FE

When running iperf3 on the laptop and I disable wifi on the phone nothing happens.the iperf3 just runs with the same speed

When running iperf3 on the phone (magic iperf) and i disable wifi on the laptop I see for a couple of seconds the iperf3 going to 0 !
afbeelding

1 Like

hi @sppmaster
I am using nbg7815, also with the nss offloading firmware based on the bitthief repo. I am however using the firmware WLAN.HK.2.9.0.1-01385 since both WLAN.HK.2.9.0.1-01837 and WLAN.HK.2.9.0.1-01862 give me disconnection problems with ax210 clients and ax200 that I have.

I do not have the problem that you describe with version 1385 and it is also the one that seems to give me the best performance, but in any case the nbg7815 is very capricious with the ath11k firmware, especially when the blockd package is included. (the blockd problem also exits in 23.05 rc2 and lastest main/master snapshot)

On the other hand, in my case, connecting a wifi5 client to the same network gives a lot of stability to wifi 6 clients (don't ask why...but the ofdma implementation of the firmware will have something to do with it) and avoids the disconnections of the latter.

2 Likes

I am testing WLAN.HK.2.9.0.1-01862 with a Mi 11 lite 5G NE as client and I dont see any disconnects, at 160MHz with 4 threads iperf3 I can reach a stable 1.02Gbits/s DL on my AX6.

Please check the first several posts carefully. There are no spontaneous disconnects of any WLAN client.
There are data traffic interruptions when another WLAN client disconnects from the network (not the one performing iperf3 test).
During your iperf3 test just manually disconnect one of your other phones from the network manually and check if the traffic will be interrupted at that moment.
Tomorrow I will try more scenarios.

2 Likes

Switching a client from ssid A to B crashes my ath11k driver on ax3600. Xiaomi AX3600: ath11k firmware crash - qcom-q6v5-wcss-pil cd00000.q6v5_wcss: fatal error received: - #14 by Catfriend1

1 Like

@sppmaster I've not repeated your tests but did have issues when connecting mediatek mt7621e linux wifi 6 clients to my Dynalink WRX36 AP. The clients connect ok and have a working wifi connection but do not respond to pings from other devices anymore. I don't know if it is an ath11k or mt7621 issue but they don't seem to work nicely together. In the end it was annoying enough that I switched back to my Meraki MR42 AP (wifi 5) for now and everything works again.

Repeated the same test while the other phone (Mi8 Lite on the same radio in VHT mode) kept connecting/disconnecting several times. I even changed the Iperf3 to a single thread so multi-threads are not "masking" if there is a short interruptions. There was no drop on in the DL traffic.

1 Like

Even simpler test is to just ping the router IP from any wireless client while disconnecting another one from the WiFi and watch for the ping loss.

No need to run iperf3 at all or do other complications.

More people confirm that behavior and loss of connection (no data traffic) for a few seconds.
@robimarko, @Ansuel, @quarky.

1 Like

Not reproducible on my QNAP 301w with an official build.

Maybe it's related to your unsupported NSS-Build.

Please check again with an official build.

1 Like

Whatever it is I've made some more tests. Repeatable no matter what client disconnects from WLAN it disrupts the traffic. Both Ping and Iperf3 reflect this. I see lower speed with latest firmware too. Could be just a measurement error.
Maybe it depends on the clients used as you see from the other users feedback the things slightly differ.

I've just returned to firmware WLAN.HK.2.9.0.1-01385-QCAHKSWPL_SILICONZ-1.
I've copied it to /lib/firmware/IPQ8074 and rebooted.

Nothing similar to be seen with it. I disconnected and connected numerous times. No issues.
For now I don't see a need to flash the official build but may try it later.
Awaiting feedback from more users.
Probably it deserves its path and glory to the Ath11k bug report page.

2 Likes

I can compile with 1835 again but where did you obtained that firmware? Saves me compiling :slight_smile:

Never mind I can extract form older build or download :slight_smile:

1 Like

Please, anyone who participates in the testing, report your OpenWrt builds. NSS or official build.

@egc Do you use build with NSS Offloading?

No, no offloading build, Snapshot 23.05 I did compile myself:

root@DL-WRX36:~# ubus call system board
{
        "kernel": "5.15.120",
        "hostname": "DL-WRX36",
        "system": "ARMv8 Processor rev 4",
        "model": "Dynalink DL-WRX36",
        "board_name": "dynalink,dl-wrx36",
        "rootfs_type": "squashfs",
        "release": {
                "distribution": "OpenWrt",
                "version": "23.05-SNAPSHOT",
                "revision": "r23313-017827e205",
                "target": "ipq807x/generic",
                "description": "OpenWrt 23.05-SNAPSHOT r23313-017827e205"
        }
}

I went back to 1835 and there was no interruption of iperf3 on disconnecting

root@DL-WRX36:~# dmesg | grep WLAN
[   13.633443] ath11k c000000.wifi: fw_version 0x290c84a5 fw_build_timestamp 2023-03-25 07:34 fw_build_id WLAN.HK.2.9.0.1-01385-QCAHKSWPL_SILICONZ-1
1 Like

@kirdes Any comments on the above post? Any other suggestions what else to check to debug this.

After the recent update of the hostapd package I have given ath11k-01862 the opportunity again and I have verified that I no longer have the disconnection problems that I suffered before.

Now I see a different problem in the hostapd package that is preventing radio0 from initializing and router booting and forcing me to restart radio0 to get it working. The following message appears in the system log

Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075): Command failed: ubus call hostapd config_set { "phy": "phy0", "config":"/var/run/hostapd-phy0.conf", "prev_config": "/var/run/hostapd-phy0.conf.prev"} (Invalid argument)
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075): Usage: ubus [<options>] <command> [arguments...]
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075): Options:
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075):  -s <socket>:		Set the unix domain socket to connect to
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075):  -t <timeout>:		Set the timeout (in seconds) for a command to complete
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075):  -S:			Use simplified output (for scripts)
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075):  -v:			More verbose output
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075):  -m <type>:		(for monitor): include a specific message type
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075): 			(can be used more than once)
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075):  -M <r|t>		(for monitor): only capture received or transmitted traffic
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075):
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075): Commands:
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075):  - list [<path>]			List objects
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075):  - call <path> <method> [<message>]	Call an object method
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075):  - subscribe <path> [<path>...]	Subscribe to object(s) notifications
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075):  - listen [<path>...]			Listen for events
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075):  - send <type> [<message>]		Send an event
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075):  - wait_for <object> [<object>...]	Wait for multiple objects to appear on ubus
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075):  - monitor				Monitor ubus traffic
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075):
Wed Aug  2 10:01:31 2023 daemon.notice netifd: radio0 (3075): Device setup failed: HOSTAPD_START_FAILED

EDIT: I try both builds, oficial and nss offloading.

My system. I compiled it myself.

root@NBG7815:/# ubus call system board
{
        "kernel": "6.1.42",
        "hostname": "NBG7815",
        "system": "ARMv8 Processor rev 4",
        "model": "Zyxel NBG7815",
        "board_name": "zyxel,nbg7815",
        "rootfs_type": "squashfs",
        "release": {
                "distribution": "OpenWrt",
                "version": "SNAPSHOT",
                "revision": "r23669+48-4a4e0c636f",
                "target": "qualcommax/ipq807x",
                "description": "OpenWrt SNAPSHOT r23669+48-4a4e0c636f"
        }
}
1 Like

Have you checked this commit that fixes this issue? But I think they are not related to the issues found with Ath11k.