Ath11k possible firmware bug - traffic interruptions when a client disconnects from WLAN

Can you tell us what is your test setup. I've tested one more time extensively for more than 10 minutes sequentially running iperf3 tests on a Laptop and a Smart Phone and disconnecting two other devices from WLAN. I couldn't reproduce the issue with WLAN.HK.2.9.0.1-01385-QCAHKSWPL_SILICONZ-1.
Or at least it doesn't happen every time as with both later versions of the firmware.
There are other users that reported they don't see similar behaviour with the latest firmware version.
Probably there are some other factors that are not so obvious and may cause an adverse effect too.

Default radio settings, channel 149 (HE80), psk2. Nothing fancy.
I have around 10 clients connected, mostly Apple devices (Macs and iPhones).

The test is running ping to router from connected mac client and toggling WiFi off on iPhone.
Yes, it does not happen every time, but toggle WiFi enough and it will happen.

I have checked logs, nothing interesting. Ran hostapd with debug logging - also nothing of interest.

2 Likes

Also, just to be clear, I don't see such extreme interruptions as you reported. Interruption is usually around 1 second long or less. I would see pings going normal (under 10ms), then on client disconnect one ping taking anywhere from 300ms to timeout.

Oh, not sure if I mentioned, it's wrx36 device.

3 Likes

I am now using 1835 and it has run fine for a week without crashes or apparent interruptions.
((DL-WRX36, 23.05) but I only have 2 modern Samsung phones a Samsung tablet and an LG TV using the wireless (5GHz) so my network is lightly taxed.

1 Like

I downgraded firmware on my wrx36 to WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1. It is very very hard to repro the issue with this older firmware.

From practical perspective this is definitely better than 2.9.0.1 firmware. Reproducing the issue on 2.9.0.1 is very easy just by togglig WiFi on/off on iPhone a couple of times.

2.7.0.1 also fixes broadcast/multicast bug for me.

2 Likes

As a reference to the issues in this thread and a workaround from this post I've tried the following.

I've just returned to the latest (at the moment) Ath11k firmware WLAN.HK.2.9.0.1-01862.
Rebooted the router and tested once again without multi to unicast option. Confirmed the loss of both Ping and Iperf3 traffic.

I've turned on multi to unicast option only for the 5G radio.
In wireless config it is option multicast_to_unicast_all '1'
I've tested once again with multi to unicast option turned on.
There is no more loss of both Ping and Iperf3 traffic if a client disconnects from the 5G WLAN.
There is a long discussion commenting this issue.

So I suggested in my earlier post these issues were connected. That is now confirmed, I think.

@egc, @asvio, @Catfriend1 @Pow maybe you can try to use the workaround enabling multi to unicast option in wireless settings.

Maybe developers can find the real cause and resolve this.
@robimarko, @Ansuel, @nbd, @kirdes, @hnyman, @quarky

4 Likes

Hey @sppmaster
Since commit 549e710fc I've not had any more disconnection problems or network errors.
I think it's essentially due to the new implementation of the hostapd package but I'm not 100% sure.

I have done some tests enabling and disabling multi_to_unicast and in my case (nbg7815) I have not found any difference in performance and/or ping errors.

2 Likes

Since commit 549e710fc I've not had any more disconnection problems or network errors.

This.

I don't know what fixed this - hostapd upgrade, kernel upgrade or kernel patches but with latest snapshot (r23763-46ed38adeb) I'm not seeing this issue anymore. At least it is not easily reproducible as before.

I am also not seeing IPv6 connectivity issue that previously required multicast_to_unicast_all workaround or downgrading firmware to 2.7.0.1. It is too early to call this yet on the multicast bug as the group rekey interval for CCMP is day, so I want to keep running this for extended time to be certain.

One thing I noticed with latest snapshot is no ath11k errors in dmesg. Previously running wifi would always have these errors, something about flushing the ring or something, can't recall exact message.

Of course just as I posted this and ran wifi command a couple of times I got the dreaded error messages:

[ 4205.867960] ath11k c000000.wifi: failed to flush transmit queue, data pkts pending 1
[ 4210.987999] ath11k c000000.wifi: failed to flush transmit queue, data pkts pending 1

After that the traffic interruption bug was easy to repro just like before. Damn.

2 Likes

I've flashed freshly compiled build with latest commits (kernel 6.1.46).
For me the issue is still there absolutely repeatable when multi to unicast is not checked.
The "workaround" option multicast_to_unicast_all '1' only masks the issue temporary but at one point it doesn't matter at all.
I'll continue to monitor how it goes/changes as time goes by.

Update - reading this post.

I can confirm this behaviour too with today snapshot and obviously multi to unicast is not even a workaround for the above issues.

Just an update. I have been running OpenWrt SNAPSHOT, r23763-46ed38adeb on wrx36 with ath11k firmware downgraded to WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1 for two months (uptime 56 days) without problems.

Dual radio - channel 149 HE80 with sae-mixed encryption, channel 6 HE20 with psk2+ccmp encryption.

3 Likes

Where can I get the firmware version?

I'm now testing OpenWrt SNAPSHOT r24124-518923178c with WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1, starting today. We'll see :slight_smile:

1 Like
WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1
1 Like

Which directory do I need to place the fw files?

Edit; For the radio firmware simply replace content of /lib/firmware/IPQ8074 with e.g. content of the 1835 directory

See : https://github.com/egc112/OpenWRT-egc-add-on/tree/main/DL-WRX36 for instructions

1 Like

Ok, let's go :slight_smile:

Oh thanks... my wifi doesn't come up anymore.

At others: Do NOT insert 2.7.0.1-01744 into r24111 build. It won't work.

Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] qcom-q6v5-wcss-pil cd00000.q6v5_wcss: fatal error received: 
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] QC Image Version: QC_IMAGE_VERSION_STRING=WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] Image Variant : IMAGE_VARIANT_STRING=8074.wlanfw.eval_v2Q
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] 
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292]     :Excep  :0 Exception detectedparam0 :zero, param1 :zero, param2 :zero.
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] Thread ID      : 0x00000069  Thread name    : WLAN RT0  Process ID     : 0
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] Register:
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] SP : 0x4bfacdc0
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] FP : 0x4bfacdd8
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] PC : 0x4b18d338
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] SSR : 0x00000001
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] BADVA : 0x009c9d7e
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] LR : 0x4b18d2b8
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] 
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] Stack Dump
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] from : 0x4bfacdc0
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] to   : 0x4bfad400
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   17.985292] 
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   18.031944] remoteproc remoteproc0: crash detected in cd00000.q6v5_wcss: type fatal error
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   18.054132] remoteproc remoteproc0: handling crash #1 in cd00000.q6v5_wcss
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   18.062217] remoteproc remoteproc0: recovering cd00000.q6v5_wcss
Oct 14 13:43:00 WifiAP-02-WZAX kernel: [   18.095005] remoteproc remoteproc0: stopped remote processor cd00000.q6v5_wcss

UPDATE: ath11k c000000.wifi: fw_version 0x290a84a5 fw_build_timestamp 2023-06-21 21:36 fw_build_id WLAN.HK.2.9.0.1-01837-QCAHKSWPL_SILICONZ-1

This works on r24111

That is the one I am using.

Not saying it is the best but works for me :slight_smile:

1 Like

Nope. WLAN.HK.2.7 and OpenWrt SNAPSHOT r24124-518923178c and toggle my iPhone wifi on/off a few times I get this:

Request timeout for icmp_seq 25
Request timeout for icmp_seq 26
Request timeout for icmp_seq 27
Request timeout for icmp_seq 28

Well I think I solved the broadcast issue I had with my airprinter. Not enabling ieee80211k - Enables Radio Resource Measurement (802.11k) support. .
I dunno why enabling ieee80211k blocks my airprinter to be discovered. Have to do some more testing about this issue. :face_with_monocle:

Edit: Nope... ieee80211k had nothing to do with it.

2 Likes