Wireless instability on ath10k radios after upgrade to 21.02.1

Still odd behavior from what I witness. DFS only means dynamic frequency selection, not drop frequency fullstop ;- ) I am also using a channel where DFS is mandatory and I neither have problems with my LineageOS devices nor that the channel ever get changed through Openwrt, but as soon as I start some old stock Android devices some "sort of DFS" (or a bug / configuration issue) kicks in and the wireless interface will be disabled (evtl. only temporarily)

Netgear R7800, 21.02.1 Guest network, Enable key reinstallation (KRACK) countermeasures enabled, Disassociate On Low Acknowledgement disabled, 40 Mhz channel width, AC mode, Coverage cell density: not sure if it was on very high

opkg list-installed | grep ath10k
ath10k-board-qca9984 - 20201118-3
ath10k-firmware-qca9984-ct - 2020-11-08-1
kmod-ath10k-ct - 5.4.154+2021-09-22-e6a7d5b5-1

System log:

Wed Dec 22 17:27:41 2021 daemon.notice hostapd: wlan0-1: AP-STA-DISCONNECTED CLIENT1
Wed Dec 22 17:37:50 2021 daemon.notice netifd: wan (1313): udhcpc: sending renew to PROVIDER
Wed Dec 22 17:45:20 2021 daemon.notice netifd: wan (1313): udhcpc: sending renew to PROVIDER
Wed Dec 22 17:46:25 2021 daemon.notice hostapd: wlan0-1: AP-STA-DISCONNECTED CLIENT2
Wed Dec 22 17:46:25 2021 daemon.info hostapd: wlan0-1: STA CLIENT2 IEEE 802.11: disassociated due to inactivity
Wed Dec 22 17:46:26 2021 daemon.info hostapd: wlan0-1: STA CLIENT2 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Wed Dec 22 17:49:05 2021 daemon.notice netifd: wan (1313): udhcpc: sending renew to PROVIDER
Wed Dec 22 17:50:57 2021 daemon.notice netifd: wan (1313): udhcpc: sending renew to PROVIDER
Wed Dec 22 17:51:53 2021 daemon.notice netifd: wan (1313): udhcpc: sending renew to 0.0.0.0
Wed Dec 22 17:51:53 2021 daemon.notice netifd: wan (1313): udhcpc: lease of IP obtained, lease time 7200
Wed Dec 22 18:09:28 2021 daemon.notice hostapd: wlan0-1: STA CLIENT2 IEEE 802.11: did not acknowledge authentication response
Wed Dec 22 18:09:47 2021 daemon.info hostapd: wlan0-1: STA CLIENT2 IEEE 802.11: authenticated
Wed Dec 22 18:09:47 2021 daemon.info hostapd: wlan0-1: STA CLIENT2 IEEE 802.11: associated (aid 2)
Wed Dec 22 18:09:47 2021 daemon.notice hostapd: wlan0-1: AP-STA-CONNECTED CLIENT2
Wed Dec 22 18:09:47 2021 daemon.info hostapd: wlan0-1: STA CLIENT2 WPA: pairwise key handshake completed (RSN)
Wed Dec 22 18:17:49 2021 daemon.notice hostapd: wlan0: DFS-RADAR-DETECTED freq=5600 ht_enabled=1 chan_offset=-1 chan_width=2 cf1=5590 cf2=0
Wed Dec 22 18:17:49 2021 daemon.notice hostapd: dfs_downgrade_bandwidth: no DFS channels left, waiting for NOP to finish
Wed Dec 22 18:17:49 2021 daemon.notice hostapd: wlan0: AP-DISABLED
Wed Dec 22 18:17:49 2021 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED CLIENT3
Wed Dec 22 18:17:49 2021 daemon.notice hostapd: wlan0-1: AP-STA-DISCONNECTED CLIENT2
Wed Dec 22 18:17:49 2021 daemon.notice hostapd: wlan0-1: AP-STA-DISCONNECTED CLIENT4
Wed Dec 22 18:17:49 2021 kern.warn kernel: [40956.334834] ath10k_pci 0000:01:00.0: No VIF found for vdev 1
Wed Dec 22 18:17:49 2021 kern.warn kernel: [40956.334881] ath10k_pci 0000:01:00.0: received addba event for invalid vdev_id: 1
Wed Dec 22 18:17:49 2021 kern.warn kernel: [40956.339606] ath10k_pci 0000:01:00.0: No VIF found for vdev 1
Wed Dec 22 18:17:49 2021 kern.warn kernel: [40956.347027] ath10k_pci 0000:01:00.0: received addba event for invalid vdev_id: 1
Wed Dec 22 18:17:49 2021 kern.warn kernel: [40956.353777] ath10k_pci 0000:01:00.0: No VIF found for vdev 1
Wed Dec 22 18:17:49 2021 kern.warn kernel: [40956.360092] ath10k_pci 0000:01:00.0: received addba event for invalid vdev_id: 1
Wed Dec 22 18:17:49 2021 kern.warn kernel: [40956.365684] ath10k_pci 0000:01:00.0: No VIF found for vdev 1
Wed Dec 22 18:17:49 2021 kern.warn kernel: [40956.373007] ath10k_pci 0000:01:00.0: received addba event for invalid vdev_id: 1
Wed Dec 22 18:17:49 2021 kern.warn kernel: [40956.379167] ath10k_pci 0000:01:00.0: No VIF found for vdev 1
Wed Dec 22 18:17:49 2021 kern.warn kernel: [40956.386084] ath10k_pci 0000:01:00.0: no vif for vdev_id 1 found
Wed Dec 22 18:17:49 2021 kern.warn kernel: [40956.392551] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 1
Wed Dec 22 18:17:49 2021 daemon.notice netifd: Network device 'wlan0-1' link is down
Wed Dec 22 18:17:49 2021 daemon.notice netifd: Interface 'guest5g' has link connectivity loss
Wed Dec 22 18:17:49 2021 kern.warn kernel: [40956.397527] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 1
Wed Dec 22 18:17:49 2021 daemon.notice netifd: Interface 'guest5g' is now down
Wed Dec 22 18:17:49 2021 daemon.notice netifd: Interface 'guest5g' is disabled
Wed Dec 22 18:17:49 2021 daemon.warn odhcpd[1426]: No network(s) available on guest5g
Wed Dec 22 18:17:49 2021 daemon.err odhcpd[1426]: setsockopt(IPV6_ADD_MEMBERSHIP): No such device
Wed Dec 22 18:17:49 2021 daemon.err odhcpd[1426]: setsockopt(SO_BINDTODEVICE): No such device
Wed Dec 22 18:17:49 2021 daemon.err odhcpd[1426]: setsockopt(SO_BINDTODEVICE): No such device
Wed Dec 22 18:17:49 2021 daemon.notice hostapd: nl80211: deinit ifname=wlan0 disabled_11b_rates=0
Wed Dec 22 18:17:49 2021 kern.info kernel: [40956.524321] device wlan0 left promiscuous mode
Wed Dec 22 18:17:49 2021 kern.info kernel: [40956.524424] br-lan: port 2(wlan0) entered disabled state
Wed Dec 22 18:17:49 2021 kern.info kernel: [40956.586620] ath10k_pci 0000:01:00.0: mac flush null vif, drop 0 queues 0xffff
Wed Dec 22 18:17:49 2021 kern.warn kernel: [40956.587518] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 0
Wed Dec 22 18:17:49 2021 kern.warn kernel: [40956.592851] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 0
Wed Dec 22 18:17:49 2021 daemon.notice netifd: Network device 'wlan0' link is down
Wed Dec 22 18:17:49 2021 daemon.notice hostapd: wlan0: interface state ENABLED->DISABLED
Wed Dec 22 18:17:55 2021 kern.warn kernel: [40962.611194] ath10k_pci 0000:01:00.0: 10.4 wmi init: vdevs: 16  peers: 48  tid: 96
Wed Dec 22 18:17:55 2021 kern.warn kernel: [40962.611220] ath10k_pci 0000:01:00.0: msdu-desc: 2500  skid: 32
Wed Dec 22 18:17:55 2021 kern.info kernel: [40962.693264] ath10k_pci 0000:01:00.0: wmi print 'P 48/48 V 16 K 144 PH 176 T 186  msdu-desc: 2500  sw-crypt: 0 ct-sta: 0'
Wed Dec 22 18:17:55 2021 kern.info kernel: [40962.694114] ath10k_pci 0000:01:00.0: wmi print 'free: 84920 iram: 13156 sram: 11224'
Wed Dec 22 18:17:56 2021 daemon.err odhcpd[1426]: Failed to send to ff02::1%guest5g@wlan0-1 (Bad file descriptor)
Wed Dec 22 18:17:56 2021 kern.info kernel: [40963.077855] ath10k_pci 0000:01:00.0: rts threshold -1
Wed Dec 22 18:17:56 2021 kern.warn kernel: [40963.083063] ath10k_pci 0000:01:00.0: Firmware lacks feature flag indicating a retry limit of > 2 is OK, requested limit: 4
Wed Dec 22 18:17:56 2021 kern.info kernel: [40963.088320] br-lan: port 2(wlan0) entered blocking state
Wed Dec 22 18:17:56 2021 kern.info kernel: [40963.093004] br-lan: port 2(wlan0) entered disabled state
Wed Dec 22 18:17:56 2021 kern.info kernel: [40963.098894] device wlan0 entered promiscuous mode
Wed Dec 22 18:17:56 2021 daemon.notice hostapd: wlan0: interface state DISABLED->COUNTRY_UPDATE
Wed Dec 22 18:17:56 2021 daemon.notice hostapd: wlan0: interface state COUNTRY_UPDATE->HT_SCAN
Wed Dec 22 18:17:56 2021 daemon.err hostapd: could not get valid channel
Wed Dec 22 18:17:56 2021 daemon.notice hostapd: wlan0: interface state HT_SCAN->DFS
Wed Dec 22 18:26:55 2021 daemon.err odhcpd[1426]: Failed to send to ff02::1%guest5g@wlan0-1 (Bad file descriptor)
Wed Dec 22 18:31:20 2021 daemon.err uhttpd[2086]: luci: accepted login on / for root from 192.168.1.145

Kernel Log

[ 2185.875994] ath10k_pci 0000:01:00.0: Invalid peer id 1 or peer stats buffer, peer: be4781d6  sta: 00000000
[ 2288.859972] ath10k_pci 0000:01:00.0: Invalid VHT mcs 15 peer stats
[40956.334834] ath10k_pci 0000:01:00.0: No VIF found for vdev 1
[40956.334881] ath10k_pci 0000:01:00.0: received addba event for invalid vdev_id: 1
[40956.339606] ath10k_pci 0000:01:00.0: No VIF found for vdev 1
[40956.347027] ath10k_pci 0000:01:00.0: received addba event for invalid vdev_id: 1
[40956.353777] ath10k_pci 0000:01:00.0: No VIF found for vdev 1
[40956.360092] ath10k_pci 0000:01:00.0: received addba event for invalid vdev_id: 1
[40956.365684] ath10k_pci 0000:01:00.0: No VIF found for vdev 1
[40956.373007] ath10k_pci 0000:01:00.0: received addba event for invalid vdev_id: 1
[40956.379167] ath10k_pci 0000:01:00.0: No VIF found for vdev 1
[40956.386084] ath10k_pci 0000:01:00.0: no vif for vdev_id 1 found
[40956.392551] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 1
[40956.397527] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 1
[40956.524321] device wlan0 left promiscuous mode
[40956.524424] br-lan: port 2(wlan0) entered disabled state
[40956.586620] ath10k_pci 0000:01:00.0: mac flush null vif, drop 0 queues 0xffff
[40956.587518] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 0
[40956.592851] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 0
[40962.611194] ath10k_pci 0000:01:00.0: 10.4 wmi init: vdevs: 16  peers: 48  tid: 96
[40962.611220] ath10k_pci 0000:01:00.0: msdu-desc: 2500  skid: 32
[40962.693264] ath10k_pci 0000:01:00.0: wmi print 'P 48/48 V 16 K 144 PH 176 T 186  msdu-desc: 2500  sw-crypt: 0 ct-sta: 0'
[40962.694114] ath10k_pci 0000:01:00.0: wmi print 'free: 84920 iram: 13156 sram: 11224'
[40963.077855] ath10k_pci 0000:01:00.0: rts threshold -1
[40963.083063] ath10k_pci 0000:01:00.0: Firmware lacks feature flag indicating a retry limit of > 2 is OK, requested limit: 4
[40963.088320] br-lan: port 2(wlan0) entered blocking state
[40963.093004] br-lan: port 2(wlan0) entered disabled state
[40963.098894] device wlan0 entered promiscuous mode

Addendum: About 30 minutes later radio starts again on the very same channel it first "detected radar" and gone into NOP. But that doesn't work long....seem related to an virtual AP created and perhaps to the numbers of stations connected. Not many in my case. Two seems to be no problem but three or four and trouble starts

daemon.warn odhcpd[1453]: No DHCP range available on guest5g

I'm also having Wireless instability after upgrading to 21.02.1.

Device: Linksys WRT3200ACM

This happens whenever I add a second SSID to a radio:

wireless.radio0=wifi-device
wireless.radio0.type='mac80211'
wireless.radio0.channel='36'
wireless.radio0.hwmode='11a'
wireless.radio0.path='soc/soc:pcie/pci0000:00/0000:00:01.0/0000:01:00.0'
wireless.radio0.htmode='VHT80'
wireless.radio0.country='US'
wireless.radio0.cell_density='0'
wireless.default_radio0=wifi-iface
wireless.default_radio0.device='radio0'
wireless.default_radio0.network='lan'
wireless.default_radio0.mode='ap'
wireless.default_radio0.macaddr='24:f5:a2:c6:11:82'
wireless.default_radio0.ssid='goaway'
wireless.default_radio0.encryption='sae-mixed'
wireless.default_radio0.key='<REDACTED>'

This works just fine. But as soon as I add my office WLAN to the same radio, things go all sideways:

wireless.wifinet3=wifi-iface
wireless.wifinet3.device='radio0'
wireless.wifinet3.mode='ap'
wireless.wifinet3.ssid='NotYours'
wireless.wifinet3.encryption='sae-mixed'
wireless.wifinet3.key='<ALSO REDACTED>'
wireless.wifinet3.network='office'

This is what it looks like in LuC. The radio is actually NOT a Generic 802.11bg, despite LuCI's confusion:

And removing it doesn't help -- the radio remains "Disabled".

(tried to add another screenshot here, but I only just opened up my Forums account and it restricts us newbies to just one screenshot I guess)

After rebooting -- as suggested by @khimaros -- the router comes back fine, but the original (goaway) WLAN isn't broadcasting and LuCI is pretty much unresponsive.

I'll do some more testing with the other radios to see if they produce undesirable results, but I have to flip back to 19.07 to get the network back and get ready for some company coming over later.

@anon89577378 thank you for the tip. i've moved my 5GHz to another channel outside of the DFS range, however, the problem has also been happening (actually, more frequently) on the 2.4GHz radio, so i don't think this is the only cause.

1 Like

this doesn't appear to be an ath10k radio, so the issue may be separate.

No it's not an Atheros radio -- for the WRT3200ACM, it's a Marvell radio I think. It might not be directly related, but it could also point to a common problem. Maybe a code maintainer will be able to take a look.

I just looked at the list of open bugs and I do see that there's a report on WPA3 on my device. I'll try only WPA2 the next time around, but it might a day or two before I can get back to it.

For WRT3200ACM with 21.02.1 it is likely at least this bug causing various trouble... (already fixed in 21.02 branch, but not in the old 21.02.1 release)

Thanks @hnyman. I'll keep my eye out for the 21.02.2 and give it another whirl.

Channel 2 overlaps with other channels on the 2.4 Ghz band.

Select either channel 1, 6, or 11

i've switched to channel 1 and i'm still having this issue.

the problem i'm experiencing most consistently is that the device is still associated (i can see it in wireless section, excellent signal strength), it has a dhcp lease, but 100% packet loss to the router.

cycling the wifi on the client solves it, but sometimes it fails again as little as 5 minutes later. this is happening to several devices on the network.

absolutely nothing of interest in dmesg or the system logs. the routes look right. i've completely disabled the firewall on the client.

one of the clients is a Intel Comet Lake PCH-LP CNVi WiFi (iwlwifi)

1 Like

Try changing your channel width from 40 to 20.

changing channel width did not help. i also notice that this is not happening to all devices at the same time. individual devices will independently stop being able to ping the router.

Couple of things to look at...

What settings are you using for wireless security?

Not all devices are WPA3 capable.

Any legacy devices?

Some may need 802.11b to work.

the settings are all the same as they were on 19.07.8. this is happening with modern devices like a Gen 8 Lenovo Thinkpad X1 Carbon (with the aforementioned iwlwifi card).

When you upgraded to 21.02.1, did you keep your 19.07.8 configs, or re-configure from scratch?

There were several changes associated with the move from swconfig to DSA.

From the 21.02.0 release announcement...

The following targets are using a switch managed with DSA in OpenWrt 21.02:

    ath79 (only TP-Link TL-WR941ND)
    bcm4908
    gemini
    kirkwood
    mediatek (most boards)
    mvebu
    octeon
    ramips (mt7621 subtarget only)
    realtek

The Turris Omnia target is mvebu.

Which means your configs were not upgradable from 19.07.8 to 21.02.1

looking back at my old configuration (i have biweekly backups dating back to 2021-05). at the least, /etc/config/wireless from my earliest backup is completely identical to what it was prior to the changes we made in this thread.

i have a machine that just entered this broken state again. it is still associated with the router. it has a DHCP lease. it cannot ping the router by IP address. the router still has it in the list of active wireless clients.

A couple of minor changes to wireless...cell density is one.

Network and other configs have changed for DSA.

I use a file diff program to compare the backup tar.gz files.

Use the 19.07.8 configs as a reference, reset, and re-configure from scratch.

First post, 21.02.0 release announcement...

something interesting. on the router, looking at the assoclist, the RX packets continue to increase, but TX is not:

# iwinfo wlan1 assoclist
...
<HWADDR>  -47 dBm / -95 dBm (SNR 48)  370 ms ago
	RX: 6.0 MBit/s                                152197 Pkts.
	TX: 130.0 MBit/s, MCS 15, 20MHz               124533 Pkts.
	expected throughput: 46.3 MBit/s

i'm not sure if i'm looking at the right /sys values on the client, but it seems like there is movement on /sys/class/net/wlp0s20f3/statistics/tx_packets but not rx_packets.

so it seems like both devices believe they are sending packets to the other device, but neither of them are receiving. yet somehow they remain associated.

in terms of wireless config, here is the old:

config wifi-device 'radio1'
	option type 'mac80211'
	option hwmode '11g'
	option path 'soc/soc:pcie/pci0000:00/0000:00:01.0/0000:01:00.0'
	option country 'US'
	option htmode 'HT40'
	option channel '2'

and here is the new:

config wifi-device 'radio1'
	option type 'mac80211'
	option hwmode '11g'
	option path 'soc/soc:pcie/pci0000:00/0000:00:01.0/0000:01:00.0'
	option country 'US'
	option cell_density '0'
	option channel '1'
	option htmode 'HT20'

the (enabled) wifi-iface sections are identical.

reading through the release notes, should DSA really be impacting wireless? it seems like this would only affect ethernet ports?

i'm using WPA2-PSK for all of my networks.

i'm reading through https://openwrt.org/docs/guide-developer/debugging#wireless and have increased the radio log level in the hopes that will help me troubleshoot.

approximately 24 hours ago i ran the following:

# uci set wireless.radio1.log_level=1
# uci commit wireless
# wifi up

since then no issues. however, i suspect the issue starts to happen after some period of time.

i'll report here if it happens again and will try to grab a snapshot of the management frames or at least more verbose log output.

finally had another failure today

there is nothing in the log at the time the issue started.

when i reassociate, it looks like any normal reassociation:

Fri Jan 21 03:53:12 2022 daemon.notice hostapd: wlan1: AP-STA-DISCONNECTED <HWADDR>
Fri Jan 21 03:53:12 2022 daemon.debug hostapd: wlan1: STA <HWADDR> WPA: event 3 notification
Fri Jan 21 03:53:12 2022 daemon.debug hostapd: wlan1: STA <HWADDR> IEEE 802.1X: unauthorizing port
Fri Jan 21 03:53:12 2022 daemon.debug hostapd: wlan1: STA <HWADDR> IEEE 802.11: deauthenticated
Fri Jan 21 03:53:12 2022 daemon.debug hostapd: wlan1: STA <HWADDR> MLME: MLME-DEAUTHENTICATE.indication(<HWADDR>, 3)
Fri Jan 21 03:53:12 2022 daemon.debug hostapd: wlan1: STA <HWADDR> MLME: MLME-DELETEKEYS.request(<HWADDR>)