Mt798x-wmac 18000000.wifi: Message xxxxxxxx (seq 5) timeout

Hi,

I'm using 23.0.5 on my new Zyxel NWA50AX PRO.
Any idea what causes the timeout messages? It causes wifi clients to lose there connection

Sat Oct 21 07:35:49 2023 daemon.notice hostapd: phy1-ap0: BEACON-REQ-TX-STATUS da:ef:5e:a8:df:a4 2 ack=1
Sat Oct 21 07:35:49 2023 daemon.notice hostapd: phy0-ap0: BEACON-REQ-TX-STATUS 76:f1:11:e8:b7:2b 27 ack=1
Sat Oct 21 07:35:49 2023 daemon.notice hostapd: phy1-ap0: BEACON-RESP-RX da:ef:5e:a8:df:a4 2 00 802c00000000000000006400008400f44d5c5a7ec7009a7bf00301d868a973d5070000006400311000064275636b656e01088c129824b048606c03012c0712444520240417340414640b1a90010095070d301c0100000fac040100000fac040300000fac02000fac04000fac06cc000b050100050000460572000000003603fdea013b0280002d1aef0917ffff0000000000000000000001000000000000000000003d162c0504000000000000000000000000000000000000007f0a04100a0a010001400040451102e7070a14160e2b000000000000000000bf0cf6798933faff0000faff0020c005012a00fcffc304022e2e2e
Sat Oct 21 07:35:49 2023 daemon.notice hostapd: phy0-ap0: BEACON-RESP-RX 76:f1:11:e8:b7:2b 27 00 802c00000000000000006400007c00f44d5c5a7ec70097f1540001de475073d5070000006400311000064275636b656e01088c129824b048606c03012c0504010200000712444520240417340414640b1a90010095070d301c0100000fac040100000fac040300000fac02000fac04000fac06cc000b050100050000460572000000003603fdea013b0280002d1aef0917ffff0000000000000000000001000000000000000000003d162c0504000000000000000000000000000000000000007f0a04100a0a010001400040451102e7070a14160e2b000000000000000000bf0cf6798933faff0000faff0020c005012a00fcffc304022e2e2e
Sat Oct 21 07:36:01 2023 daemon.warn odhcpd[1693]: No default route present, overriding ra_lifetime!
Sat Oct 21 07:36:05 2023 daemon.warn odhcpd[1693]: No default route present, overriding ra_lifetime!
Sat Oct 21 07:36:10 2023 kern.err kernel: [33667.431778] mt798x-wmac 18000000.wifi: Message 00005aed (seq 5) timeout
Sat Oct 21 07:36:31 2023 kern.err kernel: [33687.906575] mt798x-wmac 18000000.wifi: Message 000026ed (seq 6) timeout
Sat Oct 21 07:36:31 2023 daemon.notice hostapd: phy0-ap0: BEACON-REQ-TX-STATUS 76:f1:11:e8:b7:2b 28 ack=0
Sat Oct 21 07:36:31 2023 daemon.notice hostapd: phy1-ap0: BEACON-REQ-TX-STATUS da:ef:5e:a8:df:a4 3 ack=0
Sat Oct 21 07:36:34 2023 daemon.notice hostapd: phy1-ap0: BEACON-REQ-TX-STATUS da:ef:5e:a8:df:a4 4 ack=0
Sat Oct 21 07:36:34 2023 daemon.notice hostapd: phy0-ap0: BEACON-REQ-TX-STATUS 76:f1:11:e8:b7:2b 29 ack=0
Sat Oct 21 07:36:38 2023 daemon.notice hostapd: phy1-ap0: BEACON-REQ-TX-STATUS da:ef:5e:a8:df:a4 5 ack=0

my wireless config


config wifi-device 'radio0'
	option type 'mac80211'
	option path 'platform/18000000.wifi'
	option channel '6'
	option band '2g'
	option htmode 'HE20'
	option cell_density '0'
	option country 'DE'

config wifi-iface 'default_radio0'
	option device 'radio0'
	option network 'lan'
	option mode 'ap'
	option ssid 'Bucken'
	option encryption 'psk2+ccmp'
	option multicast_to_unicast_all '1'
	option key 'xxxx'
	option ieee80211r '1'
	option mobility_domain 'fdea'
	option reassociation_deadline '20000'
	option ft_over_ds '1'
	option ft_psk_generate_local '1'
	option ieee80211w '2'
	option wpa_disable_eapol_key_retries '1'
	option ieee80211k '1'
	option time_advertisement '2'
	option wnm_sleep_mode '1'
	option wnm_sleep_mode_no_keys '1'
	option bss_transition '1'
	option proxy_arp '1'

config wifi-device 'radio1'
	option type 'mac80211'
	option path 'platform/18000000.wifi+1'
	option channel '44'
	option band '5g'
	option htmode 'HE80'
	option cell_density '0'
	option country 'DE'

config wifi-iface 'default_radio1'
	option device 'radio1'
	option network 'lan'
	option mode 'ap'
	option ssid 'Bucken'
	option encryption 'psk2+ccmp'
	option multicast_to_unicast_all '1'
	option key 'xxxx'
	option ieee80211r '1'
	option mobility_domain 'fdea'
	option reassociation_deadline '20000'
	option ft_over_ds '1'
	option ft_psk_generate_local '1'
	option ieee80211w '2'
	option wpa_disable_eapol_key_retries '1'
	option ieee80211k '1'
	option time_advertisement '2'
	option wnm_sleep_mode '1'
	option wnm_sleep_mode_no_keys '1'
	option bss_transition '1'
	option proxy_arp '1'

I have the exact same problem with a Beryl AX (GL.iNet GL-MT3000) on 23.05.02.

Mon Nov 27 21:57:53 2023 kern.err kernel: [32821.198806] mt798x-wmac 18000000.wifi: Message 000026ed (seq 12) timeout
Mon Nov 27 21:58:13 2023 kern.err kernel: [32841.673752] mt798x-wmac 18000000.wifi: Message 00005aed (seq 13) timeout
Mon Nov 27 21:58:34 2023 kern.err kernel: [32862.148726] mt798x-wmac 18000000.wifi: Message 000026ed (seq 14) timeout

wireless config

config wifi-device 'radio0'
	option type 'mac80211'
	option path 'platform/18000000.wifi'
	option channel '1'
	option band '2g'
	option htmode 'HE20'
	option cell_density '0'
	option country 'DE'
	option txpower '15'

config wifi-device 'radio1'
	option type 'mac80211'
	option path 'platform/18000000.wifi+1'
	option channel '100'
	option band '5g'
	option htmode 'HE80'
	option country 'DE'
	option cell_density '0'
	option txpower '24'

config wifi-iface 'default_radio0'
	option device 'radio0'
	option network 'lan'
	option mode 'ap'
	option ssid 'Onion'
	option encryption 'psk2'
	option key 'xxxx'
	option dtim_period '3'
	option ieee80211r '1'
	option ft_over_ds '0'
	option ft_psk_generate_local '1'
	option mobility_domain '1a2b'
	option ifname 'wlan0'

config wifi-iface 'default_radio1'
	option device 'radio1'
	option network 'lan'
	option mode 'ap'
	option ssid 'Onion'
	option encryption 'psk2'
	option key 'xxxx'
	option dtim_period '3'
	option ieee80211r '1'
	option ft_over_ds '0'
	option ft_psk_generate_local '1'
	option mobility_domain '1a2b'
	option ifname 'wlan1'

config wifi-vlan
	option name 'vlan1'
	option network 'lan'
	option vid '1'

config wifi-station
	option key 'xxxx'
	option vid '1'

config wifi-vlan
	option name 'vlan20'
	option network 'gast'
	option vid '20'

config wifi-station
	option key 'xxxx'
	option vid '20'

have you tried setting these parameters:

option max_inactivity '86400'           
option disassoc_low_ack '0'             
option wpa_group_rekey '86400'

I've seen you also use roaming, try increasing your reassociation_deadline:
option reassociation_deadline '20000'

Thx for the suggestion. I tried those parameters.

Today only one timeout:

Tue Nov 28 08:49:01 2023 kern.err kernel: [71879.734357] mt798x-wmac 18000000.wifi: Message 000026ed (seq 1) timeout

In this case, no client disconnection for now.

If I stress the AP with iperf3 on WiFi connected devices I can reproduce the messages (25 connected devices in total):

Wed Nov 29 16:32:23 2023 kern.err kernel: [ 2802.378439] mt798x-wmac 18000000.wifi: Message 00005aed (seq 5) timeout
Wed Nov 29 16:32:44 2023 kern.err kernel: [ 2822.853228] mt798x-wmac 18000000.wifi: Message 000026ed (seq 6) timeout
Wed Nov 29 16:32:44 2023 kern.err kernel: [ 2822.859863] mt798x-wmac 18000000.wifi: Message 000026ed (seq 7) timeout
Wed Nov 29 16:32:55 2023 kern.err kernel: [ 2834.311039] mt798x-wmac 18000000.wifi: Message 00005aed (seq 4) timeout
Wed Nov 29 16:33:14 2023 kern.err kernel: [ 2853.095811] mt798x-wmac 18000000.wifi: Message 00005aed (seq 4) timeout
Wed Nov 29 16:33:32 2023 kern.err kernel: [ 2871.141260] mt798x-wmac 18000000.wifi: Message 00005aed (seq 2) timeout
Wed Nov 29 16:33:44 2023 kern.err kernel: [ 2882.977593] mt798x-wmac 18000000.wifi: Message 00005aed (seq 2) timeout
Wed Nov 29 16:34:20 2023 kern.err kernel: [ 2918.927495] mt798x-wmac 18000000.wifi: Message 00005aed (seq 4) timeout
Wed Nov 29 16:34:21 2023 kern.err kernel: [ 2920.357468] mt798x-wmac 18000000.wifi: Message 00005aed (seq 3) timeout
Wed Nov 29 16:35:27 2023 kern.err kernel: [ 2986.499628] mt798x-wmac 18000000.wifi: Message 00005aed (seq 14) timeout
Wed Nov 29 16:36:12 2023 kern.err kernel: [ 3030.927934] mt798x-wmac 18000000.wifi: Message 00005aed (seq 14) timeout
Wed Nov 29 16:36:27 2023 kern.err kernel: [ 3045.723855] mt798x-wmac 18000000.wifi: Message 00005aed (seq 9) timeout
...

Clients are disconnected if the message occur multiple times like above.

Something "load" related. I don't know.

Too bad, but this way I can't use OpenWrt productively :frowning: .

@retuor and @phk

I just stumbled upon this thread as I was experiencing the same on my GL-MT6000 (also mt798x-wmac) devices. I know exactly the change I made that led to this.

I will note that 18000000.wifi is the 2.4Ghz radio in the /etc/config/wireless file. I noticed the same timeout issue present as soon as I added option multicast_to_unicast_all '1' specifically to the three wifi-iface that I have for 2.4Ghz SSIDs to multiple wifi-iface definitions that I have for my 5 SSIDs.

Once I removed option multicast_to_unicast_all '1' from the 2.4Ghz all my SSIDs, the issue stopped.

FWIW, I am not seeing this timeout occurring with the same option multicast_to_unicast_all '1' setting on my 5Ghz SSIDs.

Curious if either/both of you are still dealing with this and if you can confirm if this finding holds true for you as well.


Update #1:
I may have been premature in my statements above. There appears to be more to this and it may not be limited to 2.4Ghz. Going to do more testing to try to narrow it down better.

Update #2:
I am incorrect in my assessment that it was limited to just 2.4Ghz. Further testing in my case showed it does appear to affect both bands, potentially even more-so on 5Ghz than I had originally believed. However, I am quite confident that this setting itself (multicast_to_unicast_all '1') is problematic with this driver/device. I'm wondering now if anyone has reported this as a bug already.

2 Likes

Thanks for the hint.
000026ed, 00005aed errors are gone after disabling the multicast_to_unicast_all option.
Xiaomi redmi ax6000

1 Like

For me this multicast_to_unicast feature is very important and I don't want to disable it.
Is there really a bug in the driver or in openwrt?

Not sure, but it would be great if we could check and see if a bug report exists, or if one of us could raise one in the mt76 repo.

@phk and @Rising_Sun Just opened a new issue here: https://github.com/openwrt/mt76/issues/866

Please add any additional details there that you feel might help. Thanks!

@phk && @Rising_Sun I'm trying @anon58727419's patch from here:


Update:
Even with this patch, I still experienced crashes with multicast_to_unicast_all enabled.

@phk / @Rising_Sun:

Check out @xize's post. I'm testing this out now!


Update:
Seems to still be exhibiting issues even with the updated firmware:

[  743.168242] mt798x-wmac 18000000.wifi: Message 00005aed (seq 2) timeout
[ 1297.661339] mt798x-wmac 18000000.wifi: Message 00005aed (seq 13) timeout
[ 1842.429689] mt798x-wmac 18000000.wifi: Message 000026ed (seq 15) timeout
[ 1862.888429] mt798x-wmac 18000000.wifi: Message 00005aed (seq 1) timeout
[ 1883.346054] mt798x-wmac 18000000.wifi: Message 000800c4 (seq 2) timeout
[ 1903.804285] mt798x-wmac 18000000.wifi: Message 000026ed (seq 3) timeout
[ 1924.263044] mt798x-wmac 18000000.wifi: Message 00005aed (seq 4) timeout
[ 1944.720388] mt798x-wmac 18000000.wifi: Message 000026ed (seq 5) timeout
[ 1965.179029] mt798x-wmac 18000000.wifi: Message 00005aed (seq 6) timeout
[ 1985.637032] mt798x-wmac 18000000.wifi: Message 000026ed (seq 7) timeout
[ 2006.094603] mt798x-wmac 18000000.wifi: Message 00005aed (seq 8) timeout
[ 2026.553350] mt798x-wmac 18000000.wifi: Message 000026ed (seq 9) timeout

Update 2:
I'm still facing this issue. Not sure if anyone has made any headway in nailing down the root cause as of yet.

@nbd Is this one you happen to have on your radar? Thanks!

1 Like

I'm also experiencing crashes with multi-psk on MT7981.

  • when connecting to the network using the main WPA2 PSK, everything is stable.
  • when connecting to the network using any of the secondary PSKs, everything works until a station starts sending traffic to the AP
  • then the chip hangs, and a combination of 00005aed and 000026ed timeouts happens

When using OpenWrt snapshot without any patches to the mt76 driver, the chip completely restarts on it's own and the wifi network appears in a couple of seconds. All clients including ones connected via the main PSK get disconnected.

Then I tried rany2/openwrt@18cc739 patch and 0x5a messages stop appearing but the chip still hangs, the driver shows 0x26 timeout and restarts.

I then tried to compile the rany2/openwrt fork and since it applies a bunch of patches, when the chip hangs, it manages to recover without disconnecting clients, but shows the following:

[  447.275349] mt798x-wmac 18000000.wifi: send message 000130ed timeout, try again(1).
[  447.283349] mt798x-wmac 18000000.wifi: 
[  447.283349] phy0 L1 SER recovery completed.
[  447.821897] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000004
[  447.828811] mt798x-wmac 18000000.wifi: 
[  447.828811] phy0 L1 SER recovery start.
[  447.837695] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000008
[  447.854270] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000010
[  447.861219] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000020
[  447.868360] mt798x-wmac 18000000.wifi: 
[  447.868360] phy0 L1 SER recovery completed.

I'm assuming that 0x00130ed is message type 0x30 MCU_EXT_CMD_GET_TX_STAT.

The same setup works on MT7613, MT7612, MT7615, MT7603, client optimized MT7921k (n, ac, ax), and appears to not hang on MT7975 (Asus RT-AX53U) even though it uses the same mt7915e module.

So I'm assuming that this is a firmware bug, so I tried all five firmware versions published on mtk-feeds, and it's similar with all, but the crashes don't happen as often with the latest firmware.

If possible, can someone explain to me what's the difference between stations connected to the main AP interface vs ones connected to AP_VLAN interface? The keys are different, but why would it cause it to crash?