[FIXED] 21.02.0-rc2 WRT3200ACM wireless is broken

Update: This seems to be fixed by simply disabling 802.11w support.
Network->Wireless->Edit->Wireless Security->802.11w Management Frame Protection = Disabled
It used to be disabled by default in older versions of OpenWRT.

I'm upgrading a WRT3200ACM from 18.06.x which worked fine. For the upgrade I wiped the config and started from scratch.

Now I've tried both 21.02.0-rc2 and the latest 21.02-snapshot and the WiFi is broken in both. Wireless devices can connect but no traffic flows. Then after about 5 minutes all Luci and SSH (intermittently) connections hang and the WiFi stops working completely.

I've tried with and without both 2.4GHz and 5GHz. WPA2 PSK and WPA2/3 PSK. Doesn't seem to fix anything. The only thing that keeps the router stable is to completely disable all wireless interfaces.

I see these errors and other similar "timed out" errors from ieee80211, on both phy0 and phy1:

[  371.099225] ieee80211 phy1: cmd 0x9122=UpdateEncryption timed out
[  371.105360] ieee80211 phy1: return code: 0x1122
[  371.109909] ieee80211 phy1: timeout: 0x1122
[  371.114118] wlan1: failed to remove key (0, 2c:aa:x:x:x:x) from hardware (-5)
[  371.177345] ieee80211 phy1: MACREG_REG_INT_CODE: 0x0000
[  391.202038] ieee80211 phy1: cmd 0x9111=SetNewStation timed out
[  391.207899] ieee80211 phy1: return code: 0x1111
[  391.212459] ieee80211 phy1: timeout: 0x1111
[  391.216904] ieee80211 phy1: MACREG_REG_INT_CODE: 0x0000
[  411.229221] ieee80211 phy1: cmd 0x801d=MEMAddrAccess timed out
[  411.235082] ieee80211 phy1: return code: 0x001d
[  411.239641] ieee80211 phy1: timeout: 0x001d
[  411.243844] ieee80211 phy1: MACREG_REG_INT_CODE: 0x0000
[  431.256765] ieee80211 phy1: cmd 0x801d=MEMAddrAccess timed out
[  431.262626] ieee80211 phy1: return code: 0x001d
[  431.267183] ieee80211 phy1: timeout: 0x001d
[  431.271385] ieee80211 phy1: MACREG_REG_INT_CODE: 0x0000
[  451.285632] ieee80211 phy1: cmd 0x801d=MEMAddrAccess timed out
[  451.291493] ieee80211 phy1: return code: 0x001d
[  451.296052] ieee80211 phy1: timeout: 0x001d
[  451.302452] ieee80211 phy1: MACREG_REG_INT_CODE: 0x0000
[  471.327332] ieee80211 phy1: cmd 0x9122=UpdateEncryption timed out
[  471.333463] ieee80211 phy1: return code: 0x1122
[  471.338012] ieee80211 phy1: timeout: 0x1122
[  471.342220] wlan1: failed to remove key (0, 2c:aa:x:x:x:x) from hardware (-5)
[  471.376791] ieee80211 phy1: MACREG_REG_INT_CODE: 0x0000
[  491.390280] ieee80211 phy1: cmd 0x9111=SetNewStation timed out
[  491.396142] ieee80211 phy1: return code: 0x1111
[  491.400700] ieee80211 phy1: timeout: 0x1111
[  491.433843] ieee80211 phy1: MACREG_REG_INT_CODE: 0x0000
[  511.436915] ieee80211 phy1: cmd 0x801d=MEMAddrAccess timed out
[  511.442775] ieee80211 phy1: return code: 0x001d
[  511.447341] ieee80211 phy1: timeout: 0x001d
[  511.451552] ieee80211 phy1: MACREG_REG_INT_CODE: 0x0000

Just before these errors appear the router web interface becomes unresponsive, SSH connections will intermittently hang, and the WiFi is completely dead.

I couldn't find anyone else reporting problems except in older versions with mwlwifi issues.

Here is my mwlwifi info:

driver name: mwlwifi
chip type: 88W8964
hw version: 7
driver version: 10.3.8.0-20181210
firmware version: 0x0903020c
power table loaded from dts: no
firmware region code: 0x10
mac address: 60:38:x:x:x:x
2g: disable
5g: enable
antenna: 4 4
irq number: 74
ap macid support: 0000ffff
sta macid support: 00010000
macid used: 00000000
radio: disable
iobase0: 94fee30c
iobase1: 17f898bd
tx limit: 1024
rx limit: 16384

Any hints?

I just tried 19.07.7 and it seems to work OK (using my 18.06 config). Very slight difference in the mwlwifi firmware version:

driver name: mwlwifi
chip type: 88W8964
hw version: 7
driver version: 10.3.8.0-20181210
firmware version: 0x09030206
...
irq number: 49

I'll have to let it run for a while to see if it's really stable but so far doing way better than 21.02.

Is there a firmware even newer than 0x0903020c that might fix 21.02? Where is this firmware stored? I would like to try swapping out the 21.02 firmware for the one in 19.07 to see if that fixes 21.02.

1 Like

There are many things that could affect WRT3200ACM stability / behaviour.

If that is a software regression, fixing it will require at least finding OpenWrt commit that caused it.

Possible reasons:

RF environment change

Router repositioning, your laptop movement, new neighbour with his WiFi.

Config change

Change of channel number or channel width (20 / 40 / 80 MHz).

mwlwifi driver update

Checking OpenWrt's package diff:

git diff origin/openwrt-19.07..origin/openwrt-21.02 package/kernel/mwlwifi/Makefile

shows a bump from 2019-03-02 (31d9386079b9) to the 2020-02-06 (0eda0e774a87). Comparing those commits in @Kaloz's repo:

shows:

  1. Irrelevant struct wiphy_vendor_command change
  2. 88W8964.bin firmware update

It's worth trying the old firmware (simplify putting the old one in the /lib/firmware/mwlwifi/).

hostapd update

Change in a way hostapd interacts with cfg80211 could potentially affect mwlwifi. As pointed out by Johaness mwlwifi also does some tricky stuff with IEs.

Kernel update

Some change in net subsystem or in cfg80211 is particular could affect mwlwifi too.

1 Like

@patrikx3: are you 100% sure it's the switch to the kernel 5.3 that caused this regression? That would mean a not so likely source of the problem: generic net subsystem. For cfg80211 we use backports package.
Did you explicitly test commit switching mvebu to the kernel 5.3? Did you try the commit before it and verified it works correctly? Did you test both commits long enough to be 100% sure of your results?

We really need a very accurate info and careful testing to debug this issue.

not sure 100%. just a thought.
but since v21 wifi is not usable anymore
since from Lede to OpenWrt v19, with the correct settings, it had no issues at all.

There were over 7000 changes between 17.01.0 and 19.07.0.

> git log --oneline v17.01.0..v19.07.0 | wc -l
7868

We really need proper debugging to get this problem fixed.

Since I am currently still experiencing this same issue with the latest 21.02 RCs on WRT3200 with clean installs, I wanted to share some of my findings. Hopefully we can figure out the source.

  • It is not specific to kernels above 5.3 because Davidc502 latest build with kernel 5.4.41 has worked great and stable for about a year now.

  • Also, at least in my case, it seems to be not wifi specific because network is going in and out on wired devices too. They randomly lose internet connectivity for a few seconds. This issue seems to affect wired and wireless.

  • The commit(s) that caused this issue, which I have not pinpointed yet, seemed to have had the same network issue on both kernels 5.4 and 5.10 simultaneously because users in the Divested-WRT community build thread started complaining about it around the same time as the release of RC1. However, initially kernels 5.4 and 5.10 had been running great prior to this.

  • I checked recent mvebu specific commits and could not pinpoint source of issue. I think the issue is outside of mwlwifi and mvebu yet somehow triggers issue for these.

Anyway, I hope to dig into this some more when I get some time on the weekend. I just wanted to share what I know so far.

1 Like

I can report that connections were randomly lost on v21.02rc1 on my Linksys WRT1900 ACS v2 (not testing rc2 until a fix is confirmed). Many other reports of Linksys devices having issues in the rc1 and rc2 forum threads.

Let me know if there's anything I can do to help debugging but I'm a novice so I'll need specific instructions.

1 Like

I can confirm that the problem does not seem to be the mwlwifi firmware blob. I put the "working" firmware 88W8964.bin from 19.07 (0x09030206) on to a 21.02 install and the problem still exists. Then I put the newer firmware from 21.02 (0x0903020c) on to a 19.07 install and everything works fine. In each case I confirmed that I was running whichever firmware version by looking at /sys/kernel/debug/ieee80211/phy0/mwlwifi/info -> firmware version.

So all the firmware blobs work in 19.07 and none fix 21.02.

In a way this is good news because it means the closed-source blob is likely not the problem. Now to find where the problem actually is. :confused:

Wired connections do experience problems but if you disable all wireless then the wired connections work fine. So I believe it's something to do with the wireless.

1 Like

This is interesting information. I didn’t even think to try this.

Debugging is beyond my understanding. But if some of us can reproduce the issue and share debug logs hopefully there and some more experienced users who can parse and understand the logs better.

mwlwifi has some debugging capabilities: https://github.com/kaloz/mwlwifi#monitor-interface-for-debug

Would that be beneficial in this circumstance?

Or is there other debugging within OpenWrt that would obtain better details on this issue?

Do not use these community builds as reference points to assume that OpenWrt would have been fine at the corresponding kernel version, as these builds do contain quite (potentially-) relevant changes for mvebu and mwlwifi (e.g. I seem to remember something about tx ampdu being disabled in a custom patch). You really need to test/ bisect vanilla OpenWrt here, to get meaningful results.

Disclaimer: I don't own any mvebu/ mwlwifi hardware (and never did), so I can't aid in debugging.

1 Like

i am simple building wiht this mwlwifi Makefile:

#
# Copyright (C) 2014-2016 OpenWrt.org
#
# This is free software, licensed under the GNU General Public License v2.
# See /LICENSE for more information.
#

include $(TOPDIR)/rules.mk

PKG_NAME:=mwlwifi
PKG_RELEASE=1

PKG_LICENSE:=ISC
PKG_LICENSE_FILES:=

PKG_SOURCE_URL:=https://github.com/kaloz/mwlwifi
PKG_SOURCE_PROTO:=git
PKG_SOURCE_DATE:=2019-03-02
PKG_SOURCE_VERSION:=31d9386079b91cc699658c19294e139b62b512bc
PKG_MIRROR_HASH:=7bdd05765d8215a9c293cdcb028d63a04c9e55b337eaac9e8d3659bd86218321

PKG_MAINTAINER:=Imre Kaloz <kaloz@openwrt.org>
PKG_BUILD_PARALLEL:=1

include $(INCLUDE_DIR)/kernel.mk
include $(INCLUDE_DIR)/package.mk

define KernelPackage/mwlwifi
  SUBMENU:=Wireless Drivers
  TITLE:=Marvell 88W8864/88W8897/88W8964/88W8997 wireless driver
  DEPENDS:=+kmod-mac80211 +@DRIVER_11N_SUPPORT +@DRIVER_11AC_SUPPORT +@DRIVER_11W_SUPPORT @PCI_SUPPORT @TARGET_mvebu
  FILES:=$(PKG_BUILD_DIR)/mwlwifi.ko
  AUTOLOAD:=$(call AutoLoad,50,mwlwifi)
endef

NOSTDINC_FLAGS = \
	-I$(PKG_BUILD_DIR) \
	-I$(STAGING_DIR)/usr/include/mac80211-backport/uapi \
	-I$(STAGING_DIR)/usr/include/mac80211-backport \
	-I$(STAGING_DIR)/usr/include/mac80211/uapi \
	-I$(STAGING_DIR)/usr/include/mac80211 \
	-include backport/backport.h

define Build/Compile
	+$(MAKE) $(PKG_JOBS) -C "$(LINUX_DIR)" \
		$(KERNEL_MAKE_FLAGS) \
		SUBDIRS="$(PKG_BUILD_DIR)" \
		NOSTDINC_FLAGS="$(NOSTDINC_FLAGS)" \
		modules
endef

define Package/mwlwifi-firmware-default
  SECTION:=firmware
  CATEGORY:=Firmware
  TITLE:=Marvell $(1) firmware
  DEPENDS:=+kmod-mwlwifi @TARGET_mvebu
endef

define Package/mwlwifi-firmware/install
	$(INSTALL_DIR) $(1)/lib/firmware
	$(INSTALL_DIR) $(1)/lib/firmware/mwlwifi
	$(CP) $(PKG_BUILD_DIR)/bin/firmware/$(2) $(1)/lib/firmware/mwlwifi/
	$(CP) $(PKG_BUILD_DIR)/bin/firmware/Marvell_license.txt $(1)/lib/firmware/mwlwifi/$(2).Marvell_license.txt
endef

define Package/mwlwifi-firmware-88w8864
$(call Package/mwlwifi-firmware-default,88W8864)
endef

define Package/mwlwifi-firmware-88w8864/install
	$(call Package/mwlwifi-firmware/install,$(1),88W8864.bin)
endef

define Package/mwlwifi-firmware-88w8897
$(call Package/mwlwifi-firmware-default,88W8897)
endef

define Package/mwlwifi-firmware-88w8897/install
	$(call Package/mwlwifi-firmware/install,$(1),88W8897.bin)
endef

define Package/mwlwifi-firmware-88w8964
$(call Package/mwlwifi-firmware-default,88W8964)
endef

define Package/mwlwifi-firmware-88w8964/install
	$(call Package/mwlwifi-firmware/install,$(1),88W8964.bin)
endef

define Package/mwlwifi-firmware-88w8997
$(call Package/mwlwifi-firmware-default,88W8997)
endef

define Package/mwlwifi-firmware-88w8997/install
	$(call Package/mwlwifi-firmware/install,$(1),88W8997.bin)
endef

$(eval $(call KernelPackage,mwlwifi))
$(eval $(call BuildPackage,mwlwifi-firmware-88w8864))
$(eval $(call BuildPackage,mwlwifi-firmware-88w8897))
$(eval $(call BuildPackage,mwlwifi-firmware-88w8964))
$(eval $(call BuildPackage,mwlwifi-firmware-88w8997))

The last working Makefile in v19 tagged latest.

Even if we get more info, processing & debugging it is a time consuming task. Like that info about wired network getting stable after disabling wireless. One could try to analyze network status, flying packets, firewall state and maybe he could find out what's going on. Still it's a complex and time consuming task for experienced developer.

@WildByDesign: it would be nice if you could test wired network stability with wireless disabled. That may be an important hint once we find regression source.

Still finding the first change that broke network stability is the best option to proceed.

i am building this version instead what is in the v21:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=tree;f=package/kernel/mwlwifi;hb=d5ae5658730a82312a20e68220f92f611b11d094

This may or may not be related, but I just noticed that the issues with mvebu/mwlwifi started within days of the implementation of mitigations for FRAG attacks. I didn’t follow details on that and therefore don’t recall if those are enabled by default or not. But it is within a couple of days from implementation to when complaints started. Just wanted to mention this before I dig into the other stuff.

@rmilecki Absolutely, I can test the stability of wired network when all wireless is disabled. I will follow up on this as soon as I get a chance. Likely once the family goes to sleep so that I don’t cause too much distractions.

for me the wired is perfect. only the wifi is bad. via wired i can use rdp and videos are not stucked up as well.

i contacted @davidc502 , maybe if i build the version of mwlwifi that he has and works on kernel 5.4, could probably work on v21 modified mwlwifi patches and Makefile

Fixes for FlagAttacks were backported in the commit 025bd93f36c9 ("mac80211: backport upstream fixes for FragAttacks").

Please build OpenWrt commits:

  1. 025bd93f36c9 ("mac80211: backport upstream fixes for FragAttacks")
  2. 5a9608102b3c ("build: kernel2minor: work around path length limit") (one commit before FlagAttacks fixes)

and let us know how do they work on your WRT3200.

i think it looks like in Linksys WRT 1900ACS Wifi is perfect. It looks like only in WRT32X and WRT3200ACM being BAAAAD