Users needed to test Wi-Fi stability on Linksys WRT3200ACM & WRT32X on OpenWrt 21.02

I can confirm I'm having the same issue with a Samsung Note 10+ and a WRT1900AC, running OpenWrt 21.02.0.
The phone is connected, until I start doing something, like watching youtube. It then looses connection (wifi icon with exclamation mark). If I don't do anything, it sits there for about 5 minutes, then it gets a connection again. I can manually disconnect and reconnect to fix it as well, but it's super annoying doing this several times per hour.

Logs don't say much, except for these lines, of which there are plenty, also for other devices:

Tue Oct  5 14:01:31 2021 kern.debug kernel: [428683.276860] ieee80211 phy1: Mac80211 start BA d6:ff:e1:bd:65:ba
Tue Oct  5 14:01:31 2021 kern.debug kernel: [428683.333366] ieee80211 phy1: Mac80211 start BA d6:ff:e1:bd:65:ba
Tue Oct  5 14:01:31 2021 kern.debug kernel: [428683.388916] ieee80211 phy1: Mac80211 start BA d6:ff:e1:bd:65:ba
Tue Oct  5 14:01:31 2021 kern.debug kernel: [428683.448859] ieee80211 phy1: Mac80211 start BA d6:ff:e1:bd:65:ba
Tue Oct  5 14:01:43 2021 kern.debug kernel: [428695.333484] ieee80211 phy1: Mac80211 start BA d6:ff:e1:bd:65:ba

There is no new entry when the phone finally gets internet again.
I was led to believe these lines are not problematic.

I have tried disabling "Allow AP mode to disconnect STAs based on low ACK condition", but that doesn't improve things.

Should I use sysupgrade or factory when going back to 19.07? I tried factory, but the device didn't come online, luckily it has dual boot.

Use factory, but make sure that you uncheck the "keep settings" option.
(You likely need to use the "force" option to overcome image check)

Factory is needed for the kernel partition size change (for WRT1900AC "Mamba").
The config-incompatible DSA change causes the need for not "keep settings".

Sorry, I had removed my question in the meantime. I was asking whether I should use factory or sysupgrade to downgrade back to 19.07. Factory failed, the wrt1900ac never came back alive. So I had to flash with sysupgrade and that worked perfectly. Finally stable wifi again.

I've spent days looking through past commits within the timeframe of these wireless issues but I have been unsuccessful.

My most recent theory was that it may have had something to do with the mitigations for FragAttacks ( mac80211: backport upstream fixes for FragAttacks Ā· openwrt/openwrt@025bd93 (github.com)), in particular because of timing but also because it touches a variety of things on mac80211 such as A-MSDU frames and more.

However, after some more time looking into it, I realized that 21.02.0 RC1 was compiled 2-3 weeks prior to the FragAttack mitigations commit. So unfortunately that theory did not pan out.

I was also comparing many commits between master and 21.02 branches within specific timeframes and trying to find commits which were relevant but also commits that hit both branches around the same time frames.

I could not find any other relevant commits. I'm wondering if it could be within a kernel bump that went into both 5.4 and 5.10 kernels. But this goes far beyond my skills and understanding.

Have you tried this?

Have we confirmed that the issue happened in 21.02 RC1? I personally noticed it with RC3/4, but I don't know when other reports started, ie after or before RC1.

Yes, unfortunately it was in RC1. I personally experienced it in RC1 and there were a handful of other users who first experienced it in that release.

I am frustrated that I donā€™t have the skills to dig much deeper into this. Also, I find it interesting that some users say that there is no log entry when the devices reconnect.

There is also: https://github.com/kaloz/mwlwifi#monitor-interface-for-debug

But that is something that I am not very familiar with on Linux.

1 Like

Thanks for confirming. Iā€™ll look into the commit range between Davidā€™s build and RC1. That should give me an idea on how broad of a range weā€™re dealing with.

Then I can start bisecting and create some test builds.

1 Like

You're welcome. And thank you for taking your time to do this more thorough work involved in this. I appreciate your time. If there's anything I can do or test, let me know.

Okay, by analyzing the commits between Davidc502's last build, and 21.02 RC1, it looks like we have a range of 2704 commits.

Specifically:

  • David's last build (r13342) - e35e40ad824eab9d51cdd690fb747e576e01412f
  • 21.02's RC1 (r16046) - 59980f7aaf585e65f87730b2f77d55662c362f22

I calculated 2704 commits using the following command:

git log e35e40ad824eab9d51cdd690fb747e576e01412f..59980f7aaf585e65f87730b2f77d55662c362f22 --pretty=oneline | wc -l

Which returned 2704.

If my math is correct, it will take ~11 (maybe 12) builds to bisect properly. We're doing a binary search, so it's basically the number of times we divide the range by 2 to get to ~1 commit.

I'm going to try making a test build with the commit used in David's last build to confirm that the issue does not show up there. (I really hope it doesn't, otherwise I'll be at a loss).

Once I can confirm that build works, then we can start bisecting properly.

Edit: Ran into first roadblock. When attempting to build the image, I'm seeing a lot of errors like the following:


Build dependency: Please install the GNU C Compiler (gcc) 4.8 or later
Build dependency: Please install the GNU C++ Compiler (g++) 4.8 or later

Prerequisite check failed. Use FORCE=1 to override.
make: *** [/home/user/src/openwrt-bisect/include/toplevel.mk:174: staging_dir/host/.prereq-build] Error 1

I suspect this is because my compiler setup is too new on this machine, specifically Archlinux using GCC 11.1.0.

Any tips from the community? Might be worth spinning up a virtual machine with a version of Ubuntu that was out around this time.

1 Like

@adworacz Unfortunately I do not have experience with OpenWrt build environments.

However, I have been trying to think of other possibilities in the meantime while digging into the commit history for mwlwifi. If we cannot find the issue within OpenWrt at this current moment that is triggering the wifi dropouts on mwlwifi, possibly we can try some manual changes in mwlwifi code as a temporary means for avoiding this issue.

I recall BrainSlayer stating that the Timeout issues means that the wireless chipset has crashed. Reference: https://github.com/kaloz/mwlwifi/issues/389#issuecomment-850980368

So I assume that when the chipset crashes, it takes a minute or so before it comes back up.

mwlwifi seems to have a history of Timeouts Search Ā· timeout (github.com) in which the Timeout value would be changed over the course of several commits. I believe this commit refers to the current Timeout value: Timeout prevent Ā· kaloz/mwlwifi@107aa01 (github.com)

Also, possibly manual changes to AMSDU_FW_MAX_SIZE.
Reference: mwlwifi: Increase A-MSDU to 7.9K to increase interoperability with Inā€¦ Ā· eduperez/mwlwifi_LEDE@93c3d8d (github.com)
Reference: https://downloads.linksys.com/downloads/releasenotes/WRT32X_Customer_Release_Notes_1.0.180404.58.txt

mwlwifi has been quite flaky and brittle with AMSDU over its history. Maybe that number of 7.9K might not specifically be necessary to fix the issue, but possibly somewhere in between.

I wish that it was easier or possible to get more detailed debug messages from the device itself to determine what exactly happens the moment before the issue occurs so that we could figure out what is triggering it.

I actually had a bit of a breakthrough a few minutes ago.

By using a virtual machine (Lubuntu 21.04 Hirsute Hippo), I think I can compile the same commit as Davidc502's last build. I have to wrestle with some harddrive sizing issues first, but things were looking a LOT better than when I was building on my host Archlinux machine.

I'll try getting a working VM setup in the next few days to ensure I can properly build, but this I saw definite progress today.

I'm glad that you have found a likely resolution to your build environment for building from those old commits.

I found something interesting today that we might be able to test out. I know that a bunch of users were testing out disabling 802.11w feature (instead of Optional) as a possible workaround to these wifi cutouts. A couple of users stated that it helped them, but the majority of users stated that it failed to resolve their wifi cutouts.

So I found a commit today that suggests that users setting 802.11w specifically to Disable via LuCI may have been actually removing the value line from wireless config file and therefore drivers may have been defaulting to Optional. This was commit landed 17 days ago and therefore would not be included in 21.02 builds yet.

Commit: luci-mod-network: fix disabling 11w MFP for WPA3 Ā· openwrt/luci@0b49ed4 (github.com)

Reference: Option "802.11w Management Frame Protection" in OpenWrt 21.02 on Linksys EA4500 (kirkwood) Ā· Issue #5431 Ā· openwrt/luci (github.com)

We could potentially test by manually adding option ieee80211w '0' to the wireless config in the relevant section(s). This is just a theory at the moment.

I recall back when I tested the suggestion to set 802.11w to Disabled, I had set it via LuCI at the time and recall the wireless cutouts still continuing. However, I did not look into the wireless config file at the time to verify whether the setting saved correctly or not. I personally wont be able to test this until the weekend though.

So unfortunately option ieee80211w '0' was not helpful in the end. I did a clean install of 21.02.0 on my WRT3200ACM and manually added that option for both radios. Rebooted router, etc.

I had great success for an hour or more with just my laptop. Once I connected a couple of iPhones, that is when everything became a disaster fast. The wifi cutouts were frequent and painful. They all showed internet access, but would not navigate anywhere during those cutouts. As many users said, toggling the wifi on/off would temporarily fix it. But not for long.

I am back at square one again with WRT3200ACM and 21.02.x series. Thanks to ability to reboot into other partition for Davidā€™s last build and things are stable there for now. I will keep 21.02.0 partition for testing when there are more ideas.

It seems, for the most part, that iPhoneā€™s and iPadā€™s seem to be the biggest trigger for this issue. And unfortunately, I have 4 iPhoneā€™s and 3 iPadā€™s on my network. I even have a few wireless security cameras also but they donā€™t seem to trigger the issue.

EDIT: I also wanted to note that I did not have any Timeout related lines in my syslog or kernel log. As a matter of fact, there was nothing in the logs that was relevant to the issue during the time frames that the issues were occurring. So frustrating having zero details.

It might be interesting to debug network traffic on client device to get an idea of what is happening. But I donā€™t know if that is possible on iOS devices. It doesnā€™t seem to happen to my Windows laptop.

1 Like

Thanks for the update, @WildByDesign . TBH, I'm not terribly surprised. I suspected the 802.11w feature had more of an impact on WPA3 more than anything else.

I've got builds working using my VM setup. My next setup is to do a rough diff with David's build output to see what settings he used (looks like he used the testing kernel, 5.4, at the very least), and then I'll produce a test build from there. I'm not sure that I'll produce an exact replica of his build, as he included a lot of packages that I don't personally use, but I can add things that testers might need.

From there we'll start testing step by step.

openWRT on WRT3200ACM is so disappointing, for a device that claims to be a first class openwrt supporter.
Just upgraded to 21.02 yesterday and had the same wifi problems. I ended up downgrading to 19.07. I am just wondering, how can you release a major version with such an issue? I wasted nearly 4 hours investigating the problem. Searching for configuration issues, problems while flashing etc.
Currently i am considering to replace my router's wifi functionality with dedicated hardware, as problems between Linksys and openWRT do not seem to get solved anytime at all. -.-

Because everyone has been waiting for you to address the shortcomings of the proprietary, closed source upstream BLOB.

3 Likes

@NotAnExpert_yet , like what anomeome is saying, Linksys has dropped support for these and hasn't made important parts open source, so, if those parts happen to be the source of the problem, it can make it much more difficult to fix. My only pushback on this situation is that key devs here must not be testing this series of Linksys routers very thoroughly at this point; that, or they just didn't want this issue to hold up release. I do get that open source relies on volunteers to test and fix things. Is this series of Linksys routers just not used by very many devs here anymore?

I had some success last night while following another theory for a temporary workaround. Since I am not a developer, I cannot fix code or understand debugging and such.

I realized that these constant, random wifi cutouts with 21.02.x series seem to only affect WRT3200ACM (Rango) and WRT32X (Venom), both of which use 88W8964 chipset while the older WRT series routers use different chipsets.

Over the years of following mwlwifi development, I have actually collected (and kept) a rather large amount of 88W8964 firmware blobs. So I figured, why not compile a list of the firmware blobs that I have and test a bunch of them to figure out if any of them do not contain the issue that is causing wifi cutouts.

I have a device (iPhone Xr) in which I can reproduce the wifi cutouts consistently and always within a 10 minute period of time.

9.3.2.12	fail
9.3.2.6		fail
9.3.2.5		
9.3.2.4		
9.3.2.3		
9.3.2.2		
9.3.2.1		

9.3.1.2		current testing? so far, so good

9.3.0.8		
9.3.0.7		
9.3.0.6

Also, I have been checking various details on some of these firmware blobs on Linksys' official release notes for WRT3200ACM and WRT32X.

Link: https://downloads.linksys.com/support/assets/releasenotes/WRT3200ACM_Customer_Release_Notes_1.0.8.199531.txt
Link: https://downloads.linksys.com/downloads/releasenotes/WRT32X_Customer_Release_Notes_1.0.180404.58.txt

My plan was to jump back several firmware blob versions, find a firmware blob that does not contain the issue, and work my way back closer to the newest possible firmware blob that does not cause wifi cutouts.

My very first jump back in versions, 9.3.1.2, seems to already be a success for not being able to reproduce the wifi cutouts. I put in about 3 hours of stress testing that firmware. Rebooted the iPhone Xr a bunch of times, added in my Ultrabook that has an Intel wifi chipset that is known to be problematic with mwlwifi. I had video streaming on both constantly.

Zero wifi cutouts. Zero. Fingers crossed.

Tonight I think that I will test 9.3.2.2 and see where it goes from there.

The reality is, the further back we go, the more likely we are to introduce other issues. So I want to find the most recent firmware blob that does not trigger the wifi cutouts.

In some cases, we may have to trade one issue for another and decide for ourselves which issue(s) we can live with and which we cannot. Constant, random wifi cutouts that require toggling wifi on/off to resolve is definitely not something that I can live with. But I also don't want to be stuck on older OpenWrt releases.

4 Likes