Users needed to test Wi-Fi stability on Linksys WRT3200ACM & WRT32X on OpenWrt 21.02

Oh, right - yeah - tasmota updae some time back made it good

Don't give up quite yet. I still hold out hope that if it worked in 19.07, then someone will figure out how to get it to work with a patch to 21.02.

I was thinking about this recently.

Has anyone attempted a bisect between David's last build (which can be found here) and 21.02 to find the offending commit where things broke?

I'm not averse to doing this myself, but I wanted to ask in case anyone has narrowed it down to some range of commits?

@adworacz That is a great idea. I do not have the skills to do the bisecting, but I do recall the timeline of events quite clearly from the forum itself. Hopefully pinpointing these timeframes may be beneficial for anyone capable of bisecting.

May 14:

First comments on Divested-WRT thread regarding wifi cutouts; there were also reports in 21.02 RC1 thread as well at the same time frame. So that means it hit 5.4 and 5.10 kernels at the same time because Divested-WRT was 5.10 at the time if I recall correctly.

Divested-WRT: No-nonsense hardened builds for Linksys WRT series - Community Builds, Projects & Packages - OpenWrt Forum

Divested-WRT: No-nonsense hardened builds for Linksys WRT series - Community Builds, Projects & Packages - OpenWrt Forum

Those were just the first two, but of course many more were to follow.

Two days prior, 20210512-00 release of Divested came out.

According to the changelog (https://divested.dev/unofficial-openwrt-builds/mvebu-linksys/CHANGELOG.txt):

20210512-00
- update to 6713fe030fca32fc3d5ad9761f3b2f96501aedd6
- [upstream] mitigations for https://www.fragattacks.com

Or unless those users were quick to pickup 20210514-00:

20210514-00
- update to 7fea9d9f5dd282a7049d77cc6b75e0a703ead26c
- [upstream] update to kernel 5.10.37 (security and bug fixes)

Regarding 21.02.0-rc1, we know that was released on 26 April 2021 officially. I will dig into the 21.02.0-rc1 thread later for relevant comments.

I found this comment from April 11th interesting:
Divested-WRT: No-nonsense hardened builds for Linksys WRT series - Community Builds, Projects & Packages - OpenWrt Forum

A significant DSA roaming fix went into 21.02 branch yesterday for the MV88 switches in all mvebu routers (along with the new wireguard). It's in the new snapshots and will presumably be in this next build here. I wonder if that'll help some of these less common network setups people have here using external switches. Some really nice polishing is happening on 21.02 branch and it's starting to look like 21.02 will be a very solid release once its done.

My main purpose right now is narrowing down the timeline based simply on comments from this forum. With the timeline, hopefully it can help narrow down which commit may have caused these issues or assist in bisecting if necessary.

I will dig in some more later when I have more time.

@mmortal03 Good call on not giving up on this yet. Your post gave me determination and energy to do some more research into this issue.

1 Like

I don't know if this commit is the cause or not, but I'm putting it here so that I don't lose it. It's the commit for kernel: DSA roaming fix for Marvell mv88e6xxx. Link: kernel: DSA roaming fix for Marvell mv88e6xxx · openwrt/openwrt@f1158fb (github.com)

Also, I have 21.02-SNAPSHOT downloads from 2021-03-06 and 2021-03-17 that I never ended up testing. I ended up waiting until RC1. However, those builds may narrow things down further since they may have been built before the commit was introduced that caused the wifi cutout issues.

The only problem is that I don't know if I will have time to test those builds out before next weekend to determine if I can reproduce the issue.

Thanks for the deep dive @WildByDesign!

What little information I can dig up from David's builds, his last build included changes from May 17th, 2020. If the timestamps are to be believed, he posted his last build r13342 on May 23rd, 2020.

So that may be our start date that we can bisecting forward from. Ideally we identify when these issues first started cropping up, which WildByDesign has already started to do. That will give us a potential end date to bisect backwards from.

Once we have that range, it should be just good ol' binary search to nail that down.

I can provide test builds as necessary.

Edit: I believe this is David's last post with the announcement for r13342: Davidc502- wrt1200ac wrt1900acx wrt3200acm wrt32x builds - #4925 by davidc502

Edit2: Saving this for later, but this will simplify building specific hashes, and keeping feeds in sync:

I can confirm I'm having the same issue with a Samsung Note 10+ and a WRT1900AC, running OpenWrt 21.02.0.
The phone is connected, until I start doing something, like watching youtube. It then looses connection (wifi icon with exclamation mark). If I don't do anything, it sits there for about 5 minutes, then it gets a connection again. I can manually disconnect and reconnect to fix it as well, but it's super annoying doing this several times per hour.

Logs don't say much, except for these lines, of which there are plenty, also for other devices:

Tue Oct  5 14:01:31 2021 kern.debug kernel: [428683.276860] ieee80211 phy1: Mac80211 start BA d6:ff:e1:bd:65:ba
Tue Oct  5 14:01:31 2021 kern.debug kernel: [428683.333366] ieee80211 phy1: Mac80211 start BA d6:ff:e1:bd:65:ba
Tue Oct  5 14:01:31 2021 kern.debug kernel: [428683.388916] ieee80211 phy1: Mac80211 start BA d6:ff:e1:bd:65:ba
Tue Oct  5 14:01:31 2021 kern.debug kernel: [428683.448859] ieee80211 phy1: Mac80211 start BA d6:ff:e1:bd:65:ba
Tue Oct  5 14:01:43 2021 kern.debug kernel: [428695.333484] ieee80211 phy1: Mac80211 start BA d6:ff:e1:bd:65:ba

There is no new entry when the phone finally gets internet again.
I was led to believe these lines are not problematic.

I have tried disabling "Allow AP mode to disconnect STAs based on low ACK condition", but that doesn't improve things.

Should I use sysupgrade or factory when going back to 19.07? I tried factory, but the device didn't come online, luckily it has dual boot.

Use factory, but make sure that you uncheck the "keep settings" option.
(You likely need to use the "force" option to overcome image check)

Factory is needed for the kernel partition size change (for WRT1900AC "Mamba").
The config-incompatible DSA change causes the need for not "keep settings".

Sorry, I had removed my question in the meantime. I was asking whether I should use factory or sysupgrade to downgrade back to 19.07. Factory failed, the wrt1900ac never came back alive. So I had to flash with sysupgrade and that worked perfectly. Finally stable wifi again.

I've spent days looking through past commits within the timeframe of these wireless issues but I have been unsuccessful.

My most recent theory was that it may have had something to do with the mitigations for FragAttacks ( mac80211: backport upstream fixes for FragAttacks · openwrt/openwrt@025bd93 (github.com)), in particular because of timing but also because it touches a variety of things on mac80211 such as A-MSDU frames and more.

However, after some more time looking into it, I realized that 21.02.0 RC1 was compiled 2-3 weeks prior to the FragAttack mitigations commit. So unfortunately that theory did not pan out.

I was also comparing many commits between master and 21.02 branches within specific timeframes and trying to find commits which were relevant but also commits that hit both branches around the same time frames.

I could not find any other relevant commits. I'm wondering if it could be within a kernel bump that went into both 5.4 and 5.10 kernels. But this goes far beyond my skills and understanding.

Have you tried this?

Have we confirmed that the issue happened in 21.02 RC1? I personally noticed it with RC3/4, but I don't know when other reports started, ie after or before RC1.

Yes, unfortunately it was in RC1. I personally experienced it in RC1 and there were a handful of other users who first experienced it in that release.

I am frustrated that I don’t have the skills to dig much deeper into this. Also, I find it interesting that some users say that there is no log entry when the devices reconnect.

There is also: https://github.com/kaloz/mwlwifi#monitor-interface-for-debug

But that is something that I am not very familiar with on Linux.

1 Like

Thanks for confirming. I’ll look into the commit range between David’s build and RC1. That should give me an idea on how broad of a range we’re dealing with.

Then I can start bisecting and create some test builds.

1 Like

You're welcome. And thank you for taking your time to do this more thorough work involved in this. I appreciate your time. If there's anything I can do or test, let me know.

Okay, by analyzing the commits between Davidc502's last build, and 21.02 RC1, it looks like we have a range of 2704 commits.

Specifically:

  • David's last build (r13342) - e35e40ad824eab9d51cdd690fb747e576e01412f
  • 21.02's RC1 (r16046) - 59980f7aaf585e65f87730b2f77d55662c362f22

I calculated 2704 commits using the following command:

git log e35e40ad824eab9d51cdd690fb747e576e01412f..59980f7aaf585e65f87730b2f77d55662c362f22 --pretty=oneline | wc -l

Which returned 2704.

If my math is correct, it will take ~11 (maybe 12) builds to bisect properly. We're doing a binary search, so it's basically the number of times we divide the range by 2 to get to ~1 commit.

I'm going to try making a test build with the commit used in David's last build to confirm that the issue does not show up there. (I really hope it doesn't, otherwise I'll be at a loss).

Once I can confirm that build works, then we can start bisecting properly.

Edit: Ran into first roadblock. When attempting to build the image, I'm seeing a lot of errors like the following:


Build dependency: Please install the GNU C Compiler (gcc) 4.8 or later
Build dependency: Please install the GNU C++ Compiler (g++) 4.8 or later

Prerequisite check failed. Use FORCE=1 to override.
make: *** [/home/user/src/openwrt-bisect/include/toplevel.mk:174: staging_dir/host/.prereq-build] Error 1

I suspect this is because my compiler setup is too new on this machine, specifically Archlinux using GCC 11.1.0.

Any tips from the community? Might be worth spinning up a virtual machine with a version of Ubuntu that was out around this time.

1 Like

@adworacz Unfortunately I do not have experience with OpenWrt build environments.

However, I have been trying to think of other possibilities in the meantime while digging into the commit history for mwlwifi. If we cannot find the issue within OpenWrt at this current moment that is triggering the wifi dropouts on mwlwifi, possibly we can try some manual changes in mwlwifi code as a temporary means for avoiding this issue.

I recall BrainSlayer stating that the Timeout issues means that the wireless chipset has crashed. Reference: https://github.com/kaloz/mwlwifi/issues/389#issuecomment-850980368

So I assume that when the chipset crashes, it takes a minute or so before it comes back up.

mwlwifi seems to have a history of Timeouts Search · timeout (github.com) in which the Timeout value would be changed over the course of several commits. I believe this commit refers to the current Timeout value: Timeout prevent · kaloz/mwlwifi@107aa01 (github.com)

Also, possibly manual changes to AMSDU_FW_MAX_SIZE.
Reference: mwlwifi: Increase A-MSDU to 7.9K to increase interoperability with In… · eduperez/mwlwifi_LEDE@93c3d8d (github.com)
Reference: https://downloads.linksys.com/downloads/releasenotes/WRT32X_Customer_Release_Notes_1.0.180404.58.txt

mwlwifi has been quite flaky and brittle with AMSDU over its history. Maybe that number of 7.9K might not specifically be necessary to fix the issue, but possibly somewhere in between.

I wish that it was easier or possible to get more detailed debug messages from the device itself to determine what exactly happens the moment before the issue occurs so that we could figure out what is triggering it.

I actually had a bit of a breakthrough a few minutes ago.

By using a virtual machine (Lubuntu 21.04 Hirsute Hippo), I think I can compile the same commit as Davidc502's last build. I have to wrestle with some harddrive sizing issues first, but things were looking a LOT better than when I was building on my host Archlinux machine.

I'll try getting a working VM setup in the next few days to ensure I can properly build, but this I saw definite progress today.

I'm glad that you have found a likely resolution to your build environment for building from those old commits.

I found something interesting today that we might be able to test out. I know that a bunch of users were testing out disabling 802.11w feature (instead of Optional) as a possible workaround to these wifi cutouts. A couple of users stated that it helped them, but the majority of users stated that it failed to resolve their wifi cutouts.

So I found a commit today that suggests that users setting 802.11w specifically to Disable via LuCI may have been actually removing the value line from wireless config file and therefore drivers may have been defaulting to Optional. This was commit landed 17 days ago and therefore would not be included in 21.02 builds yet.

Commit: luci-mod-network: fix disabling 11w MFP for WPA3 · openwrt/luci@0b49ed4 (github.com)

Reference: Option "802.11w Management Frame Protection" in OpenWrt 21.02 on Linksys EA4500 (kirkwood) · Issue #5431 · openwrt/luci (github.com)

We could potentially test by manually adding option ieee80211w '0' to the wireless config in the relevant section(s). This is just a theory at the moment.

I recall back when I tested the suggestion to set 802.11w to Disabled, I had set it via LuCI at the time and recall the wireless cutouts still continuing. However, I did not look into the wireless config file at the time to verify whether the setting saved correctly or not. I personally wont be able to test this until the weekend though.