Users needed to test Wi-Fi stability on Linksys WRT3200ACM & WRT32X on OpenWrt 21.02

A change in upstream mac80211 causing problems for the out-of-tree mwlwifi doesn't necessarily imply that there'd be an actual bug in mainline linux, it could just as well mean that mwlwifi hasn't been properly adapted to a recent kernel change.

2 Likes

Not sure about WildByDesign specifically, but I know others here have tested it and still had the dropouts: Users needed to test Wi-Fi stability on Linksys WRT3200ACM & WRT32X on OpenWrt 21.02.0-rc4 - #299 by alex77

Yup, that was exactly my intention, based on what you found in your testing. I'll create a new build, (likely r13922), that just contains the mac80211 5.7 upgrade so we can see exactly which commit exhibits the issue.

I'm pretty much done for tonight, so I'll look at creating it sometime tomorrow morning most likely.

Edit: 13922 is up!!: https://openwrt.austindw.com/linksys-wrt/wifi-hang-bisect/r13922/

This is specifically targets this commit:

commit d1100c76b33ff68c6db0f5fa31a26532bdbb15c4 (HEAD)
Author: Hauke Mehrtens <hauke@hauke-m.de>
Date:   Sun Jun 28 14:57:06 2020 +0200

    mac80211: Update to version 5.7.5-1

    The b43 and b43legacy driver now support DRIVER_11W_SUPPORT.

    Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
1 Like

@adworacz I just finished testing build r13922 (with mac80211 5.7.5-1 update) and it is a good build. Zero wireless disconnects. Very fast and smooth connection, actually.

I was really worried that build r13922 (with mac80211 5.7.5-1 update) might turn things upside down for our testing and show up with the issue too. But thankfully it did not.

The issue is definitely mwlwifi combined with mac80211 5.8 and 5.10. But the fact that mwlwifi is out-of-tree and unmaintained at the moment, this leaves us all in a bad position.

mwlwifi with mac80211 5.7.x is fantastic. But OpenWrt 21.02.x has moved on past mac80211 5.7.x. Unfortunately, I don't know if there is a way to build OpenWrt 21.02.x with mac80211 5.7.x specifically for WRT3200ACM/WRT32X.

3 Likes

You are 100% right with this.

I just wish that it was easier to track down exactly what is happening when it fails.

@hnyman Yes, I have reproduced the issue on 21.02.1 and also on a Master branch build about a week ago.

It is strange how this issue quite literally only affects certain mobile device clients. Particularly, the majority of affected devices are Apple's iPhones and iPads. Only a couple of users mentioned Android devices.

Yet, as far as I know, almost all laptops and desktops avoid this issue entirely.

It can still help a lot in potentially identifying the issue.

Another test would be checking 50f456b46cbae27ed13badfe7b2976cd01b67a57 with kernel 5.4 manually selected and mac80211 downgraded to the last known-good "with mac80211 5.7.5-1 update" - with a little luck that compiles (and works).

1 Like

Just a shot from the hip: 6cd536fe62ef58d7c4eac2da07ab0ed7fd19010d "cfg80211: change internal management frame registration API" might be worth checking in regards to corresponding changes in mwlwifi.

2 Likes

That is a nice idea.
Especially as the commit itself touches Marvell's in-kernel-tree mwifiex driver, and offers an example.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/wireless/marvell/mwifiex?id=6cd536fe62ef58d7c4eac2da07ab0ed7fd19010d

Probably the similar change needs to be done in mwlwifi, too.

( Ps. That might also explain why wpa3 and optional management frames have been a no-go with wrt3200acm.)

This thread is focused on mwlwifi drivers, that are not used by your device; you will probably get more support if you open a separate thread.

Especially as the commit itself touches Marvell's in-kernel-tree mwifiex driver, and offers an example.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/wireless/marvell/mwifiex?id=6cd536fe62ef58d7c4eac2da07ab0ed7fd19010d

Probably the similar change needs to be done in mwlwifi, too.

( Ps. That might also explain why wpa3 and optional management frames have been a no-go with wrt3200acm.)

Unfortunately mwlwifi has no API interface for cfg80211 in the current driver. My cursory analysis is that a complete driver rewrite would be necessary to properly upstream the driver.

Someone may be able to hack it but I don't have enough knowledge nor the documentation to even start.

The driver itself is a hack judging by the comments from the attempts at upstreaming the driver years ago.

Yeah, I think that they put lots of the functionality into the closed-source "firmware blob" and only exposed a smaller part of the functionality in the open driver.

All in all, the mwlwifi driver (and thus the mvebu routers regarding wifi) have been a dead duck since 2018, when the driver development was stopped. And in the whole time from early 2015 onward, there was pretty much just one coder doing driver fixes in the git repo. Somehow they first thgouht to launch this "popular" opensource series, but then they did not want to actually provide enough resources to keep it viable.

3 Likes

I'm beginning to remember why I stayed stock on this router now. Anyways, I remember some real old advice for OpenWRT on this router being to lock the channel to 36 or 40. On 36 I seem to have almost no issue. Didn't remember this until a reboot pushed my channel to 100. Then I started having connection problems with my phone. Locked at 36 now.

This was eventually resolved in 19.x series. Had to do with DFS if I remember correctly.

The DFS was fixed but the problems with wifi have persisted. I can't speak for all the 19 builds but as for the two 21 builds released so far, there are problems with the wifi when it comes to handheld devices.

1 Like

Thanks for the confirmation on this, @WildByDesign.

This means that right now, our current testing indicates the offending commit was ed2015c38617ed6624471e77f27fbb0c58c8c660:

commit ed2015c38617ed6624471e77f27fbb0c58c8c660 (HEAD)
Author: Hauke Mehrtens <hauke@hauke-m.de>
Date:   Sat Jun 20 23:11:17 2020 +0200

    mac80211: Update to version 5.8-rc2-1

    The following patches:
    * 972-ath10k_fix-crash-due-to-wrong-handling-of-peer_bw_rxnss_override-parameter.patch
    * 973-ath10k_fix-band_center_freq-handling-for-VHT160-in-recent-firmwares.patch
    are replaced by this commit in the upstream kernel:
    * 3db24065c2c8 ("ath10k: enable VHT160 and VHT80+80 modes")

    The following patches were applied upstream:
    * 001-rt2800-enable-MFP-support-unconditionally.patch
    * 090-wireless-Use-linux-stddef.h-instead-of-stddef.h.patch

    The rtw88 driver is now split into multiple kernel modules, just put it
    all into one OpenWrt kernel package.

    rtl8812au-ct was patched to compile against the mac80211 from kernel
    5.8, but not runtime tested.

    Add a patch which fixes ath10k on IPQ40XX, this patch was send upstream
    and fixes a crash when loading ath10k on this SoC.

    Tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> [ipq40xx/ map-ac2200]
    Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>

In the name of good science, I'm going to flash my own WRT3200ACM with both 13923 and 13922 to confirm that the former is BAD and the latter is GOOD. I'll likely sit on 13922 for a week to realllly make sure it's rock solid.

But I have little trouble believing the 5.8 mac80211 upgrade was the issue. I'm not personally sure what to do next. I might try @slh's suggestion to see if we can get lucky to produce a pretty recent build with an older mac80211:

Another test would be checking 50f456b46cbae27ed13badfe7b2976cd01b67a57 with kernel 5.4 manually selected and mac80211 downgraded to the last known-good "with mac80211 5.7.5-1 update" - with a little luck that compiles (and works).

I'll see about making my flashes tonight, will keep the thread posted on that and how making a new build with @slh's suggestion goes.

2 Likes

Your comments are always short and concise, yet valuable like gold. I appreciate it.

Let's assume for a moment that this suggestion works well. Would it make sense at some point to create a patch for 21.02.x series to downgrade mvebu (or mwlwifi) to kernel 5.4?

That wouldn't be possible.

mac80211 is a generic wifi subsystem for (almost) all drivers, not just mwlwifi. Downgrading it to the mac80211 5.7.5-1 state would thereby affect all other wireless chipsets as well (with a huge regression potential, apart from ath10k-ct and mt76 hard-depending on the v5.10 era mac80211); keep in mind that there are even mvebu devices with non-mwlwifi wireless (Turris Omnia, ath10k-ct). This wouldn't even be reasonably possible for 21.02, but for master it's out of the question.

Finding the reason what caused the regression in mwlwifi can only help in getting ideas what might be needed to fix in mwlwifi, mac80211 and the rest of the kernel are 'fine' (problems with out-of-tree drivers need to be fixed in these drivers, upstream linux doesn't care - either get it mainline, or keep working on your own).

1 Like

I have just achieved some interesting success. I decided to try playing around with AMSDU again. This success seems to be legitimate on all bad (affected) builds, including 21.02.x series.

5GHz    AMSDU Enabled   Fail
5GHz    AMSDU Disabled  Fail
2.4GHz  AMSDU Enabled   Fail
2.4GHz  AMSDU Disabled  Success

Fail = Wireless Cutouts
Success = No Wireless Cutouts

Basically, 5GHz band exhibits the wireless cutouts regardless of AMSDU enabled/disabled. The 2.4GHz band is successful, but only when AMSDU is disabled.

I am done testing for the day. I spent a few hours learning as much as I could about hostapd. Tomorrow, I plan on testing hostapd_options with with various options with regard to frame management and frame sizes. Plus, I plan on doing more testing with option vht_max_mpdu and it's several choices.

I have some ideas planned to test but I am too tired right now. I will see what I can do tomorrow.

One interesting thing that I noticed in the mwlwifi codebase is:

	if (priv->chip_type == MWL8964) {
		band->vht_cap.cap |= IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_11454;
		band->vht_cap.cap |= IEEE80211_VHT_CAP_SHORT_GI_160;
		band->vht_cap.cap |= IEEE80211_VHT_CAP_SUPP_CHAN_WIDTH_160MHZ;
	} else
		band->vht_cap.cap |= IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_3895;

The way that I understand it is that WRT3200ACM/WRT32X has a max MPDU of 11454, while WRT1900AC/ACS and WRT1200 have a max MPDU of 3895.

Also, I noticed through the mwlwifi commit history that there were a few times in which the max MPDU value was decreased due to issues, and increased at other times throughout development of the driver.

1 Like