It'll try it eventually. As you can imagine, my entire family gets a bit testy when the internet is down or choppy for a couple of hours It doesn't help that the symptoms are intermittent. I wish there was a more deterministic way to check for a problem.
Won't cause trouble, but the package should come from the same source as the running image.
kmod-ikconfig: the purpose of this module is to show the actual .config used to build the kernel. The .config file is different, so for the package to fulfill its purpose, it would need to match the source of the running image. That means if you're running a stock image, then you'd need to install the stock kmod-ikconfig; if you're running an image that was built using any of the recipes used here so far, you have to use that recipe's kmod-ikconfig, or /proc/config.gz will show a different configuration than the running kernel.
kmod-macremapper: I'm just not sure about this one, but I would bet it causes no trouble. The differences are restricted to 2 identical blocks of strings being in different positions. If I sort the output of strings then they match 100%.
Will probably cause trouble: wireless packages that appear to show different symbol names, probably due to the difference in mac80211, but that did not show a different version here compared to the official package. I since they are wireless drivers that depend on mac80211, use only the custom packages:
kmod-mwlwifi
kmod-rtl8812au-ct
Packages that have changed version because of the downgrade to mac80211, ath10k-ct, mt76. These should only be installed using the custom packages:
kmod-adm8211
kmod-ar5523
kmod-ath10k-ct-smallbuffers
kmod-ath10k-ct
kmod-ath10k
kmod-ath5k
kmod-ath6kl-sdio
kmod-ath6kl-usb
kmod-ath6kl
kmod-ath9k-common
kmod-ath9k-htc
kmod-ath9k
kmod-ath
kmod-b43
kmod-b43legacy
kmod-batman-adv
kmod-brcmfmac
kmod-brcmsmac
kmod-brcmutil
kmod-carl9170
kmod-cfg80211
kmod-hermes-pci
kmod-hermes-plx
kmod-hermes
kmod-ipw2100
kmod-ipw2200
kmod-iwl-legacy
kmod-iwl3945
kmod-iwl4965
kmod-iwlwifi
kmod-lib80211
kmod-libertas-sdio
kmod-libertas-spi
kmod-libertas-usb
kmod-libipw
kmod-mac80211-hwsim
kmod-mac80211
kmod-mt76-connac
kmod-mt76-core
kmod-mt76-usb
kmod-mt7601u
kmod-mt7603
kmod-mt7615-common
kmod-mt7615-firmware
kmod-mt7615e
kmod-mt7663-firmware-ap
kmod-mt7663-firmware-sta
kmod-mt7663-usb-sdio
kmod-mt7663s
kmod-mt7663u
kmod-mt76
kmod-mt76x0-common
kmod-mt76x02-common
kmod-mt76x02-usb
kmod-mt76x0e
kmod-mt76x0u
kmod-mt76x2-common
kmod-mt76x2
kmod-mt76x2u
kmod-mt7915e
kmod-mt7921e
kmod-mwifiex-pcie
kmod-mwifiex-sdio
kmod-mwl8k
kmod-owl-loader
kmod-p54-common
kmod-p54-pci
kmod-p54-usb
kmod-rsi91x-sdio
kmod-rsi91x-usb
kmod-rsi91x
kmod-rt2400-pci
kmod-rt2500-pci
kmod-rt2500-usb
kmod-rt2800-lib
kmod-rt2800-mmio
kmod-rt2800-pci
kmod-rt2800-usb
kmod-rt2x00-lib
kmod-rt2x00-mmio
kmod-rt2x00-pci
kmod-rt2x00-usb
kmod-rt61-pci
kmod-rt73-usb
kmod-rtl8180
kmod-rtl8187
kmod-rtl8192c-common
kmod-rtl8192ce
kmod-rtl8192cu
kmod-rtl8192de
kmod-rtl8192se
kmod-rtl8723bs
kmod-rtl8821ae
kmod-rtl8xxxu
kmod-rtlwifi-btcoexist
kmod-rtlwifi-pci
kmod-rtlwifi-usb
kmod-rtlwifi
kmod-rtw88
kmod-wil6210
kmod-wl12xx
kmod-wl18xx
kmod-wlcore
kmod-zd1211rw
Edited to make it clear that packages in "4" above are incompatible.
.. and again to remove kmod-thermal from the list, as it is an empty package not present in the custom build.
I can also confirm that running stock OpenWrt 21.02.1 on my WRT3200ACM and downgrading mac80211 (and related packages) from 5.10.x to the 5.7.5 packages from @cotequeiroz is successful and working as expected.
For anyone that tries this, make sure that you have the firmware blob package (mwlwifi-firmware-88w8964_2020-02-06-a2fd00bb-2_arm_cortex-a9_vfpv3-d16.ipk) installed before you reboot your router. Also keep in mind that the firmware blob would be different for WRT1200AC, WRT1900AC, and WRT1900ACS users.
Why is the effort now focussed on how to use a recent build and downgrade components?
Shouldn't it now be focussed on bisecting the mac80211 update and finding the breaking change upstream?
I know that's a much bigger task and a lot more complicated, but it may result in something that can be patched specifically instead of this massive downgrade effort. It could also turn out that the change isn't workable because of the closed firmware, but you won't know until you try.
Because we're trying to find a sustainable method of propagating the fix to all users. Our earlier custom images were incompatible with upstream repos, which was pretty limiting to most users. Creating custom images with custom repos is possible, but it's always more work than just creating compatible packages that can be installed on stock images.
@cotequeiroz seems to have cracked that nut wide open, and I thank them massively for it. This allows for a much more sustainable propagation of the fix to all users.
@WildByDesign - we will need to double check this. It's not just the different firmware blobs, but there's potentially different processors as well, which changes the build requirements and aren't apples-to-apples compatible with the WRT3200CM/WRT32X packages. To be frank, I haven't double checked the device tree just yet, so they might be compatible, we just need to be sure.
I can likely host these packages if need be. I can't image the bandwidth requirements would be that high, considering it's really only 4 small packages as of now.
I don't have access to any WRT1200/1900, so I won't be able to tell if the mac80211 downgrade is necessary or not. What I can say is that the kmod-mwlwifi package is the same across all of the mvebu/cortexa9 family, although--luckily for the rest of the pack--only the WRT series uses it.
Like @WildByDesign pointed out, the firmare packages are NOT the same. WRT3* series use mwlwifi-firmware-88w8964, while WRT1* series need mwlwifi-firmware-88w8864.
Like I said, I don't have the devices, but in theory, one can test if it fixes anything by installing the following packages (basically the same as wrt32*, except for mwifiex-sdio is not installed) from https://drive.google.com/drive/folders/1qOFN0bt2XQTGeC-9LhWfOLqf93wOsiv3?usp=sharing:
kmod-mwlwifi_5.4.154+2020-02-06-a2fd00bb-2_arm_cortex-a9_vfpv3-d16.ipk (this one needs to be installed from the link above, even though the version number is the same as the official package!)
My method was successful. However, based on dependencies, I lost the mwlwifi-firmware-88w8964_* firmware blob package along the way and that is why I had to manually install that. It's likely that my router would have crashed if I rebooted without first noticing that firmware blob missing.
So would it be smarter (and more foolproof) not to uninstall anything manually and simply use the --force-reinstall flag as part of the opkg install command?
I like the script approach, although I always get nervous with direct pipes into a shell, but that's a different issue.
I can put a git repo together on either Github or Sourcehut that keeps this all in one place, including the install script, a README, checksums, and likely the Dockerfile and build scripts, just to keep it all together. Actually, github might be able to do the hosting for us if we just commit the packages directly. Granted, if we wanted to do an official opkg repo, that might require separate hosting, which I can still provide.
That way the community can keep it updated via pull requests or fork as necessary if anyone needs to.
Thanks to the recent advancements here with being able to downgrade the mac80211 kernel modules, this will make testing for the root cause of the issue much easier. We can simply upgrade the kmods back to 5.10.x to trigger the issue for testing purposes, then downgrade back to 5.7.5 kmods after testing to return to a stable wireless network. All within the same stable OpenWrt build. This is really beneficial.
The 5.7.5 kmods will serve as a temporary fix until the root cause can be found.
Hereās a summary of what I see going on. Sorry for being too technical.
A mac80211 (generic wireless interface) package upgrade triggered the problem with mwlwifi (WRT3* wireless driver, which is tightly coupled to a close-sourced binary distributed firmwareā mwlwifi-firmware-88w8964āwhere most of the work is actually done). Itās probably the interaction between those two that caused the problem. We canāt do much to even look at what mwlwifi firmware is doing, so we need to explore mac80211ās changes to figure out why wireless is not working as expected.
Composing what I pictured above took a lot of effort already, thanks to the good people in this thread.
Bisecting mac80211 is not as straightforward. It is built as a backport of more recent drivers to an older kernel. In the working case, a backport of Linux-5.7.5 wireless drivers to a 5.4 Linux kernel. I donāt envision the opposite case working as is: using 5.7.5 drivers with a newer 5.10 kernel.
I donāt understand exactly how the backport repository can be bisected, but Iāll give it a try. Another option would be to take look at the commits between 5.7.5 and 5.8-rc2, and try to patch tem out of the backported codeāa lot of work.
Meanwhile, we can use the downgraded drivers with the 21.02 releases until we find a real solution.
If one really want to take a crack at master, a recipe for it would be to bring back kernel 5.4 and apply the same steps used in 21.02. It will probably not work forever, though.
src/linux/linux-stable $ git log --no-merges v5.7.5..v5.8-rc2 --oneline -- net/mac80211 | nl
1 a7f7f6248d97 treewide: replace '---help---' in Kconfig files with 'help'
2 59d4bfc1e2c0 net: fix wiki website url mac80211 and wireless files
3 523f3ec030aa mac80211: initialize return flags in HE 6 GHz operation parsing
4 07c12d618f06 mac80211: set short_slot for 6 GHz band
5 6fcb56ce0f90 mac80211: Consider 6 GHz band when handling power constraint
6 93382a0d119b mac80211: accept aggregation sessions on 6 GHz
7 2ad2274c58ee mac80211: Add HE 6GHz capabilities element to probe request
8 1bb9a8a4c81d mac80211: use HE 6 GHz band capability and pass it to the driver
9 3b3ec3d52e8f mac80211: check the correct bit for EMA AP
10 57fa5e85d53c mac80211: determine chandef from HE 6 GHz operation
11 2a333a0db24e mac80211: avoid using ext NSS high BW if not supported
12 607ca9ea3462 mac80211: do not allow HT/VHT IEs in 6 GHz mesh mode
13 d1b7524b3ea1 mac80211: build HE operation with 6 GHz oper information
14 24a2042cb22f mac80211: add HE 6 GHz Band Capability element
15 a6cf28e05f0b mac80211: add HE 6 GHz Band Capabilities into parse extension
16 a7528198add8 mac80211: support control port TX status reporting
17 c11299243370 mac80211: fix HT-Control field reception for management frames
18 1ea02224afc2 mac80211: allow SA-QUERY processing in userspace
19 dca9ca2d588b nl80211: add ability to report TX status for control port TX
20 3c23215ba8c7 mac80211: Replace zero-length array with flexible-array
21 2032f3b2f943 nl80211: support scan frequencies in KHz
22 e76fede8bf7c cfg80211: add KHz variants of frame RX API
23 60c2ef0ef07f mac80211: fix variable names in TID config methods
24 429ff87bcac7 docs: networking: convert mac80211-injection.txt to ReST
25 60689de46c7f mac80211: fix memory overlap due to variable length param
26 08fad438bed0 mac80211: TX legacy rate control for Beacon frames
27 b6b5c42e3bab mac80211: fix two missing documentation entries
28 3b23c184f72a mac80211: add freq_offset to RX status
29 b6011960f392 mac80211: handle channel frequency offset
30 dba25b04c611 mac80211: minstrel_ht_assign_best_tp_rates: remove redundant test
31 302ff8b7a2b0 mac80211: Fail association when AP has no legacy rates
32 0c197f16f7bc mac80211: agg-tx: add an option to defer ADDBA transmit
33 31d8bb4e07f8 mac80211: agg-tx: refactor sending addba
34 4826e721103a mac80211: Skip entries with HE membership selector
35 a4055e74a2ff mac80211: Don't destroy auth data in case of anti-clogging
36 d46b4ab870fa mac80211: add twt_protected flag to the bss_conf structure
37 9166cc49767a mac80211: implement Operating Mode Notification extended NSS support
38 873b1cf61105 mac80211: Process multicast RX registration for Action frames
39 6cd536fe62ef cfg80211: change internal management frame registration API
40 9eaf183af741 mac80211: Report beacon protection failures to user space
41 90e8f58dfc04 mac80211: fix drv_config_iface_filter() behaviour
42 1db364c88695 mac80211: mlme: remove duplicate AID bookkeeping
Commits 1,2,27 are cosmetic, which leaves us with 39 commits too look at. Not too bad, actually.
I think this should be the next step, I suggest creating 3 builds, something like:
1 - 15
1 - 31
1 - 42
And again go through the usual testing process, After that we can try to focus on specific commits.
btw, I am following this thread for a long time, really great work, unfortunately I am not a good candidate to help on testing, I don't have apple devices, had some problems with Samsung Galaxy S20, recently moved the 5Ghz clients to a different AP.
I do have both WRT32X and WRT1900ACS so I can at least confirm that the same package work on both of them.
I have been trying to follow along but was just looking for a bit of clarification. With your method, one could install 21.02.1 stable from the main site and then downgrade the noted packages using those provided on your drive (do these work for both 3200acm and 32x?). Would I then theoretically be able to install packages that rely on kernel modules that were breaking the modified build from this thread due to the mismatch, such as sqm, stubby, etc. or should I just build an image from the repo and include the module dependent packages right off the bat?
You should be able to just install the regular packages. The exceptions are wireless drivers--I do not expect many people will need to install mwlwifi and other wireless drivers at the same time, but it can happen. I have not included any other wireless drivers in my drive, so a custom build will be necessary.
All of this is theory. Reports of things that do and do not work will help to confirm my expectations.
I think that this is a great idea to help narrow it down. I have zero experience with building OpenWrt, so I don't know how easy or difficult it would be to build these 3 builds.
However, if somebody can provide these 3 builds, I am absolutely willing to test them on my WRT3200ACM. I am able to reproduce the issue consistently on my iPhone XR so it should be an easy process to narrow down as long as someone can build based on these commits.