Dumb AP with linksys e8450 / Belkin RT3200 occasionaly client associates but no connectivity

@nkef, @_FailSafe and @darksky any progress made?

If it's relevant, I see tons of these on all my AP's:

root@OpenWrt:~# iw event -f -t
1655117455.861487: wlan0-1 (phy #0): unknown event 60
1655117532.870205: wlan0-1 (phy #0): unknown event 60
1655117532.883033: wlan0-1 (phy #0): unknown event 60
1655117536.831729: wlan0-1 (phy #0): unknown event 60
1655117536.841423: wlan0-1 (phy #0): unknown event 60
1655117540.837648: wlan0-1 (phy #0): unknown event 60
1655117540.926989: wlan0-1 (phy #0): unknown event 60
1655117540.938121: wlan0-1 (phy #0): unknown event 60

But issue remains - iPhone loses connectivity pending manual WiFi disconnect and reconnect. Is it that iPhone cannot recover from a WiFi sleep?

An earlier 22.03 snapshot fixed the issue for me and my wife got made at me for upgrading the snapshot because the issue came back.

1 Like

I was actually just thinking about this over the weekend. For an unrelated reason, I switched back to using usteer (from DAWN) a couple weeks ago. This weekend I also realized my Apple device disconnect issue has not be occurring. Now, I'm not here blaming the issue on DAWN, but I am pointing out something that appears like a correlation.

FWIW, here are some details:

Summary
root@AP-Office:~# cat /etc/os-release
NAME="OpenWrt"
VERSION="SNAPSHOT"
ID="openwrt"
ID_LIKE="lede openwrt"
PRETTY_NAME="OpenWrt SNAPSHOT"
VERSION_ID="snapshot"
HOME_URL="https://openwrt.org/"
BUG_URL="https://bugs.openwrt.org/"
SUPPORT_URL="https://forum.openwrt.org/"
BUILD_ID="r19792-f03b20837b"
OPENWRT_BOARD="mediatek/mt7622"
OPENWRT_ARCH="aarch64_cortex-a53"
OPENWRT_TAINTS="no-all busybox"
OPENWRT_DEVICE_MANUFACTURER="OpenWrt"
OPENWRT_DEVICE_MANUFACTURER_URL="https://openwrt.org/"
OPENWRT_DEVICE_PRODUCT="Generic"
OPENWRT_DEVICE_REVISION="v0"
OPENWRT_RELEASE="OpenWrt SNAPSHOT r19792-f03b20837b"

root@AP-Office:~# dmesg | grep -i firmware
[    0.000000] psci: PSCIv1.1 detected in firmware.
[   14.838418] mt7622-wmac 18000000.wmac: N9 Firmware Version: 2.0, Build Time: 20200131180931
[   15.036309] mt7915e 0000:01:00.0: WM Firmware Version: ____000000, Build Time: 20211222184052
[   15.095022] mt7915e 0000:01:00.0: WA Firmware Version: DEV_000000, Build Time: 20211222184111
1 Like

Ah - could it be that this issue is resolved in master snapshot but not 22.03 snapshot? For me it was fixed in an earlier 22.03 snapshot than the present 22.03 snapshot (r19424-3b90edaff9), so perhaps there was a regression.

If you're not seeing this issue anymore on master snapshot perhaps I should try that too.

By the way how does your issue manifest exactly? At our end it seems my wife randomly picks up iPhone and then tries internet and no connectivity. So she disconnects and reconnects to WiFi manually, which restores it. Is that what you grappled with too?

The problem with these WiFi issues is that there seem to be the potential for lots of different issues all at the same time.

I don't see this issue at all on my Pixel 3a or laptop.

I switched back over to DAWN for the moment to see if the issue returns. Will report back in a day or two, or sooner if the issue reoccurs.

2 Likes

Do you see this type of verbiage in your logs when the issue is occurring?
daemon.info dawn: Client <your client MAC>: Kicking due to low active data transfer: RX rate 6.000000 below 6 limit

I'm not using dawn at the moment. I just had a look at the dawn documentation and it looks quite clever (e.g. sharing info across AP's to determine best current AP).

Ah! Okay, so sounds like we can cross DAWN off the list of suspects at this point. Perhaps the fix for this issue was in one of the many mt76 commits that has occurred within snapshot as of late.

I am still using the remedy script, occasional disconnects and reconnects still occur i have not checked if it is the same issue. I am on OpenWrt 22.03.0-rc1.

Have you tried playing around with max_inactivity? See here, and especially:

Adding to this, as I spent the last 2 days debugging why my phone was not doing "fast" (<2m) A->B->A AP transitions (WPA-PSK): some clients do not disassociate/deauth from the old station even after associating to the new one (my OnePlus 5 Android 10.0 phone seems to do this).
My understanding was that at least in FT-over-DS via communication to the old AP, this case should have been covered by hostapd, but it's not.
Going A->B works perfectly, but since the client is never deauth/disassociated from A after the transition, going back to A before max_inactivity kicks in on AP A means the client will not reassociate to A, because A expects the old key and Client uses a new one I suppose (and it's still registered in the kernel, you can observe the infamous "Could not set STA to kernel driver" error until the client decides to re-do the entire handshake properly).
Lowering max_inactivity to 15 seconds allowed me to mostly work around this issue.

@_FailSafe I was also planning to use Dawn can you provide the configuration you are using?

Isn't dawn a kind of a hack vs the standards though? Is there not a way to achieve the same functionality without using it? Like with one or more of the 802.11.x features?

This is probably not the right thread to debate the worthiness of dawn in great depth. But, I think the thing to remember with dawn is that part of its usefulness is that it implements 802.11k. Therefore, it offers data to wifi clients about neighboring APs, but allows clients to make their own decisions about if/when to roam as a result of the 802.11k data.

But I'm not an expert on all that. I honestly don't see much advantage of dawn over usteer right now. FWIW, usteer feels more lightweight (in a good way) and also supports IPv6.

1 Like

I have implemented this setting update previously (using 20000 for the value) and have been running with it for a while. But it did not seem to have a bearing on the disconnects I was seeing with Apple devices.

1 Like

I think the majority (or at least some) of the E8450 2G issues mentioned here were due to a TX queue that was getting stuck. @nbd pushed updated firmware to mt76 this morning that should resolve this. If you are still having issues it's worth giving this a shot.

3 Likes

Thank you for the heads up, lets hope it will land to OpenWrt 22.03 soon.

1 Like

Pushed the fix out to master and 22.03

6 Likes

Have you heard of any regression with the updated firmware? I refreshed my RT3200 build up to commit 24eee4b244 yesterday and ended up in a spot where I think a radio was crashing repeatedly.

I went back to my prior build (aae3a8a254) and confirmed everything was back to normal again. I upgraded to a build at 04545c4325 and confirmed all is well. Both of these builds have N9 Firmware Version: 2.0, Build Time: 20200131180931.

The behavior I was seeing with the new firmware was this:

For more context, 192.168.45.5 is the next hop (my OpenWrt gateway) beyond all my RT3200 APs.

Below is a redacted copy of my wireless config. This is an identical config on all three of my RT3200 APs, except for channel numbers. All three APs on the same build (24eee4b244), including the updated firmware, seemed to exhibit this behavior with all clients. Unfortunately, I was not in a great spot to do a lot of deep troubleshooting in logs to confirm the exact cause.

Wireless Config:

Summary
config wifi-device 'radio0'
	option type 'mac80211'
	option path 'platform/18000000.wmac'
	option band '2g'
	option htmode 'HT20'
	option country 'US'
	option cell_density '0'
	option log_level '1'
	option channel '1'
	option txpower '20'

config wifi-device 'radio1'
	option type 'mac80211'
	option path '1a143000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0'
	option band '5g'
	option country 'US'
	option cell_density '0'
	option htmode 'HE80'
	option he_bss_color '1'
	option log_level '1'
	option channel '44'
	option txpower '11'

config wifi-iface 'default_radio0'
	option device 'radio0'
	option network 'lan'
	option mode 'ap'
	option key '<redacted>'
	option dtim_period '3'
	option ieee80211r '1'
	option ft_over_ds '0'
	option ft_psk_generate_local '1'
	option reassociation_deadline '20000'
	option ieee80211k '1'
	option ieee80211v '1'
	option bss_transition '1'
	option encryption 'psk2+ccmp'
	option ieee80211w '1'
	option max_inactivity '15'
	option ssid '<redacted>'
	option mbo '1'

config wifi-iface 'default_radio1'
	option device 'radio1'
	option network 'lan'
	option mode 'ap'
	option key '<redacted>'
	option dtim_period '3'
	option ssid '<redacted>'
	option ieee80211r '1'
	option ft_over_ds '0'
	option ft_psk_generate_local '1'
	option reassociation_deadline '20000'
	option ieee80211k '1'
	option ieee80211v '1'
	option bss_transition '1'
	option encryption 'psk2+ccmp'
	option ieee80211w '1'
	option max_inactivity '15'
	option mbo '1'

config wifi-iface 'wifinet2'
	option device 'radio1'
	option mode 'ap'
	option ssid '<redacted>'
	option key '<redacted>'
	option network 'GUEST'
	option dtim_period '3'
	option ieee80211r '1'
	option ft_over_ds '0'
	option ft_psk_generate_local '1'
	option reassociation_deadline '20000'
	option ieee80211k '1'
	option ieee80211v '1'
	option bss_transition '1'
	option encryption 'psk2+ccmp'
	option ieee80211w '1'
	option max_inactivity '15'
	option mbo '1'

config wifi-iface 'wifinet4'
	option device 'radio0'
	option mode 'ap'
	option ssid '<redacted>'
	option key '<redacted>'
	option dtim_period '3'
	option ieee80211r '1'
	option ft_over_ds '0'
	option ft_psk_generate_local '1'
	option network 'GUEST'
	option reassociation_deadline '20000'
	option ieee80211k '1'
	option ieee80211v '1'
	option bss_transition '1'
	option encryption 'psk2+ccmp'
	option ieee80211w '1'
	option max_inactivity '15'
	option mbo '1'

config wifi-iface 'wifinet5'
	option device 'radio0'
	option mode 'ap'
	option ssid '<redacted>'
	option key '<redacted>'
	option dtim_period '3'
	option ieee80211r '1'
	option ft_over_ds '0'
	option ft_psk_generate_local '1'
	option network 'IOT'
	option reassociation_deadline '20000'
	option ieee80211k '1'
	option ieee80211v '1'
	option bss_transition '1'
	option encryption 'psk2+ccmp'
	option ieee80211w '1'
	option max_inactivity '15'
	option mbo '1'

Please do this for debugging, before triggering the problems:
for f in /sys/kernel/debug/ieee80211/*/mt76/fw_debug; do echo 1 > $f; done
Afterwards, capture a log of the kernel messages that show up

2 Likes

Captured these from where the issue just occurred:

Log:

Issue seen from client:

Please test this patch: https://termbin.com/9jba
It has the firmware update, but not the commit after it.

3 Likes