Hi, maybe someone has a hint for my problem.
My setup: OpnSense firewall acting as DHCP server and router
1 switch (old HP model)
2 APs with OpenWRT 21.02.1 (Linksys WRT3200ACM and Linksys WRT1900ACS)
My issue: my mobile sometimes looses WLAN connection and can't reconnect.
I think it happens (not every time but often) when switching from one AP to the other.
I think this happens since OpenWRT version 21, but this can be a coincidence.
When connection gets lost, Wi-Fi bar shows full strength, but mobile can't connect to the gateway (using a terminal emulation on Android with a simple ping command).
just waiting -> no success
Workaround: disable and enable Wi-Fi -> all working again
Tried to set a fixed IP on mobile, didn't help.
Can provide additional logs if needed.
Where should I start looking for the issue? firewall/router, switch or APs?
Are these devices running in a dumb AP mode? Does this happen on one of your APs every time?
It appears you have VLANs running to enable multiple networks. Is it safe to assume that you have setup these VLANs on your main router? Have you verified that the switch is configured properly with trunk ports for each of the APs and to the router? Does the problem happen on both networks or just one of them?
We'll probably want to see the contents of /etc/config/network for the problematic AP(s).
radio0 VHT40 channel 48
radio1 HT20 channel 6 <--- ok
Your 5G radios are trampling each other. It doesn't appear that radio2 is associated with a network, but the channels (36 and 38) will cause major issues. Set everything to VHT40 and spread your channels out properly.
Especially during debugging, it would be better to disable (remove the kernel module) the mwifiex based third radio, it's barely usable anyways (1x1, basically no antenna) and can mess up the region code massively.
There was an error affecting mwlwifi in 21.02.0 and 21.02.1, which will be fixed in the as-of-yet unreleased 21.02.2 - ideally switch to a 21.02- or master snapshot for the time being (until 21.02.2).
Thanks for all your questions, I try to answer them.
My APs running in "dumb" AP mode.
It often happens, but not always. Maybe I've to test further to check if I can force the error. As of now, I didn't do extensive field-testing
VLANs: yeah, there are VLANs and these VLANs are setup (router and switch). Everything works.
Happens on both networks? Can't answer it today, the 2nd WLAN is a guest WLAN with restrictions, I normally do not use it
radios: radio 2 has option disabled '1' it is not in use, shouldn't be problem?
remove the kernel module: any quick hint how to do that?
my network config from both APs
config interface 'loopback'
option device 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'
config globals 'globals'
option ula_prefix 'fd0f:97c2:5baa::/48'
config device
option name 'br-lan'
option type 'bridge'
list ports 'lan1'
list ports 'lan2'
list ports 'lan3'
list ports 'lan4'
config interface 'lan'
option device 'br-lan.1'
option proto 'static'
option ipaddr '192.168.1.249'
option netmask '255.255.255.0'
option gateway '192.168.1.254'
option broadcast '192.168.1.255'
list dns '192.168.10.8'
list dns '192.168.10.9'
list dns_search 'zzz'
config device
option name 'wan'
option macaddr '32:23:03:de:13:70'
config interface 'wan'
option device 'wan'
option proto 'dhcp'
option auto '0'
config interface 'wan6'
option device 'wan'
option proto 'dhcpv6'
option auto '0'
option reqaddress 'try'
option reqprefix 'auto'
config interface 'INTERNAL'
option proto 'none'
option device 'br-lan.10'
option force_link '1'
config interface 'GUEST'
option proto 'none'
option device 'br-lan.9'
option force_link '1'
config device
option type '8021q'
option ifname 'br-lan'
option vid '1'
option name 'br-lan.1'
config device
option type '8021q'
option ifname 'br-lan'
option vid '9'
option name 'br-lan.9'
config device
option type '8021q'
option ifname 'br-lan'
option vid '10'
option name 'br-lan.10'
config bridge-vlan
option device 'br-lan'
option vlan '1'
list ports 'lan1:u*'
config bridge-vlan
option device 'br-lan'
option vlan '9'
list ports 'lan1:t'
config bridge-vlan
option device 'br-lan'
option vlan '10'
list ports 'lan1:t'
config interface 'loopback'
option device 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'
config globals 'globals'
option ula_prefix 'fd81:e2a4:489f::/48'
config device
option name 'br-lan'
option type 'bridge'
list ports 'lan1'
list ports 'lan2'
list ports 'lan3'
list ports 'lan4'
config interface 'lan'
option device 'br-lan.1'
option proto 'static'
option ipaddr '192.168.1.250'
option netmask '255.255.255.0'
option gateway '192.168.1.254'
option broadcast '192.168.1.255'
list dns '192.168.10.8'
list dns '192.168.10.9'
list dns_search 'zzz'
config device
option name 'wan'
option macaddr '5a:ef:68:0c:f2:43'
config interface 'wan'
option device 'wan'
option proto 'dhcp'
option auto '0'
config interface 'wan6'
option device 'wan'
option proto 'dhcpv6'
option auto '0'
option reqaddress 'try'
option reqprefix 'auto'
config device
option type '8021q'
option ifname 'br-lan'
option vid '1'
option name 'br-lan.1'
config bridge-vlan
option device 'br-lan'
option vlan '1'
list ports 'lan1:u*'
config device
option type '8021q'
option ifname 'br-lan'
option vid '9'
option name 'br-lan.9'
config device
option type '8021q'
option ifname 'br-lan'
option vid '10'
option name 'br-lan.10'
config bridge-vlan
option device 'br-lan'
option vlan '9'
list ports 'lan1:t'
config bridge-vlan
option device 'br-lan'
option vlan '10'
list ports 'lan1:t'
config interface 'INTERNAL'
option proto 'none'
option device 'br-lan.10'
option force_link '1'
config interface 'GUEST'
option proto 'none'
option device 'br-lan.9'
option force_link '1'
config device
option name 'lan1'
Can you point me in the right direction, which log should I check? According to mobile there is a connection (WIFI symbol), but Terminal emulation shows that I'm unable to ping the gateway. I'm not sure if it is possible to dig through Android (not rooted) logs.
AP logs?
Possible logs in OpnSense?
My network know how is basic, I currently do not understand why WIFI is "working", device has an IP but is not able to ping anything. As stated before, WIFI OFF -> ON, all working again...
I'm currently guessing that it is a problem with my phone, not the infrastructure, but maybe I'm wrong...
I'd start with the AP logs. You can see them in the GUI under Status > System Logs, or you can get them on the CLI by using logread. Since the logs have timestamps, you can hopefully find entries that correspond to the loss of connectivity you are experiencing.
This is obviously an important thing to figure out. If this phone is the only one experiencing the issue, it could well be something wrong with that device. However, if you can reproduce the problem on other devices (laptops, other phones, tablets, etc.), then it could well be infrastructure.
I'm still trying to reproduce it, currently working without any issue (with no change at all ;-))
I'll report back if I can grab a log file when the issue happens
Today it happened again. I tried to leave the bad state "as it is" but after I moved back to the other AP it worked again.
I think the relevant log entries are these
Mon Jan 24 19:46:14 2022 daemon.info hostapd: wlan0: STA 39:c2:1f:2b:64:44 IEEE 802.11: authenticated
Mon Jan 24 19:46:14 2022 daemon.notice hostapd: wlan0: STA-OPMODE-N_SS-CHANGED 39:c2:1f:2b:64:44 2
Mon Jan 24 19:46:14 2022 daemon.info hostapd: wlan0: STA 39:c2:1f:2b:64:44 IEEE 802.11: associated (aid 1)
Mon Jan 24 19:46:15 2022 daemon.notice hostapd: wlan0: AP-STA-CONNECTED 39:c2:1f:2b:64:44
Mon Jan 24 19:46:15 2022 daemon.info hostapd: wlan0: STA 39:c2:1f:2b:64:44 WPA: pairwise key handshake completed (RSN)
Mon Jan 24 19:48:05 2022 kern.debug kernel: [51474.533740] ieee80211 phy0: Mac80211 start BA 39:c2:1f:2b:64:44
Mon Jan 24 19:57:21 2022 kern.debug kernel: [52030.685775] ieee80211 phy1: Mac80211 start BA d4:11:a3:d9:08:b7
Mon Jan 24 20:38:28 2022 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED 39:c2:1f:2b:64:44
Mon Jan 24 20:38:28 2022 daemon.info hostapd: wlan0: STA 39:c2:1f:2b:64:44 IEEE 802.11: disassociated due to inactivity
Mon Jan 24 20:38:48 2022 kern.err kernel: [54517.841915] ieee80211 phy0: cmd 0x9122=UpdateEncryption timed out
Mon Jan 24 20:38:48 2022 kern.err kernel: [54517.848037] ieee80211 phy0: return code: 0x1122
Mon Jan 24 20:38:48 2022 kern.err kernel: [54517.852598] ieee80211 phy0: timeout: 0x1122
Mon Jan 24 20:38:48 2022 kern.err kernel: [54517.856798] wlan0: failed to remove key (0, 39:c2:1f:2b:64:44) from hardware (-5)
Mon Jan 24 20:38:48 2022 daemon.notice hostapd: nl80211: nl80211_recv_beacons->nl_recvmsgs failed: -5
Mon Jan 24 20:38:48 2022 daemon.info hostapd: wlan0: STA 39:c2:1f:2b:64:44 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
39:c2:1f:2b:64:44 is the phone MAC address wlan0: failed to remove key (0, 39:c2:1f:2b:64:44) from hardware (-5) seems to be an error message
It is a Linksys WRT1900ACS device, this is mentioned here:
Concerning my specific issue with the WRT1900ACS2 5GHz network loosing WLAN access, I did ultimately disable scheduled reboots and switch to a snapshot build. I cannot remember if switching to snapshot was necessary to ameliorate this issue or something I did to fix another. Either way, I no longer have this specific problem now.
@psherman: Thanks anyway for your time, it seems that this is a bug in some way @User34: I have some sort of scheduled reboot, I power off my APs during the night ;-), but I don't think that letting them running overnight solves this issue. Maybe the snapshot build is worth giving it a try...
I would look at this thread as having many of the answers you may be looking for. Conclusion: update to latest snapshot build (sometime after Nov 25th 2021) and use channel 36.
@User34 Thank you for pointing me in that direction... I've updated both AP with the latest snapshot (quick shocking moment as LUCI isn't included with snapshots ;-)) and now both running snapshot from yesterday...
I'll try to force the issue, but as mentioned before it happened randomly and infrequently.
WiFi on these devices is terrible, has lots of problems like the one you mention, as well as others... Random dropouts, difficulty with ipv6 stuff, just generally buggy. The drivers are abandoned... Your best bet is move to a different chipset.
yeah, heard about that... by the time I bought them they were the only one with good support from OpenWRT... currently I've no money to switch them, so I have to live with it...
I think you'll have good luck with the master snapshot build concerning wifi performance. So far I haven't noticed any issues with my master build update yesterday. It's come a long way in the last 6 months. Sorry I didn't think to warn you about luci being a package you'll need to install. I have a running list of packages that install via SSH for any new builds/upgrades.