WiFi 5 breaking down every few hours/days | WRT1900ACS

Hey,

So I've been starting to use OpenWRT when I bought a WRT1900ACS at the beginning of last year.
There's an annoying problem though: Every now and then (after a few hours to a few days) the WiFi 5 networks become unusable. By unusable I mean that no devices can connect anymore.
For example an Android Phone cycles through "Connecting ..." and in about a second back to "Saved", and immediately back to "Connecting ...".
When I restart the WiFi 5 (through LuCi), all devices can connect again. Regular 802.11bgn works perfectly stable though.

Currently I am running OpenWRT 19.07.0, but the issue already persists since I started using OpenWRT with 18.06.2.

Any hints on where to start digging?
I am also happy to provide any information that is required.

Thank you very much!

I had the same issue with a different router.

I set up a cron job to ping a known wifi client every 20 minutes, and if ping failed, the router restarts wifi.

Issue solved.

If you have no permanent wifi client connected, like a wireless bridge, then you can set a cron job to restart wifi every morning at 6, or every 12 hours, for example. Keeps the issues away.

Do you need help with the script?

2 Likes

Well, that's a way to sail around the issue. I'd prefer to have a "real" solution, but I will keep yours in mind. Thanks!

Also, sometimes the WiFi breaks within the same day, so I'd have to restart it more regularly.

Thanks also for your offer on supporting me with creating the cron job! I'm fit enough with shellscripting to work it out on my own if I decide to walk that path.

Android is reported as problematic. 19.07.0??, but no update on mwlwifi so probably no much will change on that front.

When issue pops up again check/copy/safe the log files before restarting wlan. In LuCi you will find system and kernel logs under status; same can be done in cli by issuing "logread" and "dmesg" commands whatever you prefer.

No problem. Then I'd try erasing flash and writing a new firmware, if issue persists, I'd set a cron job to ping a known client every 15 minutes.

But feel free to experiment, but do write your final solution here. I am interested of what you will come up with.

Well, it doesn't only happen with android devices. I also have Dell and Lenovo Notebooks here that can't connect anymore.

I didn't quite get the point of your "19.07.0??". I am aware that it's not the latest version, but I just noticed that there are service releases out there. I'll update to 19.07.2 on the weekend. I don't really expect it to resolve the issues as they are present since 18.06.2 for me and I suspect some kind of configuration issue.

The issue is present right now. When I try to connect to a WiFi 5 network, absolutely nothing is visible through either logread or dmesg.

When I connect to the 802.11bgn network, there is some chatter in the logs (which to me looks like an ordinary connection procedure logging):

logread:

Wed Mar 11 21:15:45 2020 daemon.info hostapd: wlan1: STA 94:65:2d:xx:xx:xx IEEE 802.11: authenticated
Wed Mar 11 21:15:45 2020 daemon.info hostapd: wlan1: STA 94:65:2d:xx:xx:xx IEEE 802.11: associated (aid 1)
Wed Mar 11 21:15:45 2020 daemon.notice hostapd: wlan1: STA-OPMODE-SMPS-MODE-CHANGED 94:65:2d:xx:xx:xx static
Wed Mar 11 21:15:45 2020 daemon.notice hostapd: wlan1: AP-STA-CONNECTED 94:65:2d:xx:xx:xx
Wed Mar 11 21:15:45 2020 daemon.info hostapd: wlan1: STA 94:65:2d:xx:xx:xx WPA: pairwise key handshake completed (RSN)
Wed Mar 11 21:15:45 2020 daemon.info dnsmasq-dhcp[4088]: DHCPDISCOVER(br-lan) 94:65:2d:xx:xx:xx
Wed Mar 11 21:15:45 2020 daemon.info dnsmasq-dhcp[4088]: DHCPOFFER(br-lan) 10.0.10.189 94:65:2d:xx:xx:xx
Wed Mar 11 21:15:45 2020 daemon.info dnsmasq-dhcp[4088]: DHCPREQUEST(br-lan) 10.0.10.189 94:65:2d:xx:xx:xx
Wed Mar 11 21:15:45 2020 daemon.info dnsmasq-dhcp[4088]: DHCPACK(br-lan) 10.0.10.189 94:65:2d:xx:xx:xx ninja
Wed Mar 11 21:15:46 2020 kern.debug kernel: [4422137.140063] ieee80211 phy1: Mac80211 start BA 94:65:2d:xx:xx:xx

dmesg:
[4422137.140063] ieee80211 phy1: Mac80211 start BA 94:65:2d:xx:xx:xx

Is there anything specific we're looking for?

The kernel log indicates anomalies (exception/overload/crash etc) generated by the radio drivers. The system log usually indicates the events generated by the userspace daemons (e.g. hostapd/wpad etc) such as STA authenticated/associated/disconnected etc. Weird is that in your case you have wlan1 reporting events but, if I understood well, no events reported by wlan0? I have no experience with Marvel hardware (in my case I use qca/ath9/ath10 hardware). Do you see all configured wlan interfaces when you issue command "iwinfo" when you experience problems? Try also "iw wlan0 scan" to see if you receive other wireless APs when you experience connectivity problems.

Correct.

Yup, all here:

root@otemachi:~# iwinfo
wlan0     ESSID: "wlan0"
          Access Point: 26:F5:A2:xx:xx:xx
          Mode: Master  Channel: 36 (5.180 GHz)
          Tx-Power: 23 dBm  Link Quality: unknown/70
          Signal: unknown  Noise: -89 dBm
          Bit Rate: unknown
          Encryption: WPA2 PSK (CCMP)
          Type: nl80211  HW Mode(s): 802.11nac
          Hardware: 11AB:2A55 11AB:0000 [Marvell 88W8864]
          TX power offset: none
          Frequency offset: none
          Supports VAPs: yes  PHY name: phy0

wlan0-1   ESSID: "wlan0-1"
          Access Point: 00:25:9C:xx:xx:xx
          Mode: Master  Channel: 36 (5.180 GHz)
          Tx-Power: 23 dBm  Link Quality: unknown/70
          Signal: unknown  Noise: -90 dBm
          Bit Rate: unknown
          Encryption: WPA2 PSK (CCMP)
          Type: nl80211  HW Mode(s): 802.11nac
          Hardware: 11AB:2A55 11AB:0000 [Marvell 88W8864]
          TX power offset: none
          Frequency offset: none
          Supports VAPs: yes  PHY name: phy0

wlan1     ESSID: "wlan1"
          Access Point: 26:F5:A2:xx:xx:xx
          Mode: Master  Channel: 1 (2.412 GHz)
          Tx-Power: 20 dBm  Link Quality: 70/70
          Signal: -39 dBm  Noise: -82 dBm
          Bit Rate: 144.4 MBit/s
          Encryption: WPA2 PSK (CCMP)
          Type: nl80211  HW Mode(s): 802.11bgn
          Hardware: 11AB:2A55 11AB:0000 [Marvell 88W8864]
          TX power offset: none
          Frequency offset: none
          Supports VAPs: yes  PHY name: phy1

wlan1-1   ESSID: "wlan1-1"
          Access Point: 00:25:9C:xx:xx:xx
          Mode: Master  Channel: 1 (2.412 GHz)
          Tx-Power: 20 dBm  Link Quality: 52/70
          Signal: -58 dBm  Noise: -87 dBm
          Bit Rate: 144.4 MBit/s
          Encryption: WPA2 PSK (CCMP)
          Type: nl80211  HW Mode(s): 802.11bgn
          Hardware: 11AB:2A55 11AB:0000 [Marvell 88W8864]
          TX power offset: none
          Frequency offset: none
          Supports VAPs: yes  PHY name: phy1

root@otemachi:~#

Ummm... Yes, there are other APs being listed. For example:

root@otemachi:~# iw wlan0 scan
[...]
BSS a4:71:74:ef:60:f5(on wlan0)
        TSF: 778216038570 usec (9d, 00:10:16)
        freq: 5180
        beacon interval: 100 TUs
        capability: ESS (0x1111)
        signal: -94.00 dBm
        last seen: 3800 ms ago
        SSID: A1-EF60EC-5G
        RSN:     * Version: 1
                 * Group cipher: CCMP
                 * Pairwise ciphers: CCMP
                 * Authentication suites: PSK
                 * Capabilities: 16-PTKSA-RC 1-GTKSA-RC (0x000c)
[...]

I also browsed the logs a bit more.
The last lines for wlan0 in dmesg are as follows (no output in logread).

[3955802.467569] ieee80211 phy0: Mac80211 start BA e4:b3:18:db:2a:0e
[3956119.860101] ieee80211 phy0: radar detected by firmware
[3956120.400209] ieee80211 phy0: channel switch is done
[3956120.405210] ieee80211 phy0: change: 0x60

Whilst running iw wlan0 scan, the following has been printed to the logs:
logread:

ed Mar 11 21:37:56 2020 kern.debug kernel: [4423467.040270] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:56 2020 kern.debug kernel: [4423467.180244] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:56 2020 kern.debug kernel: [4423467.320238] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:56 2020 kern.debug kernel: [4423467.460230] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:56 2020 kern.debug kernel: [4423467.610237] ieee80211 phy0: change: 0x60
Wed Mar 11 21:37:57 2020 kern.debug kernel: [4423467.790233] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:57 2020 kern.debug kernel: [4423467.980232] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:57 2020 kern.debug kernel: [4423468.170230] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:57 2020 kern.debug kernel: [4423468.350228] ieee80211 phy0: change: 0x60
Wed Mar 11 21:37:57 2020 kern.debug kernel: [4423468.530226] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:58 2020 kern.debug kernel: [4423468.710225] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:58 2020 kern.debug kernel: [4423468.890224] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:58 2020 kern.debug kernel: [4423469.070231] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:58 2020 kern.debug kernel: [4423469.250219] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:58 2020 kern.debug kernel: [4423469.430218] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:58 2020 kern.debug kernel: [4423469.610223] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:59 2020 kern.debug kernel: [4423469.790226] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:59 2020 kern.debug kernel: [4423469.980223] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:59 2020 kern.debug kernel: [4423470.160212] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:59 2020 kern.debug kernel: [4423470.340211] ieee80211 phy0: change: 0x60
Wed Mar 11 21:37:59 2020 kern.debug kernel: [4423470.480210] ieee80211 phy0: change: 0x40
Wed Mar 11 21:37:59 2020 kern.debug kernel: [4423470.630210] ieee80211 phy0: change: 0x40
Wed Mar 11 21:38:00 2020 kern.debug kernel: [4423470.770212] ieee80211 phy0: change: 0x40
Wed Mar 11 21:38:00 2020 kern.debug kernel: [4423470.910216] ieee80211 phy0: change: 0x62

dmesg:

[4423467.040270] ieee80211 phy0: change: 0x40
[4423467.180244] ieee80211 phy0: change: 0x40
[4423467.320238] ieee80211 phy0: change: 0x40
[4423467.460230] ieee80211 phy0: change: 0x40
[4423467.610237] ieee80211 phy0: change: 0x60
[4423467.790233] ieee80211 phy0: change: 0x40
[4423467.980232] ieee80211 phy0: change: 0x40
[4423468.170230] ieee80211 phy0: change: 0x40
[4423468.350228] ieee80211 phy0: change: 0x60
[4423468.530226] ieee80211 phy0: change: 0x40
[4423468.710225] ieee80211 phy0: change: 0x40
[4423468.890224] ieee80211 phy0: change: 0x40
[4423469.070231] ieee80211 phy0: change: 0x40
[4423469.250219] ieee80211 phy0: change: 0x40
[4423469.430218] ieee80211 phy0: change: 0x40
[4423469.610223] ieee80211 phy0: change: 0x40
[4423469.790226] ieee80211 phy0: change: 0x40
[4423469.980223] ieee80211 phy0: change: 0x40
[4423470.160212] ieee80211 phy0: change: 0x40
[4423470.340211] ieee80211 phy0: change: 0x60
[4423470.480210] ieee80211 phy0: change: 0x40
[4423470.630210] ieee80211 phy0: change: 0x40
[4423470.770212] ieee80211 phy0: change: 0x40
[4423470.910216] ieee80211 phy0: change: 0x62

... but I've got no idea what those change lines mean ... :thinking:

It looks like wlan0 does receive but it looks like the radio driver for phy0 is reporting problems, I have no idea what they mean but when this event is repeated several times per second it does not look ok. When this happens does the radio keeps sending beacons (i.e. is the SSID visible on your clients)? Also remarkable is the DFS reporting radars meaning that the radio was set on a DFS channel (ch52 or higher) and switched to another channel upon detection. Could it be that exactly on that moment the issues started? You could try to set the radio to a non-DFS channel (ch48 or lower) and see if the issue pops up again.

EDIT: sorry I misunderstood the events.. those logs generated "phy0 change" few times/second during the scan are most probably normal. The radio is continuously switching channels looking for activity on other frequencies when it does not need to receive on it's "home" channel.

1 Like

Thanks a lot for your hints. Now that I've read what DFS is all about and what radar detection has to do with WiFi, I think that you actually might be on to something.

Yes, it was sending beacons on channel 100-112.

I checked my configuration again... My WiFi 5 configuration was set to auto channel with 80 MHz channel width.

I now set the configuration to 40 MHz channel width and fixed the channels to 36-40. I'll give it a try for a few days and see if it's stable now.
Somehow I dislike fixed channels though (although that might be completely absurd). Maybe the problem was that the firmware couldn't find a "good enough" 80 MHz slot to switch to? I think I might try auto mode with 40 or even 20 MHz channel width if the current settings seem to turn out stable.

What do you think about that?

Indeed, my perception as well, when auto-searching "free" channels it will base the search on basic settings (such as bandwidth) and it is evident the availability of VHT80 channels is far less compared to VHT40 and HT20 channels. This means that it will not try to decrease the selected bandwidth upon unsuccessful search. Now regarding the aspect "auto": it is based upon an algorithm (build into hostapd) searching for the "less occupied" channels but I never saw evidence that, once the channel has been set, that same algorithm keeps checking from time to time the entire spectrum looking for better channels unless re-initialisation (reboot of entire system, restart the radio due to config changes, etc). The only real dynamic aspect is the DFS functionality (only on 5G-DFS channels and build into the firmware): when a DFS channel has been chosen by the "auto" algorithm it will start the DFS algorithm: listen 60" to make sure there is no radar detected before start using the channel (DFS-CAC-START - DFS-CAC-COMPLETED) and, once active in production, listen continuously for radar activity in between the users traffic. Upon detection it is this same DFS functionality stopping immediately using this channel and start searching for another channel.

auto defaults to the lowest available channel.

Sooo, after six days my resume is:
It really looks like DFS is the culprit here. Being fixed to channels 36-40 on 40MHz channel bandwidth, the WiFi networks seem to be stable.

Now the problem is that the non-DFS channels 36-48 are kind of crowded already. According to Wikipedia the channels 50-144 are DFS only, while any channels above are SRD channels limited to 25mW...

Would you have any hints on where to start looking to get a working DFS implementation? Is the DFS logic implemented in hostapd too, or is that actually a firmware feature?

For now I'm testing auto with 40MHz channel width. Thanks for clarifying the meaning of auto btw!

DFS pattern detection is completely within the firmware, driver and hostapd only react to its findings.

Also looks like it is overriding the fixed freq. setting also. Even if I bound it to specific freq. it init. a jump sequence.

Thu Mar 19 21:48:52 2020 daemon.notice hostapd: wlan0: DFS-RADAR-DETECTED freq=5260 ht_enabled=1 chan_offset=1 chan_width=2 cf1=5270 cf2=0
Thu Mar 19 21:48:52 2020 daemon.notice hostapd: wlan0: DFS-NEW-CHANNEL freq=5180 chan=36 sec_chan=1
Thu Mar 19 21:48:52 2020 kern.info kernel: [ 2217.333497] ieee80211 phy0: radar detected by firmware
Thu Mar 19 21:48:52 2020 daemon.info hostapd: wlan0: IEEE 802.11 driver starting channel switch: freq=5180, ht=1, vht_ch=0x0, offset=1, width=2 (40 MHz), cf1=5190, cf2=0

Can anyone confirm that is how it works? For me it normally starts of channel 100 and works is way down to the lowest I can use (36 in Europe).

I cannot confirm for sure, but from what I've read in the last days (and I did quite some research) the behaviour you are describing is exactly what is mandated by regulations.
It doesn't matter if you got to a DFS channel by auto mode (which only selects your best channel on startup) or if you told your AP to start broadcasting on a specific channel. If it's on a DFS channel it has to scan for radars and get out of the way if it detects one.

But yeah, the term fixed is kind of misleading here. Better see it as initial channel on startup, as fixed only applies if you choose a non-DFS channel.

hth.

1 Like

Pm'ed
Thanks

I will reply to your PM here...

To set up a cron job to restart wifi @ 6 in the moring every day:

Log in Luci
System -> Scheduled Tasks

And enter this line:

0 6 * * * wifi

If you want to do it every 6 hours, then put this line:

* */6 * * * wifi

Or if every 12 hours, then this line:

* */12 * * * wifi

Click on Save

You are done.

1 Like