Investigate unstable wireless

Hello,

This is what I have:

  • TP-Link Archer C7 v4 connected to the modem (WAN)
  • OpenWRT 19.07.1 (I've been using OpenWRT since at least version 18)
  • a wired PC to the router
  • at least 18 wifi devices connected to the wifi all the time (smart home devices from 3-4 different brands) plus 1-3 occasional wifi clients (phone, laptop, etc.). Most (lights, outlets) are 2.4 Ghz and around 5 (speaker, phone, laptop) are 5 Ghz. They are all very close to the router (same room, open floor)
  • internet speed shows as very fast in at different speed tests (at least 50 Mb/s down and 10 Mb/s up)

This is what is happening:

  • wired connection to my PC always worked well, no lag playing games, watching videos, surfing, etc.
  • wireless:
    • never noticed issue using my phone (I mean Internet use via wifi; never hand loading pages)
    • smart home devices randomly (no specific time of the day, or day of the week) have issues; music stream would stop playing unexpectedly, lights would show up as disconnected in native app (or not responding to commands). Important to note that when this happens, the affected client is still connected to the router (clients never lose connection to router/OpenWRT). Internet confirmed working when this happens (confirmed via PC and phone). I've been struggling with this issue for many months.
  • as a hint, I noticed a few times that when the smart speaker is going to stop, it cuts the music a little bit and stops (like a video buffering). So it leads me to believe that the smart speaker is having trouble keeping the connection.

I assumed my wireless might have issues keeping a good and stable connection.
This is what I already attempted (and nothing solved the issues fully):

  • I played a lot of settings already (more below).
  • I already did the basic stuff (reboot, reinstall, hard reset, etc.)
  • use Cloudfare and Google DNS (in separate attempts)
  • disable WMM
  • disable low ACK disassociation
  • removed short preamble
  • set to a better channel according to wifi survey at home (tried a few channels actually)
  • different lease times (as short as 5 minutes (just to test) and currently as 7 days)
  • set static IP (but issue remains)

Any ideas what else I can try (other settings not included above) to fix or at least investigate what is going on? Taking the simplest example, I don't understand why the smart speaker stops playing abruptly even though internet connection is running OK (e.g. I'm playing a game with no lag or navigating via browser without any hangs).

One day everything works perfectly. The other day, it works in the afternoon then at night a few lights do not respond. Without any change in the settings, the next day everything works perfectly again. It's very random. Sometimes asking the command again works, sometimes not (forcing me to give up or disconnect from power).

Yes, it could be the services provided by the companies, but I'd like to investigate if I can do anything as a few things lead me to believe my wireless might not be stable.

Thanks for any help!

  • Are you living in an area with dense population or some rural area without many wifi? Some wifi might be hidden, so you'd have to do some proper wireless scan to find the less occupied frequency to use.
  • Do disconnections happen the same time for all affected devices or only for 1-2 at a time?

I would run a continuous ping to the most important devices from the wired PC as well as to the internet and compare the results the moment that you experience the outage.

  • If you can ping the device but not the internet, you have some internet connectivity problem.
  • If you cannot ping the device, but internet works, then the problem lies on the Wifi.
  • If both pings work you need to verify connectivity with the IoT servers. This may not be so easy, as you'll have to run the tcpdump packet capture and check the destination addresses of the packets sourcing from the IoT devices.

Are you living in an area with dense population or some rural area without many wifi? Some wifi might be hidden, so you'd have to do some proper wireless scan to find the less occupied frequency to use.

Dense, many wireless networks around. Note that both 2.4 and 5 Ghz wifi suffer from these issues. Based on what I already tried, any other suggestion that would help my network work better in a dense environment?

Do disconnections happen the same time for all affected devices or only for 1-2 at a time?

Not at the same time. It might be just one light that doesn't respond, sometimes three, sometimes just the speaker stops playing abruptly.

...ping...

Ping works. Devices are always connected to router, and Internet is always working (testing from PC and phone).

Not really, especially when all devices are close to the antenna without obstacles.

Try the last case with pinging devices and server.

Try to decrease channel bandwidth (80/40/20 MHz), usually it helps in very dense areas. Furthermore I would think of DFS (weather radar) triggering change of frequency (5G) thus causing interruptions, try to set frequency to channel 48 or less (DFS free). What is your measured RF noise level? In my case it is around -95dBm on 2G and -105dBm on 5G in relative undense environment.

I was having similar issues for a while and I think it was due to my neighbours' many router constantly choosing the same channel as mine: I switch the channel and it works well for a day or two. Then I need to switch the channel again. I am using a different router though.
I made two changes: switched to auto channel selection for both bands and set beacon intervals to be prime numbers to reduce collisions (https://www.dslreports.com/forum/r14241618-Theory-on-Optimizing-a-802-11-Beacon).
If you pick a 20MGhz 5GHz channel, you might not have to exclude the DFS channels, but I have a couple of smart devices that are super slow on DFS channels, so I excluded those. No-one complained since.

5GHz
        option channels '36 40 44 48 149 153 157 161 165'
        option beacon_int '101'
        option channel 'auto'
        option htmode 'VHT20'
        option legacy_rates '0'
        option short_preamble '1'
2.4GHz
        option channels '1 6 11'
        option beacon_int '103'
        option channel 'auto'
        option htmode 'HT20'
        option legacy_rates '0'
        option short_preamble '1'

UPDATE: another thing to try is non-ct firmware. The default does not work for me.

Like the other have said, check what channels your neighbours are using either with an app (ex: WiFi Explorer on Macos) or with:

iwinfo wlan0 scan
iwinfo wlan1 scan

Set your router to channels that are less busy. If you haven't tried DFS channels (above 48 on 5GHz), it's probably worth trying them if no one or few people use them.

What light system do you have?
I have Philips Hue lights that were disconnecting on a random basis, allowing legacy 802.11b rates in the 2.4GHz wireless settings fixed the issue. I don't have the same router than yours, but it's worth trying again.

Interesting side-effect: iwinfo wlan scan is not working on 5G DFS enabled channels in my case (ath79 running CT-HTT firmware). When changing to non-DFS it works like a charm!

True! Same for me. I never realised. The scan function in Luci Network -> Wireless is the equivalent of the scan command line above, but I found the Luci 5GHz scan works 1 time out of 2 on non DFS channels, so better use the command line

When the radio is running in DFS mode it can't scan, because the receiver must stay on the channel all the time to check for radar. So change to a non-DFS channel temporarily to conduct scans.

1 Like

Understood. When using Luci, if you do, is the 5GHz scan working reliably for you?

Try to decrease channel bandwidth (80/40/20 MHz)

Yes, all 20 Mhz already

try to set frequency to channel 48 or less

It's channel 36 already

What is your measured RF noise level?

Tx-Power: 23 dBm
Signal: -38 dBm | Noise: -101 dBm

switched to auto channel selection

Sincerely, I think auto channel selection might cause more issues since these devices will have to eventually switch channels. In my case, I believe the channel should be the constant (the best possible one).

set beacon intervals to be prime numbers to reduce collisions

Thank you, I'm trying this now even though I don't think beacon collide like that just because both are 100 ms. This is the interval, not a specific time when they happen. Increasing them, in my case, might help a bit (not much though) just because the clients are always the same and discover/switch of APs don't need to be fast. Since I increased beacon interval, I also set DTIM to 1 instead of the default 2.

non-ct firmware

What is a CT (and non-CT) firmware?

Ok, -38dBm signal compared to -101dBm noise level gives a SignalToNoise ratio of 63dB which is a very strong and comfortable signal. Since you have set your 5G to ch36 there is no DFS activated. I would decrease the tx power with at least 3dB (try 20dBm) or even more as this might also increase stability. CT firmware is an alternative ath10k firmware (firmware needed to be loaded into the Qualcomm-Atheros radio chip based on SDR technology) developped by Candela Technologies (candelatech.com). Since CT firmware is still under development it has much more features compared to Qualcomm-Atheros original firmware (it looks like Qualcomm has stopped active development on ath10k platform). In order to make use of all features you best load the ct-firmware (in my case it is ath10k-firmware-qca988x-ct-htt) together with the ct-kernel driver (in my case kmod-ath10k-ct). In my case QualcommAtheros firmware and driver are named ath10k-firmware-qca988x and kmod-ath10k respectively. If you run already the ct version I would suggest to try the original Qualcomm software.
Have you checked your logs when you experience problems? In GUI you can check the system and kernel logs (under Status). The same can be done in cli as well using the commands "logread" and "dmesg" respectively.