I have a pretty simple network of Netgear WNDR3700v4 router plugged into fibre modem, then a WN604 device as WDS AP to extend range while also providing cabled access for a PC.
At various times the wifi performance drops from one or many devices. A wifi down && wifi up on the main router usually fixes this.
It's been like this for about three releases / builds over 18 months - I don't have the specific versions I used previously to hand but currently Linux WNDR3700v4-A 4.14.209 #0 Sun Dec 6 07:31:03 2020 mips GNU/Linux, via openwrt-ath79-nand-netgear_wndr3700-v4-squashfs-sysupgrade.bin a couple of weeks ago.
Is there a know problem that I can workaround or try to help fix? FS#2856 is the closest I can find in the ticketing system, but not quite the same.
When it happens I can't see high bandwidth consumption on any interface, and any device in the network that was doing that would presumably try again once the network was back up - but performance remains good for hours / days after being "refreshed". CPU and memory consumption also appears to be low.
To refine my question, I'm assuming packets are getting delayed as they come on or off the air, or maybe as they traverse interfaces in the bridge / routes. Buffers full, some operation timing out, etc.
Are there any counters I should inspect, logging I can set up on packets per second passing through a network filter / layer, etc?
Woke up this morning to find no wifi in the house. Pressing the hardware button that does wifi down && wifi up brought it back. This is possibly a different issue since as I mention above it's usually just a slow down rather than complete failure.
I took a look at logread to see what hostapd is doing. Up to midnight-ish while people are up and about in the house I see regular (de)association and de(auth) messages for various devices. If I filter all of those out I see two less common messages:
00:30:13 daemon.notice hostapd: wlan0: WDS-STA-INTERFACE-REMOVED ifname=wlan0.sta1 sta_addr=<downstream AP MAC>
01:54:06 daemon.notice hostapd: wlan1: STA-OPMODE-MAX-BW-CHANGED <TP-Link USB WiFi MAC> 20
01:54:11 daemon.notice hostapd: wlan1: STA-OPMODE-MAX-BW-CHANGED <TP-Link USB WiFi MAC> 40
And after that no hostapd activity at all until I press that button. I'll see if there are any know problems with a client device changing bandwidth parameters (ie the STA-OPMODE-MAX-BW-CHANGED event).
I'm assuming the WDS related message is not important for now, but will keep an eye on them.
One other oddity of that TP-Link USB wifi device is that a few hours earlier it sent a burst (actually 8) of these at 5 minute intervals (which no other device does), then stopped as suddenly as it started:
daemon.notice hostapd: wlan0: AP-STA-POLL-OK <TP-Link USB WiFi MAC>
I'll keep an eye on that device to see it it shows up in other ways when the wifi is increasingly sluggish.
WNDR3700v4 is Atheros based.
If it's on the 2.4 ghz and it's using ath9k then disable ANI and see if the problem happens again.
I'm not really familiar with what wireless drivers WNDR3700v4 is using, I kinda believe it's ath9k.
(No clue if there is a similar comand for ath10k.)
The phy0 in the command above might be phy1 cause you have 2 wireless modules there one from 2.4Ghz and one for 5Ghz. If ath9k is used for both modules you might have to use that command for both phy0 and phy1 if the problem is happening on both 2.4Ghz and 5Ghz. I fail to understand if the issue is on both 2.4Ghz and 5Ghz or only on one of them.
If the router reboots ANI will NOT be disabled. You need to add this line to
to make it permanent, but only do that after you tested and the problem went away with ANI disabled.
Disabling ANI can have some side effects like lower range coverage, like the fact that big changes in the radio waves can make the wireless connection unstable (packet loss) but this last one shouldn't really happen often.
In some particular eviroments for some reasons ANI is doing more harm than good. So if you have issues with ANI enabled you can give it a go with ANI disabled. (In some cases it can work for years with ANI enabled and then something changes and you have to disabled it else wireless is not stable after some hours/days for no reason at all). Why this is the case? You gonna have to analyze the radio waves to get the answer for this question and I don't really think it worth the money spent on the tools for such task if you only need such tools only to answer to this question. Also if the answer is a device that is not under your control and you can't access well there is nothing you can do (you might figure out who is the owner but making it change it won't be an easy task).
I may have stumbled across the reason for the intermittent performance issues that I opened the thread with. While looking at something else I realised:
My main router had WDS enabled on the 2.4GHz radio SSID, but not the 5GHz one. I then set both to Access Point (WDS).
When I enabled WDS on the main AP's 5GHz SSID I started to see syslog messages at each end about MAC addresses / packets being sent back to the originating device on the other band. This led to me enabling STP on the lan interface of each AP.
This loop got me thinking about why I had the remote / extender AP configured to join the main router on both 2.4Ghz and 5Ghz radios. I changed this to use just the 5GHz one (so deleted the extender's STA on 2.4GHz).
So I now have main router offering AP WDS on both bands, and the repeater connected on just the 5Ghz band as STA while also offering AP WDS on both bands. Plus STP enabled on each device.
My initial impressions are that everything is a more stable, although time will tell. I thought I'd add this while the changes are fresh in my mind though.
Quick note to say I didn't make any further changes and stability / performance remains good after untangling the WDS settings. From trying to make things better previously I do still have an historic daily reboot in crontab, but am going to delete that today as well.