Problem
Firstly, wired devices are unaffected. This problem only affects devices (iphones seen on two different ones so far) connected to my 5 GHz WiFi eventually get really sluggish to load data when browsing in Safari, Chrome, or using the Facebook app. When working properly, there is a minimal lag when, for example, googing and awaiting the hit set to return, or simply scrolling through the feed within the facebook app.
When I experience this "sluggish to load data" effect, I can wait 5-20+ seconds to see the hit set after hitting "go" on google search. Or I can see the facebook app trying to load more data but eventually timing out.
Partial Solution
I can usually fix the problem by executing /etc/init.d/network restart on the OW router (R7800), but eventually, the behavior returns and devices seem really slow again. As well, the restart is not foolproof with regard to fixing the problem.
Help to track down cause
Happy to post any config files/logs.
Here the output from dmesg. I do see some errors there. Around 76471 seconds, an error relating to backports-4.19.120-1/net/wireless/util.c which might be related or a red herring? As expected, I see that error echoed in the output of logread as well.
...
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.050020] ------------[ cut here ]------------
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.050076] WARNING: CPU: 0 PID: 11750 at backports-4.19.120-1/net/wireless/util.c:1147 0xbf309d88 [cfg80211@bf305000+0x37000]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.053720] invalid rate bw=0, mcs=15, nss=4
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.065027] Modules linked in: pppoe ppp_async ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt compat sch_cake nf_conntrack sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.117800] cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred ledtrig_usbport nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ifb usb_storage f2fs crc32_generic leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_of_simple ohci_platform ohci_hcd phy_qcom_dwc3 ahci ehci_platform sd_mod ahci_platform libahci_platform libahci libata scsi_mod ehci_hcd gpio_button_hotplug
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.156318] CPU: 0 PID: 11750 Comm: hostapd Not tainted 4.14.180 #0
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.178450] Hardware name: Generic DT based system
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.184708] Function entered at [<c030f1c4>] from [<c030b390>]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.189563] Function entered at [<c030b390>] from [<c07c09c4>]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.195380] Function entered at [<c07c09c4>] from [<c031f878>]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.201195] Function entered at [<c031f878>] from [<c031f8d8>]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.207012] Function entered at [<c031f8d8>] from [<bf309d88>]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.212884] Function entered at [<bf309d88>] from [<bf31783c>]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.218645] Function entered at [<bf31783c>] from [<bf324654>]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.224461] Function entered at [<bf324654>] from [<bf32504c>]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.230275] Function entered at [<bf32504c>] from [<c06e1a1c>]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.236093] Function entered at [<c06e1a1c>] from [<c06e0238>]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.241906] Function entered at [<c06e0238>] from [<c06e0a10>]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.247720] Function entered at [<c06e0a10>] from [<c06dfa48>]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.253537] Function entered at [<c06dfa48>] from [<c06dfe64>]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.259354] Function entered at [<c06dfe64>] from [<c0689d5c>]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.265171] Function entered at [<c0689d5c>] from [<c068a5c4>]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.270984] Function entered at [<c068a5c4>] from [<c0307b60>]
Sat May 23 11:28:45 2020 kern.warn kernel: [76471.276894] ---[ end trace 2f58026363a19bec ]---
...
I restarted the network as shown around 141151 seconds which gave the output there.
I should note that I think the kernel WARN in dmesg above is unrelated. I am experiencing the problem now on my iphones and I do not see that kernel WARN in dmesg at all
To fix the problem, I actually rebooted the router about 4 hours ago. Here is dmesg.
I noticed that right after the reboot, the iphones again experienced the lag I described above. The only line that seems out of place from demsg is this one which I do not understand:
[ 72.608110] ath10k_pci 0000:01:00.0: Invalid peer id 1 or peer stats buffer, peer: dc211200 sta: (null)
When I google on that error, I found this bug report.... related to my problem???
Note - I waited a few hours but still, the laggy web browsing was present on the iphones. I just ran /etc/init.d/network restart and the iphones (wifi) are working as expected with very fast responses. Here is dmesg again after I restarted it (around 5328 seconds).
Itād be interesting to compare with SQM on and off.
If the tests look fine consider testing your modem too since it is a puma 7 chipset (Some people are reporting lag): http://www.dslreports.com/tools/puma6
set your 5 ghz radio to 80mhz width. Might as well get full speed to the AP
turning off your mac filter - mac filters give a false sense of security and cause issues with ios devices and wifi. Apple products spoof their MAC address when communicating with access points ever since ios 8 - so a good part of your struggle is probably trying to authenticate to an access point that is demanding the true MAC. I donāt agree with all the recommendations in their article but the MAC filter section makes sense:
Last thing not wifi related - your non SQM wired test was really odd. Nearly 1 second of lag (D buffer bloat!) is quite a bit and would produce noticeable lag in loading things. Try the puma6 test and see if your modem is acting up.
Yes, the shitty Intel chipset you mentioned in that $250 modem is horrible! Thanks for your help to uncover this unrelated problem! I will return this modem and get a different one.
That said, I am still perplexed by the initial problem I posted relating to the cause of laggy wifi on some devices.
A functioning modem and getting rid of mac filtering should give you a solid base. iOS devices hate mac filtering - a quick google search and youāll find how big an issue its been since iOS 8. Its a good thing, why advertise your real MAC address?
Two additional things to consider:
You have two SSIDs per frequency. When you have virtual interfaces they act like an additional access point on that channel and you create more management / overhead packets to prevent collisions. The more SSIDs you have on a channel the bigger performance degradation. It usually isnāt much for two SSIDs on a single channel but depending on neighbors, number of clients, etc, it can add up (more lag). Consider consolidating SSIDs on each frequency if appropriate. Name the 2.4ghz and 5ghz channels different (see point #2)
There are different preferences out there for which frequency to connect to. Personally on my iOS devices I have them forget the 2.4ghz SSID and only have them remember one 5ghz SSID. The speed on 2.4ghz is terrible and more appropriate for smart devices, 5ghz is so much quicker for iOS devices. Jumping between frequencies and between virtual interfaces can cause issues with roaming wifi devices. Every jump introduces lag.
Lets see what the same testing looks like with all fully functioning equipment. You should be able to max out your ISP connection and have solid bufferbloat.
We are pretty isolated from neighbors so interference from other radios isn't a concern. I only have 3-4 clients at any given time connected and my 2.4 GHz radio on the R7800 is disabled.
I have a new modem now and it is unaffected by puma 6 test
Did you mean with the puma 6 test? It is much better now/hardly any reds in the test results. I believe I am still affected by the original issue I posted though.
Iād set your transmission power to a set number. Usually mid 20ās usually works well. 30 should be the max. A set number will give consistent results over the default.
What types of wireless clients are having issues - all of them or only some? Zero issues with wired clients now that the modem is fixed? What does dsl speed reports testing look like (less lag?).
I have super simple SQM settings. Never really found great benefit from advanced settings - anything unique with your SQM setup?
According to luci, the "maximum transmit power" is "driver default" which is 23 dBm which is the maximum from the pulldown. I can force a setting of "23 mBm (199 mW)" from the pulldown... is there something I can inspect to see what the current is (verbiage indicated max but driver can reduce).
EDIT: Here is how to see the actual power I believe.
# iwinfo|grep Tx
Tx-Power: 23 dBm Link Quality: 38/70
Tx-Power: 23 dBm Link Quality: 54/70
Tx-Power: 0 dBm Link Quality: unknown/70
I see it on the following devices based on pinging said device from the OW router:
*iPhone 7
*iPhone 8 Plus
*iPad Air 2
*Macbook Air
For whatever reason, I do not see it on the following:
*Lenovo P1
*Raspberry Pi4B (connected via wireless)
*iPhone 11
Lag only with a few clients, just the older iphone, ipad, and macbook. Other devices connected at the same time the iphones are lagging do not show the lag. This can be measured qualitatively by watching a google hit set take 5-20+ seconds to appear, or directly by pingging the device from the OW router and seeing those really long ping times and packet loss.
The first symptom is slow web browsing on the device when this problem is occurring. I can confirm it by pinging that device so I know it is not asleep.
Any unique network configuration for these, unique DNS, or anything else changed from the default openwrt config?
Running adblock or any other additional programs you are running that could be jacking with particular website functioning?
You could try doing a āreset network settingsā on one of these older apple devices and see if there are gremlins in the remembered client network settings.
You could update your router to the latest hynman master build (CT or ath10k - you have two different wifi driver options, Iād try both options)
lastly - you could select not to keep your configuration so that you start with the default settings if you think there might be a bug from prior configuration changes.
Beyond the above posts on simplifying/tweaking the configuration settings, trying a different wifi driver, considering reseting clients +/- the router to default configuration - Iām out of any additional tips or tricks. Hope one of the above helps!
My wifi settings for 5ghz (for comparison if anything is different):
Running pihole on a RPi but it is not serving DHCP, it is just blocking ads
I tried the reset network config but it did not help
I tried running the latest hnyman build but experienced the same issues with it (did not switch drivers, just used what came on out-of-the-box which I believe is CT.
I might do a factory reset and rebuild my configuration from scratch just to rule it out.
The only non-standard thing I have is physical port #4 is VLANed to be on the guest zone I created. Beyond that, I have a pretty standard "LAN" and "Guest" zone setup.
I did a factory reset and setup a simple 5 GHz SSID. I did not experience the lag initially but it did occur eventually. I thought it was due to another client joining, but it seems inconsistent.
I have switched to the ath10k (not ath10k-ct) and have been using it for 5 days now without any issue.