High latency/pings with VLAN interfaces and wifi

I have a router set up with pfSense that is sending a trunked input of VLANs to my router, which is a Netear R7800. I have installed the 19.07.4 img on the router. I have setup the "switch" tab in the router to bring in my VLANs. And I added the interfaces and wireless configurations per various tutorials I used. I set openwrt to act as a WAP, or a 'dumb' access point. And everything is basically working. I connect, I have internet, I get separate IPs to my devices (dhcp through pfSense), and so forth.

But my pings and latency are terrible to any wifi device. If I ping from my desktop wired to the r7800 I get pings less than a millisecond. But here are my pings when I ping a wireless device:

64 bytes from 10.11.12.151: icmp_seq=18 ttl=63 time=1458 ms
64 bytes from 10.11.12.151: icmp_seq=19 ttl=63 time=449 ms
64 bytes from 10.11.12.151: icmp_seq=20 ttl=63 time=1151 ms
64 bytes from 10.11.12.151: icmp_seq=21 ttl=63 time=125 ms
64 bytes from 10.11.12.151: icmp_seq=22 ttl=63 time=963 ms
64 bytes from 10.11.12.151: icmp_seq=23 ttl=63 time=11.9 ms
64 bytes from 10.11.12.151: icmp_seq=24 ttl=63 time=570 ms
64 bytes from 10.11.12.151: icmp_seq=25 ttl=63 time=1209 ms
64 bytes from 10.11.12.151: icmp_seq=26 ttl=63 time=199 ms
64 bytes from 10.11.12.151: icmp_seq=27 ttl=63 time=834 ms
64 bytes from 10.11.12.151: icmp_seq=28 ttl=63 time=15.1 ms
64 bytes from 10.11.12.151: icmp_seq=29 ttl=63 time=472 ms
64 bytes from 10.11.12.151: icmp_seq=30 ttl=63 time=1107 ms
64 bytes from 10.11.12.151: icmp_seq=31 ttl=63 time=4402 ms
64 bytes from 10.11.12.151: icmp_seq=32 ttl=63 time=1556 ms
64 bytes from 10.11.12.151: icmp_seq=33 ttl=63 time=530 ms
64 bytes from 10.11.12.151: icmp_seq=34 ttl=63 time=1000 ms
64 bytes from 10.11.12.151: icmp_seq=35 ttl=63 time=8.89 ms
64 bytes from 10.11.12.151: icmp_seq=36 ttl=63 time=596 ms
64 bytes from 10.11.12.151: icmp_seq=37 ttl=63 time=1438 ms
64 bytes from 10.11.12.151: icmp_seq=38 ttl=63 time=425 ms
64 bytes from 10.11.12.151: icmp_seq=39 ttl=63 time=154 ms
64 bytes from 10.11.12.151: icmp_seq=40 ttl=63 time=884 ms
64 bytes from 10.11.12.151: icmp_seq=41 ttl=63 time=66.0 ms
64 bytes from 10.11.12.151: icmp_seq=42 ttl=63 time=354 ms
64 bytes from 10.11.12.151: icmp_seq=44 ttl=63 time=1747 ms
64 bytes from 10.11.12.151: icmp_seq=45 ttl=63 time=734 ms
64 bytes from 10.11.12.151: icmp_seq=46 ttl=63 time=1597 ms

When I reboot the router, I get a few minutes of pings sub 100ms. But after maybe ~10 min or so things start to get bad again. Maybe it's not even 10 minutes. But as you can see, I get a LOT of pings well over a second. I never had this issue prior to openwrt, and everything on my network is 'normal'.

This is a fresh install and I'm not doing anything heavy on the network. Right now I only have 2 devices (phone/laptop) connected for testing and they aren't doing anything heavy. I get this in all channels and on 2ghz and 5ghz bands. Once things DO connect though, my speeds are perfect. I can basically saturate my ISP speeds. It just sometimes takes a long time to start the connection.

I don't see anything obvious in the logs, but I've included the kernal and system logs here and here.

What can I do to trouble shoot this? I'm really enjoying OpenWrt otherwise.

@nertskull, welcome to the community!

  • What if you ping a wireless device on the same VLAN/LAN?
  • Is your WiFi channel congested
    • Are there other neighbors also using the same or an overlapping channel?
  • More importantly, do your wireless devices experience actual latency to the Internet (e.g. high results on speedtest.net)

Just as bad. This is from my laptop to my phone. Same VLAN/SSID.


Reply from 10.11.12.151: bytes=32 time=734ms   TTL=64
Reply from 10.11.12.151: bytes=32 time=2078ms  TTL=64
Reply from 10.11.12.151: bytes=32 time=538ms   TTL=64
Reply from 10.11.12.151: bytes=32 time=960ms   TTL=64

I doubt it. I live in a suburban area, not high density. I only see 1 other neighbors SSID. I only have 2 devices connected to anything right now while testing. I've tried multiple channels and it happens both on 2Ghz and 5Ghz bands. I'm not sure how to directly test congestion though to know for sure. But there really aren't a lot of people around me. Plus, a week ago before I installed openwrt I never had any of these issues.

Yeah. Its definitely noticeable when I browse the web. Pages that used to open pretty much immediately will sit on a white page for a bit, but then once data starts coming in everything seems to load immediately.

On the speed test websites sometimes I get pings in like the 30-60ms range. But then sometimes I get much more. I just did 4 tests and I got 36ms, 309ms, 1008ms, 14ms. Kind of all over the place like on my LAN

If your device has ath10k-ct drivers, try using the non CT...or vice versa.

I believe @DjiPi and @darksky had/have this model and worked with a similar issue before. Maybe they can shed some light.

Issues for me are detailed here/currently unresolved with ct drivers/firmware: https://github.com/greearb/ath10k-ct/issues/139

Use the non-ct drivers/firmware as @lleachii recommended. I have been for a few months and have been very pleased with the results.

So this is all way new to me. But I read around a lot and think I figured out how to do this after reading this thread.

I tried doing what was in that thread, but found it was talking about qca99x0 but I had qca9984 installed.

So I did the following

opkg update
opkg remove ath10k-firmware-qca9984-ct kmod-ath10k-ct
opkg install ath10k-firmware-qca9984 kmodt-ath10k
reboot

I did not do the firmare-5.bin stuff that thread talked about. Because I wasn't sure if I should and it was a different file.

After reboot I still had internet. But....Still the same problem with pings and latency and browsing. I'm still getting pings of >1000ms. Both from within the same VLAN and without.

So, I thought maybe that firwmare file is important. So I found a corresponding one based on that other thread. So I did this

cd /lib/firmware/ath10k/QCA9984/hw1.0/
mv firmware-5.bin firmware-5.bin.bak
wget -O firmware-5.bin https://github.com/kvalo/ath10k-firmware/raw/master/QCA9984/hw1.0/3.9.0.2/firmware-5.bin_10.4-3.9.0.2-00131 --no-check-certificate
reboot

I don't know for sure that's the best one to use, but it is the most recently updated from that repository.

Anyway, with that, I'm still getting pings of >1000ms and poor latency.

Anything else I can try? Or something I've done wrong above in my really remedial understanding of what I'm doing with drivers and firmware?

I did go mv the .bak firmware back to the main one. Because I felt it was even worse with that new firmware.

The firmware that ships with the packages you installed should be fine/no need to use the absolutely latest one.

I do not know why you're experiencing these issues, but it would seem that you ruled out the ath10k-ct driver as a cause. Just as a sanity check, what is the output of

dmesg|grep firmware
1 Like

For me, the issue was not with the latency but with the connections stability and some random Wi-Fi quirks. My kids have a tendency to react promptly when their games get disconnected on devices, so I know for sure that the CT driver was a problem for me.

And my wireless network is pretty optimal because all gaming rigs are on WiFi (which is not a good idea, but I'm cable fishing lazy and it's working pretty awesome anyway). Channels are crowded here in the suburb.

I've tested ping from wired to wireless (Windows machine to Linux), going trough a dumb 16 ports switch, and I get pings at 3-4 ms.

Maybe your wireless device is going in power saving mode / economy mode after 10 minutes or so? After a few minutes, my wireless Linux laptop (with TLP) seems to do that, and first ping seems to wake it from "sleepiness", but nowhere near as you (80-90 ms then).

Stating probably the obvious, I suggest conducting isolated tests using a clean simple install, no devices attached other than a wired and wireless one, to help you narrow down the possibilities. Then move on one step at a time toward your desired final configuration.

Edit: Forgot to mention I'm still on OpenWrt 19.07.2 with non CT drivers

So I don't know what this means, but I don't think there errors were here before. But I do get errors it seems. Here is the dmesg output

root@OpenWrt:~# dmesg | grep firmware
[    2.432498] qcom_rpm 108000.rpm: RPM firmware 3.0.16777364
[   11.759229] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0/firmware-6.bin failed with error -2
[   11.877497] firmware ath10k!QCA9984!hw1.0!firmware-6.bin: firmware_loading_store: map pages failed
[   12.057409] ath10k_pci 0000:01:00.0: firmware ver 10.4-3.9.0.2-00021 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps crc32 9626782c
[   20.594195] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0/firmware-6.bin failed with error -2
[   20.710636] firmware ath10k!QCA9984!hw1.0!firmware-6.bin: firmware_loading_store: map pages failed
[   20.731499] ath10k_pci 0001:01:00.0: firmware ver 10.4-3.9.0.2-00021 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps crc32 9626782c

Does that mean the firmware isn't actually loading? It seems to be looking for firmware-6.bin. But that wasn't there to begin with. It was firmware-5.bin. Should I rename -5 to -6?

I doubt this, because the laptop (win10) and the phone(android) are both on and fully connected anytime I try to ping them. Plus it happens the entire time. Not just high pings at the beginning of the ping.

Yeah this is what I'm trying to do. Right now its a fresh install of openwrt. And I only have my phone and laptop connected to try and ping. And most of the time my laptop is suspended and I'm just trying the phone. I guess I'm running out of ideas.

I even spent an entire day trying dd-wrt even though I'd prefer openwrt. But that also had issues with VLANs and the r7800. At least openwrt gets proper vlan setup. It's just this latency issue. I'd like to not scrap the r7800, it's been good otherwise, but I may have to find something else here eventually.

Anyway, back to the errors from dmesg. Seems like something is not right. Is there an actualy firmware-6.bin I should use. There is not listed here, only a -5.

That is a pretty old version... perhaps it is what is shipped with 19.whatever.youre.running? I am using the latest git. Perhaps you could try a snapshot from @hnyman, see: Build for Netgear R7800

If you are wanting to avoid the ct driver, use the one he calls master/"old" mainline ath10k.

It should be:

 # dmesg|grep firmware
[    2.789240] qcom_rpm 108000.rpm: RPM firmware 3.0.16777364
[   22.024567] ath10k_pci 0000:01:00.0: firmware ver 10.4-3.9.0.2-00070 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps,peer-fixed-rate crc32 873782fb
[   29.092968] ath10k_pci 0001:01:00.0: firmware ver 10.4-3.9.0.2-00070 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps,peer-fixed-rate crc32 873782fb

If you want to, you could download the absolute latest firmware from here and just replace it...

cp firmware-5.bin_10.4-3.9.0.2-00131 /lib/firmware/ath10k/QCA9984/hw1.0/firmware-5.bin

Then just reboot.

No, this is normal behaviour, the firmware-6.bin error is normal. I don't recall it ATM, but there is a thread about that with more explanation (edit: that thread, response from slh). You don't have to rename it, you've done it correctly, the firmware loaded.

So as soon as you configure VLANs this behaviour occurs, or it's present already with vanilla setup?

Nope, you are using the correct firmware and error is normal. Either it loads firmware-6 or firmware-5, depends on the board.

Edit: Could this be related to the MTU size? I'm no MTU/VLAN expert, but VLAN tag must add to the total frame size, no? Default MTU 1500 becomes 1518, then 1522 with VLAN tag possibly. Maybe this old bug report makes more sense than my thoughts:

Hello all,

I'm curiious if this worked for @nertskul but for me this did the trick.

I found this thread because I had a very similar problem.
I seperated my home network in 3 vlans: LAN(untaged) IOT(tagged) and Guest(tagged)
The vlans are connecting 2 APs, both netgear R7800s.
On each AP, there are 3 wifi networks, connected to the vlans.
I had terrible latency when pinging from the LAN network to a device in the IOT network.
ping times ranged from 2 to 5 seconds. With wireshark I saw occational retransmissions.
I had already figured out that when I connected the IOT wifi to the LAN interface, the problem was gone.

Now I have set the MTU size to 1496 on the vlan (bridge) interfaces , and the latency and responsiveness is OK again.

Thanks!

Bert Haverkamp

1 Like