Bufferbloat when wifi is the bottleneck

Or you make sure that the driver never takes more data than X ms worth at current transmisson speed and creates back pressure once full, so packets mostly "mature" inside fq_codel's own queues so the sojourntime at dequeue covers most of the packets (estimated) time before actual transmission. Or similar...

2 Likes

Hey Dan... which TP-LINK AP's are you using, or actually, what is chipset?

Don't know if it might be helpful, but on my ath10k C7's I experimented with lowering the TX queue limit... with improvements in the latency.

Now, I had more latency on the download and very little on the upload before the change, and maybe you all are running a different chipset, but it would be an easy experiment to try.

Here's a link to (hopefully) the part of the thread where it was mentioned and my trying it out: https://forum.openwrt.org/t/aql-and-the-ath10k-is-lovely/59002/925

Tried it on the main house AP, and basically left it there the past 3 weeks, with no issues.

Edit: I should say my testing was with a Win10 client. Quick testing with a few other things handy, my Android phone had +35-45ms on download, +0ms on upload. Another Win10 laptop, had +4ms, 0ms on download/upload, but much slower download speed. Interesting that you are having loads on the upload side.

I'm using the EAP225 with the commercial firmware running their controller. I've got them at 3 separate geographically distributed locations and connected via VPN to the other sites so having one single administration interface is pretty handy.

I'll look to see if I can adjust the TX queue limit in their software.

Incidentally, after adjusting that port speed for the AP to 250Mbps and running a test video conference, we had really outstanding results compared to previously. Video from her was clear and audio was not garbled, previous cases had been much more difficult to understand her audio and freezes or lags or video-smearing were pretty common. This isn't just because of the latency issue, there are a number of changes I made including she now has both 2.4 and 5Ghz on the same SSID, and her access point is now mounted in a centralized location.

I'm going to try putting her on a video conference and having her use a second device to run a speed test simultaneously. The EAP225 built in software allows turning on airtime fairness, and the CAKE qdisc on the router should be handling per-internal-ip fairness etc. So hopefully it's more stable even under multi-device load.

Off the top of my head, I think they are the same Qualcomm's that the A7/C7 use, or very similar ath9/10k radios.

I've heard that that Omada interface/environment/whatever is pretty nice. Probably you'd have to give that up? I don't know if that particular
tx queue adjustment exists outside OWT and the AQL additions. Dave? Anyone?

I don't think so, you'll need OpenWrt and AQL. I'm running the very same set of patches in a different AP (mt76), and I'm very happy with them. The below test is from an Android phone connected from the other side of the house with three dry walls in the middle, around 15 m distance, I reckon.

https://www.waveform.com/tools/bufferbloat?test-id=c3e04382-dddc-45ae-9ae0-78fd5134cdb0

The most confusing thing for me is how some of my / her devices have large delays in the upload direction. Once the packet hits the AP it's smooth sailing along gigabit ethernet to a Cake qdisc that allows 3x the speed of the AP connection, so the only bottleneck is from device to AP. In the upload direction, that means the packets are queueing on the phone/laptop.

and yet you didn't experience upload queues.

Do you have faster internet connection than your data suggests? if the bottleneck is on your ISP connection then your numbers are understandable, but if you have more than say 200/100 Mbps then I wonder why you experience no upload bloat?

An interesting test would be to run pings between a laptop and a wired computer on your lan and then in another terminal run an iperf test between those two. see how the AP behaves when it's the for-sure bottleneck.

So thesr is always the chance that powersave, channel scans or apple extras like airdrop scanning that interfere with uplink transmit slots?

powersave, channel scans etc should affect download and upload similarly. But it doesn't, bloat is mainly in upload direction once I limited the port speed.

Here's a test about 3m from the access point on my android phone.

And here's one from the same place on a kindle fire

Totally different behavior.

I'm going to try a test from the cell phone from farther away, should trigger upload bloat... watch this space:

Ok here is the Android phone a couple rooms away with walls and etc

Substantially worse, but ONLY on upload

The difference is apparently that the upload in the first test hit the port speed limit and was trimmed down by the switch. When the wifi signal doesn't exceed the port speed on upload, the android phone bloats up to 200ms or more

My Internet connection is 1000/50 shaped to 800/42 with cake. So, yes, the bottleneck is the AP. If I move closer to the AP, speed goes up to β‰ˆ400 Mbit/s β€”or soβ€” as expected for a WiFi 5 dumb AP. I cannot test anything today (not feeling well and not having the will to do so), but will do in the next days.

looks to me like your upload direction was limited by cake, not by the wifi.

No, WiFi 5 will never achieve 800 Mbit/s, WiFi was the limit, if I connect an Ethernet cable to my router it will download full speed close to 800 Mbit/s.

My network is ISP <==> RPi4 <==> dump AP <== WiFi 5 ==> testing device

Right but in your test the device was uploading 42Mbps and had zero bufferbloat. that's probably because the upload direction bottleneck was in the router/cake with your ~ 42mbps limit.

The download speed was ~ 121Mbps so upload should probably have been able to achieve a similarish rate, in any case considerably higher than 42mbps, so the upload bottleneck was likely cake.

Feel better, and thanks for the data! if you do decide to do some tests, try going outside or something on the edge of your range, where you're getting say 25Mbps download, and see what happens to your upload.

Oh, yes, you are right, misunderstood your comment. I'll test, and I'll come back to this thread.

1 Like

Mmh, when you do that at the end of the range with a crappy antenna device maybe you experience a retransmission avalanche, eating up nominal bandwidth without actually delivering useful data?

My test is ~100Mbps so it's not at all the end of the range. I doubt I have major retransmission issues... it's just buffering in the phone.

For @amteza trying to get below 42Mbps on upload might cause a problem, but if you find the right balance, where it's 20Mbps or so probably that wouldn't be a major problem. If you're getting 6 it's likely to be what you say.

The sad thing about buffering in the phone is it's 100% un-fixable for a normal user (unrooted).

Yep, and in my case, I have a "1Gb" cable connection that is actually 940/35Mb, and I have the x86 router Caking that at 800/30mbit. So, I also have a Cake limit for the upload, AP limit (seems to be about 280ish best case for a A7/C7 and my client's 5Ghz radios) on the download.

That of course, doesn't explain my Android phone (Samsung Galaxy 9) behavior... It gets pretty good latency on 5ghz, but on 2.4ghz it's another story. I had thought I had wild problems on 5, but turned out it was on 2.4, so further investigation time!

Quick phone recheck...
5Ghz/ath10k
Galaxy 9, close in: 200-220/29.5Mb idle 16ms, loaded +18/+0ms (huh?)
Galaxy 9, med far: 100-120/29.5Mb idle 18ms, loaded +5/+0ms (huh?)
Galaxy 9, rather far: 28.5/29.5Mb idle 19ms, loaded +10/+1ms (random high ping events happening)

Oddly good, compared to your results, Dan. Which phone is it?

2.4Ghz/ath9k
Galaxy 9, close in: 51.2/20.3Mb idle 50ms, loaded +31/+9ms
Galaxy 9, med far: 21.3/16.4Mb idle 58ms, loaded +35/+13ms
Galaxy 9, rather far: 13.5/29.5Mb idle 73ms, loaded +30/+121ms (loads of high ping events)

Really should redo these, the phone was >10% charge, showing a battery saving mode icon, who knows what the wifi does under those circumstances. So take them with a grain of salt. That said, I don't seem to be seeing the upload issue at all, on ath10k... though I didn't get it to drop under the Cake controlled speed. I also run a higher (12-24mbit) non standard basic data rate on the wifi radios, to help with the airtime.

But, on the ath9k radio.... I see high latency issues here AND on the desktops. Feels like something creeped in sometime in the past, I'm pretty sure it was better, and faster, some revisions back...

Agree. Most (Android) phones I've used are unrooted. Update policies cater to the "short term support" principle. There are still many devices out there that, even if companies wanted to, they could not update the Operating System for lack of memory. It's getting better though, as newer devices now feature memory that has outgrown the amounts needed for the OS. Even if there is enough memory though, many vendors refrain from updating completely and remain on old Linux kernels, only backporting the barest minimum of security updates. Not sure if Apple or other OS are handling that any better, but I doubt it.

I am not sure if fq_codel and stuff could be run as dedicated programm or if root would be necessary. I personally have run android sdk a few times to remove some unneeded bloatware from my phone, so it's somehow possible to have (semi?) root access via developer mode even on a unrooted phone.

Unfortunately, as I see it there are indeed only a few options available that would (hopefully) give you better wifi, apart from the typical stuff like staying close to the router, or avoiding channel interference:

1.Updating the OS to use latest software (including linux kernel, drivers, firmware etc.). Sadly may be impossible.
2. Root the OS, if your vendor supports it and fix the problems.

One of its main attractiveness of Linux was and will always be the possibility for users to fix bugs themselves. It's a choice. Buy devices with closed source and/or rooted software, get unlucky and receive crappy OEM software; Only thing you can do: buy new stuff and hope you are luckier next time...

Mine is a Moto One 5G Ace but we were seeing similar results on a 2014 Mac laptop and even worse on Kindle Fire. When the wired connection is substantially faster than the wireless it seems to be an issue. Hard to make that happen if your basic rate is about the same as your cake shaper.