Bufferbloat when wifi is the bottleneck

I've set my sister up with a new x86 router and TP link Omada AP setup. The router is set up with Cake on both upload and download. She has gigabit fiber and only uses wifi as she owns an iphone and a Mac laptop and nothing that even has an Ethernet port.

With wifi being a variable speed medium it's clearly the bottleneck. Waveform tests show decent speed around 100-350 Mbps depending on location but latencies that can go up into the 100ms range.

I'm planning to add some DSCP tagging to the firewall which should help with realtime UDP such as video chats etc but anyone else have thoughts about handling a situation with variable speed wifi in the LAN as the reliable bottleneck location?

I have cake/piece of cake on my WAN egress and fq_codel on my LAN and that works for me.

Did you tune cake with ethernet first?

You could try to use an AP under OpenWrt and make sure it support the recent changes that were aimed to reduce latency under load.

Short of that set the traffic shapers to 100 Mbps to have the proprietary AP rarely see enough traffic to get into its queueing regime.

In theory you could also go for a device that uses the newer so far untapped bands (some will allow crazy 320 MHz "channels") to move the bottleneck back to the router, in practice this will probably fail as existing devices are unlikely to play along...

On WiFi this is a mixed blessing, the higher ACs have higher overhead and hence reduce the overall achievable goodput, potentially resulting in even longer delays for the lower ACs (okayish for AC_BK, less so for AC_BE). Also one misbehaving station sending too much in AC_VO will reduce that ability of all other devices including the AP to acquire air time slots... So I would only use AC_VO/VI for rate limited traffic, however cake does not hard limit its higher priority tiers so is not a good shaper to use with AC_VI/AC_VO (in spite of its best intentions). My take in this is simply WMM is not well designed as a real stand-alone solution, more of a building block to be combined with appropriate admission control to allow pretty nice things without (too many) side effects, but with out additional measures the side effects can be pretty nasty.

One of the weirdest bits is that it shows ~100ms delay on upload but on upload the queue builds on the Mac laptop right? Because as soon as it sends a packet to the AP it has a clear pipe. And yes I've verified that the router itself can speed test with essentially zero latency increase (cake is throttling her gigabit down to 700Mbps or something)

On upload the station needs to acquire an airtime slot, in a crowded environment with a number of AC_VI and AC_VO users around airtime slots are hard to get, so packets can accumulate in the stations upload queue as well.*

BTW how did you measure the directionality of the delay? Does this mean RTT increases during upload tests, or did you use e.g. irtt and saw that during upload specifically the remote_server -> cliet OWD increases?

The planned "solution" by the WiFi guys seems to be to increase the available WiFi bandwidth (with WiFi7/8) by so much that the bottleneck moves again to the router; I am not sure how well this is going to work outside of WiFi deployments designed and planned by experts. For the let's put the WiFi router where it is convenient crowd this might be too optimistic, and it will not solve the issue with existing WiFi gear either.

*) Some linux wifi drivers keep some statics about air time usage by different ACs but not all. I would guess that aircaptures would show these as well.

Yes, she's a nontechnical user so we are just doing stuff like waveform bufferbloat.

If the Mac is building 100ms queues at 11pm on 5Ghz at a few meters distance with noone else using her wifi there isn't a lot I can do right?

The AP is on a managed switch, I could limit that port to 150Mbps each way, and I could give her a second AP on a different port.

1 Like

Yes, that is the usual way, just knowing you I had to ask :wink: (I recently used used irtt to detect loss direction, so I was just curious*).

Have her click the WiFi icon with alt/option held, at least on older Macoses you see both a "Create Diagnostic Report..." and a "Open Wireless Diagnostic..." option. Have her create a report and send it your ways (I assume among siblings she will trust you enough to pass this on, I have little idea how much sensitive data these reports contain). You might be able to glean something from it.
Or tell her how to get an air capture of a waveform test from her mac and send this your way.
But this is going to be tricky to diagnose from afar.

EDIT: just tested it the second option, will include the first, so "Open Wireless Diagnostic..." is enough, after some minutes it will ed with saving the report and giving some advice on what anomalies it detected, like crowed wifi environment and sub-optimal channel choice....

BTW, on android I use VREM's WiFi Analyser to see how many other APs are around and whether their frequency bands overlap, no idea how to do that on iOS (only iOS device I ever owned was a 2010 ipod touch, which does not run modern iOSes at all).

That seems worth trying to get immediate improvements... as reference we run a family of 5's traffic over a 116/37 link shaped to 105/36 Mbps and so far everybody is A-OK with that capacity, but we have never experienced faster access speed so this might not help for your sister.

*) My ISP has apparently a faulty gateway that drops around 1% packets, load or no load which I get assigned to every 7th time or so, and I decided to investigate that a bit before reporting to the ISP, which worked out well so far, my report was detailed enough that it was not simply brushed aside by the hotliners but passed on the network experts, whether that results in a fix is a different question :wink: )

Side-note: that question made me look at my local WiFi environment which considerably deteriorated since last time I looked, 20 APs in the 2.4 GHz range (albeit nicely using the three recommended channels for north Amerika... even though in the EU we could actually use 4 for less interference, but I am not complaining, some years ago it was pretty much randomly distributed).
I use 2.4 only as fall-back so I do not care much, but I see similar issues in the 5GHz band to many 80 MHz APs on too few bands, let's see whether I can move my stuff into the under used upper frequencies (in the past false positive DFS results made my AP evacuate these band and drop down randomly into the lowest 5GHz sub-band)

Side-side-note: OpenWrt has a pretty decent Status->Channel Analysis page in Luci that easily makes the band usage of near-by APs visible.

I have pointed out that OSX's implementation of the codel portion of fq_codel is broken. Recently they also added gso to their stack, which makes it worse.

So uploads suck increasingly on the mac.

2 Likes

I tried limiting the AP port to 250Mbps and then when she was fairly close to the AP she got A+ rating and 2-3ms latency increase on waveforms test. So that's going to be pretty good. We will be testing some other situations but for the moment cake on the router at 700Mbps and 250Mbps on the port seems to be ok.

It's a TL Sg108e switch, low end but still very useful.

1 Like

This is a test from my phone. I've got a similar situation to her, gigabit fiber and Omada access points. This is substantially better in download direction since I limited the switch port to 250Mbps. There's not much to do about the upload, since that queue must be in the phone itself (Android Moto One Ace).

Results look even worse on my kindle fire which only hits 50-60 Mbps and sees 150-300ms increase

Wifi sucks glad my interactive streams are all on wired. But this issue with wifi devices is nontrivial for people who don't wire in their stuff.

Sp I think this is something were running OpenWrt on accesspoints with either:
ath9k, ath10k or mt76? WiFi chips might help, there was a lot of effort in making these have acceptable bufferbloat... sure not like 1-2 ms and more more like 20, but given that WiFi aggregates can get into th 4ms duration, 1-2 ma seems hard to achieve reliably over WiFi at least with concurrent competing traffic from other stations.

Yeah, connecting the client device (phone/laptop) via cable to an accesspoint (AP) and let the AP connect to the wifi of your main router. That could be one workaround.

The more I read in this forum, the more cases I see of people that seem to have problems with Apple wifi. :confused:

Is the AP going to have any real effect on upload buffering? I'm assuming crazy long upload buffers are actually in the client device itself right?

The most concerning direction is upload since that's what affects other people's ability to understand what you're saying or see your image clearly. And the AP always has better antennas and more power so download is usually faster.

Of course during a video conf from a phone hopefully the phone doesn't do too much other traffic, still you can't control when something will decide that it's really important to synch some pictures or whatever.

These are all wired APs, and the clients are Ethernet free devices. (Phones or laptops). I did send my sister a link to a UE300 USB Ethernet device so she can wire up if it's important enough though.

Sure, but these might fill up because the station does not get access to airtime, e.g. because the AP sends too much or due to ACs > BE might send with priority access to airtime... WiFi with its "cooperative" airtime access "scheduler" and using the same channel for up-and downstream is not really intuitive.

These antennas work in both directions ;), but yes typically end-point to AP is more problematic as far as I can.

I agree, smartphones are badly controlled "environments" and often have these kind of "automatic update when on WiFi settings" that make a ton of sense on metered contracts, but that might interfere with solid video conferencing. Then again suppressing these kind of automatic updates are not necessarily easier to suppress on "full" OSes either.

For testing that would be a great idea, also to give here a back-stop in case of really important video conferences. When I tried the UE300 os macos 12.6 (the last for my device) it worked OK, but I saw mode delay spikes than I would have hoped for, albeit under saturating load conditions, so not typical for vide conferencing.

These example tests are all late at night in a quiescent environment. I think phones just have shitty queue management. I wonder do Android phones even stick something like fq_codel on their wifi interfaces or is it just pfifo_fast?

@dtaht did you do any investigation on Android?

For WiFi (as other variable rate links) the scheduler/AQM needs to be integrated into the WiFi stack.
Looking on my older android phone (using tc under termux) I only see pfifo_fast, but again fq_codel at that point would not really be much better as this is to high up in the stack.

Is the driver hardware sucking down packets faster than it's sending them out? Can't wifi use something like BQL to force information upwards to the qdisc level?

Yes, but the driver needs to actually do the work and feed back that information into the scheduler/AQM to get reasonable sojourn times, WiFi drivers themselves need a bit deeper buffers under their own control (to do proper aggregation) so things are slightly more complicated than for ethernet. But IIRC (and @dtaht and @tohojo know far more about that than I do) ath10K has grown AQL: aitrime queueing limits in analogy to ethernet's byte queue limit (even for ethernet managing the queues in service time equivalents would be a better match but for uniform rate link technologies like ethernet converting from one to the other is trivial).
I have zero insight what Apple does there, except on macos there actually is an fq_codel on the WiFi interfaces, however this will only work as intended when used via apple APIs not via unix socket API, if I recall correctly:

123-1234567:~ user$ ### OSX get queue information for interface en0: (shows WMM AC queues for wifi)
123-1234567:~ user$ sudo netstat -I en0 -qq
Password:
en0:
     [ sched:  FQ_CODEL  qlength:    0/128 ]
     [ pkts:      10686  bytes:    1591189  dropped pkts:      1 bytes:    190 ]
=====================================================
     [ pri: VO (1)	srv_cl: 0x400180	quantum: 605	drr_max: 8 ]
     [ queued pkts: 0	bytes: 0 ]
     [ dequeued pkts: 468	bytes: 116650 ]
     [ budget: 0	target qdelay: 10.00 msec	update interval:100.00 msec ]
     [ flow control: 0	feedback: 0	stalls: 0	failed: 0 	overwhelming: 0 ]
     [ drop overflow: 0	early: 0	memfail: 0	duprexmt:0 ]
     [ flows total: 0	new: 0	old: 0 ]
     [ throttle on: 0	off: 0	drop: 0 ]
     [ compressible pkts: 0 compressed pkts: 0]
=====================================================
     [ pri: VI (2)	srv_cl: 0x380100	quantum: 3028	drr_max: 6 ]
     [ queued pkts: 0	bytes: 0 ]
     [ dequeued pkts: 34	bytes: 3652 ]
     [ budget: 0	target qdelay: 10.00 msec	update interval:100.00 msec ]
     [ flow control: 0	feedback: 0	stalls: 0	failed: 0 	overwhelming: 0 ]
     [ drop overflow: 0	early: 0	memfail: 0	duprexmt:0 ]
     [ flows total: 0	new: 0	old: 0 ]
     [ throttle on: 0	off: 0	drop: 0 ]
     [ compressible pkts: 0 compressed pkts: 0]
=====================================================
     [ pri: BE (7)	srv_cl: 0x0	quantum: 1514	drr_max: 4 ]
     [ queued pkts: 0	bytes: 0 ]
     [ dequeued pkts: 9007	bytes: 1100465 ]
     [ budget: 0	target qdelay: 10.00 msec	update interval:100.00 msec ]
     [ flow control: 0	feedback: 0	stalls: 0	failed: 0 	overwhelming: 0 ]
     [ drop overflow: 0	early: 0	memfail: 0	duprexmt:0 ]
     [ flows total: 0	new: 0	old: 0 ]
     [ throttle on: 0	off: 0	drop: 0 ]
     [ compressible pkts: 0 compressed pkts: 0]
=====================================================
     [ pri: BK (8)	srv_cl: 0x100080	quantum: 1514	drr_max: 2 ]
     [ queued pkts: 0	bytes: 0 ]
     [ dequeued pkts: 1177	bytes: 370422 ]
     [ budget: 0	target qdelay: 10.00 msec	update interval:100.00 msec ]
     [ flow control: 0	feedback: 0	stalls: 0	failed: 0 	overwhelming: 0 ]
     [ drop overflow: 0	early: 0	memfail: 0	duprexmt:0 ]
     [ flows total: 0	new: 0	old: 0 ]
     [ throttle on: 0	off: 0	drop: 0 ]
     [ compressible pkts: 23 compressed pkts: 3]

(I just enabled WiFi to get this example, I typically use this mac via an USB3 gigabit ethernet dongle at home* this is why there is nothing useful in the statistics).

*) a realtek-based model from plugable that has a 3 port USB3 hub integrated into it, this is certainly inferior to my thunderbolt2 gigabit ethernet dongle, but quite convenient to keep a few USB devices ready for use.

Gotcha. Yeah. I can certainly understand the difficulty with varying speed media. Ideally you'd report something like time duration since last report, number of bytes sent, and time spent idle... something like fq_codel could calculate a smoothed averaged bandwidth and estimated sojourn time from that and dequeue an appropriate number of bytes... as well as drop/mark packets which are estimated to be too far back in the queue.