[bufferbloat] Does Airtime FQ replace the need to shape per client

mpenning · February 24, 2024, 4:01pm

Hello,

This is my first post anywhere on openwrt.org. I apologize in advance if this is a dumb question or already discussed somewhere, but my background is commercial wifi from aironet and Meraki... I'm curious whether openwrt and the Airtime FQ improvements discussed in atc17-hoiland-jorgensen (page 144, Algorithm 3) solve the problem in the next paragraph.

We all know that QoS to address bufferbloat for wifi and wired networks is similar in some ways, but the devil is in the details. If one takes the standard Hierarchical QoS (HQOS) techniques for a wired network and tries to shape / schedule / queue manually per wifi client, this would be a solution full of duct tape. Specifically, wifi clients can change their 802.11 PHY transmit rate at will, and can do it incessantly (not merely when they roam). Furthermore, a wifi client can associate and disassociate as conditions warrant. Superficially, my reading about tc CAKE in bandwidth mode per wifi client could potentially solve the problem if I constructed this kind of pseudocode duct tape:

# begin HQOS pseudocode
while WIFI_CLIENTS_ARE_ASSOCIATED:
    for WIFI_CLIENT in ALL_WIFI_CLIENTS:
        PHY_RATE = WIFI_CLIENT.80211_PHY_RATE
        IP_ADDR  = WIFI_CLIENT.IP_ADDRESS

        # Assume this function really exists
        build_tc_cake_bw_shaper(IP_ADDR, PHY_RATE)

    # Assume this function really exists
    clean_up_for_disassociated_clients(ALL_WIFI_CLIENTS)

The great thing about CAKE is it is a simple shape / schedule / AQM wrapped up in one box... but this looks hideous because this per-wifi-client management pseudocode should fundamentally be done in the linux kernel by the wifi driver (i.e. ath9k or such). The driver has direct access to the list of clients and their PHY rate... so I'd rather not run the pseudocode above.

Now to my question...

atc17-hoiland-jorgensen seems to say that the Airtime FQ implementation magically makes so many wifi client latency issues disappear under load... this normally would be a classic HQOS shape / schedule / queue problem per wifi client... but Airtime FQ doesn't seem to have shaping (much less HQOS per-client shaping) in the algorithm.

Can I safely ignore the need for a per-client HQOS with Airtime FQ. If so, why don't we need shaping?

Again, my first question. Apologies for the length and if somehow I've missed a big piece somewhere. Thank you.

richb-hanover-priv · February 25, 2024, 2:20pm

Welcome to the OpenWrt Forum. And this is a great question. Thanks.

I'm not an expert here: other experts read this list, so they can chime in and I'll know more, too.

The way I think about both CAKE/fq_codel and Airtime FQ is that these both provide enormous improvements in latency as-is, out of the box. They are so good that frequently no further tuning is necessary, freeing you to read novels, go fishing, or engage in other productive endeavors.

There are additional tweaks (that I haven't indulged in - my needs fit into the prior category) that can improve performance. But that's for someone else to describe. Thanks again.

mpenning · February 26, 2024, 2:09pm

The latency reduction from Airtime FQ is why I ask, but I want to be clear just in case...

Based on my research this far, I don't know whether Airtime FQ and the OpenWRT Airtime Fairness feature are same thing
Shaping and Airtime FQ are different functions with different purposes

Are OpenWRT Airtime Fairness and the USENIX Airtime FQ function the same thing?

I would use CAKE bandwidth mode shaping to build a child HQOS queue at a transmit rate a bit below the 802.11 transmit rate for the AP to wifi client adjacency. HQOS is typically used to built a parent queue that (in this case) holds all the 802.11 per-client queues.

If one does NOT shape, then there is no HQOS to be performed AFAICT. This is the crux of the question in my mind. Is the need for HQOS and per-client shaping removed if one uses the Airtime FQ algorithm described in the USENIX article that I referenced?

I'm hoping someone can chime in on this specific point.

richb-hanover-priv · February 26, 2024, 2:44pm

I've told you about 99% of what I know for certain.

But I'm cc'ing @tohojo who "wrote the book" (literally) and got his PhD from this topic. You've already seen his paper from the Usenix conference. I'm also cc'ing Dave Täht (@dtaht) who's been spearheading the Bufferbloat crusade for what seem like hundreds of years.

I look to them to see if there we already have a description for "mitigating latency in OpenWrt":

I know fq_codel/CAKE are easily configured in all modern OpenWrt builds
I know the low latency/ATF fixes are only available for certain Wi-Fi chips. How would a purchaser know which chip sets to favor?
Are low latency/ATF in Wi-fi good enough as-is to address @mpenning's questions above?

Thanks.

Update: Edited questions slightly.

Lynx · February 26, 2024, 2:49pm

This seems pretty interesting. So is the point here that one could have a super highly tuned cake setup at the router, which works very well, only for it to end up entirely ruined and clobbered by buggy, ugly WiFi implementations?

richb-hanover-priv · February 26, 2024, 2:56pm

Yup. That's why you see all the advice on forums and boards proclaiming, "USE ETHERNET! YOU CANNOT USE WI-FI FOR GAMING!"

Figure 2 of Toke's paper (cited above) shows Wi-Fi latency "without our solution" ranging from 50 msec to many, many hundreds of msec. Using these techniques, it ranges from 10-100msec.

Lynx · February 26, 2024, 3:04pm

Super interesting.

Are things as bad as this by default in recent versions of OpenWrt:

With say the RT3200 WiFi hardware: MediaTek MT7622BV, MediaTek MT7915E and AX mode?

zekica · February 26, 2024, 3:05pm

My understanding is that:

Make WiFi fast project is what you are talking about. It mentions the USENIX paper.
Linux mac80211 wifi stack, used by ath9k, ath10k, ath11k, ath12k?, mt76, brcmsmac, rt2x00, rtl8xxxu, rtw88, rtw89 and possibly others was reworked to support:
- per station TX queues that are automatically managed using fq_codel - to keep
  the latency below target
- aql - not sure what this is exactly used for
- airtime fairness - to maximize the performance by not letting stations with low rates use more than their fair share

dlakelan · February 26, 2024, 3:06pm

In order to limit airtime, you shape. So shaping must be occurring in the airtime fairness code. Basically the code in each timeslice allows modrate * airtimelimit bytes through for each client that has bytes in queue (very roughly)

zekica · February 26, 2024, 3:06pm

They shouldn't be as the solution was already implemented in linux's mac80211 stack. Only drivers not using it should still suffer from bad bufferbloat.

Lynx · February 26, 2024, 3:08pm

Is there an easy way to tell without sending out latency probes from individual clients? Any sort of stats generated that we can query? Like a WiFi equivalent of 'tc -s'?

mpenning · February 26, 2024, 3:08pm

Your assumption that the AP implementation and client must be in conflict is an open question.

In the case of a low negotiated AP to wifi client transmit rates (such as <= 100Mbps), Airtime FQ is rather important to avoid buffer bloat. The typically large buffer size on the AP side compared to the slow AP to client transmit rate means that latency could be a problem. FYI, I have literally seen client to AP ping RTTs number 40 seconds in pathological cases. It's this kind of problem that I'm addressing.

If a client is buggy and doesn't queue well, that's a red-herring to the discussion. Those fixes are completely different...

Lynx · February 26, 2024, 3:10pm

Me too - I get it. Around my house I saw pretty big latency variation just walking around when I checked with my iPhone 15. Whereas I've kept wired latency pretty constant. So I'm very curious about all of this. I'm dubious it's simply solved as suggested.

zekica · February 26, 2024, 3:11pm

I don't know if there are any, iw phy1-ap0 station dump doesn't show any details about per-station buffers.

tohojo · February 26, 2024, 3:40pm

Yes, that's the idea. Shaping is really a workaround for bad queue management at the real bottleneck. The airtime FQ algorithm sits at the real bottleneck (in the WiFi driver on the AP), managing the queue directly.

@zekica summed up the different components of this system well, above:

This enforces fairness between flows going to the same station, and applies the "short flow optimisation" that is also in FQ-CoDel and CAKE to ensure that sparse traffic gets relative priority.

This is to keep the lower-level queues in the WiFi firmware from bloating. It's the same mechanism that Byte Queue Limits does for Ethernet, but measured in the airtime domain (by forecasting transmission time from the current rate), hence the 'A' instead of 'B'

This is only needed for hardware that has significant queueing in the firmware; but that's everything except ath9k, so in practice this is pretty important.

This does (deficit) round-robin scheduling of all active stations to ensure they each use only their fair share of the available airtime. This is to ensure a low-rate station does not use up all the airtime (the so-called "performance anomaly"). Airtime usage is accounted after-the-fact (on TX completion) in the airtime domain, using the best available source of airtime information. I.e., if the hardware provides actual airtime usage information that is used, otherwise the usage is estimated from the rate information.

tohojo · February 26, 2024, 4:06pm

Oh, and the detailed stats live in debugfs:

root@wap:~# cat /sys/kernel/debug/ieee80211/phy0/netdev:phy0-ap2/stations/30:ae:a4:ff:38:c8/aqm
 
target 19999us interval 99999us ecn yes
tid ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets flags
0 2 0 0 34399 0 0 0 1 2974062 34399 0x6(RUN AMPDU NO-AMSDU)
1 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
2 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
3 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
4 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
5 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
6 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
7 0 0 0 283 0 0 0 0 101335 283 0x0(RUN)
8 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
9 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
10 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
11 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
12 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
13 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
14 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
15 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
root@wap:~# cat /sys/kernel/debug/ieee80211/phy0/netdev:phy0-ap2/stations/30:ae:a4:ff:38:c8/air
time 
RX: 10164702 us
TX: 1824932 us
Weight: 256
Deficit: VO: 1444 us VI: 256 us BE: 1760 us BK: 256 us

mpenning · February 26, 2024, 4:21pm

Maybe my original post wasn't clear enough... My original per-client CAKE shaping was proposed for the wifi AP radio... and thus it would be used build a per-client buffer to keep one wifi client from over-consuming an otherwise shared buffer between all clients. As an example, assume we only have one wifi radio queue and there is an elephant flow going to one client and a bunch of mice flows to all other clients. Without per-client shaping, the large number of elephant flow packets can steal too much buffer from all the other mice-flow clients (and thus drive up too much latency on all other mice flow clients).

If AQL is done per-client, Airtime Fairness + AQL is exactly what I'm looking for.

tohojo · February 26, 2024, 4:41pm

That's the "per-station TX queues" in the list above. Each station gets its own instance of FQ-CoDel, basically, and the airtime fairness scheduler does round-robin scheduling between each station.

Yes, there's both a per-station AQL limit and an additional global limit on the total outstanding airtime queued for the whole interface.

richb-hanover-priv · February 27, 2024, 2:14am

7 posts were merged into an existing topic: How OpenWrt Vanquishes Bufferbloat

richb-hanover-priv · February 27, 2024, 2:21am

I have split this topic.

Let's continue the discussion here of @mpenning's question about whether AirtimeFQ and other Wi-Fi driver questions solve his problem.

The new How OpenWrt Vanquishes Bufferbloat topic should be used to comment on all the latency-killing mechanisms in OpenWrt. Thanks.