I built openwrt head last night when the final patches for AQL and the ath10k-ct landed. Oh! what a difference in bufferbloat-related performance! Where, on this purposely terrible link (12mbits or so), delays would spike past 4 seconds, and things like the babel routing protocol fail with any background load, now it holds things steady well below 60ms with totally solid throughput. Before:
Hmm? this is the native fq_codel implementation for ath10k, no sqm required. you can see the related drop and mark stats in the "aqm" files in the directory trees below
find /sys/kernel/debug/iee*/phy* -name aqm
ATF regulates airtime between stations, but needs short queues to work properly. AQL keeps the firmware from overbuffering, and more important is the fq_codel algo on top that which then keeps queues short and intermixes packets better. This was the last major piece of the make-wifi-fast solution before it ran out of time, volunteers and money.
When the Bufferbloat team first started using these tools, it took me a while to understand what these "RRUL Charts" were showing, exactly how bad the first plot was, and how it compares to the second (good) one. Let me give more detail about what's going on. (There's lots more detail about RRUL charts at bufferbloat.net.)
These charts plot two things: ping/latency between the endpoints, and the data rates of each of several data connections.
Each test starts with five seconds of idle, then 60 seconds of full-rate transfer for the connections, followed by another five seconds of idle, for a total test time of 70 seconds.
The first chart shows bad performance: latency like this would make the link basically unusable.
The ping times (green) start low for the first 5 seconds, then ramp up to over 4000 msec (4 seconds) within 20 seconds. They plummet again at 40 seconds, then spike up again. At 65 seconds, the data transfer stops, so by the end, ping latency is low again.
But worse than that... The plots of the individual data connections show wildly chaotic rates. After the initial five-second interval, most of the transfers can't continue, or are very slow. One connection, the blue plot, ultimately manages to use about half the12 mbps link.
Anyone trying to use this network would think that "the internet was broken."
The second plot shows way better control, with way better performance, even when hammering the wi-fi link with multiple full-rate transfers.
Pings (green plot again) vary between 20 and 60 msec.
Each of the data connections hums along around 4 mbps. This "good sharing" means that each connection gets a fair share of the bandwidth.
Can I get this just by building a snapshot? Or is there more code that I would have to patch in? Thanks. Shout out dtaht I have bin following your work for years now. Some good stuff.
The snapshots have this now. (or go ahead, build one. )
ath9k and mt76 have had the fq_codel and ATF stuff for a long time. However we needed to invent "AQL" in order to make things work well with the ath10k. (the lwn link pointed to our first successful version but it went through a few iterations on the way to mainline).
It is my hope that this new API can be easily applied to other drivers such as the intel ax200 and older iwl chips, and perhaps even marvell and broadcom, one day. I'm spinning up a project to tackle the ax200 in particular...
It's a bit more complicated than just adding wiphy_ext_feature_set(ar->hw->wiphy, NL80211_EXT_FEATURE_AQL); to the driver, but...
But along the way we had all kinds of trouble with the ath10k firmware itself, which made it difficult to test. I hope we've got it right for all the ath10k hw now, (the -ct firmware has come a long way, h/t to everyone that's worked on it.... but there's so much hw that uses it and so many variants of that chipset out there... and so many routers today exhibiting the kind of broken behavior I pointed to above....
So far for me I've tested the ubnt uap mesh, mesh pro, and am going to do up an archer c7v2 as soon as I make a few other long-desired tweaks to the algorithm (lowered codel target on 5ghz, some other stuff that I tested long ago)... I'm setting up my old testbed... so I'd love it if more checked "their stuff".
I sure as f**k hope the ath11k folk are paying attention. I have not reviewed their code. Keep hoping someone from there will send me a couple chips (and a contract)
Wifi 6 brings new challenges. Ideally an api for those has per station scheduling in the firmware itself and exposes that API to linux. It's been the only sane way to deal with DU I can think of. (Without having hardware to play with)
@dtaht how does your /sys/kernel/debug/iee*/phy*/aqm look like?
I see:
root@turris:~# cat /sys/kernel/debug/ieee80211/phy0/aqm
access name value
R fq_flows_cnt 4096
R fq_backlog 0
R fq_overlimit 0
R fq_overmemory 0
R fq_collisions 0
R fq_memory_usage 0
RW fq_memory_limit 16777216
RW fq_limit 8192
But I have a hunch that this OpenWrt 19.07.2 based turris omnia does not even have the AQL patches installed. How should this look?
Yes, I can confirm that AQL patches are not included in OpenWrt 19.07. You might want to ask @nbd, who commited it to OpenWrt master, if he would consider to backport it to OpenWrt 19.07.
@nbd given the more variable timeframe for OpenWrt 20, is there a chance of getting the two AQL patches for (ath10K and aath10K-CT) into 19.07 somehow? These are truly nice improvements that Dave showed in the first post, so I am eager to get this into the hands of our users (myself included ) soonish?