A couple of notes on how ATF (is supposed to) work:
- It will account airtime from transmissions in both directions to each station, but it can only throttle traffic in the AP->client direction.
- It is only active for drivers that opt-in to it by setting that
EXT_FEATURE
flag you mentioned above; just runningiw phy
should tell you if it's enabled or not (look forAIRTIME_FAIRNESS
in the "supported extended features" list at the end of the output for each phy) - For ath10k in particular there's an odd interaction with the firmware scheduler in some cases: newer ath10k chipsets will switch to 'pull/push mode' in some cases where the firmware has its own notion of which stations to schedule when. Unfortunately, this being in firmware, I don't have a lot of insight into how this actually work, but it may be what's causing the issues with the virt time-based scheduler. I had an email exchange with someone who noticed this recently, will quote my reply to them below.
(Note that the below only applies to the pull/push mode of ath10k and the context was slightly different; but may be relevant here anyway):
So the main change with the virtual-time scheduler that's relevant here (I think) is that
ieee80211_txq_may_transmit()
is better at saying no now. The assumption for this mode of operation (which is only used by ath10k, BTW), is that the firmware will cycle through all the scheduled stations, ask the system if each of them is allowed to send (throughieee80211_txq_may_transmit()
), and if the answer is no, move on to the next one.Now, the "move on to the next one" bit is central here; if the firmware just keep asking about the same station (or a subset of all scheduled stations), it will keep getting a "no" if there's another station that is "behind" (in terms of fairness) until that other station catches up. Whereas with the old code, the round-robin scheduler could artificially advance the other stations, allowing them to transmit (but effectively disabling the fairness).
The latency spikes you describe sound like they could be caused by this throttling mechanism. Now, that throttling is obviously not supposed to happen; we're only supposed to enforce fairness between active stations, so if a station does not have any data outstanding (at the mac80211 level), it should be removed from the schedule, and the other stations should be allowed to continue.
So the question is what it is that goes wrong here; my guess is that it is one of the following:
The firmware has its own notion of which subset of stations it wants to schedule, so it gets deadlocked with the fairness mechanism as explained above (I don't have a lot of insight into how the push/pull mechanism of ath10k is actually supposed to work).
There's a bug somewhere so stations with no outstanding packets don't get de-scheduled properly (and so are still part of the rotation blocking progress).