GL.iNet GL-MT6000 - AQL and WiFi Latency

I am not having the best of the results after a while. On my MT6000 after hours I get 'persistent Wireless's slowdowns.

Dumping the stats shows this:

cat /sys/kernel/debug/ieee80211/phy*/aql*
:
1
AC     AQL pending
VO     0 us
VI     0 us
BE     4960 us.     <---- this has become stuck, does not change after multiple dumps
BK     4 us
BC/MC  0 us
total  4964 us
24000
AC      AQL limit low   AQL limit high
VO      5000            5000
VI      5000            5000
BE      5000            5000
BK      5000            5000
BC/MC   50000

I have a 802.11s BATMAN + AP setup on SNAPSHOT, r26740.

I believe the tight AQL limits in my environment is exposing something.

I will revert to AQL defaults to monitor behaviour.

3 Likes

Only the 2.4GHz radio is affected right?

A few questions for you...

Are you running @pesa1234's build?

What firmware are you running? (dmesg | grep Firmware)

1 Like

Clean Snapshot and it's old MTK firmware, not pesa's build in my case.

[   12.112457] mt798x-wmac 18000000.wifi: WM Firmware Ve
rsion: ____000000, Build Time: 20221012174725
[   12.196474] mt798x-wmac 18000000.wifi: WA Firmware Ve
rsion: DEV_000000, Build Time: 20221012174937

Had to keep the MT6000 with aql_low:2500 & aql_high:12000 for now.

How come the TUF 6000 is performing significantly better? It’s not a secret that I’m a TUF 6000 fan-boy, but now I’m actually curious? Compare the equivalent graph for the MY-6000 and it’s a BIG difference in performance or am I missing something?

I’m trying to follow here, but I’m a bit lost at the moment. Are you referring to this unit? https://openwrt.org/toh/asus/tuf-ax6000

Secondly, which graphs are you comparing specifically to where you’re seeing the significant performance difference?

1 Like

Gungernuts graph a few posts above

TUF Latency

I think this needs to be revisited. I'm not at all trying to put down the TUF AX6000 or TUF AX4000, but the graph @Gingernut posted doesn't seem to me to indicate a significant performance increase over the MT6000.

If you look at some of the more recent graphs I posted in another MT6000 thread for @pesa1234's build, we have been working hard to tune some AQL settings and achieved some pretty respectable results:

The graph that @Gingernut posted here seems to indicate some significant latency struggles especially in the combined test (third test on the graph). Those latency spikes have largely been smoothed out with tuning in @pesa1234's build.

4 Likes

Thanks for the explanation!

For comparison - my Crusader results using TP-Link EAP615 (ramips/mt7621) with OpenWrt snapshot from Jun23 - 5Ghz@80Mhz

AQL set to 5000/12000

AQL set to 1500/5000

Looks like using 1500/5000 results in latency improvement of >25%

edit: these results are from a test with 1 associated station - later Today I will retest with >3 associated stations

edit2: Retest with 5 associated stations

CH48 80MHz AQL 1500/5000 11:42

CH100 80MHz AQL 1500/5000 11:46

CH100 80MHz AQL 5000/12000 11:49

CH48 80MHz AQL 5000/12000 11:51

3 Likes

@dtaht Any thoughts on the testing that @ed8 did here?

Thought #1 - I should read the OpenWrt forums more often.
Thought #2 - what a difference a clean channel (100?) makes!!, and also how hardware retries muck with things
Thought #3 - I wish crusaders graphs were as directly comparable as flent's. In particular I care that the AQM get tested and this test does not do that. (well, the slope of the crusader staggered start test sort of does). A packet capture can help too, with the RTT plot from Wireshark (tcptrace -G, and an xplot.org also)

But to me the lowered AQL values seem to indicate goodness across the board.

Regrettably I don't have the mental energy to understand all else that under test? If you care (as I do) that a gaming wifi station do better with multiple stations present, flent's rtt_fair test is good, and anything consistently less than 20ms is good.

1 Like

Using 2500/5000 again now but on updated SNAPSHOT, r26912 for the MT6000. Looks better & stable now and nothing seems to be getting stuck after 16 hours.

1 Like

That's awesome news! Out of curiosity, have you tried running @pesa1234's build?

3 Likes

Yea I've been keeping low the same and messing with larger high settings. Currently doing AQL of 2500/8000 for a little more peak throughput with nice results. Running the latest snapshot on 6.6.38 with some various tweaks.

1 Like

I have settled on AQL high of 8000 too :slight_smile:

1 Like

You know @phinn, I have enjoyed this thread and it turns out one can go beyond just the AQL hi/lo limits.

I see in my environment cases where AQL tracking hit out of memory and stop for short bursts impacting wireless bufferfloat, the live stats can be dumped further with:

grep . /sys/kernel/debug/ieee80211/phy*/aqm

You will see fq_overmemory and fq_overlimit counters.

Given the beast this MT6000 is I have settled on a lil more complex override now such as this:

for phy in $(find /sys/kernel -name 'phy[0-9]'); do
  (cd $phy
   l=1500
   h=8000
   echo 0 $l $h > "$phy/aql_txq_limit"
   echo 1 $l $h > "$phy/aql_txq_limit"
   echo 2 $l $h > "$phy/aql_txq_limit"
   echo 3 $l $h > "$phy/aql_txq_limit"
   echo 12000   > "$phy/aql_threshold"

   # Increase the FQ-CoDel buffer
   echo "fq_limit 10240"           > "$phy/aqm"
   echo "fq_quantum 256"           > "$phy/aqm"
   echo "fq_memory_limit 33554432" > "$phy/aqm"
  );
done

And it performs so far better for my use case, sharing in case anyone is curious on tuning the FQ-CoDel params for their environment too.

10 Likes

Are those FQ-CoDel params only applicable when running SQM? Pardon my ignorance.

1 Like

They are for the wireless AQM specifically, check this commit for more info about what it does.

:
This patch implements an Airtime-based Queue Limit (AQL) to make CoDel work
effectively with wireless drivers that utilized firmware/hardware
offloading. AQL allows each txq to release just enough packets to the lower
layer to form 1-2 large aggregations to keep hardware fully utilized and
retains the rest of the frames in mac80211 layer to be controlled by the
CoDel algorithm.
:

This is independent of the SQM you do on WAN, and it helps with wireless bufferfloat by tons.

5 Likes

Nice link added to that to the MT6000 doc page. It's interesting to see the original TXQ limits and threadhold are essentially unchanged since the initial commit. Of course they are sensible values for throughput so not knocking it, but we can clearly do a little better with latency.

Btw your script is quite good well done. :+1:

1 Like