AQL and the ath10k is *lovely*

It's really nice to see good things about OpenWrt after all the bashing it has got over that dam opkg bug!

1 Like

Yay Dave! (and hardworking crew)

Been waiting for this for a long time. I hopefully will be one of your early C7V2 testers, once I screw up enough courage to try a bleeding edge snapshot on the single family AP, and have a window of time to make the switch...

1 Like

So by having trunk version from yesterday (in my case Unify AC PRO and Archer C7 v2), this patch is implemented ?

Eg. to see it is, following is enough as prove it's enabled ?

find /sys/kernel/debug/iee*/phy* -name aqm

and

cat /sys/kernel/debug/ieee80211/phy0/aqm
access name value
R fq_flows_cnt 4096
R fq_backlog 0
R fq_overlimit 0
R fq_overmemory 0
R fq_collisions 1
R fq_memory_usage 0
RW fq_memory_limit 16777216
RW fq_limit 8192
RW fq_quantum 300

This implement "kind of" Airtime Fairness (or memory buffering?) on both, 2.4GHz and 5GHz ?
And the change itself, is thanks to the patch incorporated eg. in kmod-ath10k, kmod-ath10k-ct and kmod-ath10k-smallbuffers or somewhere else (eg. firmware-ath10k-firmware-any) ?

Sorry if those are silly questions :wink:
Thank you.

Not AQM, it would be:

cat /sys/kernel/debug/ieee80211/phy0/netdev:wlan0/stations/*/aql

And only for ath10k for now.

1 Like

Nitpick - '4.19 master' isn't a thing. Either 'master' or '4.19 branch', where the branch has split off FROM master and hence is (in theory) more stable/less active from the continuing development work on master.

There are fairly continuous builds based on master (wherever it happens to be), called snapshots that contain the latest (b)leading edge work BUT they're not even guaranteed to boot or be available for long.

root@Archer_D7:~# cat /sys/kernel/debug/ieee80211/phy0/netdev:wlan0/stations/*/aql
Q depth: VO: 0 us VI: 0 us BE: 0 us BK: 0 us
Q limit[low/high]: VO: 5000/12000 VI: 5000/12000 BE: 5000/12000 BK: 5000/12000

ath10k-ct also

I'm running 3 batman-adv meshes on 13 Archer [AC]7 v[245] routers, and I'd love to switch to ath10k-ct and get the benefits dtaht describes. I have yet to be able to do that, although I'm running all 13 units on a recent OpenWRT snapshot I've had to build with the "classic" ath10k driver. There must be something I don't know. Anyone care to share a working config, such as an appropriately-redacted tarball of /overlay ?

1 Like

the benefit of AQL should be quite a lot, however the flow_dissector.c code for batman did not arrive until linux 4.16 at the earliest, so the ATF code will help,also, but the full "fq" portion of fq_codel will not, just codel. (unless it was backported elsewhere, or you are running 4.16 or later)

I'd love to see batman results!

The relevant commit was.

commit 5b0890a97204627d75a333fc30f29f737e2bfad6
Author: Sven Eckelmann sven.eckelmann@openmesh.com
Date: Thu Dec 21 10:17:42 2017 +0100

flow_dissector: Parse batman-adv unicast headers

I thought this was backported to the the "classic" ath10k driver already?

+1 - compiled latest snaphot, I'm on TP-Link Archer C7 v2 / 4.19.108 and :

cat: can't open '/sys/kernel/debug/ieee80211/phy0/netdev:wlan0/stations/*/aql': No such file or directory

When checking build log, I see patch is applied, so where is the problem ?

Applying ./patches/205-ath10k-Add-NL80211_EXT_FEATURE_AQL-flag.patch using plaintext:
patching file ath10k-5.4/mac.c

There must be a client connected to the 5Ghz/ath10k controlled phy. The command also assumes that phy0/wlan0 is the 5Ghz/ath10k interface

I'm running this on an Archer A7v5 (equivalent hardware to the Archer C7v5, with just a different flash layout).

2 Likes

Thank you, you were right, first client connected and it's ok now.

cat /sys/kernel/debug/ieee80211/phy0/netdev:wlan0/stations/*/aql

Q depth: VO: 0 us VI: 0 us BE: 0 us BK: 0 us
Q limit[low/high]: VO: 5000/12000 VI: 5000/12000 BE: 5000/12000 BK: 5000/12000

I'd love to know how to provide them to you; please feel free to suggest
relevant readings.

What I can tell you for sure is that throughput on our meshes varies
wildly, constantly and fairly nonsensically, I assume because of
poorly-managed bufferbloat. Moreover, unless nothing else is going on
on the mesh, Google Voice telephony is pretty frustrating for users, I
assume because its protocol simply drops hoards of packets that have
been delayed too long.

(I think the "classic" ath10k driver causes other annoying events
probably unrelated to bufferbloat. The Candela Technologies version
(ath10k-ct) is described as having fixed certain problems that plausibly
explain at least some of these annoying events, so I'm eager to try
ath10k-ct whenever I learn how to make it work.)

1 Like

Where have you found information on the 20 release time line? I'm not even sure which kernel it'll use. It looks like 4.19 instead of 5.4 to produce a build quicker?

I'm very excited to jump in and help test aql too.

These details all on the mailing list openwrt-adm.
http://lists.infradead.org/pipermail/openwrt-adm/2020-February/001316.html and here is the link where it all started: http://lists.infradead.org/pipermail/openwrt-adm/2020-January/001253.html

2 Likes

I had basically given up on the ath10k before now. I look forward to hearing your results with batman. The test I use is flent.org's rrul_be test primarily, but you can get more information about the characteristics of your existing and under test links with a test like:

flent -H some_server_on_the_other_side -t sometitle --te=upload_streams=4 --socket-stats tcp_nup

While flent is available for most linuxes nowadays, the most current version can be found on github. The client depends on python3, the server only needs netperf (but irtt is helpful also).

1 Like

Looking into Flent. Fascinating and very promising. I keep seeing
dthat's name on relevant papers, along with that of Toke
Høiland-Jørgensen. I must say that the graphics in flent dot org slash
flent-the-flexible-network-tester dot pdf are exceedingly compelling
with respect to fq-codel. Wow.

I think I've made Flent begin to work now. First come are some notes
about installation, in case they save anybody any time, then my question.

I'm running Debian Buster (current "stable") on my personal workstation
(laptop). It has flent:

# apt install flent

but Debian Buster evidently doesn't offer netperf, even in "non-free".
Go figure. At Debian's no-such-command advice, I installed netperfmeter
but it didn't help:

# flent rrul -H rpc160
Started Flent 1.2.2 using Python 3.7.3.
Starting rrul test. Expected run time: 70 seconds.
ERROR: Runner TCP upload BE failed check: No netperf binary found in PATH.

I should have followed dthat's advice to begin with. Now I did so,
having read the wry non-Gnu-license statement attached to the relevant
github repository:

# apt install iperf texinfo irtt

# git clone https://github.com/HewlettPackard/netperf.git
# cd netperf
# ./autogen.sh
# ./configure --enable-demo
# make
# make-install

(The make failed without the texinfo package)

On the routers that are serving as batman-adv nodes, I installed:

# opkg install netperf flent-tools

...which seemed to provide and daemonize netserver as well.

Then I got this:

# flent rrul -H rpc160
Started Flent 1.2.2 using Python 3.7.3.
Starting rrul test. Expected run time: 70 seconds.
WARNING: Program exited non-zero.
Runner class: NetperfDemoRunner
Command: /usr/local/bin/netperf -P 0 -v 0 -D -0.20 -4 -Y EF,EF -H rpc160 
-p 12865 -t UDP_RR -l 70 -F /dev/urandom    -- -e 2  -H rpc160 -k 
THROUGHPUT,LOCAL_CONG_CONTROL,REMOTE_CONG_CONT\
ROL,TRANSPORT_MSS,LOCAL_TRANSPORT_RETRANS,REMOTE_TRANSPORT_RETRANS,LOCAL_SOCKET_TOS,REMOTE_SOCKET_TOS,DIRECTION,ELAPSED_TIME,PROTOCOL,LOCAL_SEND_SIZE,LOCAL_RECV_SIZE,REMOTE_SEND_SIZE,RE\
MOTE_RECV_SIZE
Return code: 255
Stdout: establish control: are you sure there is a netserver listening 
on rpc160 at port 12865?
establish_control could not establish the control connection from 
0.0.0.0 port 0 address family AF_INET to rpc160 port 12865 address 
family AF_INET

...which was a bit baffling, at least for a moment there, because

# ssh rpc160
@rpc160:/root# ps w | grep netserver
  6307 root      1104 S    /usr/bin/netserver
  6428 root      1216 S    grep netserver
@rpc160:/root# netstat -nap | grep 12865
tcp        0      0 :::12865                :::* LISTEN      6307/netserver

...but it turned out to be a firewall-hole-punching issue. To
/etc/config/firewall I added:

## netserver
config rule
         option name             netserver-flent
         option src              wan
         option proto            udp
         option dest_port        12865
         option target           ACCEPT
         option family           ipv4

...restarted the firewall, and it seemed to work:

# flent rrul -H rpc160
Started Flent 1.2.2 using Python 3.7.3.
Starting rrul test. Expected run time: 70 seconds.
Data file written to ./rrul-2020-04-06T142915.863247.flent.gz.
Summary of rrul test run at 2020-04-06 18:29:15.863247:

                              avg       median          # data pts
  Ping (ms) ICMP   :      6105.98      4307.50 ms              114
  Ping (ms) UDP BE :       134.91         1.72 ms               71
  Ping (ms) UDP BK :        60.59         2.07 ms               51
  Ping (ms) UDP EF :        86.49       146.41 ms              169
  Ping (ms) avg    :      1596.99       379.45 ms              312
  TCP download BE  :         0.11         0.11 Mbits/s          76
  TCP download BK  :         0.03         0.05 Mbits/s          34
  TCP download CS5 :        31.53        31.49 Mbits/s         291
  TCP download EF  :        30.95        31.00 Mbits/s         289
  TCP download avg :        15.65        21.13 Mbits/s         300
  TCP download sum :        62.62        62.72 Mbits/s         300
  TCP totals       :        63.72        62.96 Mbits/s         302
  TCP upload BE    :         0.02         0.03 Mbits/s           2
  TCP upload BK    :         0.01         0.01 Mbits/s           2
  TCP upload CS5   :         0.53         0.35 Mbits/s          25
  TCP upload EF    :         0.54         0.35 Mbits/s          25
  TCP upload avg   :         0.28         0.34 Mbits/s         196
  TCP upload sum   :         1.10         0.34 Mbits/s         196

This was a testbed mesh consisting of 2 nodes both running in the same
room, running

# uname -a
Linux rpc160 4.19.108 #0 Sat Apr 4 10:28:18 2020 mips GNU/Linux

which was compiled from OpenWRT trunk two days ago.

But I have two batman-adv meshes in production (see rosepark dot us hash
map). I believe they have serious bufferbloat problems. Dave, you want
batman-adv data. What would you like me to do now?

Steve Newcomb

1 Like

hoo, boy is that miserable!

the rrul_be test should be mildly better because it only fills the best effort hw queue.

yer gonna love aql. even then I'd suggest totally disabling wifi 802.11e either with the qos map or remarking most packets to diffserv 0 or both.

1 Like

I like to look at the output flent.gz files. The flent-gui lets you do comparison plots in particular.

Have just tried the latest snapshot on two archer C7 v2 access points. The change is indeed lovely! Latency has come down when doing a dslreports speedtest from 100-200ms to around 25ms on the ath10k radio.

http://www.dslreports.com/speedtest/61820776

Thank you to all who made this happen.

2 Likes