AQL and the ath10k is *lovely*

Ok. sure. I know how building OpenWrt/Quilt works. But two questions - how does this patch work when:

  1. You're putting it in the /target/linux/generic/patches-5.4 when the mac80211 subsystem that OpenWrt uses is in the /package/kernel/mac80211/ and it has it's own existing patch folder under subsys or ath or w/e. The WiFi stack that you applied the patch to doesn't even get used - unless something has changed from master v. the 19.07 branch that I'm on re: the switch to kernel 5.4. Was backports deprecated and it's not being used anymore?
  2. The patches for ath10k-ct driver should exist in package/kernel/ath10k-ct/patches. Yet, you're applying the patch to ath10k classic.

Somehow though, you're still getting better results with the patch but with OpenWrt's default config, the WiFi stack you're applying the patches to doesn't even get used.

1 Like

Which make/model would be best? I'll seriously send it :wink:

1 Like

+1 for Netgear R7800 if you can snag one :wink:

I did not mean to come across as insulting by including the build/Quilt "basics". I was hoping that might help others who might want to try the patch.

But back to your questions/points, you are correct in that it doesn't seem to add up regarding the location of the patch and the resulting build. Let me try to move the patch to package/kernel/ath10k-ct/patches and check the result.

Obviously, I will update my original post accordingly if needed.

EDIT
So I recreated the patch in package/kernel/ath10k-ct/patches and recompiled. It did build and I flashed the image to my R7800. However, I am now seeing this kernel warning repeated constantly in the logs:
ath10k_pci 0001:01:00.0: failed to increase tx pending count: -16, dropping

Clients are unable to connect, so I had to revert back to my previous image. At this point I am not sure what to make of this whole thing. Based on my testing, it is apparent something changed right at the time I originally tested the dql patch. But based on your experience, which I'm sure is far beyond mine, the dql patch would not have been in play.

Again, not sure what to make of it all at the moment. The data points to a change, but apparently it was not dql--and no idea what else at this point.

EDIT 2
I removed the dql patch again and checked out htt_tx.c here /openwrt/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/ath10k-ct-smallbuffers/ath10k-ct-2020-04-29-3637be6f/ath10k-5.4. I do see this:

void ath10k_htt_tx_dec_pending(struct ath10k_htt *htt)
{
        lockdep_assert_held(&htt->tx_lock);

        htt->num_pending_tx--;
        if ((htt->num_pending_tx <= (htt->max_num_pending_tx / 4)) && htt->needs_unlock) {
                htt->needs_unlock = false;
                ath10k_mac_tx_unlock(htt->ar, ATH10K_TX_PAUSE_Q_FULL);
        }
}

That is the change Ben introduced here: https://marc.info/?l=linux-wireless&m=158808593808807&w=2

I'm thinking now the results I shared from testing were instead due to Ben's change for the "Restart xmit queues below low-water mark." It fits with the timing. If that is the answer, I certainly apologize for attributing the data to the wrong change. Certainly unintentional on my part. :-1:

I may pick up 2! I thought the nighthawk series was stuck with broadcomm :face_vomiting:

Thankfully the R7800 has an awesome QCA9884 instead of BCM! There is also a Zyxel 'equivalent' to the R7800 that I haven't tried, but would love to if I can find one for cheap at some point. Here's a thread about it: Netgear R7800 vs Zyxel NBG6817, is it worth the extra money?

The R7800's are popular on eBay right now, but can be had for around $125 these days.

Due to @ParanoidZoid's great observations, I am now convinced the latest results I sent you are due to Ben's low-water mark change. I jump into more details in Edit 1 and 2 here: AQL and the ath10k is *lovely*

That said, I do acknowledge your desire to see the watermarks replaced with a bql implementation. However, I am curious as to the results I have seen in testing. Ben's htt->max_num_pending_tx / 4 with the ath10k-ct's 2048 tx buffers means this low-watermark would be 512. With me running the ath10k-ct-smallbuffers at 512 tx buffers from the start, this low-watermark for me would be 128.

It seems to me this would, at least in theory, result in even better/more consistent latency (at the potential cost of throughput) on the ath10k-ct-smallbuffers driver than on the ath10k-ct driver. Or am I crazy? (I can handle honesty :wink: )

can you see if the r7800 has bql on the ethernet driver?

cat /sys/class/net/eth0/queues/tx-0/byte_queue_limits/limit

It's not directly relevant to this discussion. Just curious. Same goes for what ethernet driver it is, but I don't know how to find out that from sysfs.

you are utterly correct that a 25% percentage of smaller tx ring results in less latency and jitter than a 25% percentage of a larger one, at a possible cost in in throughput. and you are not crazy. :slight_smile:

However, packets range in size from 64 bytes to 64k bytes (with gso). and bytes = time, on ethernet.

on wifi, it's airtime, but on 802.11ac, (not n) bytes = time is a decent proxy, better than the tx ring watermarks. Figuring out airtime is what AQL is sorta supposed to do, but in this first version,
aql_threshold is a fixed value and shouldn't be.

1 Like

Aye--it does appear to...

root@OpenWrt:~# cat /sys/class/net/eth0/queues/tx-0/byte_queue_limits/limit
6072

As for the driver, not sure if this is the right location or not:

root@OpenWrt:~# cat /sys/class/net/eth0/device/driver/37200000.ethernet/modalias
of:NethernetTnetworkCqcom,ipq806x-gmac

wow. is that running at a gbit? that's a reasonable 100mbit value....

It is a gigabit link, yes. I checked the limit again a couple minutes later and it was down to 3100-3700 range. What were you expecting/hoping to see?

you got flow offloads on that thing? under load, with GRO, at 1Gige, it is typically much larger,
128k or more. Hit it with a load and measure. :slight_smile:

It slowly gets smaller when there is no load.

I kicked off a 90 second bi-directional ipferf3 between my MacBook and a server that is also gigabit connected (through a Netgear switch to my R7800). The Netgear switch has tx/rx flow control enabled.

During that iperf3 run, I never saw the limit value on the R7800 go above 6123:

root@OpenWrt:~# cat /sys/class/net/eth0/queues/tx-0/byte_queue_limits/limit
6123

Here's more info for you on eth0:

root@OpenWrt:~# ethtool -k eth0
Features for eth0:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: on
	tx-checksum-ip-generic: off [fixed]
	tx-checksum-ipv6: on
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
	tx-tcp-segmentation: off [fixed]
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp-mangleid-segmentation: off [fixed]
	tx-tcp6-segmentation: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: on [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

this is sot totally OT, but that's pretty nice. What did fq_codel do on that device? (ts -s qdisc show dev eth0)

(ya'all are convincing me I should go get one of these puppies) bqlmon is an interesting tool btw, don't know if it's in openwrt.

and to be clear the test I was trying to do was through the router, not through the switch.

lan -> wan

not lan to lan. anyway... I got distracted there. I'm going to have some time to delve into the wifi part of this this weekend probably....

I should have been checking eth1 instead of eth0. My primary SSID is tagged to eth1--it didn't dawn on me earlier that I should have been watching it.

Running the iperf3 again via WiFi resulted in eth1 bql limit values of 100k - 250k. Sounds like more believable values compared to what you were seeing.

Just for grins, I also connected my Mac to the R7800 via ethernet (gig) and run the same iperf3 back to my server.

Iperf3 result:

$ iperf3 -c 192.168.XX.5 -i1 -t90 --bidir
Connecting to host 192.168.XX.5, port 5201
[  5] local 192.168.XX.150 port 53856 connected to 192.168.XX.5 port 5201
[  7] local 192.168.XX.150 port 53857 connected to 192.168.XX.5 port 5201
[ ID][Role] Interval           Transfer     Bitrate
[  5][TX-C]   0.00-1.00   sec  73.6 MBytes   618 Mbits/sec
[  7][RX-C]   0.00-1.00   sec  76.5 MBytes   642 Mbits/sec
[  5][TX-C]   1.00-2.00   sec   112 MBytes   942 Mbits/sec
[  7][RX-C]   1.00-2.00   sec  36.8 MBytes   308 Mbits/sec
[  5][TX-C]   2.00-3.00   sec   112 MBytes   938 Mbits/sec
[  7][RX-C]   2.00-3.00   sec  28.3 MBytes   238 Mbits/sec
[  5][TX-C]   3.00-4.00   sec   112 MBytes   940 Mbits/sec
[  7][RX-C]   3.00-4.00   sec  31.5 MBytes   265 Mbits/sec
[  5][TX-C]   4.00-5.00   sec   112 MBytes   939 Mbits/sec
[  7][RX-C]   4.00-5.00   sec  33.3 MBytes   280 Mbits/sec
[  5][TX-C]   5.00-6.00   sec   112 MBytes   939 Mbits/sec
[  7][RX-C]   5.00-6.00   sec  35.4 MBytes   297 Mbits/sec
[  5][TX-C]   6.00-7.00   sec   112 MBytes   939 Mbits/sec
[  7][RX-C]   6.00-7.00   sec  37.3 MBytes   313 Mbits/sec
[  5][TX-C]   7.00-8.00   sec   112 MBytes   938 Mbits/sec
[  7][RX-C]   7.00-8.00   sec  39.3 MBytes   329 Mbits/sec
[  5][TX-C]   8.00-9.00   sec   112 MBytes   939 Mbits/sec
[  7][RX-C]   8.00-9.00   sec  41.1 MBytes   344 Mbits/sec
[  5][TX-C]   9.00-10.00  sec   112 MBytes   938 Mbits/sec
[  7][RX-C]   9.00-10.00  sec  43.0 MBytes   361 Mbits/sec
[  5][TX-C]  10.00-11.00  sec   112 MBytes   938 Mbits/sec
[  7][RX-C]  10.00-11.00  sec  44.9 MBytes   376 Mbits/sec
[  5][TX-C]  11.00-12.00  sec   107 MBytes   898 Mbits/sec
[  7][RX-C]  11.00-12.00  sec  51.3 MBytes   430 Mbits/sec
[  5][TX-C]  12.00-13.00  sec   110 MBytes   923 Mbits/sec
[  7][RX-C]  12.00-13.00  sec  50.8 MBytes   426 Mbits/sec
[  5][TX-C]  13.00-14.00  sec   111 MBytes   931 Mbits/sec
[  7][RX-C]  13.00-14.00  sec  53.2 MBytes   446 Mbits/sec
[  5][TX-C]  14.00-15.00  sec   110 MBytes   923 Mbits/sec
[  7][RX-C]  14.00-15.00  sec  44.6 MBytes   374 Mbits/sec
[  5][TX-C]  15.00-16.00  sec   111 MBytes   932 Mbits/sec
[  7][RX-C]  15.00-16.00  sec  44.5 MBytes   374 Mbits/sec
[  5][TX-C]  16.00-17.00  sec   110 MBytes   926 Mbits/sec
[  7][RX-C]  16.00-17.00  sec  34.2 MBytes   287 Mbits/sec
[  5][TX-C]  17.00-18.00  sec   112 MBytes   938 Mbits/sec
[  7][RX-C]  17.00-18.00  sec  28.7 MBytes   241 Mbits/sec
[  5][TX-C]  18.00-19.00  sec   112 MBytes   937 Mbits/sec
[  7][RX-C]  18.00-19.00  sec  33.0 MBytes   277 Mbits/sec
[  5][TX-C]  19.00-20.00  sec   111 MBytes   934 Mbits/sec
[  7][RX-C]  19.00-20.00  sec  37.2 MBytes   312 Mbits/sec
[  5][TX-C]  20.00-21.00  sec   111 MBytes   934 Mbits/sec
[  7][RX-C]  20.00-21.00  sec  41.1 MBytes   345 Mbits/sec
[  5][TX-C]  21.00-22.00  sec   110 MBytes   926 Mbits/sec
[  7][RX-C]  21.00-22.00  sec  45.0 MBytes   378 Mbits/sec
[  5][TX-C]  22.00-23.00  sec   111 MBytes   928 Mbits/sec
[  7][RX-C]  22.00-23.00  sec  47.8 MBytes   401 Mbits/sec
[  5][TX-C]  23.00-24.00  sec   111 MBytes   935 Mbits/sec
[  7][RX-C]  23.00-24.00  sec  50.8 MBytes   426 Mbits/sec
[  5][TX-C]  24.00-25.00  sec   111 MBytes   933 Mbits/sec
[  7][RX-C]  24.00-25.00  sec  52.6 MBytes   441 Mbits/sec
[  5][TX-C]  25.00-26.00  sec   111 MBytes   935 Mbits/sec
[  7][RX-C]  25.00-26.00  sec  53.1 MBytes   445 Mbits/sec
[  5][TX-C]  26.00-27.00  sec   112 MBytes   938 Mbits/sec
[  7][RX-C]  26.00-27.00  sec  52.3 MBytes   439 Mbits/sec
[  5][TX-C]  27.00-28.00  sec   111 MBytes   935 Mbits/sec
[  7][RX-C]  27.00-28.00  sec  51.0 MBytes   428 Mbits/sec
[  5][TX-C]  28.00-29.00  sec   111 MBytes   935 Mbits/sec
[  7][RX-C]  28.00-29.00  sec  48.7 MBytes   408 Mbits/sec
[  5][TX-C]  29.00-30.00  sec   112 MBytes   936 Mbits/sec
[  7][RX-C]  29.00-30.00  sec  45.4 MBytes   381 Mbits/sec
[  5][TX-C]  30.00-31.00  sec   112 MBytes   941 Mbits/sec
[  7][RX-C]  30.00-31.00  sec  46.1 MBytes   386 Mbits/sec
[  5][TX-C]  31.00-32.00  sec   110 MBytes   922 Mbits/sec
[  7][RX-C]  31.00-32.00  sec  31.4 MBytes   263 Mbits/sec
[  5][TX-C]  32.00-33.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  32.00-33.00  sec  27.5 MBytes   231 Mbits/sec
[  5][TX-C]  33.00-34.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  33.00-34.00  sec  29.2 MBytes   245 Mbits/sec
[  5][TX-C]  34.00-35.00  sec   112 MBytes   938 Mbits/sec
[  7][RX-C]  34.00-35.00  sec  31.0 MBytes   260 Mbits/sec
[  5][TX-C]  35.00-36.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  35.00-36.00  sec  32.5 MBytes   273 Mbits/sec
[  5][TX-C]  36.00-37.00  sec   109 MBytes   914 Mbits/sec
[  7][RX-C]  36.00-37.00  sec  37.6 MBytes   315 Mbits/sec
[  5][TX-C]  37.00-38.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  37.00-38.00  sec  36.7 MBytes   308 Mbits/sec
[  5][TX-C]  38.00-39.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  38.00-39.00  sec  38.6 MBytes   324 Mbits/sec
[  5][TX-C]  39.00-40.00  sec   112 MBytes   938 Mbits/sec
[  7][RX-C]  39.00-40.00  sec  40.2 MBytes   337 Mbits/sec
[  5][TX-C]  40.00-41.00  sec   108 MBytes   905 Mbits/sec
[  7][RX-C]  40.00-41.00  sec  38.0 MBytes   319 Mbits/sec
[  5][TX-C]  41.00-42.00  sec   112 MBytes   943 Mbits/sec
[  7][RX-C]  41.00-42.00  sec  22.1 MBytes   185 Mbits/sec
[  5][TX-C]  42.00-43.00  sec   111 MBytes   935 Mbits/sec
[  7][RX-C]  42.00-43.00  sec  23.3 MBytes   195 Mbits/sec
[  5][TX-C]  43.00-44.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  43.00-44.00  sec  25.7 MBytes   216 Mbits/sec
[  5][TX-C]  44.00-45.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  44.00-45.00  sec  26.1 MBytes   219 Mbits/sec
[  5][TX-C]  45.00-46.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  45.00-46.00  sec  28.5 MBytes   239 Mbits/sec
[  5][TX-C]  46.00-47.00  sec   112 MBytes   937 Mbits/sec
[  7][RX-C]  46.00-47.00  sec  30.9 MBytes   259 Mbits/sec
[  5][TX-C]  47.00-48.00  sec   110 MBytes   924 Mbits/sec
[  7][RX-C]  47.00-48.00  sec  18.5 MBytes   156 Mbits/sec
[  5][TX-C]  48.00-49.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  48.00-49.00  sec  18.5 MBytes   155 Mbits/sec
[  5][TX-C]  49.00-50.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  49.00-50.00  sec  20.1 MBytes   168 Mbits/sec
[  5][TX-C]  50.00-51.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  50.00-51.00  sec  20.7 MBytes   174 Mbits/sec
[  5][TX-C]  51.00-52.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  51.00-52.00  sec  22.1 MBytes   186 Mbits/sec
[  5][TX-C]  52.00-53.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  52.00-53.00  sec  25.7 MBytes   215 Mbits/sec
[  5][TX-C]  53.00-54.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  53.00-54.00  sec  27.4 MBytes   230 Mbits/sec
[  5][TX-C]  54.00-55.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  54.00-55.00  sec  29.4 MBytes   247 Mbits/sec
[  5][TX-C]  55.00-56.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  55.00-56.00  sec  31.4 MBytes   263 Mbits/sec
[  5][TX-C]  56.00-57.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  56.00-57.00  sec  33.1 MBytes   277 Mbits/sec
[  5][TX-C]  57.00-58.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  57.00-58.00  sec  34.8 MBytes   292 Mbits/sec
[  5][TX-C]  58.00-59.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  58.00-59.00  sec  36.8 MBytes   309 Mbits/sec
[  5][TX-C]  59.00-60.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  59.00-60.00  sec  38.7 MBytes   325 Mbits/sec
[  5][TX-C]  60.00-61.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  60.00-61.00  sec  40.7 MBytes   341 Mbits/sec
[  5][TX-C]  61.00-62.00  sec   109 MBytes   912 Mbits/sec
[  7][RX-C]  61.00-62.00  sec  29.7 MBytes   249 Mbits/sec
[  5][TX-C]  62.00-63.00  sec   112 MBytes   941 Mbits/sec
[  7][RX-C]  62.00-63.00  sec  22.1 MBytes   186 Mbits/sec
[  5][TX-C]  63.00-64.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  63.00-64.00  sec  23.9 MBytes   201 Mbits/sec
[  5][TX-C]  64.00-65.00  sec   107 MBytes   900 Mbits/sec
[  7][RX-C]  64.00-65.00  sec  22.9 MBytes   192 Mbits/sec
[  5][TX-C]  65.00-66.00  sec   111 MBytes   930 Mbits/sec
[  7][RX-C]  65.00-66.00  sec  27.5 MBytes   231 Mbits/sec
[  5][TX-C]  66.00-67.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  66.00-67.00  sec  29.3 MBytes   246 Mbits/sec
[  5][TX-C]  67.00-68.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  67.00-68.00  sec  30.7 MBytes   257 Mbits/sec
[  5][TX-C]  68.00-69.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  68.00-69.00  sec  32.9 MBytes   276 Mbits/sec
[  5][TX-C]  69.00-70.00  sec   111 MBytes   931 Mbits/sec
[  7][RX-C]  69.00-70.00  sec  34.9 MBytes   292 Mbits/sec
[  5][TX-C]  70.00-71.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  70.00-71.00  sec  36.8 MBytes   309 Mbits/sec
[  5][TX-C]  71.00-72.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  71.00-72.00  sec  38.0 MBytes   319 Mbits/sec
[  5][TX-C]  72.00-73.00  sec   112 MBytes   936 Mbits/sec
[  7][RX-C]  72.00-73.00  sec  38.9 MBytes   327 Mbits/sec
[  5][TX-C]  73.00-74.00  sec   111 MBytes   927 Mbits/sec
[  7][RX-C]  73.00-74.00  sec  42.5 MBytes   356 Mbits/sec
[  5][TX-C]  74.00-75.00  sec   110 MBytes   926 Mbits/sec
[  7][RX-C]  74.00-75.00  sec  41.8 MBytes   351 Mbits/sec
[  5][TX-C]  75.00-76.00  sec   109 MBytes   911 Mbits/sec
[  7][RX-C]  75.00-76.00  sec  39.7 MBytes   333 Mbits/sec
[  5][TX-C]  76.00-77.00  sec   111 MBytes   929 Mbits/sec
[  7][RX-C]  76.00-77.00  sec  42.7 MBytes   358 Mbits/sec
[  5][TX-C]  77.00-78.00  sec  99.4 MBytes   834 Mbits/sec
[  7][RX-C]  77.00-78.00  sec  36.7 MBytes   308 Mbits/sec
[  5][TX-C]  78.00-79.00  sec   109 MBytes   916 Mbits/sec
[  7][RX-C]  78.00-79.00  sec  50.0 MBytes   420 Mbits/sec
[  5][TX-C]  79.00-80.00  sec   111 MBytes   928 Mbits/sec
[  7][RX-C]  79.00-80.00  sec  47.4 MBytes   398 Mbits/sec
[  5][TX-C]  80.00-81.00  sec   110 MBytes   926 Mbits/sec
[  7][RX-C]  80.00-81.00  sec  46.6 MBytes   391 Mbits/sec
[  5][TX-C]  81.00-82.00  sec   110 MBytes   926 Mbits/sec
[  7][RX-C]  81.00-82.00  sec  46.4 MBytes   389 Mbits/sec
[  5][TX-C]  82.00-83.00  sec   106 MBytes   889 Mbits/sec
[  7][RX-C]  82.00-83.00  sec  43.0 MBytes   361 Mbits/sec
[  5][TX-C]  83.00-84.00  sec   106 MBytes   889 Mbits/sec
[  7][RX-C]  83.00-84.00  sec  41.8 MBytes   351 Mbits/sec
[  5][TX-C]  84.00-85.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  84.00-85.00  sec  21.6 MBytes   181 Mbits/sec
[  5][TX-C]  85.00-86.00  sec   112 MBytes   941 Mbits/sec
[  7][RX-C]  85.00-86.00  sec  22.8 MBytes   192 Mbits/sec
[  5][TX-C]  86.00-87.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  86.00-87.00  sec  24.6 MBytes   206 Mbits/sec
[  5][TX-C]  87.00-88.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  87.00-88.00  sec  26.1 MBytes   219 Mbits/sec
[  5][TX-C]  88.00-89.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  88.00-89.00  sec  27.1 MBytes   227 Mbits/sec
[  5][TX-C]  89.00-90.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  89.00-90.00  sec  30.7 MBytes   257 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-90.00  sec  9.72 GBytes   928 Mbits/sec                  sender
[  5][TX-C]   0.00-90.01  sec  9.72 GBytes   928 Mbits/sec                  receiver
[  7][RX-C]   0.00-90.00  sec  3.18 GBytes   304 Mbits/sec   26             sender
[  7][RX-C]   0.00-90.01  sec  3.18 GBytes   303 Mbits/sec                  receiver

iperf Done.

Still way off topic! I'd be rather curious what cake does as a default qdisc on that hw without the shaper turned on. In it's full blown glory, it's pretty cpu intensive, but with gro splitting it should be able to get down to about 40kb on bql and and thus lower latency. On the other hand you can't push 2Gbit (bidir) on this hw through fq_codel at present, so you are hitting a limit somewhere in the rx path, ironically enough, the rx ring might be too small. I'm no fan of gro... particularly when done in software, I'd just as soon rip it out of the fast path. you can turn it off with ethtool. That said.... merely

Anyway, if you are bored, try this:

sysctl -w net.core.default_qdisc=cake
tc qdisc replace dev eth0 root pfifo
tc qdisc replace dev eth1 root pfifo

and rerun that test, as sort of a speed test of the simplest algo we have.

then try
tc qdisc del dev eth0 root
tc qdisc del dev eth1 root

(this should make cake be the default qdisc, check with tc -s qdisc show)

rerun the iperf test
tc -s qdisc show > cake_default.log

then

tc qdisc replace dev eth0 root cake besteffort flows
tc qdisc replace dev eth0 root cake besteffort flows

your iperf3 test
tc -s qdisc show > cake_less.txt

cat the bql value at the end.

@_FailSafe @dtaht

I still don't fully comprehend how DQL or the low-water mark helps reduce latency when we've already got AQL & ATF. I can see that the improvements, but I don't know how it works/it's mechanism with regards to the rest of the ath10k stack. Can anyone bring light to this?

Which make/model would be best? I'll seriously send it :wink:

Thank you for the offer, and I really do appreciate the sentiment. However, my problem is not so much lack of hardware, as lack of time to set up a proper testbed and running tests. I am planning to try to resurrect my old testbed, but I only have remote access so there are some limits to what I can do there. Otherwise I do have a lot of empty shelf space these days, but setting up a stack of routers there requires a bit more time investment...