AQL and the ath10k is lovely

_FailSafe · May 6, 2020, 9:23pm

Due to @ParanoidZoid's great observations, I am now convinced the latest results I sent you are due to Ben's low-water mark change. I jump into more details in Edit 1 and 2 here: AQL and the ath10k is *lovely*

That said, I do acknowledge your desire to see the watermarks replaced with a bql implementation. However, I am curious as to the results I have seen in testing. Ben's htt->max_num_pending_tx / 4 with the ath10k-ct's 2048 tx buffers means this low-watermark would be 512. With me running the ath10k-ct-smallbuffers at 512 tx buffers from the start, this low-watermark for me would be 128.

It seems to me this would, at least in theory, result in even better/more consistent latency (at the potential cost of throughput) on the ath10k-ct-smallbuffers driver than on the ath10k-ct driver. Or am I crazy? (I can handle honesty )

dtaht · May 6, 2020, 11:48pm

can you see if the r7800 has bql on the ethernet driver?

cat /sys/class/net/eth0/queues/tx-0/byte_queue_limits/limit

It's not directly relevant to this discussion. Just curious. Same goes for what ethernet driver it is, but I don't know how to find out that from sysfs.

dtaht · May 6, 2020, 11:52pm

you are utterly correct that a 25% percentage of smaller tx ring results in less latency and jitter than a 25% percentage of a larger one, at a possible cost in in throughput. and you are not crazy.

However, packets range in size from 64 bytes to 64k bytes (with gso). and bytes = time, on ethernet.

on wifi, it's airtime, but on 802.11ac, (not n) bytes = time is a decent proxy, better than the tx ring watermarks. Figuring out airtime is what AQL is sorta supposed to do, but in this first version,
aql_threshold is a fixed value and shouldn't be.

_FailSafe · May 7, 2020, 12:28am

Aye--it does appear to...

root@OpenWrt:~# cat /sys/class/net/eth0/queues/tx-0/byte_queue_limits/limit
6072

As for the driver, not sure if this is the right location or not:

root@OpenWrt:~# cat /sys/class/net/eth0/device/driver/37200000.ethernet/modalias
of:NethernetTnetworkCqcom,ipq806x-gmac

dtaht · May 7, 2020, 12:38am

wow. is that running at a gbit? that's a reasonable 100mbit value....

_FailSafe · May 7, 2020, 12:45am

It is a gigabit link, yes. I checked the limit again a couple minutes later and it was down to 3100-3700 range. What were you expecting/hoping to see?

dtaht · May 7, 2020, 12:54am

you got flow offloads on that thing? under load, with GRO, at 1Gige, it is typically much larger,
128k or more. Hit it with a load and measure.

It slowly gets smaller when there is no load.

_FailSafe · May 7, 2020, 1:15am

I kicked off a 90 second bi-directional ipferf3 between my MacBook and a server that is also gigabit connected (through a Netgear switch to my R7800). The Netgear switch has tx/rx flow control enabled.

During that iperf3 run, I never saw the limit value on the R7800 go above 6123:

root@OpenWrt:~# cat /sys/class/net/eth0/queues/tx-0/byte_queue_limits/limit
6123

Here's more info for you on eth0:

root@OpenWrt:~# ethtool -k eth0
Features for eth0:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: on
	tx-checksum-ip-generic: off [fixed]
	tx-checksum-ipv6: on
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
	tx-tcp-segmentation: off [fixed]
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp-mangleid-segmentation: off [fixed]
	tx-tcp6-segmentation: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: on [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

dtaht · May 7, 2020, 1:19am

this is sot totally OT, but that's pretty nice. What did fq_codel do on that device? (ts -s qdisc show dev eth0)

(ya'all are convincing me I should go get one of these puppies) bqlmon is an interesting tool btw, don't know if it's in openwrt.

dtaht · May 7, 2020, 1:25am

and to be clear the test I was trying to do was through the router, not through the switch.

lan -> wan

not lan to lan. anyway... I got distracted there. I'm going to have some time to delve into the wifi part of this this weekend probably....

_FailSafe · May 7, 2020, 2:10am

I should have been checking eth1 instead of eth0. My primary SSID is tagged to eth1--it didn't dawn on me earlier that I should have been watching it.

Running the iperf3 again via WiFi resulted in eth1 bql limit values of 100k - 250k. Sounds like more believable values compared to what you were seeing.

Just for grins, I also connected my Mac to the R7800 via ethernet (gig) and run the same iperf3 back to my server.

Iperf3 result:

$ iperf3 -c 192.168.XX.5 -i1 -t90 --bidir
Connecting to host 192.168.XX.5, port 5201
[  5] local 192.168.XX.150 port 53856 connected to 192.168.XX.5 port 5201
[  7] local 192.168.XX.150 port 53857 connected to 192.168.XX.5 port 5201
[ ID][Role] Interval           Transfer     Bitrate
[  5][TX-C]   0.00-1.00   sec  73.6 MBytes   618 Mbits/sec
[  7][RX-C]   0.00-1.00   sec  76.5 MBytes   642 Mbits/sec
[  5][TX-C]   1.00-2.00   sec   112 MBytes   942 Mbits/sec
[  7][RX-C]   1.00-2.00   sec  36.8 MBytes   308 Mbits/sec
[  5][TX-C]   2.00-3.00   sec   112 MBytes   938 Mbits/sec
[  7][RX-C]   2.00-3.00   sec  28.3 MBytes   238 Mbits/sec
[  5][TX-C]   3.00-4.00   sec   112 MBytes   940 Mbits/sec
[  7][RX-C]   3.00-4.00   sec  31.5 MBytes   265 Mbits/sec
[  5][TX-C]   4.00-5.00   sec   112 MBytes   939 Mbits/sec
[  7][RX-C]   4.00-5.00   sec  33.3 MBytes   280 Mbits/sec
[  5][TX-C]   5.00-6.00   sec   112 MBytes   939 Mbits/sec
[  7][RX-C]   5.00-6.00   sec  35.4 MBytes   297 Mbits/sec
[  5][TX-C]   6.00-7.00   sec   112 MBytes   939 Mbits/sec
[  7][RX-C]   6.00-7.00   sec  37.3 MBytes   313 Mbits/sec
[  5][TX-C]   7.00-8.00   sec   112 MBytes   938 Mbits/sec
[  7][RX-C]   7.00-8.00   sec  39.3 MBytes   329 Mbits/sec
[  5][TX-C]   8.00-9.00   sec   112 MBytes   939 Mbits/sec
[  7][RX-C]   8.00-9.00   sec  41.1 MBytes   344 Mbits/sec
[  5][TX-C]   9.00-10.00  sec   112 MBytes   938 Mbits/sec
[  7][RX-C]   9.00-10.00  sec  43.0 MBytes   361 Mbits/sec
[  5][TX-C]  10.00-11.00  sec   112 MBytes   938 Mbits/sec
[  7][RX-C]  10.00-11.00  sec  44.9 MBytes   376 Mbits/sec
[  5][TX-C]  11.00-12.00  sec   107 MBytes   898 Mbits/sec
[  7][RX-C]  11.00-12.00  sec  51.3 MBytes   430 Mbits/sec
[  5][TX-C]  12.00-13.00  sec   110 MBytes   923 Mbits/sec
[  7][RX-C]  12.00-13.00  sec  50.8 MBytes   426 Mbits/sec
[  5][TX-C]  13.00-14.00  sec   111 MBytes   931 Mbits/sec
[  7][RX-C]  13.00-14.00  sec  53.2 MBytes   446 Mbits/sec
[  5][TX-C]  14.00-15.00  sec   110 MBytes   923 Mbits/sec
[  7][RX-C]  14.00-15.00  sec  44.6 MBytes   374 Mbits/sec
[  5][TX-C]  15.00-16.00  sec   111 MBytes   932 Mbits/sec
[  7][RX-C]  15.00-16.00  sec  44.5 MBytes   374 Mbits/sec
[  5][TX-C]  16.00-17.00  sec   110 MBytes   926 Mbits/sec
[  7][RX-C]  16.00-17.00  sec  34.2 MBytes   287 Mbits/sec
[  5][TX-C]  17.00-18.00  sec   112 MBytes   938 Mbits/sec
[  7][RX-C]  17.00-18.00  sec  28.7 MBytes   241 Mbits/sec
[  5][TX-C]  18.00-19.00  sec   112 MBytes   937 Mbits/sec
[  7][RX-C]  18.00-19.00  sec  33.0 MBytes   277 Mbits/sec
[  5][TX-C]  19.00-20.00  sec   111 MBytes   934 Mbits/sec
[  7][RX-C]  19.00-20.00  sec  37.2 MBytes   312 Mbits/sec
[  5][TX-C]  20.00-21.00  sec   111 MBytes   934 Mbits/sec
[  7][RX-C]  20.00-21.00  sec  41.1 MBytes   345 Mbits/sec
[  5][TX-C]  21.00-22.00  sec   110 MBytes   926 Mbits/sec
[  7][RX-C]  21.00-22.00  sec  45.0 MBytes   378 Mbits/sec
[  5][TX-C]  22.00-23.00  sec   111 MBytes   928 Mbits/sec
[  7][RX-C]  22.00-23.00  sec  47.8 MBytes   401 Mbits/sec
[  5][TX-C]  23.00-24.00  sec   111 MBytes   935 Mbits/sec
[  7][RX-C]  23.00-24.00  sec  50.8 MBytes   426 Mbits/sec
[  5][TX-C]  24.00-25.00  sec   111 MBytes   933 Mbits/sec
[  7][RX-C]  24.00-25.00  sec  52.6 MBytes   441 Mbits/sec
[  5][TX-C]  25.00-26.00  sec   111 MBytes   935 Mbits/sec
[  7][RX-C]  25.00-26.00  sec  53.1 MBytes   445 Mbits/sec
[  5][TX-C]  26.00-27.00  sec   112 MBytes   938 Mbits/sec
[  7][RX-C]  26.00-27.00  sec  52.3 MBytes   439 Mbits/sec
[  5][TX-C]  27.00-28.00  sec   111 MBytes   935 Mbits/sec
[  7][RX-C]  27.00-28.00  sec  51.0 MBytes   428 Mbits/sec
[  5][TX-C]  28.00-29.00  sec   111 MBytes   935 Mbits/sec
[  7][RX-C]  28.00-29.00  sec  48.7 MBytes   408 Mbits/sec
[  5][TX-C]  29.00-30.00  sec   112 MBytes   936 Mbits/sec
[  7][RX-C]  29.00-30.00  sec  45.4 MBytes   381 Mbits/sec
[  5][TX-C]  30.00-31.00  sec   112 MBytes   941 Mbits/sec
[  7][RX-C]  30.00-31.00  sec  46.1 MBytes   386 Mbits/sec
[  5][TX-C]  31.00-32.00  sec   110 MBytes   922 Mbits/sec
[  7][RX-C]  31.00-32.00  sec  31.4 MBytes   263 Mbits/sec
[  5][TX-C]  32.00-33.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  32.00-33.00  sec  27.5 MBytes   231 Mbits/sec
[  5][TX-C]  33.00-34.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  33.00-34.00  sec  29.2 MBytes   245 Mbits/sec
[  5][TX-C]  34.00-35.00  sec   112 MBytes   938 Mbits/sec
[  7][RX-C]  34.00-35.00  sec  31.0 MBytes   260 Mbits/sec
[  5][TX-C]  35.00-36.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  35.00-36.00  sec  32.5 MBytes   273 Mbits/sec
[  5][TX-C]  36.00-37.00  sec   109 MBytes   914 Mbits/sec
[  7][RX-C]  36.00-37.00  sec  37.6 MBytes   315 Mbits/sec
[  5][TX-C]  37.00-38.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  37.00-38.00  sec  36.7 MBytes   308 Mbits/sec
[  5][TX-C]  38.00-39.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  38.00-39.00  sec  38.6 MBytes   324 Mbits/sec
[  5][TX-C]  39.00-40.00  sec   112 MBytes   938 Mbits/sec
[  7][RX-C]  39.00-40.00  sec  40.2 MBytes   337 Mbits/sec
[  5][TX-C]  40.00-41.00  sec   108 MBytes   905 Mbits/sec
[  7][RX-C]  40.00-41.00  sec  38.0 MBytes   319 Mbits/sec
[  5][TX-C]  41.00-42.00  sec   112 MBytes   943 Mbits/sec
[  7][RX-C]  41.00-42.00  sec  22.1 MBytes   185 Mbits/sec
[  5][TX-C]  42.00-43.00  sec   111 MBytes   935 Mbits/sec
[  7][RX-C]  42.00-43.00  sec  23.3 MBytes   195 Mbits/sec
[  5][TX-C]  43.00-44.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  43.00-44.00  sec  25.7 MBytes   216 Mbits/sec
[  5][TX-C]  44.00-45.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  44.00-45.00  sec  26.1 MBytes   219 Mbits/sec
[  5][TX-C]  45.00-46.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  45.00-46.00  sec  28.5 MBytes   239 Mbits/sec
[  5][TX-C]  46.00-47.00  sec   112 MBytes   937 Mbits/sec
[  7][RX-C]  46.00-47.00  sec  30.9 MBytes   259 Mbits/sec
[  5][TX-C]  47.00-48.00  sec   110 MBytes   924 Mbits/sec
[  7][RX-C]  47.00-48.00  sec  18.5 MBytes   156 Mbits/sec
[  5][TX-C]  48.00-49.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  48.00-49.00  sec  18.5 MBytes   155 Mbits/sec
[  5][TX-C]  49.00-50.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  49.00-50.00  sec  20.1 MBytes   168 Mbits/sec
[  5][TX-C]  50.00-51.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  50.00-51.00  sec  20.7 MBytes   174 Mbits/sec
[  5][TX-C]  51.00-52.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  51.00-52.00  sec  22.1 MBytes   186 Mbits/sec
[  5][TX-C]  52.00-53.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  52.00-53.00  sec  25.7 MBytes   215 Mbits/sec
[  5][TX-C]  53.00-54.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  53.00-54.00  sec  27.4 MBytes   230 Mbits/sec
[  5][TX-C]  54.00-55.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  54.00-55.00  sec  29.4 MBytes   247 Mbits/sec
[  5][TX-C]  55.00-56.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  55.00-56.00  sec  31.4 MBytes   263 Mbits/sec
[  5][TX-C]  56.00-57.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  56.00-57.00  sec  33.1 MBytes   277 Mbits/sec
[  5][TX-C]  57.00-58.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  57.00-58.00  sec  34.8 MBytes   292 Mbits/sec
[  5][TX-C]  58.00-59.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  58.00-59.00  sec  36.8 MBytes   309 Mbits/sec
[  5][TX-C]  59.00-60.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  59.00-60.00  sec  38.7 MBytes   325 Mbits/sec
[  5][TX-C]  60.00-61.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  60.00-61.00  sec  40.7 MBytes   341 Mbits/sec
[  5][TX-C]  61.00-62.00  sec   109 MBytes   912 Mbits/sec
[  7][RX-C]  61.00-62.00  sec  29.7 MBytes   249 Mbits/sec
[  5][TX-C]  62.00-63.00  sec   112 MBytes   941 Mbits/sec
[  7][RX-C]  62.00-63.00  sec  22.1 MBytes   186 Mbits/sec
[  5][TX-C]  63.00-64.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  63.00-64.00  sec  23.9 MBytes   201 Mbits/sec
[  5][TX-C]  64.00-65.00  sec   107 MBytes   900 Mbits/sec
[  7][RX-C]  64.00-65.00  sec  22.9 MBytes   192 Mbits/sec
[  5][TX-C]  65.00-66.00  sec   111 MBytes   930 Mbits/sec
[  7][RX-C]  65.00-66.00  sec  27.5 MBytes   231 Mbits/sec
[  5][TX-C]  66.00-67.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  66.00-67.00  sec  29.3 MBytes   246 Mbits/sec
[  5][TX-C]  67.00-68.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  67.00-68.00  sec  30.7 MBytes   257 Mbits/sec
[  5][TX-C]  68.00-69.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  68.00-69.00  sec  32.9 MBytes   276 Mbits/sec
[  5][TX-C]  69.00-70.00  sec   111 MBytes   931 Mbits/sec
[  7][RX-C]  69.00-70.00  sec  34.9 MBytes   292 Mbits/sec
[  5][TX-C]  70.00-71.00  sec   112 MBytes   940 Mbits/sec
[  7][RX-C]  70.00-71.00  sec  36.8 MBytes   309 Mbits/sec
[  5][TX-C]  71.00-72.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  71.00-72.00  sec  38.0 MBytes   319 Mbits/sec
[  5][TX-C]  72.00-73.00  sec   112 MBytes   936 Mbits/sec
[  7][RX-C]  72.00-73.00  sec  38.9 MBytes   327 Mbits/sec
[  5][TX-C]  73.00-74.00  sec   111 MBytes   927 Mbits/sec
[  7][RX-C]  73.00-74.00  sec  42.5 MBytes   356 Mbits/sec
[  5][TX-C]  74.00-75.00  sec   110 MBytes   926 Mbits/sec
[  7][RX-C]  74.00-75.00  sec  41.8 MBytes   351 Mbits/sec
[  5][TX-C]  75.00-76.00  sec   109 MBytes   911 Mbits/sec
[  7][RX-C]  75.00-76.00  sec  39.7 MBytes   333 Mbits/sec
[  5][TX-C]  76.00-77.00  sec   111 MBytes   929 Mbits/sec
[  7][RX-C]  76.00-77.00  sec  42.7 MBytes   358 Mbits/sec
[  5][TX-C]  77.00-78.00  sec  99.4 MBytes   834 Mbits/sec
[  7][RX-C]  77.00-78.00  sec  36.7 MBytes   308 Mbits/sec
[  5][TX-C]  78.00-79.00  sec   109 MBytes   916 Mbits/sec
[  7][RX-C]  78.00-79.00  sec  50.0 MBytes   420 Mbits/sec
[  5][TX-C]  79.00-80.00  sec   111 MBytes   928 Mbits/sec
[  7][RX-C]  79.00-80.00  sec  47.4 MBytes   398 Mbits/sec
[  5][TX-C]  80.00-81.00  sec   110 MBytes   926 Mbits/sec
[  7][RX-C]  80.00-81.00  sec  46.6 MBytes   391 Mbits/sec
[  5][TX-C]  81.00-82.00  sec   110 MBytes   926 Mbits/sec
[  7][RX-C]  81.00-82.00  sec  46.4 MBytes   389 Mbits/sec
[  5][TX-C]  82.00-83.00  sec   106 MBytes   889 Mbits/sec
[  7][RX-C]  82.00-83.00  sec  43.0 MBytes   361 Mbits/sec
[  5][TX-C]  83.00-84.00  sec   106 MBytes   889 Mbits/sec
[  7][RX-C]  83.00-84.00  sec  41.8 MBytes   351 Mbits/sec
[  5][TX-C]  84.00-85.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  84.00-85.00  sec  21.6 MBytes   181 Mbits/sec
[  5][TX-C]  85.00-86.00  sec   112 MBytes   941 Mbits/sec
[  7][RX-C]  85.00-86.00  sec  22.8 MBytes   192 Mbits/sec
[  5][TX-C]  86.00-87.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  86.00-87.00  sec  24.6 MBytes   206 Mbits/sec
[  5][TX-C]  87.00-88.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  87.00-88.00  sec  26.1 MBytes   219 Mbits/sec
[  5][TX-C]  88.00-89.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  88.00-89.00  sec  27.1 MBytes   227 Mbits/sec
[  5][TX-C]  89.00-90.00  sec   112 MBytes   939 Mbits/sec
[  7][RX-C]  89.00-90.00  sec  30.7 MBytes   257 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-90.00  sec  9.72 GBytes   928 Mbits/sec                  sender
[  5][TX-C]   0.00-90.01  sec  9.72 GBytes   928 Mbits/sec                  receiver
[  7][RX-C]   0.00-90.00  sec  3.18 GBytes   304 Mbits/sec   26             sender
[  7][RX-C]   0.00-90.01  sec  3.18 GBytes   303 Mbits/sec                  receiver

iperf Done.

dtaht · May 7, 2020, 6:07am

Still way off topic! I'd be rather curious what cake does as a default qdisc on that hw without the shaper turned on. In it's full blown glory, it's pretty cpu intensive, but with gro splitting it should be able to get down to about 40kb on bql and and thus lower latency. On the other hand you can't push 2Gbit (bidir) on this hw through fq_codel at present, so you are hitting a limit somewhere in the rx path, ironically enough, the rx ring might be too small. I'm no fan of gro... particularly when done in software, I'd just as soon rip it out of the fast path. you can turn it off with ethtool. That said.... merely

Anyway, if you are bored, try this:

sysctl -w net.core.default_qdisc=cake
tc qdisc replace dev eth0 root pfifo
tc qdisc replace dev eth1 root pfifo

and rerun that test, as sort of a speed test of the simplest algo we have.

then try
tc qdisc del dev eth0 root
tc qdisc del dev eth1 root

(this should make cake be the default qdisc, check with tc -s qdisc show)

rerun the iperf test
tc -s qdisc show > cake_default.log

then

tc qdisc replace dev eth0 root cake besteffort flows
tc qdisc replace dev eth0 root cake besteffort flows

your iperf3 test
tc -s qdisc show > cake_less.txt

cat the bql value at the end.

ParanoidZoid · May 7, 2020, 7:12am

@_FailSafe @dtaht

I still don't fully comprehend how DQL or the low-water mark helps reduce latency when we've already got AQL & ATF. I can see that the improvements, but I don't know how it works/it's mechanism with regards to the rest of the ath10k stack. Can anyone bring light to this?

tohojo · May 7, 2020, 10:21am

Which make/model would be best? I'll seriously send it

Thank you for the offer, and I really do appreciate the sentiment. However, my problem is not so much lack of hardware, as lack of time to set up a proper testbed and running tests. I am planning to try to resurrect my old testbed, but I only have remote access so there are some limits to what I can do there. Otherwise I do have a lot of empty shelf space these days, but setting up a stack of routers there requires a bit more time investment...

tohojo · May 7, 2020, 10:23am

I still don't fully comprehend how DQL or the low-water mark helps reduce latency when we've already got AQL & ATF. I can see that the improvements, but I don't know how it works/it's mechanism with regards to the rest of the ath10k stack. Can anyone bring light to this?

Astute observation, and this is actually the reason I don't think spending more time on DQL for ath10k is the right thing to do. Making things airtime-based is clearly the right thing to do, so I'd rather spend the effort tweaking AQL. The main thing that is missing there (I think) is a global limit on the whole interface; AQL only has a per-station throttle. Ben's original patch (with the low-watermark tweak) was because he was running tests with a lot of stations (as in, hundreds), which has not been a case that has seen a lot of focus for AQL. So it ought to be possible to improve this case with the existing infrastructure...

dtaht · May 7, 2020, 4:09pm

I'd like openwrt to ship what we have so far. It's an enormous improvement and more users need it. We can keep sorting out better approaches as we have time.

dtaht · May 7, 2020, 4:11pm

I don't care about 100 users, I care about, oh, a max of 32. For openwrt.

Also, I think we need to tackle 802.11e better, keeping no more than two 802.11e queues active.

dana44 · May 7, 2020, 5:01pm

I don't care about 100 users, I care about, oh, a max of 32. For openwrt.

Please consider >32 no longer unreasonable. Maybe not >32 high bandwidth users, but definitely >32 users.

These days vacuums, thermostats, smoke detectors, fridges, doorbells, TVs, chromecasts, speakers, game consoles, and cars in the driveway all want on wifi. Having a smart speaker that can control colour changing smart bulbs and smart plugs, all with their own wifi connections, are popular with kids these days.

Not to mention occasionally hosting big multi-family event like Christmas where the kids run off into little groups to have tiktok watching parties while others want to stream 4k netflix. Ideally openwrt can support all this. I'm more worried that 64 isn't enough than 32, at least in a non-covid-19 world.

dtaht · May 7, 2020, 5:19pm

I certainly care about 32+ uisers. On the other hand, prior this patchset, the ath10k could seriously misbehave with 1. I'll take it!

dtaht · May 7, 2020, 6:26pm

If I have any one end goal with this work in making the ath10k sing and dance, its to finally be able to test the l4s vs sce concepts on real hardware, on wifi. I really lack time and braincells for hard core kernel development, I'm mostly just a theorist, and although I LOVE hacking on code, I have to spend too much time at layers 8 and 9 of the stack these days to focus on it. so if you know anyone that can lend a hand to this effort over here:

Perhaps we can make progress on adding similar features to the wifi implementation and analyzing them.

thx. There's a ton of other stuff worth doing in wifi as well... probably more important than this. I keep thinking importing an optional ack filter based on the cake ack-filter will help, and also I think the wifi implementation needs to also adopt the drop batching stuff that is already in the qdisc....