AQL and the ath10k is *lovely*

Is SQM on or off on the router? You are bottlenecked at the ISP link, not the wifi, in both tests.

Idle latency being poor on wifi is normal, due to the impact of powersave (usually a min 100ms)

The gross disparity between the first BE flow and the other flows is puzzling. The four should have converged, long before they did, and what happened at T+250 is really puzzling. What's the client driving the test?

You actually should not be seeing 0 bufferbloat on this test, but somewhere between 5-20ms.

1 Like

Ah, this explains the idle latency, thanks!

SQM on the router is on. It's an Apu2 that handles the PPPoE connection. That's also the node running netserver for flent, so the flent-plot should show the wifi speed. The flent client is a Thinkpad P14s with a Realtek RTL8852AE card.

Netserver should be able to drive the network harder than that. A rrul test on apu2 class hardware should be able to crack 500Mbits over ethernet. (low latency at the cost of this much bandwidth seems odd)

Are you using BBR?

No, I don't use BBR. I have to check if Fedora (installed on my laptop) sets any fancy networking options, but I doubt that.
Here is a flent plot of a run via ethernet:

This also shows a big difference between flows. But since this doesn't seem to be related to wifi, should I open another thread? This one is already huge...

1 Like

Heh. Just having baseline performance figures for this hardware also on this thread would be good, so no need to fork it. You've now shown that this device could - if it was working right - drive the wifi to saturation well past 40Mbit - well past 200Mbit and it isn't, due to yet some other problem we have not found yet. On some other bug thread here are people reporting problems with "ax" mode, try ac?

On the ethernet front...

My guess is that the APU2 has 4 hardware queues and you don't have irqbalance installed. (a "tc -s show dev the_lan_network_device" would show mq + 4 instances of fq_codel). In this test two flows landed in one hardware queue, another ended up well mapped to the right cpu, the other less so. tc -s qdisc show on your fedora box will probably also so mqs + fq_codel

A test of the lan ethernet device with just fq_codel on it (tc replace dev the_lan_device) will probably show the downloads achieving parity between each other but not a full gbit.

The icmp induced latency looks to be about right (it's usually ~500us) the induced udp latency of > 5ms surprisingly high. I'd suspect TSO/GRO. Trying cake on the lan interface (without mq but with the gso-split option) - would probably cut that (due to cutting BQL size). There are numerous other subsystems in play like TSQ.

Trying 4 instances of cake with gso-split on mq would also be intersting.

The world keeps bulking up things on us. So much of linux's development is on really high end boxes and some recent modifications like running more of the stack on rx looked good on machines with large caches but I suspect hurt on everything else.

What does openwrt use for RT_PREEMPT and clock ticks these days?

1 Like

Interesting may be:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=6fcc06205c15bf1bb90896efdf5967028c154aba
and
https://lwn.net/Articles/883713/

I apply patch:

--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4764,7 +4764,6 @@ static void ath10k_mac_op_wake_tx_queue(struct ieee80211_hw *hw,
 					struct ieee80211_txq *txq)
 {
 	struct ath10k *ar = hw->priv;
-	int ret;
 	u8 ac;
 
 	ath10k_htt_tx_txq_update(hw, txq);
@@ -4777,11 +4776,9 @@ static void ath10k_mac_op_wake_tx_queue(struct ieee80211_hw *hw,
 	if (!txq)
 		goto out;
 
-	while (ath10k_mac_tx_can_push(hw, txq)) {
-		ret = ath10k_mac_tx_push_txq(hw, txq);
-		if (ret < 0)
-			break;
-	}
+	if (ath10k_mac_tx_can_push(hw, txq))
+		ath10k_mac_tx_push_txq(hw, txq);
+
 	ieee80211_return_txq(hw, txq, false);
 	ath10k_htt_tx_txq_update(hw, txq);

and mayble is better- I not visible bug in log ath10k_ahb a000000.wifi: failed to lookup txq for peer_id X tid X until about 24h used (max 7 clients and internet connection with maximum load). Connect in 10 meters test room is better ( now is about 15-20Mbits; without patch was about 2Mbits and somtimes 12-15)...

How long tests will are good? I no know yet :slight_smile:

we won't be ready for BIG tcp for a few more years. Certainly I can see the core google devs not testing anything on weak hw.

2 Likes

mumbai.starlink.taht.net

2 Likes

You are about the only person I know that can offer reasonably near-by RTT/OWD reflectors independent on whom from where ever asks :wink: Just waiting for mcmurdo.starlink.taht.net to come on-line :wink:

2 Likes

Sorry, I should have made my homework...
irqbalance was installed but not enabled (something I regularly forget on new OpenWrt installs). With irqbalance and just cake on eth1+2, latency indeed looks great:

So back to the EAP245v3. It only supports 802.11ac, the previous plot was ac as well. It's still slow, but at least it looks less chaotic.

I don't know what happened after 250s, I had to reconect, but dmesg on the ap didn't reveal anything. This was right after a reboot. I'll test again after a few hours/days uptime, since wifi tends to get worse after a few days.

2 Likes

As it has been many months of effort for all concerned here, I am in search long term stability first, regaining performance later. How many other simultaneous bugs do we still have? mt76 still has a range problem at 2.4ghz, there's other ath10k chipsets to try (are you using -ct, stock, -ct-smallbuffers?), nobody's tested the ath9k lately, qosify vs wifi of any sort, the "
iphone leaving the network" problem, and I forget what else.

More https://en.wikipedia.org/wiki/Six_Sigma

Less agile.

If we can get down to where all that stuff goes away, my original goal this quarter was to find more ways of making wifi more capable of twitch and cloud gaming! There's a bunch of easy stuff along that road that I'd have hoped to have tried by now, but setting the bar right now at "not crashing", at "not doing weird stuff over long intervals", and getting confirmation from everyone testing that that's the results we're getting across more chipsets will make for a MUCH better next openwrt release, which I'd settle for!

Anyway your wifi result is very good. The latency inflation you see is typical of the codel settings (target 20ms), and number of txops (2 each way, max 5.7ms) we currently use. (I would like to reduce these in the future) It does seem to be quite a lot less than what ac is capable of, and the upload especially seems low. I don't know anything about the driver for that chipset.

One thing to explore is to try and figure out if it is the rate controller or some form of starvation that is limiting your bandwidth. (from your results above it "feels" like the rate controller) Doing an aircap during a short test can pull out what the advertised rates and typical transmissions actually are, google for "aircap tshark". I don't remember enough about how to get at ath10k rate controller stats from the chip, or even if that's possible. @nbd? @tohojo?

Another simpler test is to just test downloads and uploads separately.

flent -H APU -t the_test_conditions --step-size=.05 --te=download_streams=4 --socket-stats tcp_ndown

flent -H APU -t the_test_conditions --step-size=.05 --te=upload_streams=4 --socket-stats tcp_nup

Another is setting the range of rates manually.

Also if -l 300 is reliably cutting off after 250s that is either a flent bug, or a wifi bug! I can think of a multiplicity of causes for that - a dhcp renew failure, an overflow bug somewhere in the DRR implementation, if it's repeatable that would be weirdly comforting. If it were tied to the rate being achieved (say we got twice the throughput by testing tcp_ndown, and it crashes in half the time), that would also be "better".

-l 600, maybe. I'd be really happy if folk ran the rrul_be, rrul, and rtt_fair tests for days and days at a time (I think the upper limit is about 2000s per test). Six Sigma... one of the nice things about fq_codel derived solutions is that - when it's working right - you hardly notice when something else is saturating the system.

4 Likes

Going back to your excellent cake result, showing this hardware is suitable as a simpler test driver for wifi also...

x86 hardware and perhaps some new stuff like the R5S are the first things that I might recommend cake nat gso-split ran natively on, instead of defaulting to fq_codel. can you post the output of "tc -s qdisc show > /tmp/rrul_be_cake.stats" halfway through running this test?

capturing the bql stats would also be useful also in the middle of a test run.

cat /sys/class/net/eth*/queues/tx-*/byte_queue_limits/limit > /tmp/bql.stats

Not to rant, but the 1.5ms observed on this better test, is around 25x more than what is actually feasible for forwarding behavior under this workload. Each full size packet consumes about 13us on the wire (there's 4, so 52us), the acks 1us, the pings about the same.

Not happy with the periodic spike above 1.5ms either! Don't know where that is coming from. PREEMPT_RT might help.

In a more ideal universe we'd be finding ways to tune NAPI down, have ddpk as an option, and offloads for more stuff and be shooting for sub 250us induced latency and it would behave consistently like a "fluid model", but a "mere" 1.5ms latency inflation at 1gbit is remarkably good for current hw. Getting to sub-250us would make for a "tighter" interface to wifi which cannot do much better than than that.

4 Likes

PS, you get wider, more detailed plots, if you use symbolic names like APU rather than 192.168.1.1. Stick in your /etc/hosts file

192.168.1.1 APU

I also tend to do the same thing for the whatever.starlink.taht.net cloud, and/or add starlink.taht.net to my /etc/resolv.conf file

3 Likes

Regarding the sub thread about flent on macos, the ever amazing @tohojo just fixed some annoying issues that existed between recent flent and macos monterey/python3.9.

5 Likes

Your enthusiasm for this topic is truly contagious - thank you so much for your insights and for dedicating so much time to the users here!

Here are the wifi up-/download results, I'll add the cake stats tomorrow.

Summary of tcp_ndown test run from 2022-07-05 20:58:56.825608
  Title: 'QCA9990 802.11nac'

                             avg       median          # data pts
 Ping (ms) ICMP   :        21.30        20.60 ms             1399
 TCP download avg :        22.05          N/A Mbits/s        1399
 TCP download sum :        88.20          N/A Mbits/s        1399
 TCP download::1  :        20.05        21.30 Mbits/s        1399
 TCP download::2  :        21.22        21.65 Mbits/s        1399
 TCP download::3  :        23.44        22.86 Mbits/s        1399
 TCP download::4  :        23.49        22.94 Mbits/s        1399
Summary of tcp_nup test run from 2022-07-05 21:00:28.855437
  Title: 'QCA9990 802.11nac'

                                             avg       median          # data pts
 Ping (ms) ICMP                   :         3.77         2.96 ms             1317
 TCP upload avg                   :        13.99          N/A Mbits/s        1399
 TCP upload sum                   :        55.98          N/A Mbits/s        1399
 TCP upload::1                    :        14.03        14.63 Mbits/s        1399
 TCP upload::1::tcp_cwnd          :         6.36         6.00                 963
 TCP upload::1::tcp_delivery_rate :        14.17        12.72                 963
 TCP upload::1::tcp_pacing_rate   :        22.19        21.22                 963
 TCP upload::1::tcp_rtt           :         4.37         4.30                 959
 TCP upload::1::tcp_rtt_var       :         1.04         0.88                 959
 TCP upload::2                    :        14.00        14.22 Mbits/s        1399
 TCP upload::2::tcp_cwnd          :         6.26         6.00                 963
 TCP upload::2::tcp_delivery_rate :        14.25        13.30                 963
 TCP upload::2::tcp_pacing_rate   :        22.09        20.91                 963
 TCP upload::2::tcp_rtt           :         4.36         4.29                 961
 TCP upload::2::tcp_rtt_var       :         1.04         0.90                 961
 TCP upload::3                    :        14.26        14.62 Mbits/s        1399
 TCP upload::3::tcp_cwnd          :         6.46         6.00                 963
 TCP upload::3::tcp_delivery_rate :        14.46        13.40                 963
 TCP upload::3::tcp_pacing_rate   :        22.75        21.56                 963
 TCP upload::3::tcp_rtt           :         4.35         4.28                 959
 TCP upload::3::tcp_rtt_var       :         1.03         0.89                 959
 TCP upload::4                    :        13.69        14.22 Mbits/s        1399
 TCP upload::4::tcp_cwnd          :         6.28         6.00                 963
 TCP upload::4::tcp_delivery_rate :        14.07        13.07                 963
 TCP upload::4::tcp_pacing_rate   :        22.19        21.00                 963
 TCP upload::4::tcp_rtt           :         4.35         4.28                 961
 TCP upload::4::tcp_rtt_var       :         1.04         0.90                 961
3 Likes

what is your HT setting in /etc/config/wireless?

grep -i HT /etc/config/wireless

Speed in test room is yet fine (15-24Mbps of speedtest.net and 30-35Mbps with iperf; pings are about 25-28ms- but sometimes jumps) and is without a visible bugs about 3 days uptime (correct 2 days and 18 hours). I will apply another patch for testing in the evening.

1 Like

It's 'VHT80'
I'm using the default ath10k-ct driver, I didn't mention that, yet.

oh, good, I thought people had given up on -ct. -ct has multiple other benefits, notably adhoc mode, and until these past few months seemed superior in every way to the stock driver.

1 Like

@dtaht +1 I just want to call this out as an incredibly important goal. It doesn't matter how fast the links speeds are/how low the latency is: if the code ever does weird stuff, then whatever change you've introduced doesn't matter. Thanks.

3 Likes