AQL and the ath10k is *lovely*

Follow-up question. I gather that attention was recently afforded to multicast data. What about traffic like this:

root@OpenWrt:~# tcpdump -vpni br-guest
tcpdump: listening on br-guest, link-type EN10MB (Ethernet), capture size 262144 bytes
22:06:08.043450 IP (tos 0x0, ttl 255, id 53539, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.192.49154 > 255.255.255.255.6666: UDP, length 175
22:06:08.514987 IP (tos 0x0, ttl 255, id 53529, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.191.49154 > 255.255.255.255.6666: UDP, length 175
22:06:08.529170 IP (tos 0x0, ttl 255, id 53525, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.156.49154 > 255.255.255.255.6666: UDP, length 175
22:06:09.398553 IP (tos 0x0, ttl 255, id 34152, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.109.49154 > 255.255.255.255.6667: UDP, length 188
22:06:09.846746 IP (tos 0x0, ttl 255, id 34157, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.200.49154 > 255.255.255.255.6667: UDP, length 188
22:06:10.085035 IP (tos 0x0, ttl 255, id 34150, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.134.49154 > 255.255.255.255.6667: UDP, length 188
22:06:10.085365 IP (tos 0x0, ttl 255, id 34150, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.134.49154 > 255.255.255.255.6667: UDP, length 188
22:06:10.269147 IP (tos 0x0, ttl 255, id 52923, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.179.49155 > 255.255.255.255.6667: UDP, length 188
22:06:10.527005 IP (tos 0x0, ttl 255, id 53528, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.119.49154 > 255.255.255.255.6666: UDP, length 175
22:06:10.564785 IP (tos 0x0, ttl 255, id 34171, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.128.49154 > 255.255.255.255.6667: UDP, length 188
22:06:11.043718 IP (tos 0x0, ttl 255, id 53540, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.192.49154 > 255.255.255.255.6666: UDP, length 175
22:06:11.513351 IP (tos 0x0, ttl 255, id 53530, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.191.49154 > 255.255.255.255.6666: UDP, length 175
22:06:11.530135 IP (tos 0x0, ttl 255, id 53526, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.156.49154 > 255.255.255.255.6666: UDP, length 175
22:06:12.074089 IP (tos 0x0, ttl 255, id 47449, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.118.49154 > 255.255.255.255.6667: UDP, length 188
22:06:12.102683 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.2.179 tell 192.168.2.179, length 28
22:06:12.374825 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.2.109 tell 192.168.2.109, length 28
22:06:12.776482 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.2.134 tell 192.168.2.134, length 28
22:06:12.903406 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.2.200 tell 192.168.2.200, length 28
22:06:13.520926 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.2.128 tell 192.168.2.128, length 28
22:06:13.529649 IP (tos 0x0, ttl 255, id 53529, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.119.49154 > 255.255.255.255.6666: UDP, length 175
22:06:14.042985 IP (tos 0x0, ttl 255, id 53541, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.192.49154 > 255.255.255.255.6666: UDP, length 175
22:06:14.395631 IP (tos 0x0, ttl 255, id 34153, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.109.49154 > 255.255.255.255.6667: UDP, length 188
22:06:14.486113 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.2.118 tell 192.168.2.118, length 28
22:06:14.514464 IP (tos 0x0, ttl 255, id 53531, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.191.49154 > 255.255.255.255.6666: UDP, length 175
22:06:14.524313 IP (tos 0x0, ttl 255, id 53527, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.156.49154 > 255.255.255.255.6666: UDP, length 175
22:06:14.846485 IP (tos 0x0, ttl 255, id 34158, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.200.49154 > 255.255.255.255.6667: UDP, length 188
22:06:15.085166 IP (tos 0x0, ttl 255, id 34151, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.134.49154 > 255.255.255.255.6667: UDP, length 188
22:06:15.270983 IP (tos 0x0, ttl 255, id 52924, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.179.49155 > 255.255.255.255.6667: UDP, length 188
22:06:15.564898 IP (tos 0x0, ttl 255, id 34172, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.128.49154 > 255.255.255.255.6667: UDP, length 188
22:06:16.527908 IP (tos 0x0, ttl 255, id 53530, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.119.49154 > 255.255.255.255.6666: UDP, length 175
22:06:17.043297 IP (tos 0x0, ttl 255, id 53542, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.192.49154 > 255.255.255.255.6666: UDP, length 175
22:06:17.073189 IP (tos 0x0, ttl 255, id 47450, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.118.49154 > 255.255.255.255.6667: UDP, length 188
22:06:17.514675 IP (tos 0x0, ttl 255, id 53532, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.191.49154 > 255.255.255.255.6666: UDP, length 175
22:06:17.528499 IP (tos 0x0, ttl 255, id 53528, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.156.49154 > 255.255.255.255.6666: UDP, length 175
22:06:19.399187 IP (tos 0x0, ttl 255, id 34154, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.109.49154 > 255.255.255.255.6667: UDP, length 188
22:06:19.528582 IP (tos 0x0, ttl 255, id 53531, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.119.49154 > 255.255.255.255.6666: UDP, length 175
22:06:19.848940 IP (tos 0x0, ttl 255, id 34159, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.200.49154 > 255.255.255.255.6667: UDP, length 188
22:06:20.048174 IP (tos 0x0, ttl 255, id 53543, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.192.49154 > 255.255.255.255.6666: UDP, length 175
22:06:20.085719 IP (tos 0x0, ttl 255, id 34152, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.134.49154 > 255.255.255.255.6667: UDP, length 188
22:06:20.269806 IP (tos 0x0, ttl 255, id 52925, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.179.49155 > 255.255.255.255.6667: UDP, length 188
22:06:20.510184 IP (tos 0x0, ttl 255, id 53533, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.191.49154 > 255.255.255.255.6666: UDP, length 175
22:06:20.525346 IP (tos 0x0, ttl 255, id 53529, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.156.49154 > 255.255.255.255.6666: UDP, length 175
22:06:20.562477 IP (tos 0x0, ttl 255, id 34173, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.128.49154 > 255.255.255.255.6667: UDP, length 188
22:06:22.073323 IP (tos 0x0, ttl 255, id 47451, offset 0, flags [none], proto UDP (17), length 216)
    192.168.2.118.49154 > 255.255.255.255.6667: UDP, length 188
22:06:22.099811 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.2.179 tell 192.168.2.179, length 28
22:06:22.375276 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.2.109 tell 192.168.2.109, length 28
22:06:22.523823 IP (tos 0x0, ttl 255, id 53532, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.119.49154 > 255.255.255.255.6666: UDP, length 175
22:06:22.771964 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.2.134 tell 192.168.2.134, length 28
22:06:22.905432 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.2.200 tell 192.168.2.200, length 28
22:06:23.043737 IP (tos 0x0, ttl 255, id 53544, offset 0, flags [none], proto UDP (17), length 203)
    192.168.2.192.49154 > 255.255.255.255.6666: UDP, length 175

Could such IoT device traffic be the cause of lag on 2.4Ghz?

ssh last I looked set the tos imm bit for interactive traffic by itself. Thus it will typically land in the VO queue from a linux client.

Yes, multicast is a hidden bane of many wifi networks. It's almost always turned off in large public networks. We've found ways to elide common multicast things like arp, and nd, but mdns and a few other protocols use it heavily. I routinely increase the multicast rate to 12mbit or higher nowadays. The unfortunately limits range, and a righter answer would have been (where it baked into the standard) to probe for the right rate, or use one that was the lowest observed rate achieved for all the stations on the network. In the standard was also the concept of rate limiting multicast so it would only eat, say 20ms of airtime per 100ms, max. (to be clear, in the first case I was talking about encoding and broadcasting multicast at the 12mbit rate, in the second, limiting the number of packets that wlll be sent to under 20ms of airtime per 100ms)

The important detail that has cropped up in this string of tests over the last few weeks is the damage being done by location services. I had no idea, and would like a comparison plot with them on and off.

1 Like

I had a quick look in my local network (captured via wireshark on my mac):
SSH traffic using port 22:
macosx sends: AF21; TOS 0x48 -> TOS immediate bit set
ubuntu22 LTS sends: 4; TOS 0x10 -> TOS immediate bit not set

But, AF21 maps into AC_BK by default WMM mapping rules, while 4 at least stays in AC_BE:

UP DSCP AC PHBs(decDSCP)
Range0 0-7 BE CS0(0)
Range1 8-15 BK CS1(8), AF11(10), AF12(12), AF13(14)
Range2 16-23 BK CS2(16), AF21(18), AF22(20), AF23(22)
Range3 24-31 BE CS3(24), AF31(26), AF32(28), AF33(30)
Range4 32-39 VI CS4(32), AF41(34), AF42(36), AF43(38)
Range5 40-47 VI CS5(40), VA(44), EF(46)
Range6 48-55 VO CS6(48)
Range7 56-63 VO CS7(56)

Recent OpenWrt mapping rules keep both in AC_BE:
UP DSCP AC PHBs(decDSCP)
Ex0 BE BE(0) BE/CS0(0)
Range0 2-16 BE CS1(8)**, AF11(10), AF12(12), AF13(14), CS2(16)
Range1 1-1 BK LE(1)
Range2 -
Range3 18-22 BE AF21(18), AF22(20), AF23(22)
Range4 24-38 VI CS3(24), AF31(26), AF32(28), AF33(30), CS4(32), AF41(34), AF42(36), AF43(38)
Range5 40-40 VI CS5(40)
Range6 44-46 VO VA(44), EF(46)
Range7 48-56 VO CS6(48), CS7(56)

Testing mosh showed CS0 in both directions....

Will need to do a similar test for a VoIP call...

1 Like

mosh used to be ecn enabled on some OSes, over ipv4 at least. It was originally marked AF42 until (ironically) - a machine on MIT's campus was claimed to block that eventually (after many seconds).

1 Like

Yepp, that is what I see ECT(0) in both directions (so between macosx monterey and ubuntu 22 LTS, both x86_64), while SSH is Not-ECT.

The only quick example I can provide today. This is during a ping to my router clicking the WiFi icon which triggers a WiFi scan. I timestamped ICMP requests for your viewing pleasure. :slight_smile:

macOS does periodic WiFi scans, so this is what's expected each time.

1 Like

@dtaht Location services tests, WLAN to WAN.

Location services on:

Location services off:

PS. I need some help compiling a sysupgrade for a RE450v2. How can I reduce the size of it?

3 Likes

Not really the right thread for that. Ripping out unnecessary packages - or putting in all the packages you need so they are properly compressed as part of the image, is my first thought.

Thx for showing what location services "looks like" against the cosmic background bufferbloat radiation. It looks to me as though your wifi is not saturated in either direction, but we do see things get pretty fuzzy while that is going on. Open question is what that looks like on ath10k, 9k...

Otherwise things are working pretty good. It's difficult to recall all the different bugs we've encountered along the way, and it's my hope we get more testers in general soon. Going out and pursuing the bug-filers of so many "wifi is flaky" bugs, perhaps, or just waiting for the next release....

3 Likes

Ref: AQL and the ath10k is *lovely* - #737 by ka2107

@dtaht

5 GHz Band, AP and Client in different room with brick wall between them

Belkin RT3200 Router running "OpenWrt 22.03-SNAPSHOT r19575-506432a783 / LuCI openwrt-22.03 branch git-22.204.42822-9a18337" and Netperf netserver + IRTT server

Flent command: flent tcp_nup --test-parameter=upload_streams=1 -l 60 -H <Router_IP> -v

macOS 12.5 Monterey - All Location Services Disabled

Flent Data File: mac_to_router_5ghz_tcp_1up_Location_disabled.flent

% /System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport -I
     agrCtlRSSI: -69
     agrExtRSSI: 0
    agrCtlNoise: -101
    agrExtNoise: 0
          state: running
        op mode: station
     lastTxRate: 52
        maxRate: 144
lastAssocStatus: 0
    802.11 auth: open
      link auth: wpa2-psk
          BSSID:
           SSID: XXXX_5GHz
            MCS: 3
  guardInterval: 800
            NSS: 2
        channel: 36
% networkQuality -v
==== SUMMARY ====
Upload capacity: 21.584 Mbps
Download capacity: 36.202 Mbps
Upload flows: 16
Download flows: 12
Responsiveness: Low (169 RPM)
Base RTT: 39
Start: 24/Jul/2022, 05:28:23
End: 24/Jul/2022, 05:28:38
OS Version: Version 12.5 (Build 21G72)

macOS 12.5 Monterey - 'Find My Mac' and 'Network & Wireless' Location Services Enabled

Flent Data File: mac_to_router_5ghz_tcp_1up_Location_enabled.flent

% /System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport -I
     agrCtlRSSI: -70
     agrExtRSSI: 0
    agrCtlNoise: -100
    agrExtNoise: 0
          state: running
        op mode: station
     lastTxRate: 52
        maxRate: 144
lastAssocStatus: 0
    802.11 auth: open
      link auth: wpa2-psk
          BSSID:
           SSID: XXXX_5GHz
            MCS: 3
  guardInterval: 800
            NSS: 2
        channel: 36
% networkQuality -v
==== SUMMARY ====
Upload capacity: 16.488 Mbps
Download capacity: 34.957 Mbps
Upload flows: 12
Download flows: 12
Responsiveness: Low (171 RPM)
Base RTT: 39
Start: 24/Jul/2022, 07:01:23
End: 24/Jul/2022, 07:01:34
OS Version: Version 12.5 (Build 21G72)

Ping when upload direction is fully loaded is too high. Is there anything that the AP can do to reduce latency when Client-to-AP stream is loaded?

1 Like

@ka2107 mt7915 driver issue

Can you please elaborate more? Is the issue described anywhere (forum post, bug report in GitHub etc.)?

1 Like

I theorize that perhaps the up/down problem might be related to txpower - and perhaps some of the problems those on this other thread experienced are related to the string of bugs on this thread. They also asked what version of openwrt the fixes landed in, and I've lost track...

One potentially useful thing to try is a long duration flent rrul test, stepping down the power every few seconds, to watch what happens.

1 Like

Still something is still not as it should be. After applying @nbd patches (330-336, 337, 338 and 339) the speed, ping and network responsiveness for many users is good (but still not what it should be in my opinion, why for example with iperf are always higher than in tests from websites such as speedtest.net, despite the ability of the link (wan, wwan ...) to higher speeds - on average by up to 50%, regardless of signal strength, noise, distance, obstacles, numbers of clients ... ? - of course, we are talking about wireless connectivity, because everything is fine with the cable. But whatever... it must have always been a negative feature of every version of openwrt.

Random errors, with which the log is buried for last 5 and 1/2 days, in practice cause a random client to be temporarily disconnected from the station for several tens of seconds; it helps to wait or sometimes restart the connection at the client (Android).

[153461.316027] ath10k_ahb a000000.wifi: Invalid peer id 139 peer stats buffer
[155486.431578] ath10k_ahb a000000.wifi: Invalid peer id 142 peer stats buffer
[232917.816463] ath10k_ahb a000000.wifi: Invalid peer id 272 peer stats buffer
[233326.526645] ath10k_ahb a000000.wifi: Invalid peer id 273 peer stats buffer
[247746.452125] ath10k_ahb a000000.wifi: Invalid peer id 285 peer stats buffer
[386441.138656] ath10k_ahb a000000.wifi: Invalid peer id 359 peer stats buffer
[432035.903251] ath10k_ahb a000000.wifi: Invalid peer id 394 peer stats buffer
1 Like

I am seeing the same issue as well. I see a ton of:

ath10k_pci 0000:01:00.0: received unexpected tx_fetch_ind event: in push mode

and

ath10k_pci 0000:01:00.0: Invalid peer id 61 peer stats buffer

and

ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 351 tid 0

I am seeing clients randomly get disconnected or traffic stopping completely on clients for a few seconds (I am not sure exactly how many seconds). I am testing primarily with macOS clients - macbook pro, m1, 2x2, macOS 12.4

On first and maybe last bug may be helping patch from post

Now I tests bottom patch, and also is without a bugs 1 and 3.

--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4764,29 +4764,31 @@
 					struct ieee80211_txq *txq)
 {
 	struct ath10k *ar = hw->priv;
-	int ret;
-	u8 ac;
+	u8 ac = txq->ac;
+       int ret = 0;
 
 	ath10k_htt_tx_txq_update(hw, txq);
 	if (ar->htt.tx_q_state.mode != HTT_TX_MODE_SWITCH_PUSH)
 		return;
 
-	ac = txq->ac;
 	ieee80211_txq_schedule_start(hw, ac);
 	txq = ieee80211_next_txq(hw, ac);
+	ieee80211_txq_schedule_end(hw, ac);
 	if (!txq)
-		goto out;
+		return;
 
 	while (ath10k_mac_tx_can_push(hw, txq)) {
 		ret = ath10k_mac_tx_push_txq(hw, txq);
 		if (ret < 0)
 			break;
 	}
-	ieee80211_return_txq(hw, txq, false);
-	ath10k_htt_tx_txq_update(hw, txq);
-out:
-	ieee80211_txq_schedule_end(hw, ac);
-}
+	if (ret == -EBUSY) {
+                ieee80211_txq_schedule_start(hw, ac);
+                ieee80211_return_txq(hw, txq, false);
+                ieee80211_txq_schedule_end(hw, ac);
+        }
+        ath10k_htt_tx_txq_update(hw, txq);
+}
 
 /* Must not be called with conf_mutex held as workers can use that also. */
 void ath10k_drain_tx(struct ath10k *ar)

Does this patch migrate ath10k txq scheduling to mac80211 and so helps with airtime fairness? I am catching up ath10k code base.

thanks!

I don't know if it helps or not- practical tests say it is unlikely to spoil more than it already is. The patch is probably from 2015 or 2018, searched somewhere on the internet and I don't think it was ever included in any official branch. Bugs in log are, but is their less.
In kmod-ath10-ct it is seems to be structured even differently.

I am confused as to which patches and firmware you are testing against?

see: Ipq806x NSS build (Netgear R7800 / TP-Link C2600 / Linksys EA8500) - #2329 by vochong

1 Like

I am running openwrt 22.03 tip with all of nbd's atf/aql patches with mainline ath10k