AQL and the ath10k is *lovely*

Yup, that's the only change I did to my R7800 with the latest pull from 21.02. AQL is enabled.

Edit: In addition to my NSS additions that is.

I think I may have stumbled across why the new virtual time-base airtime scheduler is causing problems for ath10k routers. I think the issue may be traced to one method ( ath10k_mac_op_wake_tx_queue) in the ath10k driver that tries to schedule packets to wireless clients.

From what I can understand of this method, it will be called by mac80211 to make a transmit queue (txq) active in the firmware so that it can start transmitting packets in that queue. Instead of activating the txq that mac80211 wants, it looks for another txq and activates it. This method also updates the firmware with accounting information regarding transmit duration of the txq. Now this method updates the accounting information for both the txq that mac80211 sends in and also the new txq that it found.

So net effect is that the firmware may try to send out packets for txq that mac80211 did not intend to or have nothing to send, and artificially starving the txq with data to transmit, due to incorrect accounting. This probably will not affect routers with small number of clients. Will probably be more pronounced with more clients or when time drags on. That may explain why the WiFi interface needed to be restarted every once in a while.

I think the following patch should solve the problem.

--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4601,17 +4601,15 @@ static void ath10k_mac_op_wake_tx_queue(
 {
 	struct ath10k *ar = hw->priv;
 	int ret;
-	u8 ac;
 
-	ath10k_htt_tx_txq_update(hw, txq);
 	if (ar->htt.tx_q_state.mode != HTT_TX_MODE_SWITCH_PUSH)
 		return;
 
-	ac = txq->ac;
-	ieee80211_txq_schedule_start(hw, ac);
-	txq = ieee80211_next_txq(hw, ac);
 	if (!txq)
-		goto out;
+		return;
+
+	if (!ieee80211_txq_may_transmit(hw, txq))
+		return;
 
 	while (ath10k_mac_tx_can_push(hw, txq)) {
 		ret = ath10k_mac_tx_push_txq(hw, txq);
@@ -4620,8 +4618,6 @@ static void ath10k_mac_op_wake_tx_queue(
 	}
 	ieee80211_return_txq(hw, txq, false);
 	ath10k_htt_tx_txq_update(hw, txq);
-out:
-	ieee80211_txq_schedule_end(hw, ac);
 }
 
 /* Must not be called with conf_mutex held as workers can use that also. */

My R7800 is currently not available for test. Is anyone willing to test the patch? @KONG fancy a try as you seem to have a setup that could simulate this problem?

2 Likes

I can give it a test this evening (in ~5 hours or so) on my R7800

1 Like

@quarky I am using Belkin RT3200 which uses mediatek mt76 wifi. I am using OpenWrt 22.03-rc1. Do you know if mt76 is also affected by AQL virtual time-base airtime scheduler issue similar to ath10k?

To all that have Wi-Fi issues. Have you checked the WLAN statistics graphs. Are there any unusual records for signal quality drops, phy rates, associated clients at the time you experienced the issues.

My test with the Linksys E8450 suggest it’s not affected.

2 Likes

Just to follow up on my last post, I did create a build with ath10k and this patch applied, and it built without issues and flashed fine onto my R7800. I was experiencing issues, however, with the build in that maybe 5 or so of my IoT devices were not connecting. I don't know if the root cause of that was the patch, or if it was just switching to ath10k - as I usually just use ath10k-ct. Long story short though is that I don't think I'll be of much use to you in testing this, as I've already switched back to my "-ct" build.

1 Like

No worries. Appreciate you taking the time to help test the patch.

can't we test this on ath10k-ct too?

On ea6350v3, on kmod-ath10k-ct, with 3.6.140 firmware, on snapshot (on 5.10.111 kernel), not work :slight_smile: ... propably.

Looks like the ct driver have the same issue. The patch should apply cleanly to the ct driver as well.

1 Like

Applies without errors on latest master.

Are you trying on ct driver?

Yeah sorry, should have stated that.

El El sáb, 14 may 2022 a las 9:02, quarky via OpenWrt Forum <mail@forum.openwrt.org> escribió:

Your patch should be placed in /package/kernel/mac80211/patches/ath10k for -ct testing also?

No. It should be placed in the ath10k-ct package, not in mac80211 package.

Edit: the patch file will have to be modified (location of file to patch) tho to follow the ath10k-ct file structure.

As I also have the problem, I test your patch on a r7800 with normal ath10k driver (the ct driver does not work for me without bugs).

Before the patch, I have to disable the airtime by setting the airtime_flags to 0, otherwise I have low throughput and latency spikes after some time.

echo 0 > /sys/kernel/debug/ieee80211/phy[01]/airt
ime_flags

I will see if it helps.

2 Likes

It seems it does not work. After some time I have latency spikes. Also if I leave the house with my phone the wifi in the house stops working for some time.

How can I debug it. I can send logs to a syslog local syslog server for further debugging.

Thanks for taking the time to test. Looks like it's back to the drawing board.

I'm quite certain the patch is still OK tho. as the original ath10k code doesn't make sense.

I'll try to free up my R7800 and do more experiment with it. I'm very much interested to find out the root cause.

1 Like

@quarky Not sure if you are aware of the bufferbloat.net make-wifi-fast mailing list, but this was brought up by @dtaht over there.

https://lists.bufferbloat.net/pipermail/make-wifi-fast/2022-May/003358.html

Disclaimer: I am just a reader, not a developer. I have no clue about how to fix this issue.