AQL and the ath10k is *lovely*

It seems it does not work. After some time I have latency spikes. Also if I leave the house with my phone the wifi in the house stops working for some time.

How can I debug it. I can send logs to a syslog local syslog server for further debugging.

Thanks for taking the time to test. Looks like it's back to the drawing board.

I'm quite certain the patch is still OK tho. as the original ath10k code doesn't make sense.

I'll try to free up my R7800 and do more experiment with it. I'm very much interested to find out the root cause.

1 Like

@quarky Not sure if you are aware of the bufferbloat.net make-wifi-fast mailing list, but this was brought up by @dtaht over there.

https://lists.bufferbloat.net/pipermail/make-wifi-fast/2022-May/003358.html

Disclaimer: I am just a reader, not a developer. I have no clue about how to fix this issue.

Thank you for the info. Related question: Do you know if AQL is enabled/active on Belkin RT3200 5 GHz band running in 802.11ax mode (HE20, HE40, HE80) and connected to WiFI 6 Client/Station? The only info that I found talks about ath11k APs (stock version) not containing any AQL fixes. No info about mt76 802.11ax APs.

1 Like

From what I understand of the source code, AQL is managed in mac80211, so it will be enabled in all routers using mac80211, which the Belkin RT3200 is also using. So my understanding is that yes, RT3200 is also running with AQL, regardless of whether it is G/N/AC/AX or the channel width.

1 Like

The reply in that link suggest that the push/pull mode will be broken, but it doesn't seem to be the case from what @celle1234 has tested? It looks like the behaviour is unchanged with or without the patch?

I noticed another difference between the old and new airtime scheduler tho. In the new scheduler, the ieee80211_return_txq() method will remove a txq from it's scheduled queues if the txq becomes empty, while in the old method, it just leaves it in the queued data structure. I do not see anywhere in mac80211 that the removed queues will be put back into the rb_tree, unless the client goes into power save mode and back. Probably because I'm not familiar enough with mac80211's flows that I've missed it.

I would imagine that over time the ath10k driver may have buffered frames to transmit, while mac80211 does not have the corresponding txq scheduled for transmission in it's rb_tree?

Maybe the simple solution is to just force it back into the rb_tree by setting the force flag to true for ath10k when the method is called?

Is anyone familiar enough with mac80211 to comment on this difference in behaviour?

1 Like

Not sure if this commit applies: https://github.com/openwrt/openwrt/commit/96012227e578a0d8dcfa86823db97345e98e2c8f

4 Likes

I'll try to back port this to the 21.02 tree and see if it resolves the issue I'm facing for my R7800. Hopefully it is able to resolve the high latency spike I see for my R7800.

2 Likes

For me the patch make things worse. Nearly no throughput < 1mbit/s and unstable.

1 Like

My R7800 running the patch on the virtual time-based airtime scheduler seems to be working fine tho. YouTube and mobile games seems to be OK. It's still earlier days yet tho, as my R7800 is up only 15 hours. So far, no complains ... yet.

Btw, my R7800 is running on a 21.02 branch that I forked for my ipq806x with NSS acceleration, and that it is running an older version of the ath10k firmware. The patch seems to apply OK to the 5.10-mac80211 backports, but I needed to download the linux/minmax.h include file from the 5.10 tree, and to adjust the patch lines.

Are you running yours on the master branch? Master uses a new ath10k firmware. Wonder if firmware makes any difference.

1 Like

Hello, because the QOS seems not to work with pppoe with nss, I use the mainline openwrt, with the 22.03 branch. There I downloaded the 330-.. and 331-.. patchfile from master for mac80211 subsys and build it. For <1s the speed seems to be OK, but then the speed is terrible. Websites load very slow and youtube videos are not working. Also disabling aql_enable or airtime_flags does not help. No I am back on 22.03 with airtime_flags set back to 0.

This is interesting then.

22.03 uses the same mac802111 backports (5.15.33) version as master, but uses an older version of the ath10k firmware.

21.02 uses an older version of mac80211 backports (5.10.110), and I think uses the same version of the ath10k firmware as 22.03.

Wonder if the issue is with code version or environment, or a combination of both.

With latest @ACwifidude NSS master build and mac80211 patch I have the usual good Wi-Fi performance (signal and speeds both 2.4 and 5GHz). Hope that it resolves latency issues that some people have.

Edit - after two days of use the Wi-Fi 5GHz completely died and I had to revert to the older master version.

2 Likes

To clarify: “usual WiFi speed”; as in “good performance” or “bad performance”?

1 Like

Sorry for the short wording. I've meant I see the usual very good Wi-Fi performance with Ath-10k drivers that I had so far too.
I add that I've never experienced any latency issue (that other people have) prior to latest mac80211 patch.

Edit - the above is no longer valid with latest patches so just ignore it.

1 Like

Are we talking CT firmware or mainline firmware? Does the driver also matter, CT vs mainline?

Trying to build a table of WTF is working and what isn't, is on my mind.

Nailing this set of bugs to the wall and closing it out is proving to be almost as difficult as this one was:

And I DON'T want to spend (our) summer vacation on it.

3 Likes

On master snapshot (on kernel 5.15.38) without without a patches there was a transfer of about 18-20Mbps in 10 meters in a rather crowded environment (two floors, two ceilings, partition wall ...), ping about 30 +/- 5ms. For comparison, I have about 35Mbps on the cable, and with the device itself, the wifi is not much better than at 10 meters in the smartphone. After applying the fix, the speed is similar (maybe a bit higher because sometimes it reaches 24Mbps) but the ping is about 5ms better on average. Except that I applied yet another fix- https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.15.y&id=4bcc2ab96fce497ab9c7863dd051bdcf0a5018bf. It seems better than before the corrections, but it takes time for it, and on the LTE connection it is reading tea leaves :sweat_smile:.

Firmware ath10k ver 10.4-3.6-00140 and kmod-10k-ct. Tests in 2.4GHz freq. Linksys EA6350v3 (ipq4019).

Bad news. The patch didn't resolve the high latency issue I'm facing with the new VTBA scheduler. I'm reverting my build back to the round-robin scheduler.

OK... After a cursory test, it appears that the fixes do not fix either bandwidth or access times in any way. After applying the patch, strange messages appeared in the log (i.e.10k_ahb a000000.wifi: received unexpected tx_fetch_ind event: in push mode and ath10k_ahb a000000.wifi: DFS region 0x0 not supported, will trigger radar for every pulse) for both 2.4GHz and 5GHz - I tried with various firmware but it did not help. After the appearance of the messages, the link was degraded and, as a consequence, the lack of normal access after a few hours (even 1/10 of the initial transfers- after restart device- which disqualified the normal use, ping was fairly normal).

After restoring the previous state without corrections, no errors, normal transfers and normal ping (as for the Internet after LTE :grin:). Also, what I wrote earlier about the corrections could be premature - although it may work on some devices. On the ea6350v3 unfortunately it does not work.

1 Like