AQL and the ath10k is *lovely*

I do not know the exact details, but as far as I know it should. Probably best to wait for a confirmation from someone that does know the details, though :slight_smile:

From what I can understand of the ATF code, the flags control the use of TX and RX airtime accounting. If set, the TX/RX time will be added to the ATF airtime consumption of a client transmit queue, which will then be used to determine if the client's transmit queue has used up it's share of transmit time. So clearing of the flag means the transmit queue will always be eligible for transmit, as the code will think that no airtime has been consumed.

But because the new VTAF queue data structure is organised in a red-black tree structure instead of a circular queue buffer (i.e. round-robin), I think the behaviour will be different. Only way is to test it out I suppose.

2 Likes

Already did above and in another thread (my posts can be a bit verbose).

EDIT: this command

cat /sys/kernel/debug/ieee80211/phy0/ath10k/wmi_services | grep -E '(AIRTIME|PEER_STATS)'

is useful if you start digging into different devices/drivers.

If anyone is interested, skim through this thread starting at Dec 21. There are some details there about how ATF is different between devices (and drivers), how to disable it if you compile your own driver (but nbd already showed you how to do that above - see also this post in this thread one to two years ago...), how to get ATF on the r7500v2 with the ath10k-ct and how to enable/disable ATF via the ath10k-ct fwcfg api (but you will have to adapt that patch for the r7800 and compile your own driver). This thread is not a guide - just me trying to figure out why wifi sucks.

FWIW AQL/ATF can make wifi better - but it is a bit bugged atm.

As mentioned above /sys/kernel/debug/ieee80211/phy0/airtime_flags apparently does not disable it revert it to the round robin scheduler, nor does playing with ATF settings in hostapd (currently). I hope this changes in the future as well.

1 Like

From yesterday's tests on master 5.15.45 and added last patches @nbd user I can say that it is better - it does not throw errors in the log above 5 active clients and thus does not block traffic. However, the downside I've noticed for a few days is a mediocre transfer at long distances (in my case a heavily supressed room, 10 meters from the device).

There is another way to increase or decrease the range - the board.bin files used for calibration along with the router's firmware.bin. These board.bin files can also (depending on their settings) degrade or improve not only the transfer rate, but also the access times and the maximum range. Unfortunately I haven't found a way to edit those files, but only a dozen or so already made, which differ in the above properties.

As for the tests from the first paragraph - firmware and kmod without -ct, board.bin file the newest standard. Transfers from 1-2 meters 38-42Mbps, ping 23-27, probably the best so far as I have an ea6350v3. With a cable the ping is 14-17 and transfers up to 45. With 10 meters it is already worse because the ping is from 25 to 32 (but this is understandable) but the transfers hit 5-8Mbps. The range has also decreased dramatically.

However, the tests are the same as above, but after replacing the board.bin file with the one previously used (prepared, found in web) it improves the maximum range by up to 100%, and thus the speed at long distances, but it worsens by up to 50% at short distances. On the stable previous release (21.02) in spite of using the same board.bin file the results from far away were comparable with those from near - i.e. about 20-25Mbps, pings were already different. I believe that AQL is not solely, or even mostly, to blame for the poor wifi results.

3 Likes

r7500v2 (QCA9980),
ath10k-ct with htt firmware,
master at commit 24e27bec9 (jun 21),
plus your commit 8c042341e4b (all patches)

Builds, loads, and runs fine so far. I've only tested a few minutes but I'll let this one run unless there is another change to test later.

I cannot reproduce @sjpacket's throughput reduction observations upon devices disconnecting or reconnecting either with or without your latest commit. Throughput on a device doing netperf/iperf will change drastically when other clients transmit data; however, throughput always recovers for me if the data transfer from other clients is transient.

I also do not have the latency symptoms others report with QCA9984 either with or without your latest commit - not surprising given my device does not support ATF and the VTBS (without modification at least).

HTH

1 Like

Not sure if this can help to get you started (but I suspect you might have already seen that post). What to change in those files is likely the real challenge. Good luck.

2 Likes

As indicated in my most recent response to nbd above, I cannot reproduce your observations either with or without nbd's latest commit.

One thing to look out for is that throughput will change if other clients transmit data. If possible make sure the other devices are not checking for updates or otherwise communicating upon connecting when observing *perf output. Throughput should recover if the drop in *perf is due to other clients (transiently) using the network.

Otherwise, welcome to the club of "I have a rare device and it behaves differently than almost everyone else."

Have fun - I do.

EDIT, if you are testing with kernel 5.15, try 5.10. Last time I tried kernel 5.15, I had wifi issues.

1 Like

For now I have disabled AQL, ATF and restart my WiFi radios at 5am, so maybe that fixes the WiFi issues.

Disabled AQL and ATF:
(/etc/rc.local or LuCI → System → Startup → Local Startup tab)

# Put your custom commands here that should be executed once
# the system init finished. By default this file does nothing.

# Disable Airtime Queue Limits (AQL) (default value was 1)
echo 0 > /sys/kernel/debug/ieee80211/phy0/aql_enable
echo 0 > /sys/kernel/debug/ieee80211/phy1/aql_enable

# Disable Airtime Fairness (ATF) (default value was 3)
echo 0 > /sys/kernel/debug/ieee80211/phy0/airtime_flags
echo 0 > /sys/kernel/debug/ieee80211/phy1/airtime_flags

exit 0

Restart wireless radios:
(/etc/crontabs/root or LuCI → System → Scheduled Task)

# Restart wireless radios at 5am every day
0 5 * * * wifi down && wifi up

Reboot the router and then check:

cat /sys/kernel/debug/ieee80211/phy0/aql_enable
cat /sys/kernel/debug/ieee80211/phy1/aql_enable
cat /sys/kernel/debug/ieee80211/phy0/airtime_flags
cat /sys/kernel/debug/ieee80211/phy1/airtime_flags

Output:

root@OpenWrt:~# cat /sys/kernel/debug/ieee80211/phy0/aql_enable
0
root@OpenWrt:~# cat /sys/kernel/debug/ieee80211/phy1/aql_enable
0
root@OpenWrt:~# cat /sys/kernel/debug/ieee80211/phy0/airtime_flags
root@OpenWrt:~# cat /sys/kernel/debug/ieee80211/phy1/airtime_flags

i don't have much issue with throughput either (using r7800), especially since my switch from ath10k-ct to ath10k.

One thing to keep in mind is that WIFI is always a half-duplex communication technology (even with MU-MIMO support in 11ac Wave-2 and 11ax), so the data transmission (especially if their direction is opposite to the direction of the data transmission on your measuring client) from other clients will negatively affect each other's throughput as well. It also makes sense not to try to measure bidirectional throughput with WIFI. Wireless technologies like 4G/5G (FDM variety) can handle full-duplex transmissions because uploading and downloading can transmit concurrently on two different frequency bands.

@72105

Please don't restart your WIFI everyday. Keep it running without restarting because some problems may only arise after several days. We also don't want perfection in anything because perfection does not exist (in the general sense of Heisenberg’s uncertainty principle), we just want things to work reasonably well and reliably for extended periods of time :slight_smile:

2 Likes

@quarky:

From what I can understand of the ATF code, the flags control the use of TX and RX airtime accounting. If set, the TX/RX time will be added to the ATF airtime consumption of a client transmit queue, which will then be used to determine if the client's transmit queue has used up it's share of transmit time. So clearing of the flag means the transmit queue will always be eligible for transmit, as the code will think that no airtime has been consumed.

This looks like a Quick-and-Dirty way to eliminate that high latency problem. All-you-can-eat buffet à l'américaine :slight_smile:

Well, an all you can eat buffet will be a cause of high latency for latency sensitive traffic like VoIP or online gaming. For example, downloading a large software update file will drown out VoIP traffic. ATF is suppose to solve this problem, but we'll have to work together to iron out the bugs.

1 Like

@quarky, It was more of a joke, but theoretically a possible Q&D workaround for my problem.

My main problem is extremely high latency for associated clients when some clients exit the house. My guess is that some corner logic may have depleted the associated clients of their transmit queues and rendered them unable to send/receive anything for a few minutes (aka high latency). Your original explanation mentioned that disabling the ATF flags would make the client transmit queues be always available for transmit, so theoretically it would remedy my high latency problem in a Q&D way :-). As a matter of fact, so far it seems to help that way after disabling the ATF flags (I kept AQL enabled).

Ah ... apologies. Didn't get the joke.

I think this is a good work-around, since AQL will limit the airtime as well.

Unfortunately, errors still pop up (we'll see if the network becomes useless)...

[50208.914881] ath10k_ahb a000000.wifi: received unexpected tx_fetch_ind event: in push mode
[50208.921160] ath10k_ahb a000000.wifi: received unexpected tx_fetch_ind event: in push mode
[50208.929182] ath10k_ahb a000000.wifi: received unexpected tx_fetch_ind event: in push mode
[50208.937336] ath10k_ahb a000000.wifi: received unexpected tx_fetch_ind event: in push mode

I'm joining in the fun and have applied @nbd's experimental patch to both 21.02 and master branches.

My R7800 is running the 21.02 build while my Linksys E8450 is running the master build. Both have the latest pull from the respective branches.

I must say so far things are looking good after a couple of hours of uptime. No latency issue that I can observe, so I'm optimistic.

Let's see how it goes this round.

I'm a little encouraged. I would like more folk to post the output of the
/sys/kernel/debug/iee*/phy*/netdev:*/aqm

I have fears that this codepath is not being updated with drops and marks correctly.

2 Likes

I applyed the patches to 22.03 and until now it looks good, but I need more longterm testing.

1 Like

Please...

/$ cat /sys/kernel/debug/ieee80211/phy0/netdev:wlan0/aqm
ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets
2 0 0 13507 0 0 0 1 2174129 13533

and 10 minutes after :wink:

/$ cat /sys/kernel/debug/ieee80211/phy0/netdev:wlan0/aqm
ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets
2 0 0 17994 0 0 0 1 2848428 18123
/$ cat /sys/kernel/debug/ieee80211/phy0/netdev:wlan0/aqm
ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets
2 0 0 17998 0 0 0 1 2849344 18127

master on 5.15.45, + patches, 2,4GHz (phy0), 5GHz disabled

Are bugs, but wifi is usable yet (i.e. 10h after visible bugs in log)... is now 5 wifi user (2.4GHz)- 1 two stream and 4 one stream. Wireless is good in near (1-5 meters, on the this same floor, ping 2), and not good in far (10 meters)), but work stable.

1 Like
r7500v2 # cat /sys/kernel/debug/iee*/phy*/netdev:*/aqm
ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets
2 0 0 86768 0 0 0 0 21823552 86771
ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets
2 0 0 86690 0 0 0 0 21806307 86700
r7500v2 # uptime
 16:20:49 up 1 day,  8:33,  load average: 0.00, 0.01, 0.00

Build described above here.

EDIT: if it helps, a few minutes later

r7500v2 # cat /sys/kernel/debug/iee*/phy*/netdev:*/aqm
ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets
2 0 0 87688 0 0 0 0 22095101 87691
ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets
2 0 0 87610 0 0 0 0 22077856 87620
r7500v2 # uptime
 16:34:44 up 1 day,  8:47,  load average: 0.08, 0.03, 0.00