R7500v2 (ath10k 9980) netperf observations, firmware/board-2.bin files, and other wifi issues

anon98444528 · February 26, 2022, 2:35pm

enable ATF for ath10k-ct via fwcfg API git patch here.

usage (after building and using the ath10k-ct driver patched above):

r7500v2 # cat /sys/kernel/debug/ieee80211/*/ath10k/firmware_info | grep fwcfg:
fwcfg:     fwcfg-pci-0000:01:00.0.txt
fwcfg:     fwcfg-pci-0001:01:00.0.txt

and note the filenames after fwcfg: (they may be different than my case above)

To enable ATF for both radios (you can do either individually if desired):

echo "enable_atf = 1" >/lib/firmware/ath10k/fwcfg-pci-0000:01:00.0.txt
echo "enable_atf = 1" >/lib/firmware/ath10k/fwcfg-pci-0001:01:00.0.txt

This patch only enables ATF. Given quarky's experience, it might be useful to extend the patch above to disable ATF for devices that automatically detect and enable ATF. This should be straightforward by adding an additional "disable_atf" fwcfg entry.

EDIT tested and currently running on my r7500v2.

anon98444528 · February 26, 2022, 2:44pm

I have to confess, it is not clear to me why you need:

--- a/ath10k-5.15/txrx.c
+++ b/ath10k-5.15/txrx.c
@@ -5,6 +5,8 @@
  * Copyright (c) 2018, The Linux Foundation. All rights reserved.
  */
 
+#include <net/mac80211.h>
+
 #include "core.h"
 #include "txrx.h"
 #include "htt.h"
@@ -168,6 +170,8 @@ int ath10k_txrx_tx_unref(struct ath10k_h
 	struct sk_buff *msdu;
 	u8 flags;
 	bool tx_failed = false;
+	u32 duration = 0;
+	int len = 0;
 
 	ath10k_dbg(ar, ATH10K_DBG_HTT,
 		   "htt tx completion msdu_id %u status %d\n",
@@ -286,6 +290,14 @@ int ath10k_txrx_tx_unref(struct ath10k_h
 		ar->ok_tx_rate_status = true;
 		ath10k_set_tx_rate_status(ar, &info->status.rates[0], tx_done);
 
+		len = msdu->len;
+		duration = ieee80211_calc_tx_airtime(htt->ar->hw, info, len);
+		rcu_read_lock();
+		if (txq && txq->sta && duration)
+			ieee80211_sta_register_airtime(txq->sta, txq->tid,
+						       duration, 0);
+		rcu_read_unlock();
+

I assume you are extending something based off ath9k. I guess I'll need to look at the mac80211 ATF implementation to understand why this is needed for the r7500v2 when ATF already works for the r7800 (i.e. i don't think the mac80211 logic tries to communicated with the firmware).

If your willing, I appreciate if you could explain your reasoning.

I'll see what I can do about measuring cpu usage on the AP during an AP->client netperf (plus irtt) - I know it is possible.

Thanks again for sticking with this.

castiel652 · February 26, 2022, 3:22pm

Because QCA9980 doesn't support peer stats and ct firmware doesn't report airtime.
And yes it's based on older chips that also calculate airtime in driver.

I am not sure about QCA9984 tho.

anon98444528 · February 26, 2022, 9:58pm

Ok, this might be redundant as there is already:

        rcu_read_lock();
        if (txq && txq->sta && skb_cb->airtime_est)
                ieee80211_sta_register_airtime(txq->sta, txq->tid,
                                               skb_cb->airtime_est, 0);
        rcu_read_unlock();

in
./build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/ath10k-ct-regular/ath10k-ct-2021-11-28-dc350bbf/ath10k-5.15/txrx.c (starting at line 217)

This obviously depends skb_cb->airtime_est which appears to be set in ath10k-5.15\mac.c (via calls to ath10k_mac_update_airtime at lines 4693, 5192, & 5521). I'll see if i can't check if there is a value for airtime_est (on the r7500v2) with a debug line. I suspect this is where the r7800 reports it's airtime (i.e. greearb's comment about removing airtime calculation from the ct firmware ~~might also~~ does apply to the r7800 or any other ath10k device (see EDIT0 note below); hence, the ath10k-ct driver will report airtime regardless of device and, as long as NL80211_EXT_FEATURE_AIRTIME_FAIRNESS is set, the mac80211 code will take advantage of whatever the driver reports).

It might be a week or two before I can test, so please be patient.

EDIT0 Based on the reports above from r7800 users above, for the ath10k firmware (non-ct) WMI_SERVICE_REPORT_AIRTIME is enabled (see here). I think meaning the non ct firmware supplies the airtime.

For the ct firmware, WMI_SERVICE_REPORT_AIRTIME is not enabled (see here). I think meaning the ct firmware does not supply the airtime estimate. In this case, the ath10k-ct driver code in ath10k_mac_update_airtime calculates/estimates airtime via

        spin_lock_bh(&ar->data_lock);
        arsta = (struct ath10k_sta *)txq->sta->drv_priv;

        pktlen = skb->len + 38; /* Assume MAC header 30, SNAP 8 for most case */
        if (arsta->last_tx_bitrate) {
                /* airtime in us, last_tx_bitrate in 100kbps */
                airtime = (pktlen * 8 * (1000 / 100))
                                / arsta->last_tx_bitrate;
                /* overhead for media access time and IFS */
                airtime += IEEE80211_ATF_OVERHEAD_IFS;
        } else {
                /* This is mostly for throttle excessive BC/MC frames, and the
                 * airtime/rate doesn't need be exact. Airtime of BC/MC frames
                 * in 2G get some discount, which helps prevent very low rate
                 * frames from being blocked for too long.
                 */
                airtime = (pktlen * 8 * (1000 / 100)) / 60; /* 6M */
                airtime += IEEE80211_ATF_OVERHEAD;
        }
        spin_unlock_bh(&ar->data_lock);

I did take a look at ieee80211_calc_tx_airtime but the airtime calculation is difficult to follow compared that in ath10k_mac_update_airtime. As far as I can tell without a test, they will likely be different but I'll hazard a guess not different enough to make a difference in how ATF functions (still I think it will be interesting to compare them).

Note the ath10k-ct driver is supposed to be compatible with the non ct firmware. So for the ath10k-ct driver/non ct firmware combination, the ath10k-ct driver defaults to the firmware provided airtime and not the value calculated in ath10k_mac_update_airtime (corrected based on castiel652's next post below).

castiel652 · February 27, 2022, 5:24am

Based on the code it wouldn't have correct value on R7500v2 without WMI_SERVICE_PEER_STATS

Correct

requires WMI_SERVICE_PEER_STATS

They are different but both should work just fine.

for non ct firmware it won't do calculation in ath10k_mac_update_airtime
it will just return 0 because of the check below

if (test_bit(WMI_SERVICE_REPORT_AIRTIME, ar->wmi.svc_map))
		return airtime;

Instead it will report airtime from firmware in htt_rx.c

anon98444528 · February 27, 2022, 1:57pm

I'm trying to follow the code for how last_tx_bitrate gets calculated. I'm still hopeful that it is currently estimated correctly in the ath10k-ct driver even without WMI_SERVICE_PEER_STATS set; however, you might be right.

In other words, I understand that QCA9980 does not get it's peer stats from the firmware, but I interpreted greearb's comment as the relevant peer stats are still available as they are calculated in the driver as opposed to pulled from the firmware. Perhaps I am misinterpreting the statement.

Even if it turns out to be valid, I'd be interested in trying both airtime calculations.

Thank you for pointing out the correct location.

anon98444528 · February 28, 2022, 12:51pm

I may have a chance to test today. To make it easier for me to test and keep the AP in use, I'll be using your patch with a few modifications.

I'll be using my fwcfg mechanism to set NL80211_EXT_FEATURE_AIRTIME_FAIRNESS
I'd like to add a second "test_aql" fwcfg entry to turn on and off your aql modification - this is mostly a convenience mod so I don't have to replace the ath10k driver to test with and without the change: however it will add an "if statement" before your aql code which might degrade its CPU benifit. If you think this matters please let me know and try a version without an "if statement"
If airtime_est is inaccurate, I'm wondering if I shouldn't comment out:

	rcu_read_lock();
	if (txq && txq->sta && skb_cb->airtime_est)
		ieee80211_sta_register_airtime(txq->sta, txq->tid,
					       skb_cb->airtime_est, 0);
	rcu_read_unlock();

starting at line 217 in txrx.c.

EDIT (after testing a version leaving the code in point 3. above untouched)

Blockquote

+ * The driver can either call this function synchronously for every packet or
+ * aggregate, or asynchronously as airtime usage information becomes >available.
+ * TX and RX airtime can be reported together, or separately by setting one of
+ * them to 0

I think this means I should try if (!ar->request_enable_atf) { around the other calls to ieee80211_sta_register_airtime in the ath10k driver.
REF

anon98444528 · March 1, 2022, 10:39pm

@castiel652

The modified patch I'm using is here. It builds, loads, and so far runs fine.

I did have an issue yesterday in which the build I tried lost all wireless and wired connectivity - I think this is the result of user error probably related to a .config configuration error and not at all related to your patch.

I'm currently running with

r7500v2 # cat /lib/firmware/ath10k/fwcfg-pci-000*
enable_atf = 1
test_aql = 1
enable_atf = 1
test_aql = 1

and so far it's fine.

netperf -t tcp_maerts -l 60 -D 1s -H <server ip>

using two clients looks good (just by causal visual inspection I'd say better than the netperf tcp_maerts test without your patch but with ATF "enabled") - but my network is not quiet so I'll be trying some tests over the next several days.

I also tried mpstat -P ALL 2 30 on the r7500v2 during tests with and without test_aql = 1; however, casual visual inspection is not reliable here - too much noise but probably close.

Thank you again for posting the patches - I think your on to something.

anon98444528 · March 2, 2022, 10:17pm

@castiel652

I did some additional testing today. In brief, it seems to do no harm; however, you may want someone with greater knowledge and ability than me for testing and future development (at least in the short term).

I'm pretty sure I will not be able to show any cpu usage difference with using ieee80211_tx_status_ext - there is just too much "noise" (by visual inspection using mpstat during netperf out to 2 to 3 clients). I observed both the best cpu usage (~70 idle) and the worst cpu usage (~20% idle) when trying the _ext variant under similar conditions. Obviously something else confounded the test. In a future adaptation of your patch, I'll dispense with the test_aql fwcfg feature and just use what you propose. That said, I will try the flent-tools package to measure the AP cpu usage for testing with flent and try a little harder to eliminate external factors.

I have less prior history using netperf to stream out from the AP out to clients than streaming client to AP (the netperf server is not on the AP). During the "casual" testing today, I saw some odd behavior. For a three client test, I set the ATF weight of a "fast" client to 10 and the weights of the two slower clients (but both about the same phy rate) to 512. The netperf of the fast client (weighted to 10) poped up to 180 mbps for a 10-15 s duration of 60 second netperf when it otherwise behaved as expected and stayed about ~30 mbps.

I also had a hard time distinguishing between tests with "enable_atf = 1" versus not set in "2 clients" netperf tests. Tests I did with irtt all looked pretty good (all less than 80 rtt); however, this was the case regardless of how I set "enable_atf". Both these (casual visual) observations are inconsistent with my past recollection when testing streaming AP to clients.

The short of it is, I'll need to spend some time working out how to demonstrate a difference with and without your modification. That means flent and likely a significant amount of time to work out how to use flent properly. I also want to understand the function of the driver/ATF algo better so that I can convince myself what's there is behaving as intended - again a significant amount of time.

I will probably do both flent and educate myself on the driver/mac80211 function related to ATF/AQL, just not quickly.

If you have ideas about testing or additional changes that might help you develop this, please do let me know.

castiel652 · March 6, 2022, 3:53am

Sorry I've been busy with military service lately.

Try look at it the other way. Less CPU load on one task so that means throughput would be higher under the same CPU load.

Unfortunately I am out of idea as for now

anon98444528 · March 6, 2022, 9:40pm

No worries, it may be slow going for me as well so feel free to post as your availability and interest dictate.

I'll keep looking at cpu.

FWIW some of the "odd" behavior I mentioned above came from a mac client (one of the fastest available to me). After a little more playing around, I'm starting to think this may be os or client specific so I'm going to limit testing with this device. I may have to use it for AP cpu testing just on its own (to avoid ssh).

anon98444528 · March 10, 2022, 3:38pm

Quick update:

As the AP is in use, I continue to be limited by testing opportunities and lack of confidence that my "casual" test method gives useful results. I'm still working on using flent.

I am currently trying the following variant (here) of @castiel652's patch. It works but I don't (yet) observe a significant change with this variant compared to the prior version.

anon98444528 · March 11, 2022, 9:30pm

I won't be able to do this for a few days but here is a plan to measure AP cpu usage both using ieee80211_tx_status_ext and the mac80211 airtime calculation and not using them.

1 client, 5GHz channel 36 (little to no wifi interference or network activity at the time i plan to test), direct line of sight to AP less than 5 m away (let me know if you think i should test with the client in a more challenging spot - like 15-20 m away and no direct line of sight).

EDIT: Based on your knowledge of ieee80211_tx_status_ext, is 1 client the right choice or do I need at least 2 for this function call to have any impact on cpu usage?

Using my adaptation of your patch, on the client run:

flent tcp_download -l 60 -H <netserver_host> --test-parameter=cpu_stats_hosts=root@r7500v2

10 times with "alternate_atf = 1" and 10 times with out "alternate_atf" in the fwcfg file. I'll do consecutive runs for each fwcfg setting but it'd probably be better to randomize changing the fwcfg setting over the 20 observations.

The un-gziped flent output is just a json file and easy to load into python. After doing 25 tcp_download flent runs when my network was not necessarily "quiet," here is a preliminary analysis (the python script use to generate this output is below):

cpu ave w/atf: 0.414268
cpu stdev w/atf: 0.039519
tcp ave w/atf: 210.894066
tcp stdev w/atf: 10.390132
cpu ave wo/atf: 0.415269
cpu stdev wo/atf: 0.044348
tcp ave wo/atf: 210.655311
tcp stdev wo/atf: 16.266658

len(tests)
Out[2]: 22

ttest_cpu
Out[3]: Ttest_indResult(statistic=array([0.90528009]), pvalue=array([0.36535477]))

ttest_cpu_sgAve
Out[4]: Ttest_indResult(statistic=array([0.24003843]), pvalue=array([0.8127436]))

ttest_tcp
Out[5]: Ttest_indResult(statistic=array([-0.69607884]), pvalue=array([0.486405]))

ttest_tcp_sgAve
Out[6]: Ttest_indResult(statistic=array([-0.13789265]), pvalue=array([0.89170411]))

df
Out[7]: 
    level         tcp       cpu
0    watf  209.404479  0.408377
1    watf  213.290590  0.405416
2    watf  214.246389  0.411845
3    watf  206.591146  0.408260
4    watf  207.402708  0.419394
5    watf  212.714722  0.422297
6    watf  207.910208  0.416022
7    watf  213.656979  0.432385
8    watf  214.281319  0.410290
9    watf  214.478993  0.407563
10   watf  205.857187  0.414953
11  woatf  209.192917  0.410254
12  woatf  214.141111  0.423969
13  woatf  213.274653  0.405573
14  woatf  201.422049  0.404262
15  woatf  213.191354  0.433456
16  woatf  214.195208  0.408574
17  woatf  214.421806  0.433945
18  woatf  214.508021  0.414500
19  woatf  202.843275  0.404710
20  woatf  210.656042  0.404367
21  woatf  209.334861  0.424435

cpu_ancova
Out[8]: 
     Source        SS  DF         F     p-unc       np2
0     level  0.000008   1  0.088611  0.769181  0.004642
1       tcp  0.000233   1  2.512106  0.129478  0.116776
2  Residual  0.001761  19       NaN       NaN       NaN

The average cpu results vary with average tcp download. I've introduced the requirement that the tcp download be greater than a (client specific) value based on past experience to eliminate what I consider to be outliers. After introducing this mean tcp download threshold filter, I rejected 3 of 25 observations.

I have not forgotten your suggestion above about looking at cpu load in conjunction with tcp download. After a bit of reading, I'd like to try an an analysis of covariance (ANCOVA) to analyze the data. Average cpu is the response variable, tcp download is the covariate, and both are measured at two levels (with and without setting alternate_atf in the fwcfg file). I've done a fair amount of hypothesis testing on means and some DOE's, but never an ANCOVA so this will be fun for me.

The output above indicate that one could not conclude the cpu load is different at the 0.05 significance level (i.e. all the "p" values are greater than 0.05). I have not yet done diagnostics on the data to validate the assumptions for performing the t-tests or the ancova. Based on prior experience, I expect the diagnostics for the t-tests are ok. I'm less certain about the ancova analysis as this is the first time I've done one and I'm using an unfamiliar python package for the calculation.

I'm sure the stats could be more rigorous (i.e. some experimentation around choosing rational "subgroup size," randomization, etc) but this is a far as I'm going to go for now.

The python analysis script used to generate the output above:

import glob
import gzip
import json
import numpy as np
from scipy.stats import ttest_ind
import pandas as pd
from pingouin import ancova

flentFiles = glob.glob("*.flent.gz")
tests = []
for file in flentFiles:
    with gzip.open(file,'r') as gzf:
        tests.append(json.loads(gzf.read().decode('ascii')))

cpu_watf = []
cpu_woatf = []
tcp_watf = []
tcp_woatf = []
cpu_watf_sgAve = []
cpu_woatf_sgAve = []
tcp_watf_sgAve = []
tcp_woatf_sgAve = []
offset = 1 # seconds, time duration to ignor cpu_stats data around the
           # netperf start and finsih times
tcp_threshold = 200 # mbps, threshold value for mean tcp download value below
                    # which the test is rejected for inclusion for analysis
for test in tests:
    tcp_mean = np.array([val for val in
                         test['results']['TCP download']
                         if val is not None]).mean()
    if tcp_mean > tcp_threshold:
        test['metadata']['netperf_start'] = test['raw_values']['TCP download'][0]['t']
        test['metadata']['netperf_finish'] = test['raw_values']['TCP download'][-1]['t']
        """
        test['results']['cpu_baseline'] = \
        [val['load'] for val in c1_0['raw_values']['cpu_stats_root@r7500v2']
         if val['t'] < test['metadata']['netperf_start'] - offset or
         val['t'] > test['metadata']['netperf_finish'] + offset]
        """
    
        cpu_obs = [val['load'] for val in
                   test['raw_values']['cpu_stats_root@r7500v2']
                   if val['t'] > test['metadata']['netperf_start'] +
                   offset and val['t'] < test['metadata']['netperf_finish'] -
                   offset]
        test['results']['cpu_load'] = cpu_obs

        tcp_obs = [val['val'] for val in
                   test['raw_values']['TCP download']
                   if val['t'] > test['metadata']['netperf_start'] +
                   offset and
                   val['t'] < test['metadata']['netperf_finish'] -
                   offset]
        test['results']['tcp_download'] = tcp_obs

        if "_watf" in test['metadata']['TITLE']:
            cpu_watf.extend(cpu_obs)
            tcp_watf.extend(tcp_obs)
            cpu_watf_sgAve.append(np.array(cpu_obs).mean())
            tcp_watf_sgAve.append(np.array(tcp_obs).mean())
        
        if "_woatf" in test['metadata']['TITLE']:
            cpu_woatf.extend(cpu_obs)
            tcp_woatf.extend(tcp_obs)
            cpu_woatf_sgAve.append(np.array(cpu_obs).mean())
            tcp_woatf_sgAve.append(np.array(tcp_obs).mean())
    else:
        print("Rejected test file: %s\n" % test['metadata']['DATA_FILENAME'])
        print("tcp_mean: %s < tcp_threshold: %s\n" % (tcp_mean,tcp_threshold))
        
cpu_watf =  np.array(cpu_watf)
cpu_woatf =  np.array(cpu_woatf)
tcp_watf =  np.array(tcp_watf)
tcp_woatf =  np.array(tcp_woatf)
cpu_watf_sgAve =  np.array(cpu_watf_sgAve)
cpu_woatf_sgAve =  np.array(cpu_woatf_sgAve)
tcp_watf_sgAve =  np.array(tcp_watf_sgAve)
tcp_woatf_sgAve =  np.array(tcp_woatf_sgAve)

print("cpu ave w/atf: %f" % cpu_watf.mean())
print("cpu stdev w/atf: %f" % cpu_watf.std())
print("tcp ave w/atf: %f" % tcp_watf.mean())
print("tcp stdev w/atf: %f" % tcp_watf.std())
print("cpu ave wo/atf: %f" % cpu_woatf.mean())
print("cpu stdev wo/atf: %f" % cpu_woatf.std())
print("tcp ave wo/atf: %f" % tcp_woatf.mean())
print("tcp stdev wo/atf: %f" % tcp_woatf.std())

ttest_cpu = ttest_ind(cpu_woatf.reshape(cpu_woatf.size,1),
                      cpu_watf.reshape(cpu_watf.size,1))
ttest_tcp = ttest_ind(tcp_woatf.reshape(tcp_woatf.size,1),
                      tcp_watf.reshape(tcp_watf.size,1))
ttest_cpu_sgAve = ttest_ind(cpu_woatf_sgAve.reshape(cpu_woatf_sgAve.size,1),
                            cpu_watf_sgAve.reshape(cpu_watf_sgAve.size,1))
ttest_tcp_sgAve = ttest_ind(tcp_woatf_sgAve.reshape(tcp_woatf_sgAve.size,1),
                            tcp_watf_sgAve.reshape(tcp_watf_sgAve.size,1))


df = pd.DataFrame({'level': np.repeat(['watf','woatf'],11),
'tcp': np.concatenate((tcp_watf_sgAve,tcp_woatf_sgAve)),
'cpu': np.concatenate((cpu_watf_sgAve,cpu_woatf_sgAve))})

cpu_ancova = ancova(data=df,dv='cpu',covar='tcp',between='level')

anon98444528 · June 26, 2022, 10:10pm

@moeller0

I've set up my three wired "lan" servers in prep for a flent rtt_fair test.

However, I'm trouble shooting a wifi issue atm. Short of it is, this will take me time.

EDIT sudo iwconfig <wifi_if> power off fixed it.

A first pass rtt_fair test:

Starting Flent 2.0.1 using Python 3.10.2.
Starting rtt_fair test. Expected run time: 70 seconds.
Data file written to ./rtt_fair-2022-06-27T105149.121652.flent.gz

Summary of rtt_fair test run from 2022-06-27 14:51:49.121652

                                      avg       median          # data pts
 Ping (ms) ICMP1 L0       :        34.87        37.65 ms              350
 Ping (ms) ICMP2 L2     :        81.86        84.70 ms              350
 Ping (ms) ICMP3 L1//1  :       132.07       135.00 ms              350
 Ping (ms) ICMP4 L1//2  :       132.86       135.00 ms              350
 Ping (ms) avg             :        95.42          N/A ms              350
 TCP download BE1 L0      :        60.22        57.19 Mbits/s         350
 TCP download BE2 L2    :        33.93        36.54 Mbits/s         350
 TCP download BK1 L1//1 :        33.46        35.98 Mbits/s         350
 TCP download BK2 L1//2 :        35.29        35.36 Mbits/s         350
 TCP download avg          :        40.73          N/A Mbits/s         350
 TCP download fairness     :         0.93          N/A Mbits/s         350
 TCP download sum          :       162.90          N/A Mbits/s         350
 TCP upload BE1 L0        :        40.77        40.00 Mbits/s         350
 TCP upload BE2 L2      :        12.34        12.08 Mbits/s         350
 TCP upload BK1 L1//1   :         9.46         8.96 Mbits/s         350
 TCP upload BK2 L1//2   :         9.16         9.59 Mbits/s         350
 TCP upload avg            :        17.93          N/A Mbits/s         350
 TCP upload fairness       :         0.65          N/A Mbits/s         350
 TCP upload sum            :        71.73          N/A Mbits/s         350

I won't be able to load an image with the latest nbd patch for a bit. I hope it will be interesting testing with the latest mac80211 patches and:

AQL (default no ATF for the r7500v2),
AQL "disabled", and
AQL + ATF (my adaptation of castiel652's patch above).

anon98444528 · June 28, 2022, 4:09pm

flent rtt_fair4be, 1 wifi 5G client to 3 lan servers (lan host 2 below used twice). Two of the lan servers with netem qdisc's as follows:
lan host 1

(p3102.ob) [3] $ tc qdisc show dev ifb0
qdisc netem 8002: root refcnt 2 limit 1000 delay 23.0ms  3.0ms
(p3102.ob) [4] $ tc qdisc show dev enp0s25
qdisc netem 8001: root refcnt 2 limit 1000 delay 23.0ms  3.0ms
qdisc ingress ffff: parent ffff:fff1 ----------------

lan host 2

[8] $ tc qdisc show dev ifb0
qdisc netem 8002: root refcnt 2 limit 1000 delay 48.0ms  6.0ms
[9] $ tc qdisc show dev eno1
qdisc netem 8001: root refcnt 2 limit 1000 delay 48.0ms  6.0ms
qdisc ingress ffff: parent ffff:fff1 ----------------

AQL (default r7500v2, 6 observations):

Average ping: 97.6 ms +/- 1 ms
TCP Download fairness: 0.93 +/- 0.02

AQL "disabled" with
echo 0 > /sys/kernel/debug/ieee80211/phy0/aql_enable
(4 observations):

Average ping: 129.8 ms +/- 5.5 ms
TCP Download fairness: 0.81 +/- 0.06

AQL enabled, ATF enabled via this patch and

echo "alternate_atf = 1" >/lib/firmware/ath10k/fwcfg-pci-0000:01:00.0.txt

followed by rebooting the AP and verification:

r7500v2 # iw phy phy0 info | grep -E '(AQL|AIRTIME)'
                * [ AIRTIME_FAIRNESS ]: airtime fairness scheduling
                * [ AQL ]: Airtime Queue Limits (AQL)

After 4 observations:

Average ping: 97.1 ms +/- 1.3 ms
TCP Download fairness: 0.96 +/- 0.01

I'm not sure what impact the change from ATF using the VTBS to an improved round robin scheduler (commit 6d49a25 and commit 7bf5233) has wrt castiel652's patch to enable ATF on my device.

Based on the flent rtt_fair4be "TCP download fairness" parameter, download fairness moved "in the direction of goodness" with ATF enabled; however, I will need more data (perhaps also a greater difference in netem delay times?) to show that. Also I'm using the host with the longest delay time twice for the flent tests (to get 4 hosts total as required by the rtt_fair test), so I wonder if this might obfuscate the fairness results.

I'm going to spend a little time looking at the flent results in more detail, then try increasing the netem delay times (larger differences between the three hosts).

anon98444528 · November 8, 2022, 11:15pm

This post details my attempt to evaluate enabling ATF on the r7500v2 via my adaptation of castiel652's patch with the current openwrt ATF implementation in mac80211.

Since discovering earlier in this thread that the r7500v2 ath10k wifi firmware & drivers do not support ATF and then subsequently evaluating castiels652's patch to enable ATF, there have been a number of changes to the mac80211 code around ATF (this post and below is a good summary of the issue and resolution - but posts about this issue can be found in various forum threads). Below, I've attempted to use the flent rtt_fair4be test to evaluate the r7500v2 as an AP with and without ATF enabled via the patch.

Since this post is long, I'll report my conclusion at this point. I do not plan to use my adaptation of castiel652's patch going forward. At best, it seems to make little difference. At worst, it may be hampering fairness as measured by the flent rtt_fair4be "fairness" metric.

Here are the details of the test and some selected results:

At this time, master has moved on to kernel 5.15 but most of my prior testing has been on kernel 5.10 (5.10 is very stable for me). Hence I'm using a custom 22.03 "snapshot" build with my adaptation of castiel652's patch applied to kmod-ath10k-ct - 5.10.149+2022-05-13-f808496f-2 driver (firmware is ath10k-firmware-qca99x0-ct-htt - 2020-11-08-1). mac80211 is kmod-mac80211 - 5.10.149+5.15.74-1-1.

r7500v2 # uname -a
Linux r7500v2 5.10.149 #0 SMP Tue Nov 1 12:20:10 2022 armv7l GNU/Linux

For testing ATF, 4 wifi clients were configured as netperf "servers" and a lan device was used to run the flent rtt_fair4be test to these clients. The r7500v2 is configured as an AP only, 5 GHz wifi, channel 36, (up to) 80 MHz width, AQL enabled for all tests. There are no other wifi clients connected (including 2.4 GHz) and 5 GHz channel 36 has little to no interference from other wifi networks in my area.

On the r7500v2, I edited the vht_capab config entry in /var/run/hostapd-phy0.conf to add [BF-ANTENNA-2][BF-ANTENNA-3][SOUNDING-DIMENSION-2][SOUNDING-DIMENSION-3] and remove [BF-ANTENNA-4][SOUNDING-DIMENSION-4] followed by kill -HUP $(pidof hostapd) and then reconnecting the 4 wifi clients. The reason for changing the hostapd config is detailed here. I don't think this should make a difference, but this is consistent with prior tests and I think [BF-ANTENNA-4][SOUNDING-DIMENSION-4] is not supported by the r7500v2.

I tested two different location arrangements of the 4 wifi clients. One is "ideal": all 4 clients 2-3 meters with a clear line of sight to the AP. One is "actual" or "non-ideal" (i.e. how wifi clients are actually distributed in my household). 2 clients 10-15 meters from the AP, no line of site (blocked by wood frame drywall covered walls). 2 clients 2-5 meters from the AP with a clear line of site. One (fast) client is a macbook air with an updated os. I did run sudo ifconfig awdl0 down on the mac. The other three clients are older model laptops with intel wifi cards running an updated ubuntu 20.04.5 lts os.

For tests using ATF, on the r7500v2 I ran

echo "alternate_atf = 1" >/lib/firmware/ath10k/fwcfg-pci-0000:01:00.0.txt
echo "alternate_atf = 1" >/lib/firmware/ath10k/fwcfg-pci-0001:01:00.0.txt

rebooted the AP, updated the hostapd config as mentioned above, and connected the wifi clients.

For tests without ATF, on the r7500v2 I removed the fwcfg-*.txt files rebooted the AP, updated the hostapd config as mentioned above, and connected the wifi clients.

On an AP lan device, I used the following flent command:

flent rtt_fair4be -H <wifi_host_1> -H <wifi_host_2> -H <wifi_host_3> -H <wifi_host4> --test-parameter=cpu_stats_hosts=root@r7500v2 -t woatf >> sum-20221108.txt

adjusting the -t <test-name> for tests with ATF (watf) and without ATF (woatf) for the ideal arrangement. I used watf-1 woatf-1 for the non-ideal arrangement.

I ran 10 sequential 70 second flent tests for each configuration and recorded the "TCP download fairness" and "TCP upload fairness" metric from the sum-20221108.txt file. 40 tests total. The wifi clients were not moved between tests with and without ATF for different spacial arrangements.

The following show the python code (with data), analysis, and output.

import numpy as np
from scipy.stats import ttest_ind

# rtt_fair4be*.watf.flent.gz = ideal_watf_5_up_fairness
# rtt_fair4be*.woatf.flent.gz = ideal_woatf_5_up_fairness
# rtt_fair4be*.watf-1.flent.gz = nonideal_watf_5_up_fairness
# rtt_fair4be*.woatf-1.flent.gz = nonideal_woatf_5_up_fairness
ideal_woatf_5_down_fairness = [0.65, 0.77, 0.74, 0.83, 0.6, 0.69, 0.57, 0.88, 0.84, 0.87]
ideal_woatf_5_up_fairness = [0.82, 0.84, 0.75, 0.71, 0.88, 0.9, 0.86, 0.8, 0.92, 0.74]
ideal_watf_5_down_fairness = [0.8, 0.78, 0.72, 0.74, 0.86, 0.84, 0.6, 0.66, 0.61, 0.67]
ideal_watf_5_up_fairness = [0.75, 0.78, 0.64, 0.55, 0.64, 0.5, 0.71, 0.82, 0.81, 0.87]
ideal_woatf_5_down_fairness =  np.array(ideal_woatf_5_down_fairness)
ideal_woatf_5_up_fairness =  np.array(ideal_woatf_5_up_fairness)
ideal_watf_5_down_fairness =  np.array(ideal_watf_5_down_fairness)
ideal_watf_5_up_fairness =  np.array(ideal_watf_5_up_fairness)

print("ideal woatf 5GHz down fairness ave: %f" % ideal_woatf_5_down_fairness.mean())
print("ideal woatf 5GHz down fairness stdev: %f" % ideal_woatf_5_down_fairness.std())
print("ideal woatf 5GHz up fairness ave: %f" % ideal_woatf_5_up_fairness.mean())
print("ideal woatf 5GHz up fairness stdev: %f" % ideal_woatf_5_up_fairness.std())

print("ideal watf 5GHz down fairness ave: %f" % ideal_watf_5_down_fairness.mean())
print("ideal watf 5GHz down fairness stdev: %f" % ideal_watf_5_down_fairness.std())
print("ideal watf 5GHz up fairness ave: %f" % ideal_watf_5_up_fairness.mean())
print("ideal watf 5GHz up fairness stdev: %f" % ideal_watf_5_up_fairness.std())

ttest_ideal_w_wo_atf_down = ttest_ind(ideal_woatf_5_down_fairness.reshape(ideal_woatf_5_down_fairness.size,1),
                                ideal_watf_5_down_fairness.reshape(ideal_watf_5_down_fairness.size,1))
ttest_ideal_w_wo_atf_up = ttest_ind(ideal_woatf_5_up_fairness.reshape(ideal_woatf_5_up_fairness.size,1),
                              ideal_watf_5_up_fairness.reshape(ideal_watf_5_up_fairness.size,1))

nonideal_watf_5_down_fairness = [0.52, 0.43, 0.62, 0.76, 0.47, 0.44, 0.53, 0.5, 0.49, 0.48]
nonideal_watf_5_up_fairness = [0.65, 0.54, 0.64, 0.51, 0.49, 0.65, 0.72, 0.58, 0.51, 0.56]
nonideal_woatf_5_down_fairness = [0.62, 0.67, 0.65, 0.73, 0.72, 0.73, 0.69, 0.65, 0.67, 0.59]
nonideal_woatf_5_up_fairness = [0.61, 0.63, 0.63, 0.74, 0.68, 0.72, 0.74, 0.65, 0.43, 0.46]

nonideal_woatf_5_down_fairness =  np.array(nonideal_woatf_5_down_fairness)
nonideal_woatf_5_up_fairness =  np.array(nonideal_woatf_5_up_fairness)
nonideal_watf_5_down_fairness =  np.array(nonideal_watf_5_down_fairness)
nonideal_watf_5_up_fairness =  np.array(nonideal_watf_5_up_fairness)

print("nonideal woatf 5GHz down fairness ave: %f" % nonideal_woatf_5_down_fairness.mean())
print("nonideal woatf 5GHz down fairness stdev: %f" % nonideal_woatf_5_down_fairness.std())
print("nonideal woatf 5GHz up fairness ave: %f" % nonideal_woatf_5_up_fairness.mean())
print("nonideal woatf 5GHz up fairness stdev: %f" % nonideal_woatf_5_up_fairness.std())

print("nonideal watf 5GHz down fairness ave: %f" % nonideal_watf_5_down_fairness.mean())
print("nonideal watf 5GHz down fairness stdev: %f" % nonideal_watf_5_down_fairness.std())
print("nonideal watf 5GHz up fairness ave: %f" % nonideal_watf_5_up_fairness.mean())
print("nonideal watf 5GHz up fairness stdev: %f" % nonideal_watf_5_up_fairness.std())

ttest_nonideal_w_wo_atf_down = ttest_ind(nonideal_woatf_5_down_fairness.reshape(nonideal_woatf_5_down_fairness.size,1),
                                         nonideal_watf_5_down_fairness.reshape(nonideal_watf_5_down_fairness.size,1))
ttest_nonideal_w_wo_atf_up = ttest_ind(nonideal_woatf_5_up_fairness.reshape(nonideal_woatf_5_up_fairness.size,1),
                                       nonideal_watf_5_up_fairness.reshape(nonideal_watf_5_up_fairness.size,1))

Output

%run analysis.py
ideal woatf 5GHz down fairness ave: 0.744000
ideal woatf 5GHz down fairness stdev: 0.106977
ideal woatf 5GHz up fairness ave: 0.822000
ideal woatf 5GHz up fairness stdev: 0.067646
ideal watf 5GHz down fairness ave: 0.728000
ideal watf 5GHz down fairness stdev: 0.087384
ideal watf 5GHz up fairness ave: 0.707000
ideal watf 5GHz up fairness stdev: 0.115590
nonideal woatf 5GHz down fairness ave: 0.672000
nonideal woatf 5GHz down fairness stdev: 0.044452
nonideal woatf 5GHz up fairness ave: 0.629000
nonideal woatf 5GHz up fairness stdev: 0.102220
nonideal watf 5GHz down fairness ave: 0.524000
nonideal watf 5GHz down fairness stdev: 0.093509
nonideal watf 5GHz up fairness ave: 0.585000
nonideal watf 5GHz up fairness stdev: 0.072560

ttest_nonideal_w_wo_atf_up
Out[6]: Ttest_indResult(statistic=array([1.0530053]), pvalue=array([0.30626963]))

ttest_nonideal_w_wo_atf_down
Out[7]: Ttest_indResult(statistic=array([4.28830412]), pvalue=array([0.00044242]))

ttest_ideal_w_wo_atf_down
Out[8]: Ttest_indResult(statistic=array([0.34749779]), pvalue=array([0.7322476]))

ttest_ideal_w_wo_atf_up
Out[9]: Ttest_indResult(statistic=array([2.57599011]), pvalue=array([0.01903518]))

Given the test configuration (lan device running flent rtt_fair4be out to 4 wifi hosts running netserver), I believe the average "TCP upload fairness" metric should evaluate if enabling ATF on the r7500v2 (at least the way as described in this thread) is beneficial or not. It's my understanding that a "fairness" result close to 1 is considered "fair" and values less than 1 are less fair.

Using a t-test to check if the average "TCP upload fairness" with and without ATF enabled are statistically different, for the ideal case the p value is less than 0.05 indicating the averages are different at the 95% confidence level. Given the average "fairness" is less with ATF enabled, this is not the result I hoped for.

For the non ideal case, the average "TCP upload fairness" is also less fair with ATF enabled (again not what I hoped for), but it is not statistically significant at the 95% confidence level.

It has been suggested that wifi rate control in the r7500v2 wifi firmware/hardware might be responsible for the observations that started this thread (more about that in my next post). On the r7500v2 (from the lan device) during the flent testing, I periodically ran:

printf "woatf-1: " >> /tmp/sum-rap-20221108.txt && date >> /tmp/sum-rap-20221108.txt && iw dev wlan0 station dump | grep "tx bitrate" >> /tmp/sum-rap-20221108.txt

I did not observe changes in MCS values that might indicate such issues (like I have in other testing); however, the macbook client did change during the tests (this client was moved to a non line of site location for the non-ideal test arrangement). Some MCS output excerpts:

r7500v2 # cat /tmp/sum-rap-20221108.txt
woatf: Tue Nov  8 10:13:01 EST 2022
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
        tx bitrate:     866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
woatf: Tue Nov  8 10:19:20 EST 2022
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
        tx bitrate:     866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
watf: Tue Nov  8 10:59:00 EST 2022
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
        tx bitrate:     866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
watf: Tue Nov  8 11:06:30 EST 2022
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
        tx bitrate:     866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
        tx bitrate:     270.0 MBit/s MCS 14 40MHz short GI
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
watf: Tue Nov  8 11:06:39 EST 2022
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
        tx bitrate:     866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
watf-1: Tue Nov  8 11:18:55 EST 2022
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
        tx bitrate:     650.0 MBit/s VHT-MCS 7 80MHz short GI VHT-NSS 2
        tx bitrate:     240.0 MBit/s MCS 13 40MHz short GI
        tx bitrate:     270.0 MBit/s MCS 14 40MHz short GI
watf-1: Tue Nov  8 11:20:37 EST 2022
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
        tx bitrate:     650.0 MBit/s VHT-MCS 7 80MHz short GI VHT-NSS 2
        tx bitrate:     240.0 MBit/s MCS 13 40MHz short GI
        tx bitrate:     240.0 MBit/s MCS 13 40MHz short GI
watf-1: Tue Nov  8 11:30:16 EST 2022
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
        tx bitrate:     360.0 MBit/s VHT-MCS 8 40MHz short GI VHT-NSS 2
        tx bitrate:     240.0 MBit/s MCS 13 40MHz short GI
        tx bitrate:     270.0 MBit/s MCS 14 40MHz short GI
watf-1: Tue Nov  8 11:30:24 EST 2022
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
        tx bitrate:     360.0 MBit/s VHT-MCS 8 40MHz short GI VHT-NSS 2
        tx bitrate:     270.0 MBit/s MCS 14 40MHz short GI
        tx bitrate:     270.0 MBit/s MCS 14 40MHz short GI
woatf-1: Tue Nov  8 11:50:50 EST 2022
        tx bitrate:     240.0 MBit/s MCS 13 40MHz short GI
        tx bitrate:     585.1 MBit/s VHT-MCS 6 80MHz short GI VHT-NSS 2
        tx bitrate:     243.0 MBit/s MCS 14 40MHz
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
woatf-1: Tue Nov  8 11:52:17 EST 2022
        tx bitrate:     216.0 MBit/s MCS 13 40MHz
        tx bitrate:     300.0 MBit/s VHT-MCS 7 40MHz short GI VHT-NSS 2
        tx bitrate:     240.0 MBit/s MCS 13 40MHz short GI
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
woatf-1: Tue Nov  8 11:52:41 EST 2022
        tx bitrate:     240.0 MBit/s MCS 13 40MHz short GI
        tx bitrate:     300.0 MBit/s VHT-MCS 7 40MHz short GI VHT-NSS 2
        tx bitrate:     270.0 MBit/s MCS 14 40MHz short GI
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI
woatf-1: Tue Nov  8 11:55:24 EST 2022
        tx bitrate:     240.0 MBit/s MCS 13 40MHz short GI
        tx bitrate:     270.0 MBit/s VHT-MCS 6 40MHz short GI VHT-NSS 2
        tx bitrate:     243.0 MBit/s MCS 14 40MHz
        tx bitrate:     300.0 MBit/s MCS 15 40MHz short GI

I have to look over the flent results in detail to see what might be going on in the non-ideal case. Even if wifi rate control is affecting the non-ideal flent results, ATF is still of little use to me since it doesn't help in the "ideal" or "real world" use test arrangement.

sigh...

blaming wifi rate control is easy to do but hard (possbily impossible) to show or fix. More about that below. I did have hope for ATF - it seems I'm better off without it.

anon98444528 · November 9, 2022, 1:24pm

This post is not the solution; however, wifi rate control is likely the best place to look next to explain the multi wifi client netperf observations (which I can still reproduce) that started this thread. I've reported this example showing the correlation between wifi rate control and poor multi client netperf results.

From the ath10k mailing list the following quotes (Dave Taht and Ben Greear) are why I do not plan to look any further at wifi rate control:

Dave Taht
"So far as I know "rate control" is one of the major handwaves in the
standard and differs from vendor to vendor and 802.11 standard to
another.

...

but actually choosing a MCS rate, number of retries, formatting an
aggregate... and other factors, are... different.

I've tried to point at the first and best paper that shows how
minstrel works here:

http://blog.cerowrt.org/post/minstrel/

(and minstrel's known problems) but that's not what ath10k uses.

I would welcome full documentation (or code!) on how ath10k choses
its rates. Or any other wifi chipset's blob, frankly.

and

Ben Greear
"At least the sensing and stuff is probably baked into the hardware.
What I notice is that the firmware passes a series of rates to the
hardware to use for each packet, and it includes different bandwidth rates."

"You could instrument the kernel to save a histogram of
tx-rates or something similar, perhaps. 'iw dev wlan0 station dump' and
similar will now show expected tx rate, but that is an average or maybe
the last tx rate reported, so it does not give a detailed view of
what is happening.

You can also sniff the air and see the rates in that manner (on any
firmware, including stock)."

So it seems not only is wifi rate control implemented in the ath10k firmware, parts of it may be baked into the hardware. Even if one can show wifi rate control is responsible and convince the ath10k-ct dev to help with the firmware, if the issue goes back to hardware your done. I see little value in going further (with ath10k or qualcomm devices for that mater) just to find out.

anon98444528 · November 11, 2022, 12:31pm

The ath10k-ct patches used in the flent testing above (adpated from castiel652's original patch):

--- a/ath10k-5.15/core.c
+++ b/ath10k-5.15/core.c
@@ -1488,6 +1488,12 @@ start_again:
 				ar->fwcfg.flags |= ATH10K_FWCFG_BMISS_VDEVS;
 			}
 		}
+		else if (strcasecmp(filename, "alternate_atf") == 0) {
+			if (kstrtol(val, 0, &t) == 0) {
+				ar->fwcfg.alternate_atf = t;
+				ar->fwcfg.flags |= ATH10K_FWCFG_ALTERNATE_ATF;
+			}
+		}
 		else if (strcasecmp(filename, "max_amsdus") == 0) {
 			if (kstrtol(val, 0, &t) == 0) {
 				ar->fwcfg.max_amsdus = t;
@@ -3245,6 +3251,7 @@ static int ath10k_core_init_firmware_fea
 
 	ar->request_ct_sta = ath10k_modparam_ct_sta;
 	ar->request_nohwcrypt = ath10k_modparam_nohwcrypt;
+	ar->request_alternate_atf = ath10k_modparam_alternate_atf;
 	ar->request_nobeamform_mu = ath10k_modparam_nobeamform_mu;
 	ar->request_nobeamform_su = ath10k_modparam_nobeamform_su;
 	ar->num_ratectrl_objs = ath10k_modparam_target_num_rate_ctrl_objs_ct;
@@ -3261,6 +3268,8 @@ static int ath10k_core_init_firmware_fea
 		ar->request_nohwcrypt = ar->fwcfg.nohwcrypt;
 	if (ar->fwcfg.flags & ATH10K_FWCFG_CT_STA)
 		ar->request_ct_sta = ar->fwcfg.ct_sta_mode;
+	if (ar->fwcfg.flags & ATH10K_FWCFG_ALTERNATE_ATF)
+		ar->request_alternate_atf = ar->fwcfg.alternate_atf;
 	if (ar->fwcfg.flags & ATH10K_FWCFG_NOBEAMFORM_MU)
 		ar->request_nobeamform_mu = ar->fwcfg.nobeamform_mu;
 	if (ar->fwcfg.flags & ATH10K_FWCFG_NOBEAMFORM_SU)
--- a/ath10k-5.15/core.h
+++ b/ath10k-5.15/core.h
@@ -1357,6 +1357,7 @@ struct ath10k {
 #define ATH10K_FWCFG_CT_STA         (1<<16)
 #define ATH10K_FWCFG_ALLOW_ALL_MCS  (1<<17)
 #define ATH10K_FWCFG_DMA_BURST      (1<<18)
+#define ATH10K_FWCFG_ALTERNATE_ATF  (1<<19)
 
 		u32 flags; /* let us know which fields have been set */
 		char calname[100];
@@ -1368,6 +1369,7 @@ struct ath10k {
 		u32 stations;
 		u32 peers;
 		u32 nohwcrypt;
+		u32 alternate_atf;
 		u32 ct_sta_mode;
 		u32 nobeamform_mu;
 		u32 nobeamform_su;
@@ -1494,6 +1496,7 @@ struct ath10k {
 	int num_tids;
 	bool request_ct_sta;    /* desired setting */
 	bool request_nohwcrypt; /* desired setting */
+	bool request_alternate_atf;
 	bool request_nobeamform_mu;
 	bool request_nobeamform_su;
 	u32 num_ratectrl_objs;
--- a/ath10k-5.15/mac.c
+++ b/ath10k-5.15/mac.c
@@ -312,6 +312,11 @@ int ath10k_modparam_ct_sta;
 module_param_named(ct_sta, ath10k_modparam_ct_sta, int, 0444);
 MODULE_PARM_DESC(ct_sta, "Use CT-STA mode, a bit like proxy-STA");
 
+int ath10k_modparam_alternate_atf;
+module_param_named(alternate_atf, ath10k_modparam_alternate_atf, int, 0444);
+MODULE_PARM_DESC(alternate_atf,
+		 "Overide WMI firmware checks and use an alternate airtime fairness");
+
 int ath10k_modparam_nobeamform_mu;
 module_param_named(nobeamform_mu, ath10k_modparam_nobeamform_mu, int, 0444);
 MODULE_PARM_DESC(nobeamform_mu, "Disable TX/RX MU Beamforming capabilities");
@@ -11303,7 +11308,8 @@ int ath10k_mac_register(struct ath10k *a
 				      NL80211_EXT_FEATURE_ACK_SIGNAL_SUPPORT);
 
 	if (ath10k_peer_stats_enabled(ar) ||
-	    test_bit(WMI_SERVICE_REPORT_AIRTIME, ar->wmi.svc_map))
+	    test_bit(WMI_SERVICE_REPORT_AIRTIME, ar->wmi.svc_map) ||
+	    ar->request_alternate_atf)
 		wiphy_ext_feature_set(ar->hw->wiphy,
 				      NL80211_EXT_FEATURE_AIRTIME_FAIRNESS);
 
--- a/ath10k-5.15/mac.h
+++ b/ath10k-5.15/mac.h
@@ -17,6 +17,7 @@ enum wmi_tlv_tx_pause_action;
 
 extern int ath10k_modparam_ct_sta;
 extern int ath10k_modparam_nohwcrypt;
+extern int ath10k_modparam_alternate_atf;
 extern int ath10k_modparam_nobeamform_mu;
 extern int ath10k_modparam_nobeamform_su;
 extern int ath10k_modparam_target_num_vdevs_ct;
--- a/ath10k-5.15/txrx.c
+++ b/ath10k-5.15/txrx.c
@@ -168,7 +168,12 @@ int ath10k_txrx_tx_unref(struct ath10k_h
 	struct sk_buff *msdu;
 	u8 flags;
 	bool tx_failed = false;
-
+	struct ieee80211_tx_status status = {
+	    .sta = NULL,
+	};
+	u32 tx_airtime = 0;
+	int len = 0;
+	
 	ath10k_dbg(ar, ATH10K_DBG_HTT,
 		   "htt tx completion msdu_id %u status %d\n",
 		   tx_done->msdu_id, tx_done->status);
@@ -214,11 +219,13 @@ int ath10k_txrx_tx_unref(struct ath10k_h
 		wake_up(&htt->empty_tx_wq);
 	spin_unlock_bh(&htt->tx_lock);
 
-	rcu_read_lock();
-	if (txq && txq->sta && skb_cb->airtime_est)
-		ieee80211_sta_register_airtime(txq->sta, txq->tid,
-					       skb_cb->airtime_est, 0);
-	rcu_read_unlock();
+	if (!ar->request_alternate_atf) {
+	  rcu_read_lock();
+	  if (txq && txq->sta && skb_cb->airtime_est)
+	    ieee80211_sta_register_airtime(txq->sta, txq->tid,
+					   skb_cb->airtime_est, 0);
+	  rcu_read_unlock();
+	}
 
 	if (ar->bus_param.dev_type != ATH10K_DEV_TYPE_HL)
 		dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);
@@ -317,7 +324,21 @@ int ath10k_txrx_tx_unref(struct ath10k_h
 	}
 #endif
 
-	ieee80211_tx_status(htt->ar->hw, msdu);
+	if (ar->request_alternate_atf) {
+	  rcu_read_lock();
+	  len = msdu->len;
+	  tx_airtime = ieee80211_calc_tx_airtime(htt->ar->hw, info, len);
+	  if (txq && txq->sta) {
+	    ieee80211_sta_register_airtime(txq->sta, txq->tid, tx_airtime, 0);
+	    status.sta = txq->sta;
+	  }
+	  status.skb = msdu;
+	  status.info = info;
+	  rcu_read_unlock();
+	  ieee80211_tx_status_ext(htt->ar->hw, &status);
+	} else {
+	  ieee80211_tx_status(htt->ar->hw, msdu);
+	}
 	/* we do not own the msdu anymore */
 
 	return 0;
--- a/ath10k-5.15/htt_rx.c
+++ b/ath10k-5.15/htt_rx.c
@@ -3206,7 +3206,9 @@ do_generic:
 						IEEE80211_QOS_CTL_TID_MASK;
 		tx_duration = __le32_to_cpu(ppdu_dur->tx_duration);
 
-		ieee80211_sta_register_airtime(peer->sta, tid, tx_duration, 0);
+		if (!ar->request_alternate_atf) {
+		  ieee80211_sta_register_airtime(peer->sta, tid, tx_duration, 0);
+		}
 
 		spin_unlock_bh(&ar->data_lock);
 		rcu_read_unlock();

dtaht · November 17, 2022, 9:08pm

thx for the exhaustive testing!!!

... but *.flent.gz files and graphs might still be revealing.

system · November 27, 2022, 9:08pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.