on osx:
sudo netstat -l your_device -qq
and I'm calling it a day. New theory - the rpi (openwrt.lan) has something rate limiting the performance of it's stack.
on osx:
sudo netstat -l your_device -qq
and I'm calling it a day. New theory - the rpi (openwrt.lan) has something rate limiting the performance of it's stack.
Before calling it a day -see below. May I ask what you expect to find with the output of that command?
netstat output (too long to paste directly)
Going to enjoy Father's day for a little while.
heh. I gave you the wrong command. -qq -I (thats an I not an L) should show the native fq_codel stats on the osx box. and even that might be the wrong command... gimmee sec
It makes more sense now, I should realise; here you go:
@reaper$ ➜ ~ sudo netstat -I en0 -qq
en0:
[ sched: FQ_CODEL qlength: 0/128 ]
[ pkts: 124486499 bytes: 170983491083 dropped pkts: 44041 bytes: 65229549 ]
=====================================================
[ pri: VO (1) srv_cl: 0x400180 quantum: 605 drr_max: 8 ]
[ queued pkts: 0 bytes: 0 ]
[ dequeued pkts: 33377 bytes: 6044211 ]
[ budget: 0 target qdelay: 10.00 msec update interval:100.00 msec ]
[ flow control: 0 feedback: 0 stalls: 0 failed: 0 overwhelming: 0 ]
[ drop overflow: 0 early: 0 memfail: 0 duprexmt:0 ]
[ flows total: 0 new: 0 old: 0 ]
[ throttle on: 0 off: 0 drop: 0 ]
[ compressible pkts: 0 compressed pkts: 0]
=====================================================
[ pri: VI (2) srv_cl: 0x380100 quantum: 3028 drr_max: 6 ]
[ queued pkts: 0 bytes: 0 ]
[ dequeued pkts: 5842684 bytes: 8182467304 ]
[ budget: 0 target qdelay: 10.00 msec update interval:100.00 msec ]
[ flow control: 355 feedback: 355 stalls: 12 failed: 0 overwhelming: 0 ]
[ drop overflow: 0 early: 0 memfail: 0 duprexmt:0 ]
[ flows total: 0 new: 0 old: 0 ]
[ throttle on: 0 off: 0 drop: 0 ]
[ compressible pkts: 0 compressed pkts: 0]
=====================================================
[ pri: BE (7) srv_cl: 0x0 quantum: 1514 drr_max: 4 ]
[ queued pkts: 0 bytes: 0 ]
[ dequeued pkts: 118254467 bytes: 162519175698 ]
[ budget: 0 target qdelay: 10.00 msec update interval:100.00 msec ]
[ flow control: 9243 feedback: 9243 stalls: 40 failed: 0 overwhelming: 0 ]
[ drop overflow: 0 early: 43041 memfail: 0 duprexmt:0 ]
[ flows total: 0 new: 0 old: 0 ]
[ throttle on: 0 off: 0 drop: 0 ]
[ compressible pkts: 0 compressed pkts: 0]
=====================================================
[ pri: BK (8) srv_cl: 0x100080 quantum: 1514 drr_max: 2 ]
[ queued pkts: 0 bytes: 0 ]
[ dequeued pkts: 355971 bytes: 275803870 ]
[ budget: 0 target qdelay: 10.00 msec update interval:100.00 msec ]
[ flow control: 3 feedback: 3 stalls: 0 failed: 0 overwhelming: 0 ]
[ drop overflow: 0 early: 0 memfail: 0 duprexmt:0 ]
[ flows total: 0 new: 0 old: 0 ]
[ throttle on: 0 off: 0 drop: 0 ]
[ compressible pkts: 83604 compressed pkts: 34500]
enjoy your fathers day. It is labor day here... and we have labored mightily.
did you patch the kernel or the mac80211 package?
wmm_ac_be_txop_limit=32 you see on the osx box?
this was a good result from aug 4:
a tcpdump of 2 flows up woul be useful also.
I patched package/kernel/mac80211
.
Update: 1:38 AEDT - I just realised my mistake, recompiling and letting it ready for re-testing tomorrow in the early morning.
Yes, Sir, see below:
But I didn't use today the below parameters, I only modified txop=32, and I used burst=0.
tx_queue_data2_aifs=1
tx_queue_data2_cwmin=7
tx_queue_data2_cwmax=15
tx_queue_data2_burst=3.0
I will do it tomorrow.
Another round.
WMM test parameters:
tx_queue_data2_aifs=3
tx_queue_data2_cwmin=15
tx_queue_data2_cwmax=63
tx_queue_data2_burst=0
wmm_ac_be_txop_limit=0
AQL test parameters:
root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_txq_limit
AC AQL limit low AQL limit high
VO 5000 12000
VI 5000 12000
BE 5000 12000
BK 5000 12000
root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_threshold
24000
root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_enable
1
flent rrul_be test:
flent tcpn[up,down] 2-threaded test:
yuck.
w/aql disabled?
Sadly, nope. You can see this in the parameters list I posted before the graphs.
Where were we in july?
"At least it doesn't crash."
And, the last one for today.
WMM test parameters:
tx_queue_data2_aifs=3
tx_queue_data2_cwmin=15
tx_queue_data2_cwmax=63
tx_queue_data2_burst=0
wmm_ac_be_txop_limit=0
AQL test parameters:
root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_txq_limit
AC AQL limit low AQL limit high
VO 2000 2000
VI 2000 2000
BE 2000 2000
BK 2000 2000
root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_threshold
24000
root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_enable
1
Please, note that aql_threshold gets nullified because low and high aql_tx_limits are identical.
flent rrul_be
300 s test graph:
flent tcpnup
test with 1, 2, 4, 8 and 16 threads ping cdf graph:
flent tcpndown
test with 1, 2, 4, 8 and 16 threads ping cdf graph:
And as usual, click here to download including a tcpdump of 2-threaded up and down captures.
I think we are onto something, but why is downloading so good and uploading so bad?! Can you think of anything @dtaht?
We aren't winning the election often enough. Distributing a txop in the beacon of 2ms or less might help. Nice to see progress (without y'alls help and interest I'd have given up multiple times on these fronts. Actually, I DO give up periodically in the hope that a zen like moment would yield inspiration, or someone else would have the inspiration).
I still need to look at these packet captures closely.
A key thing to remember is we got about 1/3 ms RTT with no load. What we achieve now is pretty miserable compared to that.
Fiddling with NAPI_POLL_WEIGHT=16 might pull some latency out of the ethernet interfaces.
On the osx upload side, there are no drops, essentially. My guess is all the latency is coming from a fixed length queue there, and when they run out, they push back on the stack.
Two ways to test that - I can't find a way to lock the mcs rate in osx with a few searches of the web, but narrowing the channel width at the AP to HT20 might be revealing. Going from VHT80 to HT20 would probably quadruple the latency observed from the osx upload if that theory is correct.
Polling the netstat -qq -I the_interface and dumping that somewhere during that upload test might be revealing also.
for i in `seq 1 60`
do
netstat -qq -I the_theinterface
sleep 1
done > fq_codel_osx.log
I will try this.
By the way, I just did a new quick test on Waveform, and with Apple's tool letting WLAN be the choking point... quite an improvement on these two tests, previously the results were +20-40 ms and < 1000 RPM.
==== SUMMARY ====
Upload capacity: 34.504 Mbps
Download capacity: 418.589 Mbps
Upload flows: 12
Download flows: 12
Responsiveness: High (3157 RPM)
Base RTT: 11
Start: 6/9/2022, 5:34:57 am
End: 6/9/2022, 5:35:07 am
OS Version: Version 12.5.1 (Build 21G83)
Let me run the tests before starting my daily fight against the Gravity. Stay tuned.
Here you go. I ran two sets of tests, upload and download from 1 to 16 threads with the AP in HT20 mode. In parallel, I ran a version of your script to capture fq_code stats.
You can download all the files by clicking here.
To not I didn't see the same kind of "contention" between upload and download. I think that it might be that we are not hitting the download limit, so latency does not increase. Another of my nïves queries, I reckon.
I will patch this next. Test to come tomorrow, though, Dave.
Update: I changed my mind; I will fight against Gravity in the arvo. See below.
WMM test parameters:
tx_queue_data2_aifs=3
tx_queue_data2_cwmin=15
tx_queue_data2_cwmax=63
tx_queue_data2_burst=0
wmm_ac_be_txop_limit=0
AQL test parameters:
root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_txq_limit
AC AQL limit low AQL limit high
VO 2000 2000
VI 2000 2000
BE 2000 2000
BK 2000 2000
root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_threshold
24000
root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_enable
1
Kernel patches:
MS2TIME(8)
NAPI_POLL_WEIGHT=16
flent rrul_be
300 s test graph:
flent tcpnup
test with 1, 2, 4, 8 and 16 threads ping cdf (median ≈70 ms) graph:
flent tcpndown
test with 1, 2, 4, 8 and 16 threads ping cdf (median ≈8 ms) graph:
Click me to download flent data and tcpdump capture.
Note that I didn't see any kernel warning; hence the driver is not calling netif_napi_add()
with a weight value higher than the one defined (16).
if (weight > NAPI_POLL_WEIGHT)
netdev_err_once(dev, "%s() called with weight %d\n", __func__,
weight);
The napi pollweight affects the whole system. You should see an increase in context switches due to it. It's been 64 for two decades,
and I've always felt it was too high for modern (especially arm) multicore systems. It's better to do less work, more often, to have a more fluid experience.
Since your test result was identical, it's not clear if it was actually applied. A printk in init from the mt76 ethernet driver printing out what its set to would validate that it changed.
For all I know, 8 is closer to a good number on arm.
Very happy with the download result (we were at VHT80(?) 430Mbit though, what were we getting before? 80Mbit mt76 vs 120 OSX or so... to me this ratio we've had since the beginning of this thread STILL points maybe to an ampdu sizing or mcs-rate problem more than anything else, since we now have plenty of cpu left over.
FQ is working just fine and dandy with these reduced AQL values. I still don't get why aql needs to be enabled at all... (could you do another test with aql disabled with HT20?)
If you don't mind, I'd like to stay at HT20 for a while - makes the packet captures smaller! I ended up with 9GB on swap on the last cap....
To make 'em bigger... to be able to actually look at what's on the wireless... sigh... do you have a usb stick on available for the router? Does a wifi monitoring interface work on this chipset? tcpdump
I just swapped some email with @sultannaml - working on the latest mt79 chipset - who told me:
Other mt76 changes I made AP-side include making mt7915 construct larger A-MSDUs
when A-MSDUs are hardware offloaded (which it has been for over a year now), and
working around a weird firmware bug where mt7915 couldn't transmit frames with
MCS 10-11 when using 160 MHz bandwidth with 2 spatial streams over DFS spectrum.
I also made a handful of other changes to mt76 to fix mt7922 bugs and
performance issues, such as how mt7922 would never TX at 160 MHz bandwidth to my
mt7915 AP out of the box — TX would be limited to 80 MHz — despite working just
fine with a Broadcom AP.
[1] https://github.com/openwrt/openwrt/commit/f338f76a66a50d201ae57c98852aa9c74e9e278a
[2] https://github.com/kerneltoast/kernel_x86_laptop/commit/ca89780690f7492c2d357e0ed2213a1d027341ae
You went from VHT80 or 160 down to 20?
Can I have a quick pointer to it, please?
I read those patches. I've moved from VHT80 to HT20 here. My devices are MT7621 CPUs with MT7615e chipsets.