AQL and the ath10k is *lovely*

Is it possible to replace the files on the router responsible for cpu support (e.g. changing clocks, overclocking) and usb (changing the frequency) without uploading the entire image again?

My problems with the connection speed may be related not so much to AQL or DSA..., and to the board-2.bin file - changing to the newest standard (with OpenWrt) gives better speeds up close (even by 1/3) but the range decreases drastically and does not allow me to go beyond floor. On the other hand, the modified one (only for ea6350v3) gives a range (up to 200 meters from the router), but the speed close up is 1/3 lower, which I don't care much about anyway. Strange, because on stable releases before 22.03 and master on 5.10 it was fine despite the use of a modified board file- near and the far speeds were from 30 to 15 Mbps respectively - maybe in the 5.15 kernel some modifications were made to read or use board-2.bin files differently as in earlier versions OpenWrt?
I can also be guilty of USB in the router (errors

[6313.592704] usb usb2-port1: Cannot enable. Maybe the USB cable is bad?
[ 6359.172602] usb usb2-port1: Cannot enable. Maybe the USB cable is bad?

), but this does not affect the speed and availability of the cable connection or stable of wireless - I would rather bet a conflict between the USB frequency and the 2.4GHz wifi frequency (although this supposedly can take place if the port works in version 3.0 and not as currently 2.0).

As for AQL tests and the latest patches (all 8 patches) - there is still an error in the log (and interestingly completely random - regardless of the load, number of clients), but it does not have any visible effects as before. Without patch 334 and 335, the pings are higher and the near / far speeds are similar to how apply all patches.

EDIT: I change board file on in old optymized (from 19.07 notengobattery images) , and...
20-22Mbps/25ms in test room (from before 1.5-2Mbps/28-30ms)// tests without 334 and 335 patch.
Will see how long work fine.

What is the clock bug? The min 800MHz frequency?
FWIW, I always run my R7800 with the performance governor at max 1.7GHz, minimal temperature increase 2-3C, barely gets over 60C in the summer. I used to have an Asus router that ran at 80C for many years, I think 60C is fine. Why do we bother with freq scaling for a device that's always plugged in, no battery life concerns like a phone :man_shrugging:

1 Like

excerpts from simultaneous netperf (AP -> wifi clients) using nmba and "59" client (the one that did 0.07 mbps BE upload in the tests above). Both clients in the same location as the test above. MCS values as reported on the AP (I checked several times during the ~5 min netperfs - they did not change).

59:

Interim result:   90.16 10^6bits/s over 1.001 seconds ending at 1656515800.668
Interim result:   88.43 10^6bits/s over 1.019 seconds ending at 1656515801.687
Interim result:   87.33 10^6bits/s over 1.013 seconds ending at 1656515802.700
Interim result:   88.50 10^6bits/s over 1.000 seconds ending at 1656515803.700
Interim result:   89.57 10^6bits/s over 1.005 seconds ending at 1656515804.705
Interim result:   88.42 10^6bits/s over 1.013 seconds ending at 1656515805.717
Interim result:   89.54 10^6bits/s over 1.005 seconds ending at 1656515806.722
Interim result:   89.47 10^6bits/s over 1.001 seconds ending at 1656515807.723
Interim result:   88.54 10^6bits/s over 1.013 seconds ending at 1656515808.736
Interim result:   87.97 10^6bits/s over 1.008 seconds ending at 1656515809.744
Interim result:   89.95 10^6bits/s over 1.003 seconds ending at 1656515810.747

nmba:

Interim result:   13.06 10^6bits/s over 2.077 seconds ending at 1656515800.050
Interim result:   13.27 10^6bits/s over 1.002 seconds ending at 1656515801.053
Interim result:   18.14 10^6bits/s over 1.002 seconds ending at 1656515802.054
Interim result:   16.41 10^6bits/s over 1.100 seconds ending at 1656515803.154
Interim result:   14.09 10^6bits/s over 1.163 seconds ending at 1656515804.317
Interim result:   14.41 10^6bits/s over 1.015 seconds ending at 1656515805.332
Interim result:   13.51 10^6bits/s over 1.066 seconds ending at 1656515806.398
Interim result:   12.35 10^6bits/s over 1.090 seconds ending at 1656515807.488
Interim result:   14.68 10^6bits/s over 1.015 seconds ending at 1656515808.503
Interim result:   19.06 10^6bits/s over 1.013 seconds ending at 1656515809.516
Interim result:   17.03 10^6bits/s over 1.121 seconds ending at 1656515810.637

I did not make a cut and paste error nor did I miss lablel the netperf results.

MCS from r7500v2 AP

r7500v2 # iw dev wlan0 station dump | grep "tx bitrate"
        nmba tx bitrate:     360.0 MBit/s VHT-MCS 8 40MHz short GI VHT-NSS 2
        59 tx bitrate:     180.0 MBit/s MCS 12 40MHz short GI

Both 59 and nmba are about 10 m from the AP, no clear line of site, and about 3 m from each other. There are gaps in the (wood frame and drywall) walls obstructing their line of site.

I started the 59 wifi client netperf first (this use to matter in the past - atm i can start either client first and get reproducible results). A single client netperf to nmba at this location can achieve 300+ mbps. When I start the nmba netperf fisrt, I'll see 200-300+ mbps until I start the 59 netperf at which point it drops to 10-20 mbps.

I'll bet that if I put tbf's on the wifi clients and limit their throughput to about 30 mbps and then repeat the "reversed" flent rtt_fair test, I'll get more meaningful results.

I wish the pre-packaged netperf binaries had the "-w" and "-b" options included by default. It would make using flent for this kind of testing a lot easier than playing with qdisc's.

EDIT 0: I added the 56 wifi client back in and tried simultaneous netperf's. 56 also is in the same location are the original reverse rtt_fair test above (~1 m from AP, clear line of site)

In words: I started the 56 client netperf first, then added nmba and 59. Throughput on the 56 client went from ~165 mbps (before starting nmba and 59) to a complete stop.

MCS 56 client stream on it's own:

r7500v2 # iw dev wlan0 station dump | grep "tx bitrate"
        nmba tx bitrate:     300.0 MBit/s VHT-MCS 7 40MHz short GI VHT-NSS 2
        59 tx bitrate:     180.0 MBit/s MCS 12 40MHz short GI
        56 tx bitrate:     270.0 MBit/s MCS 15 40MHz

MCS all three clients streaming:

r7500v2 # iw dev wlan0 station dump | grep "tx bitrate"
        nmba tx bitrate:     360.0 MBit/s VHT-MCS 8 40MHz short GI VHT-NSS 2
        59 tx bitrate:     150.0 MBit/s MCS 7 40MHz short GI
        56 tx bitrate:     30.0 MBit/s MCS 8 40MHz short GI

So something like this probably did happen during the reverse rtt_fair test. The only way I've been able to avoid it is to limit the total throughput to something the AP can handle (about 100 mbps by my estimation with this configuration).

ty for the suggestion to look at MCS.

EDIT 1: I turned the mac (nmba) off and added a third ubuntu wifi client (call it 135).

3 client (56, 59, & 135) simultaneous netperf:

56: ~30 mbps; tx bitrate:     150.0 MBit/s MCS 7
59: ~60 mbps; tx bitrate:     270.0 MBit/s MCS 14
135: ~70 mbps; tx bitrate:     300.0 MBit/s MCS 15

So ATF works for me? Or ATF works only if sans apple?

I probably will be able to during the weekend. Here I'm using imagebuilder for it.

In my case the difference is not so dramatic, WiFi vs cable is ≈4-6 ms vs ≈0.2-1 ms in normal conditions. Under load if you check my Waveform it increases only by 5-10 ms under normal conditions on WiFi.

@quarky

never mind

I'll leave an edited version of this post up in the event it helps someone else.

For non apple devices, be sure to disable the wifi powersave feature (which I have been doing for my reported results above).

On ubuntu:

sudo iwconfig <wifi_if> power off

Apparently this is not possible on apple devices. After a little googling and reading others experiences with ping, I will not use an apple device for testing. I can't say if what I observed above using a mac result from this or some other feature mac/broadcom have hidden in their software or hardware.

It does look like QCA9980 with ath10k-ct driver firmware does not support wifi power save.

r7500v2 # cat /sys/kernel/debug/ieee80211/phy*/netdev:wlan*/stations/*/peer_ps_state
2
2
...

the output 2 indicates disabled

non-ct ath10k may not support it in the future as well:

Something that @Ansuel is working on I believe. IIRC, it's something to do with L2 and CPU clock not in sync when switching from/to 384Mhz causing corruption (somehow) to the cache.

I set my R7800 to a min of 800MHz CPU clock and let it scale using the schedutil governor. Seems stable for my R7800 so far.

Funny, I've just done exactly what you did before reading your message. I switched to schedutil governor and min_freq = 800 MHz.

Before that, my CPU governor was ondemand and min_freq = 800 MHz, and I've kept getting random annoying crashes at the least expected moments.

In Intel-based machines whose CPU is not capable of supporting the Intel P-state (older CPUs), most Linux distros use the default governor "schedutil".

Intel's own Linux distro "ClearOS" uses the governor "Performance" as the default, so its benchmark numbers always look good. Intel cheater :slight_smile:

1 Like

Hi Felix,

Do you have any plan to backport the changes to 21.02 branch as well? Hopefully the next 21.02.4 will be reliable again in terms of WIFI.

Thanks a lot!

I have not seen the issue occur after the mt76 update. I am currently running

OpenWrt 22.03-SNAPSHOT r19482-2b8021d614 / LuCI openwrt-22.03 branch git-22.167.28394-8a4486a

I also did not see this issue with

OpenWrt 22.03-SNAPSHOT r19455-f608779f92 / LuCI openwrt-22.03 branch git-22.167.28394-8a4486a

and

OpenWrt SNAPSHOT r19873-a703f9ed0b / LuCI Master git-22.167.28356-8effea5

However one time (not sure which OpenWrt version I was running at that time), I thought the issue occurred again but it turned out to be due to something else.

Websites couldn't load while I was still connected to WiFi in my PIxel 6, but I was able to log into my RT3200 router from my phone (which means I was not disconnected from WiFi) using the local IPv4 address.

It turned out to be some issue with "Private DNS" (DNS over TLS) feature in my Pixel 6 (Android 12, Build SQ3A.220605.009.B1). I have disabled "Private DNS" in my phone for now and there have been no issues since then.

My /etc/config/wireless is still same as at 802.11r Fast Transition how to understand that FT works? - #105 by ka2107 (I only changed option he_bss_color from '128' to '8' when moving from MASTER-SNAPSHOT to 22.03-SNAPSHOT).

Hi Felix @nbd, thank you for your patches. It has been working very well for me on Belkin RT3200.

One question though, do these changes affect ath9k in any way? I have a TP-Link Archer A7 v5 in a remote location (in another country), currently running 22.03.0-rc4. Will it improve WiFi latency on the 2.4 GHz band (HT20, 802.11n only) if I flash the latest 22.03-SNAPSHOT on the Archer A7?

I would be comforted if everyone could re-demonstrate a rrul_be result like this, over 300 seconds, on all the wifi chipsets openwrt supports, in whatever the final patchset looks like.

Nuke it from orbit. It's the only way to be sure.

1 Like

Hi Dave @dtaht, while I would love to run the RRUL test and provide you the results from different clients and APs, unfortunately Flent only runs on Linux. I have a Arch Linux installation running on a laptop with Intel AC 9560 WiFi card. I am able to run Flent on it. I will try to run the test on it provide the results to you. However most of the time my laptop is wired over Ethernet (Intel I219-V, or Realtek based USB) to my RT3200 (ISP: Comcast Xfinity, DOCSIS 3.1, Arris S33 Modem, 50/10 Mbps) and I almost never connect WiFI on it. However this will still be a result from only a single client and single AP/Router.

I tried setting up Flent on a 2017 Macbook Air (Router/AP: Belkin RT3200, ISP: AirTel, Country: India, GPON Fiber 40/40 Mbps, PPPoE) running macOS 12.4 Monterey but I was not able to set it up due to python dependencies issue. It may be my own lack of understanding since I am not familiar with macOS. Flent also does not run on Windows.

For this remote location, I can provide Waveform results and maybe newer speedtest.net results with loaded latency. But those tests would be over VNC which I am not sure how it will affect the results. It would be nice to be able to run Flent on those systems though.

I also have a Netgear R7800 (ISP: AT&T Fiber; Symmetric Gigabit; AT&T 5268ac in IP Passthrough mode) and TP-Link Archer A7 v5 (ISP: BSNL, Country: India, EPON Fiber 60/60 Mbps, PPPoE). However these 2 devices I can only control from WAN side and I do not have any client devices I control on which I can VNC into and run any WiFi tests.

Yeah, nah, not gonna happen to shiny and useful. I will redo with a Linux box.

Entropy level in master on 5.15.45 not actualize... Is low. Maybe have it this can be the reason for poor wlan and transfers performance (especially from a distance) and jumping pings and log errors - still popping up the same despite applying all recent fixes. It probably has to do with the problem on this topic as well.
At stable on 19.07 I had> 3500 points (variables), now I have a constant low of 256 and despite turning on rng-tools or haveged this level does not change at all.

root@OpenWrt:~# /etc/init.d/haveged restart
root@OpenWrt:~# cat /proc/sys/kernel/random/entropy_avail
256
root@OpenWrt:~# /etc/init.d/haveged status
running
root@OpenWrt:~# cat /proc/sys/kernel/random/entropy_avail
256
root@OpenWrt:~#

Replacing board-2.bin files with different calibrations does not help much - sometimes they jump with better results, but usually it is very poor (with speed from a distance, but also pings can jump like a monkey on a tree).

EDIT: Ok, Changes valuse entrophy are 'new feature' and not bugs. But does it not have any impact on other system evlemets, such as transfers or pings?

EDIT2:
In logs I see a little errors:

Fri Jul  1 11:51:29 2022 daemon.err haveged[6070]: haveged: command socket is listening at fd 3
Fri Jul  1 11:51:29 2022 daemon.info haveged[6070]: haveged starting up
Fri Jul  1 11:51:30 2022 daemon.err haveged[6070]: haveged: ver: 1.9.18; arch: generic; vend: ; build: (gcc 11.3.0 CV); collect: 128K
Fri Jul  1 11:51:30 2022 daemon.err haveged[6070]: haveged: cpu: (); data: 32K (P); inst: 32K (P); idx: 19/40; sz: 32744/67304
Fri Jul  1 11:51:30 2022 daemon.err haveged[6070]: haveged: fills: 0, generated: 0
Fri Jul  1 11:51:30 2022 daemon.err haveged[6070]: haveged: Stopping due to signal 15
Fri Jul  1 11:51:30 2022 daemon.err haveged[6070]:
Fri Jul  1 11:51:30 2022 daemon.err haveged[6070]: fills: 1, generated: 512 K bytes, RNDADDENTROPY: 256

I am sorry that getting flent up and running on OSX has become so darn hard. Neither @tohojo or I have ready access to an OSX box. Could you file a bug here: https://github.com/tohojo/flent/issues with the errors you get on trying to get it built?

1 Like

I tend to be concerned about a lack of entropy also. How is performance without encryption?

1 Like

Does WPA3 vs WPA2 play a role in having enough entropy available to sustain a high demand on encryption? I’m not an expert on these matters but I can imagine that lack of entropy (throughput) might have an impact on the amount of packets that can be transmitted depending on the WPA version?

In master on 5.15.45...

root@OpenWrt:~# cat /proc/sys/kernel/random/entropy_avail
256
root@OpenWrt:~# rngtest -c 1000 </dev/random
rngtest 6.15
Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions.  There is NO warra    nty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

rngtest: starting FIPS tests...
rngtest: bits received from input: 20000032
rngtest: FIPS 140-2 successes: 999
rngtest: FIPS 140-2 failures: 1
rngtest: FIPS 140-2(2001-10-10) Monobit: 1
rngtest: FIPS 140-2(2001-10-10) Poker: 0
rngtest: FIPS 140-2(2001-10-10) Runs: 0
rngtest: FIPS 140-2(2001-10-10) Long run: 0
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=97.813; avg=220.686; max=224.394)Mibits/s
rngtest: FIPS tests speed: (min=25.671; avg=35.239; max=35.785)Mibits/s
rngtest: Program run time: 630506 microseconds
root@OpenWrt:~# cat /proc/sys/kernel/random/entropy_avail
256
root@OpenWrt:~#

In master on 5.10.111...

root@OpenWrt:~# cat /proc/sys/kernel/random/entropy_avail
3666
root@OpenWrt:~# rngtest -c 1000 </dev/random
rngtest 6.15
Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

rngtest: starting FIPS tests...
rngtest: bits received from input: 20000032
rngtest: FIPS 140-2 successes: 1000
rngtest: FIPS 140-2 failures: 0
rngtest: FIPS 140-2(2001-10-10) Monobit: 0
rngtest: FIPS 140-2(2001-10-10) Poker: 0
rngtest: FIPS 140-2(2001-10-10) Runs: 0
rngtest: FIPS 140-2(2001-10-10) Long run: 0
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=35.321; avg=192.159; max=214.309)Mibits/s
rngtest: FIPS tests speed: (min=7.322; avg=32.132; max=35.718)Mibits/s
rngtest: Program run time: 695997 microseconds
root@OpenWrt:~# cat /proc/sys/kernel/random/entropy_avail
3666
root@OpenWrt:~#

On stable 5.4, I no test was but entropy are 3700-3900...

this looks helpful elsewhere and in the longer run. https://bristot.me/operating-system-noise-in-the-linux-kernel/

2 Likes