R7800 performance

Not ddwrt, kong openwrt trunk with kernel 5.4. Only thing I changed is the option Packet Steering enabled. H 100 + 160Mhz + US, everything else is pretty much on default.

So... I redid tests with packet steering and irqbalance

With 19.07.3 : activating irqbalance bandwidth drops from 550 mbits down / 350-400 mbits down to 400 mbits down / 450-500 mbits up

With master snapshot: I can't get channel working at 160MHz anymore, rate drop to 360 mbits. Consequently my bandwidth is about 300 mbits down / 300 mbits up without irqbalance and 600 mbits down / 450-500 mbits up with irqbalance.

I've tried several master builds (kong, hnyman) and they all give the same numbers, I can't get the same phy rate as 19.07.3

Edit: forgot to mention that with 19.07.3 the limiting factor was the cpu usage, I had one cpu at 100% and the other one at 60% without irqbalance, with irqbalance only one cpu was 100% loaded, the other one nothing.
With master, both cpu are around 70-80%, the limiting factor seems to be phy rate.

Are you testing with the ath10k or ath10k-ct driver/firmware? If you're using ath10k-ct, would you mind downloading the latest ath10k-ct beta firmware and retrying your tests with a snapshot build?

To do so:

wget https://www.candelatech.com/downloads/ath10k-9984-10-4b/ath10k-fw-beta/firmware-5-ct-htt-mgt-community.bin
[ -f /lib/firmware/ath10k/QCA9984/hw1.0/firmware-5.bin ] && mv firmware-5-ct-htt-mgt-community.bin /lib/firmware/ath10k/QCA9984/hw1.0/firmware-5.bin
[ -f /lib/firmware/ath10k/QCA9984/hw1.0/ct-firmware-5.bin ] && mv firmware-5-ct-htt-mgt-community.bin /lib/firmware/ath10k/QCA9984/hw1.0/ct-firmware-5.bin

Would love to see if it performs better or worse for you.

1 Like

I was using the default one with snapshot builds, which is the latest CT driver, non htt-mgt. I've just tried the latest beta, with or without htt-mgt, it does not change the numbers regarding transfer rates.

Thanks for reporting back. I run my own extremely stripped down snapshot builds on my R7800 as its use in my setup is just a "dumb" AP. However, I have been using the ath10k-ct-smallbuffers driver with ath10k-ct htt-mgt firmware. I was seeing in the neighborhood of 50-100mbps improved throughput in both directions with the beta firmware on iperf3 tests.

Not exactly relevant, but as I don't have a r7800 to test with....

fyi -

Wonder if our bin are shedutil awere

Could use performance locked at max all the time and monitor the temps for a bit, if not running very hot, good enough. People ask about power consumption but in the grand scheme of things in a house it doesn't really matter.

Did you change /sys/devices/system/cpu/cpufreq/schedutil/rate_limit_us from the 10000 default? Android phones seem to set it to 0.

how to tell?

no it stays at 10000

I don't think schedutil works. Do you see the frequency change at all? Mine has been flat at 800MHz for the past couple hours (that's what I set the min freq to). With ondemand it goes up once in a while. I can make it spike briefly when I log in to the UI or ssh in.

Setting the min frequency to 800 MHz gives the best performance at minimal CPU freq with schedutil when speedtest-netperf.sh -H netperf-west.bufferbloat.net -p 1.1.1.1 --sequential -t 20 and measuring downstream throughput with SQM enabled on the router.

I tried varying the min frequency then running the speed test. Here is my result:

Freq MHZ    Downstream Mbps
384     184
384     181
600     191
600     189
800     200
800     197
1000    201
1000    198
1400    199
1400    200
1725    200
1725    201

You can see that 800 MHz seems in the sweetspot of maximal throughput at minimal lower boundary for CPU frequency. I am away from my laptop now or I would show a nice little plot of these two.

As to the discussion around power consumption, I used a Kill-a-Watt P3 to measure it on my R7800 at various CPU speeds and with various governors. I found that the CPU frequency did not affect power consumption at least that could be detected with my Kill-a-Watt. Note that it does not report mW, just W without a decimal (ie 6 or 7 or 25 etc.).

Idle power consumption was 6 W irrespective of frequency (I tried each one from 384 MHz up to 1725 MHz). When the CPU was loaded with the SQM script with a wired client pulling all of my downstream bandwidth bounced between 6 and 7 W.

1 Like

maybe try setting rate_limit_us to 0

yep. 800000

You can reset stats by writing to:
echo 1 > /sys/devices/system/cpu/cpufreq/policy0/stats/reset
echo 1 > /sys/devices/system/cpu/cpufreq/policy1/stats/reset

Let it run for a while then you can look at:
cat /sys/devices/system/cpu/cpufreq/policy*/stats/time_in_state

1 Like

I'm more worried about temps and cooking the router if it runs very hot than what it costs to power it. Mine seems to stay under 60C in the summer so it's OK, even if it goes up another 10C. I have an Asus router that ran at >80C for a few years and it's still functional.

It looks like it does move between frequencies:

# cat /sys/devices/system/cpu/cpufreq/policy*/stats/time_in_state
384000 0
600000 0
800000 97069
1000000 100
1400000 61
1725000 82
384000 0
600000 0
800000 97064
1000000 90
1400000 54
1725000 66

I guess it doesn't stay in the higher frequencies long enough for collectd to catch it, I normally look at the graphs in the UI.

Update: It still doesn't look right to me, after a few hours, there's very little time spent above 800MHz. Either this governor is very good vs ondemand and doesn't think it needs to scale up or it doesn't work well. Given how tricky it is to get these governors working, I suspect there may be work required to make schedutil functional.

@wired - I think you want a measure that is more functional in nature rather than time spent at a particular frequency. For example, what is the most CPU intensive task you do with your hardware that can also be directly quantitated?

For me it's just running a speed test with SQM enabled. My connection is 225 Mbps down and that is enough to nearly saturate the CPU:

As I showed above, running the speed test a few times gives a decent average and shows a saturable effect with respect to the measured value. Above, I compared different lower thresholds with schedulit, but I also compared schedutil to ondemand to performance and found them to be really close perhaps within experimental error:

schedutil     197
schedutil     196
schedutil     200
ondemand      200
ondemand      200
ondemand      200
performance   195
performance   199
performance   198

I am not a fan of synthetic benchmarks just because they tend not to represent any real world workload, but perhaps you could select one and use it as a more sensitive measure of how effective the different governors function to scale the CPU.

1 Like

ended up revert back to ondemand as it's a bit more stable for me. schedutil is decent, has potential.