Netgear R7800 exploration (IPQ8065, QCA9984)

thanks.

I monitoring CPU temp. with performance and ondemand governors (with optimized settings), and without hi-load CPU temperature is practically equal. (LinkSys EA8500, 1.4Ghz)

I make few tests with different frequencies of CPU / L2 cache / mem,
for small quantities of data difference is 0 or small , but in other tests I received
dramatic difference.

(LinkSys EA8500, IPQ8064 1.4Ghz)
For example, compression with XZ (lzma2 method, 8Mb dict., 6 level of compression , 95Mb memory used) :

performance governor ,
use RAmDisk in /tmp

tar -cf - /usr/ > /tmp/test.tar
ls -la /tmp/test.tar
-rw-r--r-- 1 root root 18684928 Jan 13 23:31 /tmp/test.tar

top
PID PPID USER STAT VSZ %VSZ %CPU COMMAND
12928 12927 root R 96220 20% 50% xz -6 -c test.tar
ps | grep xz
12927 root 1064 S time xz -6 -c test.tar
12928 root 96220 R xz -6 -c test.tar

test :
time xz -6 -c test.tar > test.tar.xz

echo 384000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 384000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
384000
384000

real 2m 20.71s
real 2m 21.17s
it's CPU 384 Mhz , frequencies of / L2 cache / mem always the same (low)

echo 600000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 600000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
600000
600000

real 1m 44.50s (user 1m 43.84s sys 0m 0.58s)
real 1m 44.47s
it's CPU 600 Mhz , frequencies of / L2 cache / mem always the same

echo 800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 800000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
800000
800000

real 1m 27.01s
real 1m 26.96s
it's CPU 800 Mhz , frequencies of / L2 cache / mem always the same

echo 1000000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1000000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
1000000
1000000

real 0m 59.55s
real 0m 59.53s
it's CPU 1000 Mhz with high (for 1000 mhz) frequencies of / L2 cache / mem

echo 1200000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1200000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
1200000
1200000

real 1m 8.98s
real 1m 9.02s
it's CPU 1200 Mhz with low (for 1200 mhz) frequencies of / L2 cache / mem

echo 1400000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1400000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
1400000
1400000

real 1m 3.71s
real 1m 3.59s
it's 1400 Mhz with low (for 1400 mhz) frequencies of / L2 cache / mem

echo 800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 800000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
echo 1400000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1400000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
1400000
1400000

real 0m 45.12s
real 0m 45.19s
it's CPU 1400 Mhz with high (for 1400 mhz) frequencies of / L2 cache / mem

echo 800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 800000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
echo 1000000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1000000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
1000000
1000000

real 0m 59.56s
it's CPU 1000 Mhz with high (for 1000 mhz) frequencies of / L2 cache / mem

echo 1200000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1200000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
echo 1000000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1000000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
1000000
1000000

real 1m 16.21s
it's CPU 1000 Mhz with low frequencies of / L2 cache / mem

(I haven't had time to look openwrt/r7800 stuff in almost month.)

It moves & forces network interfaces' receiving and transmit queues to 2nd cpu core in default config (iirc). And is also reason why commonly used irq balancing attempts done with R7800 aren't as sufficient as should be as those attempts do not move all cpu work caused certain irq.

I also took another look to the script and seem there is config option to disable it. Might be better use that than remove / edit the script itself :slight_smile:

Interrupt first is handled by 1st core, but doing part of computing the interrupt caused is forced done by 2nd core.
Example: First interrupt waits 1st core to wake up and then fully processing (received network packet) also needs to wake up also 2nd core. In continuous heavy traffic this doesn't matter as cores are active already, but for eg. in web browsing this adds latency. (Well, there might be also some cache locality issues adding latency also in continuous traffic). Disabling script allows kernel to use both cores for queues (iirc) and it seemt to stay in already active core. I did not do thorought testing or test heavy loads.

So, if we run irqbalance, should we disable the script or not?

So, the is the option (/etc/config/network) you are talking about and it indeed disables the script. I will give it a shot.

config globals 'globals'
option default_ps '0'

see nbg6817-openwrt-rebooting-constantly
and FS#2026

Thanks Paul. I disabled flow-offloading as suggested by @slh. Whether or not this has anything to do with it I can't say, but my uptime record is now 6 days. I'm running hnymans master build in the non -ct variant.

If you can help me out with the most appropriate way to capture logs, I could append them to your bug-report.

Br,
Martin

FYI, new official fw again, 5 days old.

I apologise in advance for my quesiton that could have been answered before, but how exactly do you incorporate this new firmware in your custom build?

just modify the makefile... for r7800 it's just 2 line

If you're unsure, you can always replace /lib/firmware/ath10k/QCA9984/hw1.0/firmware-5.bin at runtime (e.g. via scp - and reboot afterwards).

Thank you, but wifi stopped working after update. Together with firmware-5.bin I also updated https://github.com/kvalo/ath10k-firmware/blob/master/QCA9984/hw1.0/board-2.bin – perhaps, that is why?

You usually don't need to change board-2.bin (although doing so shouldn't break anything), but you must make sure to download the files in their binary representation (raw), which might not be very obvious in most git webinterfaces (alternatively you can clone the git repo completely and copy the files out).

Thank you very much for your help, slh. I used:

wget -q https://github.com/kvalo/ath10k-firmware/blob/master/QCA9984/hw1.0/3.9.0.2/firmware-5.bin_10.4-3.9.0.2-00018 -O /lib/firmware/ath10k/QCA9984/hw1.0/firmware-5.bin

Maybe that's not the right way to do it?

Indeed, if you look into your downloaded file, you'll notice that you actually download a HTML file, rather than the raw firmware.

https://github.com/kvalo/ath10k-firmware/raw/master/QCA9984/hw1.0/3.9.0.2/firmware-5.bin_10.4-3.9.0.2-00018
(also make sure to restore your board-2.bin)

slh: thanks a lot again! I am feeling dumb. :blush: May you have a great day!

Regarding the firmware, they don't provide a changelog somewhere?

Yes. Look here.

stock

root@OpenWrt:/_hostsidescripts# /usr/bin/openssl speed md5 sha1 sha256 sha512 des des-ede3 aes-128-cbc aes-192-cbc aes-256-cbc rsa2
048 dsa2048 | tee /tmp/sslspeed-host2
Doing md5 for 3s on 16 size blocks: 1812280 md5's in 3.00ss
Doing md5 for 3s on 8192 size blocks: 67460 md5's in 3.00s
Doing sha1 for 3s on 16 size blocks: 1907420 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 95161 sha1's in 3.00s
Doing sha256 for 3s on 16 size blocks: 3877410 sha256's in 3.00s
Doing sha256 for 3s on 8192 size blocks: 52507 sha256's in 3.00s
Doing sha512 for 3s on 16 size blocks: 1488761 sha512's in 3.00s
Doing sha512 for 3s on 8192 size blocks: 27853 sha512's in 3.00s
Doing des cbc for 3s on 16 size blocks: 4007564 des cbc's in 3.00s
Doing des cbc for 3s on 8192 size blocks: 8249 des cbc's in 3.00s
Doing des ede3 for 3s on 16 size blocks: 1634475 des ede3's in 3.00s

deb-ch'd

root@OpenWrt:/_hostsidescripts# /armhf-debchrootA20/usr/bin/openssl speed md5 sha1 sha256 sha512 des des-ede3 aes-128-cbc aes-192-c
bc aes-256-cbc rsa2048 dsa2048 | tee /tmp/sslspeed
Doing md5 for 3s on 16 size blocks: 5873353 md5's in 3.00s
Doing md5 for 3s on 8192 size blocks: 77649 md5's in 3.00s
Doing sha1 for 3s on 16 size blocks: 5928682 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 94366 sha1's in 3.00s
Doing sha256 for 3s on 16 size blocks: 4090449 sha256's in 3.00s
Doing sha256 for 3s on 8192 size blocks: 51030 sha256's in 3.00s
Doing sha512 for 3s on 16 size blocks: 1504988 sha512's in 3.00s
Doing sha512 for 3s on 8192 size blocks: 27866 sha512's in 3.00s
Doing des cbc for 3s on 16 size blocks: 5571570 des cbc's in 3.00s
Doing des cbc for 3s on 8192 size blocks: 11881 des cbc's in 3.00s
Doing des ede3 for 3s on 16 size blocks: 2136144 des ede3's in 

debian of my existence :wink:

Nice, this is speeds from latest master

Doing md5 for 3s on 16 size blocks: 1771947 md5's in 2.97s
Doing md5 for 3s on 64 size blocks: 1586763 md5's in 3.00s
Doing sha1 for 3s on 16 size blocks: 1900937 sha1's in 2.97s
Doing sha1 for 3s on 64 size blocks: 1859566 sha1's in 2.94s
Doing sha1 for 3s on 256 size blocks: 1295828 sha1's in 2.98s
Doing sha256 for 3s on 16 size blocks: 3906005 sha256's in 3.00s
Doing sha256 for 3s on 64 size blocks: 2501577 sha256's in 2.99s
Doing sha512 for 3s on 16 size blocks: 1459206 sha512's in 2.97s

etc... so slightly faster but could be my cpufreq settings as well.

cannesahs findings are interesting, but before anyone starts using his settings please note they can have negative effects.

The ondemand scheduler (default on OpenWRT) does appear to be too conservative for the R7800. However, if you use 'performance' then the CPU is statically set to the maximum frequency. While there may not be significant power usage at maximum frequency I am not sure any of us can guarantee the thermal dissipation of the unit can keep the CPU cool enough at max speed. Instead I would recommend setting the 'ondemand' scheduler 'up_threshold' lower than the default of 95 down to 40. I've load tested at 40 with good results.

802.11ac - 80 Mhz benchmark:
ondemand default at 95% - 400 Mbit up/down
ondemand at 40% - 500 Mbit up/down
performance - 500 Mbit up/down

Regarding his changing of interrupt coalesce settings (tx-usecs/rx-usecs) we need to be careful and benchmark these changes not only for speed but for CPU usage. Changing them to '0' may lead to high CPU usage. I haven't tested these new settings yet and I plan to, soon.

1 Like