Netgear R7800 exploration (IPQ8065, QCA9984)

I apologise in advance for my quesiton that could have been answered before, but how exactly do you incorporate this new firmware in your custom build?

just modify the makefile... for r7800 it's just 2 line

If you're unsure, you can always replace /lib/firmware/ath10k/QCA9984/hw1.0/firmware-5.bin at runtime (e.g. via scp - and reboot afterwards).

Thank you, but wifi stopped working after update. Together with firmware-5.bin I also updated https://github.com/kvalo/ath10k-firmware/blob/master/QCA9984/hw1.0/board-2.bin – perhaps, that is why?

You usually don't need to change board-2.bin (although doing so shouldn't break anything), but you must make sure to download the files in their binary representation (raw), which might not be very obvious in most git webinterfaces (alternatively you can clone the git repo completely and copy the files out).

Thank you very much for your help, slh. I used:

wget -q https://github.com/kvalo/ath10k-firmware/blob/master/QCA9984/hw1.0/3.9.0.2/firmware-5.bin_10.4-3.9.0.2-00018 -O /lib/firmware/ath10k/QCA9984/hw1.0/firmware-5.bin

Maybe that's not the right way to do it?

Indeed, if you look into your downloaded file, you'll notice that you actually download a HTML file, rather than the raw firmware.

https://github.com/kvalo/ath10k-firmware/raw/master/QCA9984/hw1.0/3.9.0.2/firmware-5.bin_10.4-3.9.0.2-00018
(also make sure to restore your board-2.bin)

slh: thanks a lot again! I am feeling dumb. :blush: May you have a great day!

Regarding the firmware, they don't provide a changelog somewhere?

Yes. Look here.

stock

root@OpenWrt:/_hostsidescripts# /usr/bin/openssl speed md5 sha1 sha256 sha512 des des-ede3 aes-128-cbc aes-192-cbc aes-256-cbc rsa2
048 dsa2048 | tee /tmp/sslspeed-host2
Doing md5 for 3s on 16 size blocks: 1812280 md5's in 3.00ss
Doing md5 for 3s on 8192 size blocks: 67460 md5's in 3.00s
Doing sha1 for 3s on 16 size blocks: 1907420 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 95161 sha1's in 3.00s
Doing sha256 for 3s on 16 size blocks: 3877410 sha256's in 3.00s
Doing sha256 for 3s on 8192 size blocks: 52507 sha256's in 3.00s
Doing sha512 for 3s on 16 size blocks: 1488761 sha512's in 3.00s
Doing sha512 for 3s on 8192 size blocks: 27853 sha512's in 3.00s
Doing des cbc for 3s on 16 size blocks: 4007564 des cbc's in 3.00s
Doing des cbc for 3s on 8192 size blocks: 8249 des cbc's in 3.00s
Doing des ede3 for 3s on 16 size blocks: 1634475 des ede3's in 3.00s

deb-ch'd

root@OpenWrt:/_hostsidescripts# /armhf-debchrootA20/usr/bin/openssl speed md5 sha1 sha256 sha512 des des-ede3 aes-128-cbc aes-192-c
bc aes-256-cbc rsa2048 dsa2048 | tee /tmp/sslspeed
Doing md5 for 3s on 16 size blocks: 5873353 md5's in 3.00s
Doing md5 for 3s on 8192 size blocks: 77649 md5's in 3.00s
Doing sha1 for 3s on 16 size blocks: 5928682 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 94366 sha1's in 3.00s
Doing sha256 for 3s on 16 size blocks: 4090449 sha256's in 3.00s
Doing sha256 for 3s on 8192 size blocks: 51030 sha256's in 3.00s
Doing sha512 for 3s on 16 size blocks: 1504988 sha512's in 3.00s
Doing sha512 for 3s on 8192 size blocks: 27866 sha512's in 3.00s
Doing des cbc for 3s on 16 size blocks: 5571570 des cbc's in 3.00s
Doing des cbc for 3s on 8192 size blocks: 11881 des cbc's in 3.00s
Doing des ede3 for 3s on 16 size blocks: 2136144 des ede3's in 

debian of my existence :wink:

Nice, this is speeds from latest master

Doing md5 for 3s on 16 size blocks: 1771947 md5's in 2.97s
Doing md5 for 3s on 64 size blocks: 1586763 md5's in 3.00s
Doing sha1 for 3s on 16 size blocks: 1900937 sha1's in 2.97s
Doing sha1 for 3s on 64 size blocks: 1859566 sha1's in 2.94s
Doing sha1 for 3s on 256 size blocks: 1295828 sha1's in 2.98s
Doing sha256 for 3s on 16 size blocks: 3906005 sha256's in 3.00s
Doing sha256 for 3s on 64 size blocks: 2501577 sha256's in 2.99s
Doing sha512 for 3s on 16 size blocks: 1459206 sha512's in 2.97s

etc... so slightly faster but could be my cpufreq settings as well.

cannesahs findings are interesting, but before anyone starts using his settings please note they can have negative effects.

The ondemand scheduler (default on OpenWRT) does appear to be too conservative for the R7800. However, if you use 'performance' then the CPU is statically set to the maximum frequency. While there may not be significant power usage at maximum frequency I am not sure any of us can guarantee the thermal dissipation of the unit can keep the CPU cool enough at max speed. Instead I would recommend setting the 'ondemand' scheduler 'up_threshold' lower than the default of 95 down to 40. I've load tested at 40 with good results.

802.11ac - 80 Mhz benchmark:
ondemand default at 95% - 400 Mbit up/down
ondemand at 40% - 500 Mbit up/down
performance - 500 Mbit up/down

Regarding his changing of interrupt coalesce settings (tx-usecs/rx-usecs) we need to be careful and benchmark these changes not only for speed but for CPU usage. Changing them to '0' may lead to high CPU usage. I haven't tested these new settings yet and I plan to, soon.

1 Like

Temperature in "performance" mode without load - same as in "ondemand", may be+1-+deg.C

All IPQ806x routers in OpenWRT 18.06.x has problem with "ondemand" , because kernel incorrectly manage frequences of IPQ806x CPU/cache/memory - see messages (and my messages too) about this.
In "performance" mode we can resolve this problem by hard startup settings for CPU, in "ondemand" can't (must be fxed in kernel).
Performance penalty (same CPU frequency, but different cache/memory freq.) - from 0-1% to 40% on memory-intensive tasks (see my tests).

Hmm.
Why high load of CPU ?
Wait for your tests.

Why can't someone provide a patch for it if the root cause is known?

I performed benchmarks using Wi-fi only and only through the switch ports (eth1) from a ethernet host to a wifi client.

Default coalesce: ~32% CPU load
"0" tx-usec: ~35% CPU load
+"31" rx-usec: ~35% CPU load

There seems to be a off-by-one bug in either ethtool or the kernel regarding rx-usec. You set "32" but in reality it sets 31 (reading with ethtool -c).

I also checked bandwidth and latency on wifi and neither improved or worsened.

My caution on changing these settings is that they change the interrupt polling of the ports. More polling = higher CPU usage. In any case there doesn't seem to be any problem changing these settings, but they don't provide any benefit so I don't see a need to change the default.

1 Like

ath10k-firmware: update Candela Tech firmware images
there is a new ath10k-ct firmware

1 Like

Double or nothing :wink:
( just a tad of overhead )

# time openssl speed -multi 2 md5 sha1 sha256 sha512 des des-ede3 aes-128-cbc aes-192-cbc aes-256-cbc rsa2048 dsa2048

Forked child 0
Forked child 1

+DT:md5:3:64
+R:4718262:md5:3.000000
+DT:md5:3:64
+R:3371684:md5:3.000000
+DT:md5:3:256
+R:3357017:md5:3.000000
+DT:md5:3:256
[.....]
Got: +H:16:64:256:1024:8192:16384 from 0
Got: +F:3:md5:25164064.00:71616362.67:124668160.00:166843050.67:172927658.67:170775893.33 from 0
Got: +F2:2:2048:123.676324:5755.844156 from 0
Got: +F3:2:2048:441.000000:556.400000 from 0
+R4:5792:2048:10.00
Got: +H:16:64:256:1024:8192:16384 from 1
Got: +F:3:md5:26089482.67:71929258.67:124210005.33:168057514.67:179314688.00:174549674.67 from 1
Got: +F2:2:2048:122.700000:5782.500000 from 1
Got: +F3:2:2048:472.500000:579.200000 from 1

options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr) 
compiler: arm-openwrt-linux-muslgnueabi-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -mcpu=cortex-a15 -mfpu=neon-vfpv4 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -ffunction-sections -fdata-sections -marm -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_PREFER_CHACHA_OVER_GCM -DOPENSSL_SMALL_FOOTPRINT

Hey folks! New to the crew, but not to this chipset. github.com/robcore/machinex has my (currently broken and long abandoned) kernel project for the galaxy s4's ipq8064 variant. Same krait cpu with 4 cores and way more scaling frequencies than this model. That said, i know this sucker inside and out and am pretty decent in C.

What I'm curious about is potentially reducing the netgear partition in a stock build and giving that space to rootfs. So far... No luck. The changes I've attempted in stock's nand_partitions.c don't seem to be accepted by u-boot. Does anyone have an idea as to what I can do to trick it? I have a serial connection setup thanks to hnyman's perfect guide, so I can see what's happening in the logs. I don't seem to be making any headway though. Any ideas?

Oh also why do stock images load at address 40908000 and 41508000 while openwrt loads at 42008000?

That has already been done for 18.06.0 and later, via DTS changes (hardcoded partition locations/ sizes, not using qcom-smem for the r7800 - regard to the nbg6817 for an example using qcom-smem instead), see Netgear R7800 exploration (IPQ8065, QCA9984) for details.

There is still plenty of margin for further optimization (in particular around the cpufreq code), but it's a solid arch and device.

Edit: OpenWrt is based on the upstream kernel (currently 4.14-lts (4.14.104), but 4.19-lts (4.19.26) is under development), diverging as little from mainline as reasonably possible (and ideally with an expectation to get the necessary pieces mainline as well), so there are probably quite some differences relative to the 3.4.113 based android kernel used for Qualcomm's APQ8064AB SOC.

1 Like

Thanks mate!
I should have mentioned that I'm more than familiar with owrt than I let on. The 8064ab kernel is surprisingly similar to the 8065, but phones have different design needs and, unfortunately, bootloaders.

I actually answered my own question over the weekend by browsing oem uboot source. In the case of stock firmware, the loading addresses I cited tell uboot that the image is stock netgear and to use the legacy/netgear specific setup. The dt addresses don't have to account for weird old offsets and uboot is designed to recognize the addresss, header, and dt instructions and skips the legacy functions if an upstream image is detected.

Thank you, though! Eventually I'd love to trick this system to give a little more space to rootfs, but for now I suppose I'm stuck with alternative measures like extroot.