[How-to guide] Installing OpenWrt on Qnap QHora-322

The board has 3! inside-secure crypto chips...
(I think one in the main processor and two more in the CPs, in any case, it showed several)
But there's no reason to use them, I've never seen them be faster than the ARMv8 Crypto Extensions...

Quite the contrary, all the tests I've done and seen so far have been significantly slower...

Edit: I posted my results with ARMv8 CE there, not fast enough?

Device “PUZZLE-M902”

2 Likes

It’s more about offloading the CPUs so they can do sth else instead.

Most crypto offloads I have seen are significantly slower than CPUs with crypto extensions (x86 aesni or arm crypto)

I got the device today. Upgraded to 25.12 following the great guide of @JuliusZet .

Managed to reach 9.4Gbps with mtu=1500 talking to another aqr113 (or is it aqc113).

My other rtl8127 nic did not perform as well for some reason.

Increasing to mtu=9000 did not improve the above results — not that I mind – but it reduced the cpu load slightly.

I will now investigate the pppoe performance and crypto offload.

The question is how you tested it.

iperf3 between the router and PC? Well, then the router only has to send or receive.(If you're not testing with --bidir)

If you want to route 10Gbit via the router, then two ports are used on the router, one receives and one sends, which is a bit more computationally intensive.

1 Like

Thank you @loaNga0m , I only measured

router < - > client iperf3, i should have measured some forwarding/switching traffic.

Does anybody know why we are booting the kernel with cpu idling disabled? (cpuidle.off=1) i think this disables the cpu c-states. Is it because of a hardware bug in this board or is it just for lowering (network) latencies?

No idea.

But experience has shown that this does not offer any major advantages for ARM CPUs.
The energy savings is only a few mW, and there are latency issues and packet loss when the CPU is clocked down significantly.
Some CPUs have bugs and cannot switch from the lowest to the highest frequency.
Sometimes cores freeze and no longer switch frequencies.
Some routers no longer boot reliably, etc.

In short, I don't use it on routers that support it because it only causes problems anyway.
And the power savings can hardly be measured with normal multimeters.

I think you are referring to cpufreq aka dvfs; I was referring to C-states aka cpuidle. But I think you are right about the power consumption (in these a72 arm cores at least).

I was looking at the System Block Diagram https://openwrt.org/_media/media/iei/puzzle-m902_block_diagram.png?cache= and it seems that one 10gig port is connected directly to cpu whereas the other two 10gig go through the “chipsets”.

Question: Would it make sense to have my WAN 10gig port be the cpu-direct port? If yes, how would I even find out which of the 3 is the cpu-direct one?

The main processor consists of a multi-chip module.

It comprises an AP807 application processor and a CP115 southbridge.
Each of the co-processors also contains a CP115 southbridge.

So, 1x AP807 + 3 CP115

The PHYs are connected to the southbridges.
1x AQR113 + 2x AQR112R per southbridge.

I don't think it makes any difference which ports you use.

1 Like

@loaNga0m Actually, I checked the device and it turns out that there by default is cpufreq with ondemand governor as default. On idle, the cpus’ are sitting at 550Mhz. If somebody wants to have them run at max frequency do:

echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor 
echo performance > /sys/devices/system/cpu/cpufreq/policy2/scaling_governor 

or maybe disable CONFIG_ARM_ARMADA_8K_CPUFREQ at kernel compile time https://cateee.net/lkddb/web-lkddb/ARM_ARMADA_8K_CPUFREQ.html

1 Like

Yes, thank you, I am already aware of that.

I am building my own openwrt kernel, and under the default menuconfig section ‘Accelerated Cryptographic Algorithms for CPU (arm64)’ , I see that there is nothing selected from ARMv8 crypto extensions; for example there is no https://cateee.net/lkddb/web-lkddb/CRYPTO_AES_ARM64_CE_BLK.html

Do these have to be enabled? Looking at /proc/crypto I only see generic implementations and nothing specific to NEON or armv8 ce.

I'm no expert, but ARMv8 CE are CPU instructions that can also be used in user space.

You can check whether ARMv8 CE is supported by looking at the features in cat /proc/cpuinfo.

On the other hand, OpenSSL has integrated ARMv8 CE support and runtime detection, so no additional kernel modules are necessary.

The kernel modules are for things that don't use OpenSSL—I have no idea—drive encryption or something like that.

1 Like

I tried to boot with cpuidle.off=0 and found no difference in Watt consumption.

The device is hovering between 35-47watts with no interfaces plugged. I am suprised; I was expecting less when nothing is connected.

According to the manual, the device consumes a maximum of 36W.
The power supply is also only specified with an output power of 36W.

I hardly believe that the box draws anywhere near 50W when idle.
But I'm too lazy to fetch my power meter from the basement...

So my power meter shows ~20W (19W-23W) with 3 active ports WAN and LAN ....

So your meter is almost certainly showing nonsense if you measure ~50W when idle without active ports.

Sorry, I made a mistake in my measuring setup. It now shows 16w-20w idling.

Good news is that I can reach 8.7Gbps when iperf3 forwarding and mtu=1500. 1 core is sitting at 100% . mtu=9000 helps a lot with the cpu load and bandwidth (9.9Gbps).

I am now running a custom openwrt build with PAGE_SIZE=64k and VA_BITS=48 , and I can reach 9.4Gbps forwarding with mtu=1500 and minimum CPU load. Also self-iperf3:

root@OpenWrt:~# iperf3 -D -s && iperf3 -c 127.0.0.1
Connecting to host 127.0.0.1, port 5201
[ 5] local 127.0.0.1 port 55758 connected to 127.0.0.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 2.95 GBytes 25.3 Gbits/sec 0 1.06 MBytes
[ 5] 1.00-2.00 sec 2.99 GBytes 25.7 Gbits/sec 0 1.06 MBytes
[ 5] 2.00-3.00 sec 3.03 GBytes 26.1 Gbits/sec 0 1.06 MBytes
[ 5] 3.00-4.00 sec 2.99 GBytes 25.7 Gbits/sec 0 1.06 MBytes
[ 5] 4.00-5.00 sec 3.05 GBytes 26.2 Gbits/sec 0 1.06 MBytes
[ 5] 5.00-6.00 sec 2.85 GBytes 24.5 Gbits/sec 0 1.06 MBytes
[ 5] 6.00-7.00 sec 2.98 GBytes 25.6 Gbits/sec 0 1.06 MBytes
[ 5] 7.00-8.00 sec 3.01 GBytes 25.8 Gbits/sec 0 1.06 MBytes
[ 5] 8.00-9.00 sec 3.05 GBytes 26.2 Gbits/sec 0 1.06 MBytes
[ 5] 9.00-10.00 sec 2.88 GBytes 24.7 Gbits/sec 0 1.06 MBytes


[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 30.4 GBytes 26.1 Gbits/sec 0 sender
[ 5] 0.00-10.00 sec 30.4 GBytes 26.1 Gbits/sec receiver

iperf Done.

Does anybody know if there is a reason why the sysupgrade for the iEi Puzzle-M902 (and therefore also for the Qnap QHora-322) does not use A/B flashing? All the partitions are there already (see this comment here) and it is also possible to edit the bootcmd from within OpenWrt.

As a proof of concept I have written myself a little shell script that downloads, verifies, extracts and flashes an OpenWrt image to the non-active partitions.

The only two prerequisites are that you need to

  1. install the “losetup” package

  2. create the file /etc/fw_env.config with the following content, so that the commands "fw_printenv" and "fw_setenv" know where to find the bootloader’s environment variables

    /dev/mtdblock3 0x00000 0x10000 0x10000
    

Here is the script. Use at your own risk! https://pastebin.com/BbFZeyg0

Please let me know what you think.

1 Like

This is great, I will give the script a try next time I am flashing.