NanoPi R4S-RK3399 is a great new OpenWrt device

what is size of firmware you sysupgrade and size of RAM r4s you have?

Please explain why you ask this, as I fail to understand the reason ?

1 Like

Size of firmware is around 128MB. 4GB RAM R4S. My SD Card is 32GB.

1 Like

I use @1715173329 PR code to build firmware with the latest trunk, it works very well.
Also, try setup nic bonding with two port onboard, iperf3 speedtest reach > 1.8gbps. r8168 driver @BotoX noted above gives better speed than r8169 kernel driver about 200-300mbps.

iperf3 -V -i 5 -t 30 -c 192.168.16.11
iperf 3.9
Linux NanopiR4S 5.4.99 #0 SMP PREEMPT Fri Feb 19 01:25:49 2021 aarch64
Control connection MSS 1448
Time: Fri, 19 Feb 2021 12:03:52 UTC
Connecting to host 192.168.16.11, port 5201
      Cookie: qylwd6hdwtjsutpso2b3rsny2i72xtgavsgs
      TCP MSS: 1448 (default)
[  5] local 192.168.16.1 port 53742 connected to 192.168.16.11 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 30 second test, tos 0
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-5.00   sec  1.06 GBytes  1.82 Gbits/sec  2159    390 KBytes
[  5]   5.00-10.00  sec  1.07 GBytes  1.85 Gbits/sec  978    382 KBytes
[  5]  10.00-15.00  sec  1.05 GBytes  1.80 Gbits/sec  247    379 KBytes
[  5]  15.00-20.00  sec  1.09 GBytes  1.87 Gbits/sec  165    410 KBytes
[  5]  20.00-25.00  sec  1.05 GBytes  1.80 Gbits/sec   73    376 KBytes
[  5]  25.00-30.00  sec  1.09 GBytes  1.87 Gbits/sec   60    379 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.00  sec  6.41 GBytes  1.84 Gbits/sec  3682             sender
[  5]   0.00-30.00  sec  6.41 GBytes  1.84 Gbits/sec                  receiver
CPU Utilization: local/sender 17.0% (0.2%u/16.8%s), remote/receiver 19.6% (1.9%u/17.7%s)
snd_tcp_congestion bbr
rcv_tcp_congestion bbr
1 Like

Can someone please share the output from cryptsetup benchmark ?

Here result of my board, overcloked 2Ghz & cpu governor set to performance:

cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       207721 iterations per second for 256-bit key
PBKDF2-sha256     331408 iterations per second for 256-bit key
PBKDF2-sha512     279173 iterations per second for 256-bit key
PBKDF2-ripemd160     N/A
PBKDF2-whirlpool  164870 iterations per second for 256-bit key
argon2i       4 iterations, 296217 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id      4 iterations, 358570 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b        84.4 MiB/s        86.6 MiB/s
    serpent-cbc        128b        47.5 MiB/s        47.5 MiB/s
    twofish-cbc        128b        77.2 MiB/s        74.0 MiB/s
        aes-cbc        256b        65.6 MiB/s        66.7 MiB/s
    serpent-cbc        256b        48.0 MiB/s        47.4 MiB/s
    twofish-cbc        256b        77.8 MiB/s        73.9 MiB/s
        aes-xts        256b        90.5 MiB/s        95.0 MiB/s
    serpent-xts        256b        48.4 MiB/s        52.0 MiB/s
    twofish-xts        256b        82.9 MiB/s        86.1 MiB/s
        aes-xts        512b        71.8 MiB/s        71.4 MiB/s
    serpent-xts        512b        51.6 MiB/s        52.0 MiB/s
    twofish-xts        512b        87.8 MiB/s        86.0 MiB/s
1 Like

not too shabby, but obviously without crypto acceleration. Thanks!

Thanks for your amazing work :love_you_gesture:

I just setup a NanoPi R4S as 'in place replacement' of the 'internet box' provided by my ISP (for a 10G-EPON fiber connexion).

I'm brand new to OpenWrt, so I still need to learn lot's of things.
I generated my image based on the fork available on github: 1715173329 / openwrt-official

Just FYI: I face an issue with the wan interface not acquiring an IPv6 from the ISP. But as soon as I set the wan interface into promiscuous mode, it's working well.

I posted more details, context, logs, etc... here:

Note I use the basic r8169 driver, not the 8168-8.048.03 realtek kernel module yet (for the simple reason I don't know yet how to build my image with this driver :sweat_smile:)

Again: many thanks for you work and for sharing it :+1:

1 Like

Why the optimize for RK3399 is cortex-a73.cortex-a53, not cortex-a72.cortex-a53 ?

@nouknouk: "I face an issue with the wan interface not acquiring an IPv6 from the ISP. But as soon as I set the wan interface into promiscuous mode, it's working well."

Some more tests done about issue for vlan support of WAN interface:

  • same issue with a rebuild FriendlyWrt image (based on kernel 5.10)
  • same issue with a rebuild OpenWrt image with r8168-8.048.03 realtek kernel module

Performance / load difference between r8168 and r8169 may result from the fact that r8168 has interrupt coalescing enabled by default (drawback is that this increases latency), and r8169 has not.
To deal with this you can either use ethtool to enable irq coalescing with r8169, or better and easier (from kernel 5.10):
echo 20000 > /sys/class/net//gro_flush_timeout
echo 1 > /sys/class/net//napi_defer_hard_irqs

Enabling TSO may also provide a benefit. It's disabled per default because of hw bugs on some chip versions.
ethtool -K sg on tso on

In general r8169 has the more modern design and a much smaller memory footprint. On the other hand r8168 has a lot of undocumented magic that may help to work around problematic board / BIOS / network chip version combinations.

4 Likes

The result on my r4s seems promising? Overclocked to 2.2Ghz/1.8Ghz and with following kernel config.

CONFIG_CRYPTO_DEV_ROCKCHIP=y
CONFIG_HW_RANDOM_ROCKCHIP=y

[root@R4S:/tmp/downloads]$ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       227951 iterations per second for 256-bit key
PBKDF2-sha256     442064 iterations per second for 256-bit key
PBKDF2-sha512     382134 iterations per second for 256-bit key
PBKDF2-ripemd160     N/A
PBKDF2-whirlpool  160627 iterations per second for 256-bit key
argon2i       4 iterations, 298677 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id      4 iterations, 320365 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b       653.1 MiB/s       907.7 MiB/s
    serpent-cbc        128b        51.2 MiB/s        55.5 MiB/s
    twofish-cbc        128b        82.5 MiB/s        90.4 MiB/s
        aes-cbc        256b       558.5 MiB/s       800.9 MiB/s
    serpent-cbc        256b        51.4 MiB/s        55.7 MiB/s
    twofish-cbc        256b        83.3 MiB/s        89.5 MiB/s
        aes-xts        256b       736.0 MiB/s       734.9 MiB/s
    serpent-xts        256b        56.2 MiB/s        56.6 MiB/s
    twofish-xts        256b        95.8 MiB/s        93.6 MiB/s
        aes-xts        512b       661.7 MiB/s       669.0 MiB/s
    serpent-xts        512b        56.4 MiB/s        56.4 MiB/s
    twofish-xts        512b        95.7 MiB/s        93.5 MiB/s

Openssl result seems not right here.

[root@R4S:/tmp/downloads]$ openssl engine -t -c
(dynamic) Dynamic engine loading support
     [ unavailable ]
(devcrypto) /dev/crypto engine
     [ available ]

and speed test.

[root@R4S:/tmp/downloads]$ time openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 50776931 aes-128-cbc's in 2.96s
Doing aes-128-cbc for 3s on 64 size blocks: 34280079 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 256 size blocks: 14864233 aes-128-cbc's in 2.97s
Doing aes-128-cbc for 3s on 1024 size blocks: 4466057 aes-128-cbc's in 2.96s
Doing aes-128-cbc for 3s on 8192 size blocks: 599587 aes-128-cbc's in 2.96s
Doing aes-128-cbc for 3s on 16384 size blocks: 300172 aes-128-cbc's in 2.97s
OpenSSL 1.1.1k  25 Mar 2021
built on: Mon Mar 29 20:26:27 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr) 
compiler: aarch64-openwrt-linux-musl-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -pipe -march=armv8-a+crypto+crc -mabi=lp64 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fPIC -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_PREFER_CHACHA_OVER_GCM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc     274469.90k   733754.20k  1281226.82k  1545014.31k  1659397.54k  1655898.33k

real	0m18.039s
user	0m17.818s
sys	0m0.098s
1 Like

i see router + samba + encrypted storage all in one, thanks @greekstreet

I just received the 1GB version and would like to run some benchmarks. I have iperf3 running -s on my laptop and when run in client mode on the device I'm getting really slow speeds. I'm connected via a short ethernet cable directly to the device. When I run it vise versa (server on the device and laptop in client mode I get ~950/450). I have SQM disabled.

I'm using this image:https://github.com/quintus-lab/NanoPi-R4S-OpenWRT

 root@OpenWrt:~# iperf3 -c 192.168.1.143 -f M
Connecting to host 192.168.1.143, port 5201
[  5] local 192.168.1.1 port 58832 connected to 192.168.1.143 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  49.1 MBytes  49.1 MBytes/sec    0    282 KBytes       
[  5]   1.00-2.00   sec  48.4 MBytes  48.4 MBytes/sec    0    294 KBytes       
[  5]   2.00-3.00   sec  48.1 MBytes  48.1 MBytes/sec    0    297 KBytes       
[  5]   3.00-4.00   sec  48.3 MBytes  48.3 MBytes/sec    0    288 KBytes       
[  5]   4.00-5.00   sec  48.6 MBytes  48.6 MBytes/sec    0    282 KBytes       
[  5]   5.00-6.00   sec  48.1 MBytes  48.1 MBytes/sec    0    277 KBytes       
[  5]   6.00-7.00   sec  48.4 MBytes  48.4 MBytes/sec    0    245 KBytes       
[  5]   7.00-8.00   sec  48.3 MBytes  48.3 MBytes/sec    0    279 KBytes       
[  5]   8.00-9.00   sec  47.9 MBytes  47.9 MBytes/sec    0    274 KBytes       
[  5]   9.00-10.00  sec  48.7 MBytes  48.7 MBytes/sec    0    282 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   484 MBytes  48.4 MBytes/sec    0             sender
[  5]   0.00-10.00  sec   482 MBytes  48.2 MBytes/sec                  receiver       
root@OpenWrt:~# iperf3 -c 192.168.1.143 -f M -R
Connecting to host 192.168.1.143, port 5201
Reverse mode, remote host 192.168.1.143 is sending
[  5] local 192.168.1.1 port 58836 connected to 192.168.1.143 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   112 MBytes   112 MBytes/sec                  
[  5]   1.00-2.00   sec   113 MBytes   113 MBytes/sec                  
[  5]   2.00-3.00   sec   113 MBytes   113 MBytes/sec                  
[  5]   3.00-4.00   sec   113 MBytes   113 MBytes/sec                  
[  5]   4.00-5.00   sec   113 MBytes   113 MBytes/sec                  
[  5]   5.00-6.00   sec   113 MBytes   113 MBytes/sec                  
[  5]   6.00-7.00   sec   113 MBytes   113 MBytes/sec                  
[  5]   7.00-8.00   sec   113 MBytes   113 MBytes/sec                  
[  5]   8.00-9.00   sec   113 MBytes   113 MBytes/sec                  
[  5]   9.00-10.00  sec   113 MBytes   113 MBytes/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  1.10 GBytes   113 MBytes/sec                  sender
[  5]   0.00-10.00  sec  1.10 GBytes   113 MBytes/sec                  receiver

Am I doing this wrong? I'm extremely new to openwrt and networking in general. Any help would be appreciated!

EDIT: I just realized this is benchmarking on the device itself which is probably limited by the SD card? How should I properly run these benchmarks?

@Fauks ideally you should get 2 pcs, and your device (R4S between them), or a PC with two interfaces, the idea is to send from one pc and receive in the other passing through the device.

That said, your measurements look quite low, in the repository where you get the image, there are some benchmark with different settings to iperf3, maybe try those?

@xiaobo Thanks for the images, I took the last one and noticed that the wan and lan port are inverted?

I'd like to try to build my own image, if someone has any documentation about the process, I'll appreciate it.

I also noticed that the MR for this device were closed, but the changes (unless I saw it wrong) are already present in mainline kernel and u-boot. If that's the case, is that the case?

I've been messing around with a few different images, although I've settled down with "ImmortalWrt" - Slim.img from: https://github.com/klever1988/nanopi-openwrt which has been awesome.

I figured out what my problem was I think, I had LLA overhead set to 22. I'm on a network with another router locally, so I was not sure what I needed. After disabling it completely, I now get ~950Mbps up or down. This is with SQM enabled using Cake - Piece of Cake.qos. I have the server running on the device and the client running on my computer, I'm not able to test using another machine here but I will once I finish wiring my home.

Using CPUFreq I have the the two main cores locked in at 1200Mhz and the smaller 4 cores locked to ~800MHz using the schedutil governor scaling for both. Anything less, speeds starts to drop although it doesn't seem bad considering the cores can clock much higher.

Am I doing this right? Here is my output:

root@DCGateway:~# tc -d qdisc
qdisc noqueue 0: dev lo root refcnt 2 
qdisc mq 0: dev eth0 root 
qdisc fq_codel 0: dev eth0 parent :1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64 
qdisc cake 8007: dev eth1 root refcnt 2 bandwidth 1Gbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100ms raw overhead 0 
qdisc ingress ffff: dev eth1 parent ffff:fff1 ---------------- 
qdisc fq_codel 0: dev ztks5427oj root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64 
qdisc noqueue 0: dev br-lan root refcnt 2 
qdisc cake 8008: dev ifb4eth1 root refcnt 2 bandwidth 1Gbit besteffort triple-isolate nonat wash no-ack-filter split-gso rtt 100ms raw overhead 0
c:\iperf>iperf3 -c 10.10.10.1
Connecting to host 10.10.10.1, port 5201
[  5] local 10.10.10.143 port 64789 connected to 10.10.10.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   110 MBytes   920 Mbits/sec
[  5]   1.00-2.00   sec   113 MBytes   949 Mbits/sec
[  5]   2.00-3.00   sec   113 MBytes   949 Mbits/sec
[  5]   3.00-4.00   sec   113 MBytes   949 Mbits/sec
[  5]   4.00-5.00   sec   113 MBytes   949 Mbits/sec
[  5]   5.00-6.00   sec   113 MBytes   949 Mbits/sec
[  5]   6.00-7.00   sec   113 MBytes   949 Mbits/sec
[  5]   7.00-8.00   sec   113 MBytes   949 Mbits/sec
[  5]   8.00-9.00   sec   113 MBytes   949 Mbits/sec
[  5]   9.00-10.00  sec   114 MBytes   958 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  1.10 GBytes   947 Mbits/sec                  sender
[  5]   0.00-10.06  sec  1.10 GBytes   941 Mbits/sec                  receiver

iperf Done.

c:\iperf>iperf3 -c 10.10.10.1 -R
Connecting to host 10.10.10.1, port 5201
Reverse mode, remote host 10.10.10.1 is sending
[  5] local 10.10.10.143 port 64795 connected to 10.10.10.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   111 MBytes   929 Mbits/sec
[  5]   1.00-2.00   sec   112 MBytes   936 Mbits/sec
[  5]   2.00-3.00   sec   111 MBytes   930 Mbits/sec
[  5]   3.00-4.00   sec   111 MBytes   935 Mbits/sec
[  5]   4.00-5.00   sec   110 MBytes   925 Mbits/sec
[  5]   5.00-6.00   sec   111 MBytes   928 Mbits/sec
[  5]   6.00-7.00   sec   110 MBytes   924 Mbits/sec
[  5]   7.00-8.00   sec   110 MBytes   926 Mbits/sec
[  5]   8.00-9.00   sec   111 MBytes   928 Mbits/sec
[  5]   9.00-10.00  sec   111 MBytes   933 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.05  sec  1.09 GBytes   927 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.08 GBytes   929 Mbits/sec                  receiver

iperf Done.

1 Like

I can't seem to get LAN interface working with 5.10 kernel, could someone share the kernel configuration for it? (I think it's the one that goes via the PCIe bus).

dmesg doesn't say anything about it nor lspci.

Could someone do a benchmark of SQM 1g/s bidirectional, WITHOUT an accompanying WAN -> LAN workload?

This for an L2 Transparent SQM bridge https://apenwarr.ca/log/?m=201808#openwrt which in my previous testing requires less CPU than a full-on router.

I have made a OpenWrt 21.02 build for R2S / R4S from vanilla Openwrt + rockchip patches from ImmortalWrt + r8168 driver.
If you want to give a try : https://github.com/anaelorlinski/OpenWRT-Rockchip/releases

2 Likes