NanoPi R4S-RK3399 is a great new OpenWrt device

Try from a fresh clone, ignore the warning.

git clone https://github.com/openwrt/openwrt.git
cd openwrt
openwrt$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
openwrt$ git diff
openwrt$ wget https://patch-diff.githubusercontent.com/raw/mj22226/openwrt/pull/2.patch
--2020-12-15 05:31:15--  https://patch-diff.githubusercontent.com/raw/mj22226/openwrt/pull/2.patch
Resolving patch-diff.githubusercontent.com (patch-diff.githubusercontent.com)... 140.82.112.4
Connecting to patch-diff.githubusercontent.com (patch-diff.githubusercontent.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Cookie coming from patch-diff.githubusercontent.com attempted to set domain to github.com
Length: unspecified [text/plain]
Saving to: ‘2.patch’

2.patch                              [ <=>                                                              ]  26.94K  --.-KB/s    in 0.05s   

2020-12-15 05:31:15 (492 KB/s) - ‘2.patch’ saved [27589]

openwrt$ git am 2.patch
Applying: uboot-rockchip: add support for NanoPi R4S
.git/rebase-apply/patch:61: space before tab in indent.
        rk3399-nanopi-m4.dtb \
.git/rebase-apply/patch:62: space before tab in indent.
        rk3399-nanopi-m4-2gb.dtb \
.git/rebase-apply/patch:63: space before tab in indent.
        rk3399-nanopi-neo4.dtb \
.git/rebase-apply/patch:65: space before tab in indent.
        rk3399-orangepi.dtb \
.git/rebase-apply/patch:66: space before tab in indent.
        rk3399-pinebook-pro.dtb \
warning: squelched 2 whitespace errors
warning: 7 lines add whitespace errors.
Applying: rockchip: add support for NanoPi R4S
Applying: rockchip: add NanoPi R4S DTS
Applying: rockchip: add bootscript for NanoPi R4S
Applying: uboot-rockchip: add NanoPi R4S 1Gb DDR3
.git/rebase-apply/patch:61: space before tab in indent.
        rk3399-nanopi-m4-2gb.dtb \
.git/rebase-apply/patch:62: space before tab in indent.
        rk3399-nanopi-neo4.dtb \
.git/rebase-apply/patch:63: space before tab in indent.
        rk3399-nanopi-r4s.dtb \
.git/rebase-apply/patch:65: space before tab in indent.
        rk3399-orangepi.dtb \
.git/rebase-apply/patch:66: space before tab in indent.
        rk3399-pinebook-pro.dtb \
warning: squelched 1 whitespace error
warning: 6 lines add whitespace errors.
Applying: rockchip: add NanoPi R4S 1Gb DDR3
Applying: rockchip: fix LEDs states for NanoPi R4S
Applying: rockchip: update base-files
openwrt$ git log
1 Like

Ahh ok I'm running it through Lede instead of OpenWRT which I think is causing problems.

Could anyone tell me that how to make u-boot compatible with both DDR3 and LPDDR4 memory?
I can only make a dirty workaround for current.

See below:

Possibly this may work, I just need someone to confirm it.
https://github.com/mj22226/openwrt/commit/f08e1ef908cd12f7172de6765176825e599ad2f8

Hello everyone, I've submited a PR to upstream:

However, I would like to mark it as "Draft" until the issue of u-boot is being resolved.

considering its price, x86 soft router preferred

1 Like

Thanks for all of your work, this little board has a lot of power and it's really neat! Mine arrived last week and I've been playing with unofficial builds here and there. They're promising, but I haven't replaced anything yet. I'll be waiting for an official release from here. Again, all of your work is appreciated, looking forward to it.

1 Like

I've posted a few benchmarks to the GitHub PR: https://github.com/openwrt/openwrt/pull/3701#issuecomment-756947314

I've received my NanoPI R4S recently and was able to get it running quickly with this code here.

I've made some small changes though to get the most out of it:

  • Use Google OP1 overclock to 2.0GHz
    • edit arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
    • #include "rk3399-op1-opp.dtsi" instead of #include "rk3399-opp.dtsi"
  • optimize with -O3 instead of -Os
  • optimize for RK3399: -march=armv8-a+crypto+crc -mcpu=cortex-a73.cortex-a53+crypto+crc -mtune=cortex-a73.cortex-a53

Can hopefully be done better/cleaner than this:

diff --git a/include/target.mk b/include/target.mk
index edc6a146de..e7217c6181 100644
--- a/include/target.mk
+++ b/include/target.mk
@@ -194,7 +194,7 @@ LINUX_RECONF_DIFF = $(SCRIPT_DIR)/kconfig.pl - '>' $(call __linux_confcmd,$(filt
 ifeq ($(DUMP),1)
   BuildTarget=$(BuildTargets/DumpCurrent)
 
-  CPU_CFLAGS = -Os -pipe
+  CPU_CFLAGS = -O3 -pipe
   ifneq ($(findstring mips,$(ARCH)),)
     ifneq ($(findstring mips64,$(ARCH)),)
       CPU_TYPE ?= mips64
@@ -235,7 +235,7 @@ ifeq ($(DUMP),1)
   endif
   ifeq ($(ARCH),aarch64)
     CPU_TYPE ?= generic
-    CPU_CFLAGS_generic = -mcpu=generic
+    CPU_CFLAGS_generic = -march=armv8-a+crypto+crc -mcpu=cortex-a73.cortex-a53+crypto+crc -mtune=cortex-a73.cortex-a53
     CPU_CFLAGS_cortex-a53 = -mcpu=cortex-a53
   endif
   ifeq ($(ARCH),arc)
diff --git a/target/linux/rockchip/patches-5.4/202-rockchip-rk3399-Overclock-and-Undervolt-from-Google-OP1.patch b/target/linux/rockchip/patches-5.4/202-rockchip-rk3399-Overclock-and-Undervolt-from-Google-OP1.patch
new file mode 100644
index 0000000000..d0fc1d1a0f
--- /dev/null
+++ b/target/linux/rockchip/patches-5.4/202-rockchip-rk3399-Overclock-and-Undervolt-from-Google-OP1.patch
@@ -0,0 +1,13 @@
+Index: linux-5.4.86/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
+===================================================================
+--- linux-5.4.86.orig/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
++++ linux-5.4.86/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
+@@ -14,7 +14,7 @@
+ /dts-v1/;
+ #include <dt-bindings/input/linux-event-codes.h>
+ #include "rk3399.dtsi"
+-#include "rk3399-opp.dtsi"
++#include "rk3399-op1-opp.dtsi"
+ 
+ / {
+ 	chosen {

Though not sure if OpenWRT even considers this.
Also didn't even directly benchmark the impact of it.

Now for some testing:

FriendlyWRT 5.10.2 1.8GHz

# echo 10 > /proc/irq/35/smp_affinity
# echo 20 > /proc/irq/90/smp_affinity
# echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Idle power usage:
Conservative = 3.1-3.2W | Performance = 3.2-3.3W

LAN INTERFACE TX

root@FriendlyWrt:~# iperf3 -c HOST

[  5]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver

CPU 5: 90%
Power usage: 5.6 Watt

LAN INTERFACE RX

root@FriendlyWrt:~# iperf3 -c HOST -R

[  5]   0.00-10.00  sec  1.10 GBytes   944 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver

CPU 4: 80% from iperf3 | CPU 5: 100%
Power usage: 6.8 Watt

LAN INTERFACE BIDIR

root@FriendlyWrt:~# iperf3 -c HOST --bidir

[  5][TX-C]   0.00-10.00  sec  1.07 GBytes   919 Mbits/sec    0             sender
[  5][TX-C]   0.00-10.00  sec  1.07 GBytes   918 Mbits/sec                  receiver
[  7][RX-C]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec    0             sender
[  7][RX-C]   0.00-10.00  sec  1.09 GBytes   937 Mbits/sec                  receiver

CPU 4: 70% (mostly iperf) | CPU 5: 100%
Power usage: 7.0 Watt


WAN INTERFACE TX

root@FriendlyWrt:~# iperf3 -c HOST

[  5]   0.00-10.00  sec  1.10 GBytes   944 Mbits/sec    0             sender
[  5]   0.00-10.01  sec  1.10 GBytes   941 Mbits/sec                  receiver

CPU 4: 50%
Power usage: 5.3 Watt

WAN INTERFACE RX

root@FriendlyWrt:~# iperf3 -c HOST -R

[  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver

CPU 4: 28%
Power usage: 5.2 Watt

WAN INTERFACE BIDIR

$ iperf3 -c HOST --bidir

[  5][TX-C]   0.00-10.00  sec  1.08 GBytes   932 Mbits/sec    0             sender
[  5][TX-C]   0.00-10.00  sec  1.08 GBytes   929 Mbits/sec                  receiver
[  7][RX-C]   0.00-10.00  sec  1.07 GBytes   921 Mbits/sec    1             sender
[  7][RX-C]   0.00-10.00  sec  1.07 GBytes   918 Mbits/sec                  receiver

CPU 4: 90%
Power usage: 6.1 Watt


ROUTING LAN -> WAN

$ iperf3 -c HOST

[  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   939 Mbits/sec                  receiver

CPU 4: 80% | CPU 5: 80%
Power usage: 7.3 Watt

ROUTING LAN <-> WAN BIDIR

$ iperf3 -c HOST --bidir

[  5][TX-C]   0.00-10.00  sec  1.08 GBytes   928 Mbits/sec    0             sender
[  5][TX-C]   0.00-10.02  sec  1.08 GBytes   925 Mbits/sec                  receiver
[  7][RX-C]   0.00-10.00  sec   491 MBytes   412 Mbits/sec  150             sender
[  7][RX-C]   0.00-10.02  sec   487 MBytes   408 Mbits/sec                  receiver

CPU 4: 80% | CPU 5: 100%
Power usage: 8.2 Watt
super random, can be way worse
SMP affinity seems broken in this firmware


OpenWRT 5.4.86 2GHz

# echo 10 > /proc/irq/35/smp_affinity
# echo 20 > /proc/irq/90/smp_affinity
# echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Idle power usage: 3.1W

r8169 kernel driver

LAN INTERFACE TX

root@OpenWRT:~# iperf3 -c HOST

[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   939 Mbits/sec                  receiver

CPU 5: 83%
Power usage: 5.7 Watt

LAN INTERFACE RX

root@OpenWRT:~# iperf3 -c HOST -R

[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   939 Mbits/sec                  receiver

CPU 4: 60% from iperf3 | CPU 5: 96%
Power usage: 7.4 Watt

LAN INTERFACE BIDIR

root@OpenWRT:~# iperf3 -c HOST --bidir

[  5][TX-C]   0.00-10.00  sec  1.03 GBytes   882 Mbits/sec    0             sender
[  5][TX-C]   0.00-10.00  sec  1.02 GBytes   880 Mbits/sec                  receiver
[  7][RX-C]   0.00-10.00  sec  1.08 GBytes   927 Mbits/sec    0             sender
[  7][RX-C]   0.00-10.00  sec  1.08 GBytes   924 Mbits/sec                  receiver

CPU 4: 70% (mostly iperf) | CPU 5: 100%
Power usage: 7.6 Watt


r8168-8.048.03 realtek kernel module

LAN INTERFACE TX

root@OpenWRT:~# iperf3 -c HOST

[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver

CPU 5: 20%
Power usage: 4.6 Watt

LAN INTERFACE RX

root@OpenWRT:~# iperf3 -c HOST -R

[  5]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver

CPU 4: 15% from iperf3 | CPU 5: 30%
Power usage: 5.2 Watt

LAN INTERFACE BIDIR

root@OpenWRT:~# iperf3 -c HOST --bidir

[  5][TX-C]   0.00-10.00  sec  1.07 GBytes   918 Mbits/sec    0             sender
[  5][TX-C]   0.00-10.00  sec  1.07 GBytes   916 Mbits/sec                  receiver
[  7][RX-C]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec    0             sender
[  7][RX-C]   0.00-10.00  sec  1.09 GBytes   937 Mbits/sec                  receiver

CPU 4: 20% from iperf3 | CPU 5: 30%
Power usage: 5.4 Watt


Using r8168-8.048.03 realtek kernel module for the next tests because it's much better than r8169 in the kernel:

(And you'll also see that the WAN / eth0 / SoC integrated / st_gmac / mdio / rgmii / RTL8211E? NIC or driver is crap compared to the PCIe R8111H LAN / eth1 NIC + r8168)

WAN INTERFACE TX

root@FriendlyWrt:~# iperf3 -c HOST

[  5]   0.00-10.00  sec  1.10 GBytes   944 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver

CPU 4: 86%
Power usage: 5.5 Watt

WAN INTERFACE RX

root@FriendlyWrt:~# iperf3 -c HOST -R

[  5]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver

CPU 4: 96% | CPU 5: 55% from iperf3
Power usage: 6.8 Watt

WAN INTERFACE BIDIR

root@FriendlyWrt:~# iperf3 -c HOST --bidir

[  5][TX-C]   0.00-10.00  sec  1.08 GBytes   925 Mbits/sec    0             sender
[  5][TX-C]   0.00-10.00  sec  1.08 GBytes   923 Mbits/sec                  receiver
[  7][RX-C]   0.00-10.00  sec  1.07 GBytes   920 Mbits/sec    0             sender
[  7][RX-C]   0.00-10.00  sec  1.07 GBytes   918 Mbits/sec                  receiver

CPU 4: 100% | CPU 5: 30% from iperf3
Power usage: 6.3 Watt


ROUTING WAN -> LAN

$ iperf3 -c HOST

[  5]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver

CPU 4: 90% | CPU 5: 30%
Power usage: 6.6 Watt

ROUTING LAN -> WAN

$ iperf3 -c HOST -R

[  5]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver

CPU 4: 80% | CPU 5: 20%
Power usage: 6.6 Watt

ROUTING LAN <-> WAN BIDIR

$ iperf3 -c HOST --bidir

[  5][TX-C]   0.00-10.00  sec  1.08 GBytes   931 Mbits/sec   27             sender
[  5][TX-C]   0.00-10.00  sec  1.08 GBytes   929 Mbits/sec                  receiver
[  7][RX-C]   0.00-10.00  sec  1007 MBytes   845 Mbits/sec  194             sender
[  7][RX-C]   0.00-10.00  sec  1003 MBytes   842 Mbits/sec                  receiver

CPU 4: 100% | CPU 5: 45%
Power usage: 7.1 Watt


ROUTING WAN -> LAN with SQM 1 000 000 kbit/s

$ iperf3 -c HOST

[  5]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver

CPU 4: 97% | CPU 5: 88%
Power usage: 7.8 Watt
+0ms RTT on LAN

ROUTING LAN -> WAN with SQM 1 000 000 kbit/s

$ iperf3 -c HOST -R

[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   939 Mbits/sec                  receiver

CPU 4: 100% | CPU 5: 20%
Power usage: 6.8 Watt
+7-8ms RTT on LAN

ROUTING LAN <-> WAN BIDIR with SQM 1 000 000 kbit/s

$ iperf3 -c HOST --bidir

[  5][TX-C]   0.00-10.00  sec   974 MBytes   817 Mbits/sec    2             sender
[  5][TX-C]   0.00-10.00  sec   971 MBytes   814 Mbits/sec                  receiver
[  7][RX-C]   0.00-10.00  sec  1004 MBytes   843 Mbits/sec    3             sender
[  7][RX-C]   0.00-10.00  sec  1001 MBytes   839 Mbits/sec                  receiver

CPU 4: 100% | CPU 5: 100%
Power usage: 7.9 Watt
+10ms RTT on LAN

ROUTING LAN -> WAN BIDIR with SQM 800 000 kbit/s

$ iperf3 -c HOST --bidir

[  5][TX-C]   0.00-10.00  sec   911 MBytes   764 Mbits/sec    6             sender
[  5][TX-C]   0.00-10.00  sec   907 MBytes   761 Mbits/sec                  receiver
[  7][RX-C]   0.00-10.00  sec   902 MBytes   757 Mbits/sec    7             sender
[  7][RX-C]   0.00-10.00  sec   900 MBytes   755 Mbits/sec                  receiver

CPU 4: 99% | CPU 5: 100%
Power usage: 7.9 Watt
+0.5-1ms RTT on LAN

ROUTING LAN <-> WAN BIDIR 4 (total 8) PARALLEL with SQM 800 000 kbit/s

$ iperf3 -c HOST --bidir -P 4

[SUM][TX-C]   0.00-10.00  sec   844 MBytes   708 Mbits/sec   68             sender
[SUM][TX-C]   0.00-10.00  sec   840 MBytes   705 Mbits/sec                  receiver
[SUM][RX-C]   0.00-10.00  sec   902 MBytes   757 Mbits/sec   61             sender
[SUM][RX-C]   0.00-10.00  sec   897 MBytes   753 Mbits/sec                  receiver

CPU 4: 100% | CPU 5: 100%
Power usage: 7.9 Watt
+0.5-1ms RTT on LAN


Conclusion

  • Definitely use 2.0GHz, the device runs stable and cool with it and needs some extra oomph to work with gigabit.
    Max recorded temperature was 49°C with the included metal case (solid alumnium block with thermal pad and with fins)

  • Didn't make any benchmarks about the CFLAGS, but surely doesn't hurt to add them.
    Installing software from the repo that is optimized for generic armv8 works fine nontheless.

  • Forget about r8169 in-tree kernel module. Use the realtek r8168 one.
    https://github.com/BROBIRD/openwrt-r8168

  • Investigate performance issues of the WAN NIC.
    It looks like the performance of it is kinda better on FriendlyWRT Linux Kernel 5.10.
    Is there a technical explanation for it to perform much worse than the PCIe NIC?

Helpful debug stuff

perf top -C 4 while iperf over WAN interface is running: (CPU4 86%)


echo $((-$(awk '$11 == "eth0" { print $6 }' /proc/interrupts)+$(sleep 1 ; awk '$11 == "eth0" { print $6 }' /proc/interrupts))) irq/s
60800 irq/s

perf top -C 5 -F 50000 while iperf over LAN interface is running: (CPU5 20%)


echo $((-$(awk '$11 == "eth1" { print $7 }' /proc/interrupts)+$(sleep 1 ; awk '$11 == "eth1" { print $7 }' /proc/interrupts))) irq/s
9004 irq/s

So we see that eth0 WAN causes 7x many interrupts as eth1 LAN.
That does explain the 4x higher CPU usage, but why does it do that?

Edit: Okay, so the WAN nic eth0 is not from realtek, only the PHY is.
The WAN NIC is some STMicroelectronics GMAC crap.
The driver can be found in drivers/net/ethernet/stmicro/stmmac

Edit2: Figured out how to tune the WAN nic eth0 to be less of a cpu hog.

# ethtool -C eth0 rx-usecs 1000 rx-frames 25
# ethtool -C eth0 tx-usecs 100 tx-frames 25

Reduces CPU load:

  • WAN TX from 86% to 57%
  • WAN RX from 96% to 40%
  • WAN BIDIR from 100% to 90%

But performance didn't actually improve on full load :frowning:
So this change was basically useless.
Lets see what Linux 5.10 brings to the table in the future.

8 Likes

@BotoX

Have you tried disabling these

CONFIG_SLUB_DEBUG
CONFIG_PROC_PAGE_MONITOR

Found on

Both options are disabled:
[ ] Enable /proc slab debug info
[ ] Enable /proc page monitoring

in Kconfig of kernel

CONFIG_SLUB_DEBUG=y
# CONFIG_SLUB_DEBUG_ON is not set
CONFIG_PROC_PAGE_MONITOR=y

I don't think those are the issue here when looking at the perf top
Just checked my diffconfig too, nothing special:

CONFIG_TARGET_rockchip=y
CONFIG_TARGET_rockchip_armv8=y
CONFIG_TARGET_rockchip_armv8_DEVICE_friendlyarm_nanopi-r4s=y
CONFIG_DEVEL=y
CONFIG_BUSYBOX_CUSTOM=y
CONFIG_DROPBEAR_ECC=y
CONFIG_DROPBEAR_ECC_FULL=y
CONFIG_GETDNS_ENABLE_STUB_ONLY=y
CONFIG_KERNEL_ARM_PMU=y
CONFIG_KERNEL_PERF_EVENTS=y
CONFIG_KERNEL_PROFILING=y
CONFIG_LIBCURL_COOKIES=y
CONFIG_LIBCURL_FILE=y
CONFIG_LIBCURL_FTP=y
CONFIG_LIBCURL_HTTP=y
CONFIG_LIBCURL_NO_SMB="!"
CONFIG_LIBCURL_PROXY=y
CONFIG_LIBCURL_WOLFSSL=y
CONFIG_NGINX_HEADERS_MORE=y
CONFIG_NGINX_HTTP_ACCESS=y
CONFIG_NGINX_HTTP_AUTH_BASIC=y
CONFIG_NGINX_HTTP_AUTOINDEX=y
CONFIG_NGINX_HTTP_BROWSER=y
CONFIG_NGINX_HTTP_CACHE=y
CONFIG_NGINX_HTTP_CHARSET=y
CONFIG_NGINX_HTTP_EMPTY_GIF=y
CONFIG_NGINX_HTTP_FASTCGI=y
CONFIG_NGINX_HTTP_GEO=y
CONFIG_NGINX_HTTP_GZIP=y
CONFIG_NGINX_HTTP_LIMIT_CONN=y
CONFIG_NGINX_HTTP_LIMIT_REQ=y
CONFIG_NGINX_HTTP_MAP=y
CONFIG_NGINX_HTTP_MEMCACHED=y
CONFIG_NGINX_HTTP_PROXY=y
CONFIG_NGINX_HTTP_REFERER=y
CONFIG_NGINX_HTTP_REWRITE=y
CONFIG_NGINX_HTTP_SCGI=y
CONFIG_NGINX_HTTP_SPLIT_CLIENTS=y
CONFIG_NGINX_HTTP_SSI=y
CONFIG_NGINX_HTTP_UPSTREAM_HASH=y
CONFIG_NGINX_HTTP_UPSTREAM_IP_HASH=y
CONFIG_NGINX_HTTP_UPSTREAM_KEEPALIVE=y
CONFIG_NGINX_HTTP_UPSTREAM_LEAST_CONN=y
CONFIG_NGINX_HTTP_USERID=y
CONFIG_NGINX_HTTP_UWSGI=y
CONFIG_NGINX_HTTP_V2=y
CONFIG_NGINX_PCRE=y
CONFIG_NGINX_UBUS=y
CONFIG_OPENSSL_ENGINE=y
CONFIG_OPENSSL_WITH_ASM=y
CONFIG_OPENSSL_WITH_CHACHA_POLY1305=y
CONFIG_OPENSSL_WITH_CMS=y
CONFIG_OPENSSL_WITH_DEPRECATED=y
CONFIG_OPENSSL_WITH_ERROR_MESSAGES=y
CONFIG_OPENSSL_WITH_PSK=y
CONFIG_OPENSSL_WITH_SRP=y
CONFIG_OPENSSL_WITH_TLS13=y
CONFIG_PACKAGE_6in4=y
CONFIG_PACKAGE_bcp38=y
CONFIG_PACKAGE_cgi-io=y
CONFIG_PACKAGE_collectd=y
CONFIG_PACKAGE_collectd-mod-conntrack=y
CONFIG_PACKAGE_collectd-mod-cpu=y
CONFIG_PACKAGE_collectd-mod-dns=y
CONFIG_PACKAGE_collectd-mod-exec=y
CONFIG_PACKAGE_collectd-mod-interface=y
CONFIG_PACKAGE_collectd-mod-irq=y
CONFIG_PACKAGE_collectd-mod-iwinfo=y
CONFIG_PACKAGE_collectd-mod-load=y
CONFIG_PACKAGE_collectd-mod-memory=y
CONFIG_PACKAGE_collectd-mod-network=y
CONFIG_PACKAGE_collectd-mod-ping=y
CONFIG_PACKAGE_collectd-mod-rrdtool=y
CONFIG_PACKAGE_collectd-mod-sqm=y
CONFIG_PACKAGE_collectd-mod-tcpconns=y
CONFIG_PACKAGE_collectd-mod-thermal=y
CONFIG_PACKAGE_curl=y
CONFIG_PACKAGE_ddns-scripts=y
CONFIG_PACKAGE_ddns-scripts-cloudflare=y
CONFIG_PACKAGE_ddns-scripts-services=y
CONFIG_PACKAGE_getdns=y
CONFIG_PACKAGE_htop=y
CONFIG_PACKAGE_ip-tiny=y
CONFIG_PACKAGE_ipset=y
CONFIG_PACKAGE_iptables-mod-conntrack-extra=y
CONFIG_PACKAGE_iptables-mod-ipopt=y
CONFIG_PACKAGE_kmod-ifb=y
CONFIG_PACKAGE_kmod-ipt-conntrack-extra=y
CONFIG_PACKAGE_kmod-ipt-ipopt=y
CONFIG_PACKAGE_kmod-ipt-ipset=y
CONFIG_PACKAGE_kmod-ipt-raw=y
CONFIG_PACKAGE_kmod-iptunnel=y
CONFIG_PACKAGE_kmod-iptunnel4=y
CONFIG_PACKAGE_kmod-ledtrig-default-on=y
CONFIG_PACKAGE_kmod-ledtrig-heartbeat=y
CONFIG_PACKAGE_kmod-ledtrig-netdev=y
CONFIG_PACKAGE_kmod-ledtrig-timer=y
CONFIG_PACKAGE_kmod-macvlan=y
CONFIG_PACKAGE_kmod-nfnetlink=y
CONFIG_PACKAGE_kmod-sched-cake=y
CONFIG_PACKAGE_kmod-sched-core=y
CONFIG_PACKAGE_kmod-sit=y
CONFIG_PACKAGE_kmod-tun=y
CONFIG_PACKAGE_kmod-udptunnel4=y
CONFIG_PACKAGE_kmod-udptunnel6=y
CONFIG_PACKAGE_kmod-wireguard=y
CONFIG_PACKAGE_libbfd=y
CONFIG_PACKAGE_libbz2=y
CONFIG_PACKAGE_libcap=y
CONFIG_PACKAGE_libctf=y
CONFIG_PACKAGE_libcurl=y
CONFIG_PACKAGE_libdw=y
CONFIG_PACKAGE_libelf=y
CONFIG_PACKAGE_libipset=y
CONFIG_PACKAGE_libiwinfo=y
CONFIG_PACKAGE_libiwinfo-lua=y
CONFIG_PACKAGE_libltdl=y
CONFIG_PACKAGE_liblua=y
CONFIG_PACKAGE_liblucihttp=y
CONFIG_PACKAGE_liblucihttp-lua=y
CONFIG_PACKAGE_libminiupnpc=y
CONFIG_PACKAGE_libmnl=y
CONFIG_PACKAGE_libnatpmp=y
CONFIG_PACKAGE_libncurses=y
CONFIG_PACKAGE_libopcodes=y
CONFIG_PACKAGE_libopenssl=y
CONFIG_PACKAGE_libopenssl-conf=y
CONFIG_PACKAGE_liboping=y
CONFIG_PACKAGE_libpcap=y
CONFIG_PACKAGE_libpcre=y
CONFIG_PACKAGE_librrd1=y
CONFIG_PACKAGE_libstdcpp=y
CONFIG_PACKAGE_libubus-lua=y
CONFIG_PACKAGE_libuci-lua=y
CONFIG_PACKAGE_libyaml=y
CONFIG_PACKAGE_lua=y
CONFIG_PACKAGE_luci-app-bcp38=y
CONFIG_PACKAGE_luci-app-ddns=y
CONFIG_PACKAGE_luci-app-firewall=y
CONFIG_PACKAGE_luci-app-mwan3=y
CONFIG_PACKAGE_luci-app-opkg=y
CONFIG_PACKAGE_luci-app-sqm=y
CONFIG_PACKAGE_luci-app-statistics=y
CONFIG_PACKAGE_luci-app-wireguard=y
CONFIG_PACKAGE_luci-base=y
CONFIG_PACKAGE_luci-compat=y
CONFIG_PACKAGE_luci-lib-base=y
CONFIG_PACKAGE_luci-lib-ip=y
CONFIG_PACKAGE_luci-lib-ipkg=y
CONFIG_PACKAGE_luci-lib-jsonc=y
CONFIG_PACKAGE_luci-lib-nixio=y
CONFIG_PACKAGE_luci-mod-admin-full=y
CONFIG_PACKAGE_luci-mod-network=y
CONFIG_PACKAGE_luci-mod-status=y
CONFIG_PACKAGE_luci-mod-system=y
CONFIG_PACKAGE_luci-nginx=y
CONFIG_PACKAGE_luci-proto-ipv6=y
CONFIG_PACKAGE_luci-proto-ppp=y
CONFIG_PACKAGE_luci-proto-wireguard=y
CONFIG_PACKAGE_luci-theme-bootstrap=y
CONFIG_PACKAGE_mwan3=y
CONFIG_PACKAGE_nano=y
CONFIG_PACKAGE_nginx-mod-luci=y
CONFIG_PACKAGE_nginx-ssl=y
CONFIG_PACKAGE_nginx-ssl-util=y
CONFIG_PACKAGE_nginx-util=y
CONFIG_PACKAGE_objdump=y
CONFIG_PACKAGE_openssl-util=y
CONFIG_PACKAGE_perf=y
CONFIG_PACKAGE_rpcd=y
CONFIG_PACKAGE_rpcd-mod-file=y
CONFIG_PACKAGE_rpcd-mod-iwinfo=y
CONFIG_PACKAGE_rpcd-mod-luci=y
CONFIG_PACKAGE_rpcd-mod-rrdns=y
CONFIG_PACKAGE_rrdtool1=y
CONFIG_PACKAGE_sqm-scripts=y
CONFIG_PACKAGE_stubby=y
CONFIG_PACKAGE_tc=y
CONFIG_PACKAGE_terminfo=y
CONFIG_PACKAGE_uwsgi=y
CONFIG_PACKAGE_uwsgi-cgi-plugin=y
CONFIG_PACKAGE_uwsgi-luci-support=y
CONFIG_PACKAGE_uwsgi-syslog-plugin=y
CONFIG_PACKAGE_wireguard=y
CONFIG_PACKAGE_wireguard-tools=y
CONFIG_PACKAGE_zerotier=y
CONFIG_PACKAGE_zlib=y
# CONFIG_TARGET_IMAGES_GZIP is not set
CONFIG_TARGET_OPTIMIZATION="-O3 -pipe -march=armv8-a+crypto+crc -mcpu=cortex-a73.cortex-a53+crypto+crc -mtune=cortex-a73.cortex-a53"
CONFIG_TARGET_OPTIONS=y
# CONFIG_TARGET_ROOTFS_SQUASHFS is not set
CONFIG_TARGET_ROOTFS_TARGZ=y
# CONFIG_NGINX_NAXSI is not set

Thanks for sharing - Perhaps a bit off topic, but I really need to perform the same overclocking for my pine rockpro64. Right now it's stuck at about 1400Mhz, and sqm maxes out at about 600 mbit using an Intel I350-T4 pcie.

Do you care to share very quickly how you generated the diff file? And how to apply it?

Find that file and change the line saying:
#include "rk3399-opp.dtsi"
to
#include "rk3399-op1-opp.dtsi"

However uhh how high are your temps?
Default is 1.8GHz for me, and op1 increases that to 2.0GHz.
Make sure you are using cores 5 and 6 for the ethernet IRQs, instead of the slower cores 0 - 4
Verify with htop that this is the case during load (you can also show core frequency in htop)
Try with performance governor as well, there is basically no power usage penalty.

And about the diff file, you don't have to make one if you edit the file in the build dir.
But ofc. it's cleaner

Also I won't be able to do any tests for a while as I'm now using the nanopi as my main router.
Upgrading OpenWRT on ext4/F2FS is a pain in the arse also, why can't I just opkg upgrade from my build folder and it installs the new kernel idk.

Thank you @BotoX ! I set it up like this:

  • Installed netperf on a local server
  • Changed IP range for the device to avoid conflict w/ my current network
  • Connected the device using the WAN port
  • And ran speedtest-netperf.sh -H [IP of netperf server]

What i find confusing is the CPU config - I seem to have 4 fast cores, and 2 slow cores - where you had 2 fast cores, and 4 slow. Off course they're different devices, but I would assume the CPU config would be similar.

Excerpt from netperf-speedtest:

 CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 57 samples]
     cpu0: 100.0 +/-  0.0  @ 1416 MHz
     cpu1:  14.6 +/-  4.3  @ 1416 MHz
     cpu2:  13.5 +/-  5.0  @ 1416 MHz
     cpu3:  13.7 +/-  4.0  @ 1416 MHz
     cpu4:  38.8 +/- 12.8  @ 1122 MHz
     cpu5:  40.8 +/-  5.9  @ 1152 MHz

Complete speedtest-netperf output:

root@OpenWrt:~# speedtest-netperf.sh -H 192.168.1.178
2021-01-23 21:45:17 Starting speedtest for 60 seconds per transfer session.
Measure speed to 192.168.1.178 (IPv4) while pinging gstatic.com.
Download and upload sessions are sequential, each with 5 simultaneous streams.
............................................................
 Download: 577.51 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:   8.286
    10pct:   8.852
   Median:   9.180
      Avg:   9.186
    90pct:   9.497
      Max:   9.790
 CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 57 samples]
     cpu0: 100.0 +/-  0.0  @ 1416 MHz
     cpu1:  14.6 +/-  4.3  @ 1416 MHz
     cpu2:  13.5 +/-  5.0  @ 1416 MHz
     cpu3:  13.7 +/-  4.0  @ 1416 MHz
     cpu4:  38.8 +/- 12.8  @ 1122 MHz
     cpu5:  40.8 +/-  5.9  @ 1152 MHz
 Overhead: [in % used of total CPU available]
  netperf:  21.0
.............................................................
   Upload: 848.47 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:   8.864
    10pct:   9.883
   Median:  10.498
      Avg:  10.487
    90pct:  11.028
      Max:  12.059
 CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 57 samples]
     cpu0: 100.0 +/-  0.0  @ 1416 MHz
     cpu1:   2.8 +/-  1.4  @ 1416 MHz
     cpu2:   0.8 +/-  1.2  @ 1416 MHz
     cpu3:   0.2 +/-  0.4  @ 1416 MHz
     cpu4:  10.1 +/-  4.0  @ 1098 MHz
     cpu5:  10.2 +/-  3.2  @ 1193 MHz
 Overhead: [in % used of total CPU available]
  netperf:   2.9

both devices have the same SoC RK3399
If you didn't set your CPU governor to performance then your two fast cores can obviously clock lower than their maximum
cpu4 and cpu5 are the fast cores on your board too, however you are not using them and thus they stay in lower frequency states

set governor to performance
find out how to move the network irq to the big cores
read this blog post: https://www.stupid-projects.com/nanopi-r4s-benchmarks-with-networking-optimizations/

2 Likes

Thank you for the guide. The performance seems good. I can get gigabit network speed. BTW, have you tried AES hardware acceleration? I tried but it doesn't work.

Anyone tried sysupgrade? I tried but it did nothing. System was rebooted but new firmware was not flashed.

Yes, it doesn't work for both kernelspace and userspace, as rockchip has never opened the source of cryptography offloading driver.
However you can try with golang, as it has it's own way to implement hw acceleration.

Sysupgrade is functional.

1 Like

Thank you but I think it’s beyond my ability.

So wired. I flashed ext4 instead of squashfs. I don’t know if it’s related.