NanoPi R4S-RK3399 is a great new OpenWrt device

Yep, I was surprised as well seeing below gigabit with CAKE SQM in the R4S. I was not running concurrent tests, it was downloading only (iperf3 -R from the client).

Would it be possible that CAKE could be using the LITTLE instead of the big cores?

Below is the htop I've captured in during the download test. You can see that core 4 is at 100% load, I'm not sure but I believe this is a LITTLE core.

EDIT: core 4 and 5 seems to be actually the big cores, so the above theory does not seem to be correct.

------------------------------------------------------
Download - SQM (cake / piece_of_cake.qos)
------------------------------------------------------

    0[                                                          0.0%]   3[|                                                          0.7%]
    1[||                                                        1.3%]   4[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
    2[                                                          0.0%]   5[|||||||||||||||||||||                                     32.9%]
  Mem[||||                                               50.4M/3.78G] Tasks: 19, 0 thr, 87 kthr; 2 running
  Swp[                                                         0K/0K] Load average: 0.08 0.05 0.01
                                                                      Uptime: 01:24:11

I've been wrong before - many times - but I recall 4 and 5 were the big cores. The big cores run at a faster clock, so if you check the box in htop setup to add CPU frequency that should settle the issue.

1 Like

Yep, you are right. While you responded I was just testing this. Confirmed that cores 4 and 5 are the big ones.

So core 4 (big) reaching 100% of load could be in fact the cause that was limiting SQM performance. I had software flow offloading enabled, this was the only thing I changed. I will try to repeat the test without it to see if it makes any difference.

I've repeated the tests, something weird is going on.

With R4S+22.03.5 I was able to replicate my previous results consistently. Just in case I also disabled Software Flow Offloading, results were the same (download ˜730Mbps with cake/piece_of_cake).

But then I decided to try R4S+23.05.0. This is when thing got weird. SQM seems to be using randomly LITTLE and big cores.

With 23.05.0 when a big core was used I was able to get the same results as 22.03.5 (download ˜730Mbps with cake/piece_of_cake).

However when SQM was assigned to a LITTLE core, then the speed dropped to only ˜540Mbps.

I really don't understand what is going on...

R4S+OpenWrt 23.05.0+CAKE:

Test 1 (SQM selected a big core, download with SQM/CAKE ~730Mbps)

Test 2 (SQM selected a LITTLE core, download with SQM/CAKE ~540Mbps)

​​

​​

I have found best results with sqm were so that let the little cores do the queues and big cores the eth irqs.

# Set CPU0-3 to handle the eth0 queues
echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus
#  Set CPU0-3 to handle the eth1 queues
echo f > /sys/class/net/eth1/queues/rx-0/rps_cpus
# Set CPU4 to handle eth0 affinity
echo 10 > `dirname /proc/irq/*/eth0`/smp_affinity
#  Set CPU5 to handle eth1 affinity
echo 20 `dirname /proc/irq/*/eth1`/smp_affinity

For me the affinity was already set to big cores for the eth devices by default.

Ive had best results with

	set_interface_core 4 "eth0"
	set_interface_core 8 "eth1"
	#spread the queues to cpu 4 & 5 
	find /sys/class/net/eth*/queues/[rt]x-[01]/[rx]ps_cpus -exec sh -c '[ -w {} ] && echo 30 > {} 2>/dev/null' \;

or

	set_interface_core 4 "eth0"
	set_interface_core 8 "eth1"
	echo -n 10 > /sys/class/net/eth0/queues/rx-0/rps_cpus
	echo -n 20 > /sys/class/net/eth1/queues/rx-0/rps_cpus
	;;

im using PPPOE

With 22.03.5 this made things worse for SQM. From ~730Mbps with CAKE/piece_of_cake it went down to ~460Mbps:

~ $ iperf3 -c 192.168.1.151 -R
Connecting to host 192.168.1.151, port 5201
Reverse mode, remote host 192.168.1.151 is sending
[  5] local 192.168.2.177 port 59401 connected to 192.168.1.151 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  52.0 MBytes   436 Mbits/sec
[  5]   1.00-2.00   sec  55.2 MBytes   464 Mbits/sec
[  5]   2.00-3.00   sec  56.4 MBytes   473 Mbits/sec
[  5]   3.00-4.00   sec  54.2 MBytes   455 Mbits/sec
[  5]   4.00-5.00   sec  55.8 MBytes   467 Mbits/sec
[  5]   5.00-6.00   sec  55.1 MBytes   462 Mbits/sec
[  5]   6.00-7.00   sec  56.4 MBytes   473 Mbits/sec
[  5]   7.00-8.00   sec  52.5 MBytes   441 Mbits/sec
[  5]   8.00-9.00   sec  56.4 MBytes   472 Mbits/sec
[  5]   9.00-10.00  sec  53.2 MBytes   447 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   550 MBytes   461 Mbits/sec    0             sender
[  5]   0.00-10.00  sec   547 MBytes   459 Mbits/sec                  receiver

Packet steering? irqbalance? I'm using packet steering and irqbalance with the defaults except time interval reduced to 7s on a very recent (5.15.137 kernel) snapshot.

My half gig ISP service isn't capable of stressing my R4S, even less so on a download test maxxed out at 333 Mbps over longish range WiFi; however, FWIW, I maybe see a little more load distribution across the cores than in your tests.

Pre test CPU utilization was ~3%. With some really dumb estimates, if CAKE/layer_cake is on core 2, then 1.4/0.816 x (1-0.03)/(0.574) x 333 is around 965 Mbps, which I find hard to believe on a 1.4 GHz A53 core LOL...and if it is on the 1.8 GHz A72 core 4 then 1.8/0.6 x (1-0.03)/(0.549) x 333 >> full Gig.

IDK dsouza, I'm out of ideas and I doubt I helped you much, but that's what I'm seeing.

Did you do it for both eth0 and eth1? One eth device should be on CPU4 and other on CPU5.
How it was by default? You can check with

cat `dirname /proc/irq/*/eth0`/smp_affinity
cat `dirname /proc/irq/*/eth1`/smp_affinity

Packet steering? irqbalance? I'm using packet steering and irqbalance with the defaults except time interval reduced to 7s on a very recent (5.15.137 kernel) snapshot.

Yep, I've tried packet steering (enabled and disabled), and I've also installed irqbalance and it did not make any difference (with everything default).

Notice that the setup I'm using is a test environment. Luckly my ISP does not have bufferbloat issues (fiber 700/350 D/U), so my R4S is running fine without SQM. My initial intention was to collect the R4S SQM performance compared to the R5S to provide advice to some folks that need SQM (usually cable modem/DOCSIS) and were discussing between R4S and R5S.

IDK dsouza, I'm out of ideas and I doubt I helped you much, but that's what I'm seeing.

I really appreciate your insights, thank you! For now I'm done with these tests, in case anyone in the future do more tests with the R4S+SQM the data I've collected can be used as reference (hopefully to confirm that R4S can indeed reach 1Gbps with CAKE/pice_of_cake).

Thanks!

Hmm, I'm at 22.03.5 (r20134-5f15225c1e) and did also some testing as I haven't used SQM in download with this version before. With piece of cake LITTLE cores gets saturated already with ~60-70MB/s. Eth0 and eth1 IRQs are set to cpu 4 and 5 and cpu usage for them are ok.
Fq_codel with simple, simplest and simplest_tbf it works fine.

Previous Openwrt version I had was 21.02 r16307-4b212b1306 and it handled 1Gbit with piece of cake just fine with those slower cores.

1 Like

Thank you. Your observations seem to confirm the results I've measured in my tests.

BTW, I've noticed that with 23.05.0, enabling "Software flow offloading" does improve SQM CAKE to up ~900Mbps (despite the warning in LuCI saying it is not fully compatible with SQM). However with 23.05.0 there still the issue that SQM may be assigned to LITTLE cores causing the download speed to reduce by half.

I'm wondering that there could be some patch in 22.03 that would set SQM CPU affinity to the big cores, and perhaps that patch was removed from 23.05 causing such regression. Anyway, just a theory.


#CPU Performance 
find /sys/devices/system/cpu/cpufreq/ -name scaling_governor | while read GOVERNOR ; do echo performance > $GOVERNOR ; done

Have you tried setting the cpufreq to performance?

SFO break cake, for full gigabit you need to overlock the cores on nanopi r4s

TL;DR: I've seen absolutely no issues with either 22.03 or 23.05, with SQM on 500/20 cable internet.

I am running two custom builds based on the anaelorlinski repo (22.03 and 23.05). My changes involve:
-overclocking big/little to 2.0GHz/1.6GHz,
-setting the governor minimum to 816 MHz for all cores,
-pinning eth0 irq and rx queues to Big cores
-pinning eth1 interrupts and rx queues to little cores via hotplug.d script
-all my preferred packages.

My production (family) router is running 22.03, while my dev router is running 23.05.

I have asymetric cable (500/20) with SQM cake/piece of cake enabled. My CPU loads stay under 100% and rarely jump to max freq. My iperf3 tests are ~940 up, ~940 down, with per-core CPU loads below 100% and frequency below max. I haven't performed simultaneous tx/rx.

No one in my house has had issues with 22.03--wife, five kids schooling at home, plus my remote work office. I put 22.03 into production a bit over a month ago, replacing 21.02.

With my personal devices I connect mainly to the 23.05 Dev router via vlans+SSIDs and smart switches; I've had no issues at all.

Both dev and prod are configured more or less identically, including an LXC for pihole and an LXC for Omada controller.

Have you got any info I the 3 custom tweaks that you have made, interested in trying these :+1:

For overclocking, I used info previously posted on this forum topic. Search for 'overclock' or '2.0GHz', etc. The overclock must be compiled in. A dtsi file can be edited to point to a different operating point dtsi (but that only gets you to 1.5GHz on the little cores) or a patch on 'arch/arm64/boot/dts/rockchip/rk3399-opp.dtsi' can be used. I added a patch to my local clone of anaelorlinksi's repo. I placed the patch in

openwrt-22.03/patches/target/linux/rockchip/patches-5.10/
openwrt-23.05/patches/target/linux/rockchip/patches-5.15/

I added the governor settings in rc.local

find /sys/devices/system/cpu/cpufreq/ -name rate_limit_us | while read RATEEVAL ; do echo 1000 > $RATEEVAL ; done # was 10000
find /sys/devices/system/cpu/cpufreq/ -name scaling_min_freq | while read MINFREQ ; do echo 816000 > $MINFREQ ; done # was 408000

For the cpu affinity I modifed /etc/hotplug.d/net/40-net-smp-affinity

friendlyarm,nanopi-r4s)
        set_interface_core 10 "eth0"
        echo 20 > /sys/class/net/eth0/queues/rx-0/rps_cpus
        set_interface_core 01 "eth1"
        echo 02 > /sys/class/net/eth1/queues/rx-0/rps_cpus
        ;;
esac

Cheers for that, i may struggle to build own image but will certainly try other 2 tweaks

Hello guys! I am begging for you advice :pray:. I am not highly experienced in OpenWRT

All my attempts to expand rootfs has failed. I have no idea where to move next. Can someone help?

I made sysupgrade ontop of FriendlyWRT using squashfs 23.05.0 image at here https://downloads.openwrt.org/releases/23.05.0/targets/rockchip/armv8/

Initially I had this:

root@FriendlyWrt:~# lsblk -o +FSTYPE,UUID; losetup -l
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
                                               FSTYPE   UUID
loop0         7:0    0 100.8M  0 loop /overlay f2fs     460d5ffe-7d86-11ee-9410-dd1eda0fa7b4
mmcblk1     179:0    0    15G  0 disk                   
├─mmcblk1p1 179:1    0    16M  0 part          ext4     84173db5-fa99-e35a-95c6-28613cc79ea9
└─mmcblk1p2 179:2    0   104M  0 part /rom     squashfs 
NAME       SIZELIMIT  OFFSET AUTOCLEAR RO BACK-FILE  DIO LOG-SEC
/dev/loop0         0 3342336         1  0 /mmcblk1p2   0     512

I followed Anael's @anaelorlinski instruction here https://github.com/anaelorlinski/OpenWrt-NanoPi-R2S-R4S-Builds/blob/main/docs/resize-f2fs.md.

Resized mmcblk1p2 partition, rebooted and went to the next step

first concern I had when I saw 100MB total instead of full SD card free space at check operation with fsck.f2fs

root@FriendlyWrt:~# echo ${OFFS} ${LOOP} ${ROOT}
3342336 /dev/loop1 /dev/mmcblk1p2

root@FriendlyWrt:~# losetup -o ${OFFS} ${LOOP} ${ROOT}
root@FriendlyWrt:~# fsck.f2fs -f ${LOOP}
.....
Info: total FS sectors = 206464 (100 MB)
Info: CKPT version = 2d07486d
Info: checkpoint state = 44 :  crc compacted_summary sudden-power-off

[FSCK] Max image size: 30 MB, Free space: 70 MB
.....

anyway I ran the resize with root@FriendlyWrt:~# resize.f2fs ${LOOP}
after reboot R4S did not start.

I tried all over again. The same result.

Then I decided to try to implement the idea of placing overlay on the separate partition. I created a new partition with fdisk and formatted it with ext4 file system.

root@FriendlyWrt:~# fdisk -l /dev/mmcblk1
Disk /dev/mmcblk1: 15 GiB, 16106127360 bytes, 31457280 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x5452574f

Device         Boot  Start      End  Sectors  Size Id Type
/dev/mmcblk1p1 *     65536    98303    32768   16M 83 Linux
/dev/mmcblk1p2      131072   344063   212992  104M 83 Linux
/dev/mmcblk1p3      344064 31457279 31113216 14.8G 83 Linux

root@FriendlyWrt:~# blkid
/dev/loop0: LABEL="rootfs_data" UUID="d012ff86-614c-11e2-a6cc-67864aa44e8c" BLOCK_SIZE="4096" TYPE="f2fs"
/dev/mmcblk1p3: UUID="9f64749b-f521-4e42-a797-81f4da618b00" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="5452574f-03"
/dev/mmcblk1p1: LABEL="kernel" UUID="84173db5-fa99-e35a-95c6-28613cc79ea9" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="5452574f-01"
/dev/mmcblk1p2: BLOCK_SIZE="262144" TYPE="squashfs" PARTUUID="5452574f-02"

mounted and copied overlay content

root@FriendlyWrt:~# mount /dev/mmcblk1p3 /mnt
root@FriendlyWrt:~# cp -a -f /overlay/. /mnt
root@FriendlyWrt:~#  ls -la /mnt
drwxr-xr-x    5 root     root          4096 Oct 10 05:45 .
drwxr-xr-x    1 root     root          3488 Nov  8 01:19 ..
lrwxrwxrwx    1 root     root             1 Nov  8 13:28 .fs_state -> 2
drwx------    2 root     root         16384 Nov  8 01:54 lost+found
drwxr-xr-x   10 root     root          4096 Nov  8 01:19 upper
drwxr-xr-x    3 root     root          4096 Nov  8 05:24 work

added mount overlay section in /etc/config/fstab and also changed delay_root to 15

config global
	option anon_swap '0'
	option anon_mount '0'
	option auto_swap '1'
	option auto_mount '1'
	option delay_root '15'
	option check_fs '0'

config 'mount'
        option target '/overlay'
        option uuid '9f64749b-f521-4e42-a797-81f4da618b00'
        option enabled '1'

rebooted... and no effect

root@FriendlyWrt:~# df -hT
Filesystem           Type            Size      Used Available Use% Mounted on
/dev/root            squashfs        3.3M      3.3M         0 100% /rom
tmpfs                tmpfs           1.9G     84.0K      1.9G   0% /tmp
/dev/loop0           f2fs           98.8M     51.7M     47.1M  52% /overlay
overlayfs:/overlay   overlay        98.8M     51.7M     47.1M  52% /
tmpfs                tmpfs         512.0K         0    512.0K   0% /dev

mount output

root@FriendlyWrt:~# mount
/dev/root on /rom type squashfs (ro,relatime,errors=continue)
proc on /proc type proc (rw,nosuid,nodev,noexec,noatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,noatime)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noatime)
/dev/loop0 on /overlay type f2fs (rw,lazytime,noatime,background_gc=on,nodiscard,no_heap,user_xattr,inline_xattr,inline_data,inline_dentry,flush_merge,extent_cache,mode=adaptive,active_logs=6,alloc_mode=reuse,checkpoint_merge,fsync_mode=posix,discard_unit=block)
overlayfs:/overlay on / type overlay (rw,noatime,lowerdir=/,upperdir=/overlay/upper,workdir=/overlay/work)
tmpfs on /dev type tmpfs (rw,nosuid,noexec,noatime,size=512k,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,noatime,mode=600,ptmxmode=000)
debugfs on /sys/kernel/debug type debugfs (rw,noatime)
bpffs on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,noatime,mode=700)

logread output

root@FriendlyWrt:~# logread | sed -n -e "/ preinit /,/ init /p"
Wed Nov  8 14:04:06 2023 user.info kernel: [    1.282114] init: - preinit -
Wed Nov  8 14:04:06 2023 kern.notice kernel: [    1.607171] random: jshn: uninitialized urandom read (4 bytes read)
Wed Nov  8 14:04:06 2023 kern.notice kernel: [    1.646052] random: jshn: uninitialized urandom read (4 bytes read)
Wed Nov  8 14:04:06 2023 kern.notice kernel: [    1.686436] random: jshn: uninitialized urandom read (4 bytes read)
Wed Nov  8 14:04:06 2023 kern.warn kernel: [    1.766271] dw-apb-uart ff1a0000.serial: forbid DMA for kernel console
Wed Nov  8 14:04:06 2023 kern.info kernel: [    3.824685] loop0: detected capacity change from 0 to 212992
Wed Nov  8 14:04:06 2023 kern.info kernel: [    3.907702] loop0: detected capacity change from 212992 to 206464
Wed Nov  8 14:04:06 2023 kern.notice kernel: [    4.211595] F2FS-fs (loop0): Mounted with checkpoint version = 69fe4bfb
Wed Nov  8 14:04:06 2023 user.info kernel: [    4.213196] mount_root: switching to f2fs overlay
Wed Nov  8 14:04:06 2023 kern.info kernel: [    4.278851] EXT4-fs (mmcblk1p1): mounted filesystem without journal. Opts: (null). Quota mode: none.
Wed Nov  8 14:04:06 2023 user.warn kernel: [    4.292818] urandom-seed: Seeding with /etc/urandom.seed
Wed Nov  8 14:04:06 2023 user.info kernel: [    4.332863] procd: - early -
Wed Nov  8 14:04:06 2023 user.info kernel: [    4.333208] procd: - watchdog -
Wed Nov  8 14:04:06 2023 user.info kernel: [    4.885980] procd: - watchdog -
Wed Nov  8 14:04:06 2023 user.info kernel: [    4.887093] procd: - ubus -
Wed Nov  8 14:04:06 2023 kern.notice kernel: [    4.915903] random: ubusd: uninitialized urandom read (4 bytes read)
Wed Nov  8 14:04:06 2023 kern.notice kernel: [    4.939692] random: ubusd: uninitialized urandom read (4 bytes read)
Wed Nov  8 14:04:06 2023 kern.notice kernel: [    4.940632] random: ubusd: uninitialized urandom read (4 bytes read)
Wed Nov  8 14:04:06 2023 user.info kernel: [    4.942572] procd: - init -

also PREINIT is not present in /etc/rc.local

root@FriendlyWrt:~# cat /etc/rc.local
# Put your custom commands here that should be executed once
# the system init finished. By default this file does nothing.

exit 0

I suggest to use imagebuilder (don't confuse with compilation, imagebuilder just builds flashable images from binary packages) and build image with resized rootfs
after entering imagebuilder directory edit .config file and change CONFIG_TARGET_ROOTFS_PARTSIZE (size in MB).

1 Like