High latency on NanoPi R4S

pedroluizsro · August 25, 2022, 7:54pm

Hello,

I'm doing some tests on my network and I notice that the latency is above the desired with my NanoPi R4S using OpenWRT.

The ping for the internal wired network is always greater than 1 ms, which I find strange because in tests without it on the network the tests are below 1 ms.

Even the ping to localhost is too high to not even be going out to the network:

root@openwrt:~# ping localhost
PING localhost (::1): 56 data bytes
64 bytes from ::1: seq=0 ttl=64 time=0.491 ms
64 bytes from ::1: seq=1 ttl=64 time=0.580 ms
64 bytes from ::1: seq=2 ttl=64 time=0.449 ms
64 bytes from ::1: seq=3 ttl=64 time=0.446 ms
64 bytes from ::1: seq=4 ttl=64 time=0.448 ms
64 bytes from ::1: seq=5 ttl=64 time=0.401 ms

On my computer ping localhost:

pedro@pop-os:~$ ping localhost
PING localhost(localhost (::1)) 56 data bytes
64 bytes from localhost (::1): icmp_seq=1 ttl=64 time=0.055 ms
64 bytes from localhost (::1): icmp_seq=2 ttl=64 time=0.047 ms
64 bytes from localhost (::1): icmp_seq=3 ttl=64 time=0.052 ms
64 bytes from localhost (::1): icmp_seq=4 ttl=64 time=0.049 ms

When I ping the next WAN hop, the ping exceeds 1ms:

root@openwrt:~# ping 192.168.0.1
PING 192.168.0.1 (192.168.0.1): 56 data bytes
64 bytes from 192.168.0.1: seq=0 ttl=64 time=1.438 ms
64 bytes from 192.168.0.1: seq=1 ttl=64 time=1.199 ms
64 bytes from 192.168.0.1: seq=2 ttl=64 time=1.206 ms
64 bytes from 192.168.0.1: seq=3 ttl=64 time=1.213 ms
64 bytes from 192.168.0.1: seq=4 ttl=64 time=1.297 ms

Is there any solution to improve latency? I use cat6a cables, testing directly on my other TP-link router this does not happen.

I look forward to everyone's help.
Thanks.

CopperCassette · August 25, 2022, 11:03pm

That does seem rather high for an ethernet interface.

I'd be suspicious that the R4S might be doing something to save power, like perhaps keeping some peripheral asleep and only checking it every 1ms or so.

Have you tried stressing the networking slightly (eg by transferring a file to the R4S at 50Mbit/s) and running the ping at the same time?

EDIT: It looks like the two physical network ports are implemented in different ways:

Ethernet: one Native Gigabit Ethernet, and one PCIe Gigabit Ethernet

Have you tried to see if the problem only exists on one physical port?

rhester72 · August 25, 2022, 11:15pm

I hadn't noticed this before, but can confirm it as well. It doesn't matter what internal interface you ping (localhost, WAN, LAN, or even VPN)...there is about 3x the expected response latency. Very odd.

rhester72 · August 25, 2022, 11:29pm

The more I look at this, the more mystified I become.

Forcing IPv4 helps slightly, but by the expected margin, so that's no real surprise.

Interrupts crossing cores isn't a factor because localhost doesn't use interrupts.

Forcing an outbound interface to ensure that you aren't internally routing between interfaces makes no difference.

The software bridge doesn't seem to be a contributing factor, because eth0 suffers as much as br-lan/eth1.

I'm truly lost.

CopperCassette · August 26, 2022, 3:51am

It doesn't matter what internal interface you ping (localhost,

Wait what? So if you type either of these commands whilst on the actual device:

$ ping 127.0.0.1
$ ping localhost

... then you see 1msec?

Is this thing running hot and clock skipping? Does it have a heatsink?

For reference this is what I see across a few of my devices (just tested now):

rtt min/avg/max/mdev   = 0.122/0.161/0.220/0.025 ms  OrangePi R1 Plus (non-openwrt, custom Debian-based, would not recommend this device): 
rtt min/avg/max/mdev   = 0.023/0.041/0.065/0.008 ms  My desktop x86 computer (non-openwrt)
round-trip min/avg/max = 0.119/0.131/0.210 ms        PCengines APU2 (openwrt 21.02)
round-trip min/avg/max = 0.287/0.297/0.386 ms        Ubiquity Unifi AP AC lite (openwrt 21.02)

To be honest some of those figures are slower then I expected, but I guess they make sense on low-clocked & slow hardware. I still would not expect the RK3399 powered NanoPi R4S to be slower than my MIPS-based Unifi AP AC lite.

EDIT: Maybe you have a lot more firewall and networking rules for the kernel to parse before packets come back? Perhaps try backing up your config and trying again with a default/simple network config?

choppyc · August 26, 2022, 9:27am

@pedroluizsro What Firmware are you currently using?

rhester72 · August 26, 2022, 3:22pm

Not over 1msec. This is 127.0.0.1:

64 bytes from 127.0.0.1: seq=0 ttl=64 time=0.216 ms
64 bytes from 127.0.0.1: seq=1 ttl=64 time=0.327 ms
64 bytes from 127.0.0.1: seq=2 ttl=64 time=0.327 ms

It's not running hot or clock skipping, the LA is 0.00. The case of the device is a passive heatsink.

In comparison, I'm seeing this on a RPI 4B:

64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.082 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.086 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.086 ms

The firewall is pretty standard stuff with 5 forwards and 7 rules via nftables, so I hardly think that's a contributor (but cannot temporarily disable to test because this is an active router). Nothing particularly strange about the network either - very basic WAN, LAN, DMZ and VPN (Wireguard) networks defined.

rhester72 · August 26, 2022, 3:27pm

22.03.0-rc4

pedroluizsro · August 26, 2022, 4:57pm

Thanks for the answer

Both for localhost and for 127.0.0.1 the ping is too high for an internal network.

PING localhost (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=64 time=0.658 ms
64 bytes from 127.0.0.1: seq=1 ttl=64 time=0.417 ms
64 bytes from 127.0.0.1: seq=2 ttl=64 time=0.414 ms
64 bytes from 127.0.0.1: seq=3 ttl=64 time=0.416 ms

PING 127.0.0.1 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=64 time=0.517 ms
64 bytes from 127.0.0.1: seq=1 ttl=64 time=0.416 ms
64 bytes from 127.0.0.1: seq=2 ttl=64 time=0.418 ms
64 bytes from 127.0.0.1: seq=3 ttl=64 time=0.417 ms

Regarding firewall rules, there is nothing very elaborate, I only have a rule for forwarding to a Wireguard VPN.

As DNS I use NextDNS, I believe that also does not influence this point.

Everything gets worse when I send a ping to some other host on the network, when this is router by router, where the Ping goes up to 2 ms.

I believe that the Hardware is very powerful, it can easily handle the application of SQM on a 400 Mbps connection.

Looking forward to more help from the community.
Thanks.

choppyc · August 26, 2022, 5:43pm

what firmware are you using?

pedroluizsro · August 26, 2022, 5:48pm

It currently shows the following information:

OpenWrt 21.02.1 r16325-88151b8303

rhester72 · August 26, 2022, 5:51pm

How? I didn't think support for the R4S was even introduced until 22.03.

choppyc · August 26, 2022, 5:54pm

ImmortalWrt 21.02.1 r19441+1-f8624db86c / LuCI openwrt-21.02 branch git-22.173.71791-cf096f8

Im currently using this build until 22.03 is official.

@pedroluizsro im guessing That's a friendlyWRT build, you may have to use there forums for support.

Have you tried 22.03rc6? does it exhibit the same behaviour?

pedroluizsro · August 26, 2022, 8:19pm

I'll test it as soon as it's out of beta, I'll wait.

However, I made a change here and there has already been an improvement in latency. I noticed from the comments that colleague CopperCassette talked about Power save and changed the CPU profile for performance.

echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

After this change it has improved, now the ping for localhost is no more than 0.3 ms and for WAN it is below 1 ms.

root@openwrt:~# ping 127.0.0.1
PING 127.0.0.1 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=64 time=0.302 ms
64 bytes from 127.0.0.1: seq=1 ttl=64 time=0.236 ms
64 bytes from 127.0.0.1: seq=2 ttl=64 time=0.237 ms
64 bytes from 127.0.0.1: seq=3 ttl=64 time=0.248 ms
64 bytes from 127.0.0.1: seq=4 ttl=64 time=0.238 ms

root@openwrt:~# ping 192.168.0.1
PING 192.168.0.1 (192.168.0.1): 56 data bytes
64 bytes from 192.168.0.1: seq=0 ttl=64 time=0.942 ms
64 bytes from 192.168.0.1: seq=1 ttl=64 time=0.813 ms
64 bytes from 192.168.0.1: seq=2 ttl=64 time=0.886 ms
64 bytes from 192.168.0.1: seq=3 ttl=64 time=0.883 ms
64 bytes from 192.168.0.1: seq=4 ttl=64 time=0.808 ms
64 bytes from 192.168.0.1: seq=5 ttl=64 time=0.804 ms

But I'm still not satisfied, I believe this can go further, I would like to reduce the 0.5 ms for WAN.

I will continue to ask the forum staff for help. Thanks.

rhester72 · August 26, 2022, 9:06pm

Good call.

This, in /etc/rc.local, brought me to levels I'd expect:

find /sys/devices/system/cpu/cpufreq/ -name scaling_governor | while read GOVERNOR; do
  echo performance > $GOVERNOR
done

pedroluizsro · August 26, 2022, 11:18pm

I don't know if it happened to you, but my router kept restarting by itself randomly. It works for a few hours and then restarts.

rhester72 · August 26, 2022, 11:55pm

Too new of a change to know yet, but I took a look at temps and they seemed literally the same with both performance and schedutil.

To be fair, my router is a router - that's it. It doesn't run any apps or services, so my load average and duty cycle tends to be very, very low.

pedroluizsro · August 27, 2022, 10:18pm

For me it's really restarting automatically without giving any logs, especially when the network is too demanding, like for example in a connection test.

I'm going to try an improvement on my NanoPi, I'm going to put the firmware 22 as they reordered, but I'm going to wait for the arrival of a new sd card with an A2 rating and faster than the one I currently use. I'll see if there is any improvement after this replacement.

rhester72 · August 28, 2022, 12:25am

I imagine the issue is FriendlyWRT itself - would strongly advise migration to OpenWRT proper.

faser · August 28, 2022, 3:06am

Do you have proper cooling for your R4S?