Performance issue (CPU bottleneck on ksoftirqd/0) on current OpenWrt snapshot (Raspberry Pi4)

I saw this high usage when I first tested RPi4 using an Amazon basics USB device which is why I switched to the TP-Link UE300. I had assumed it was just lousy drivers or chip design since using the UE300 solved it. I hope your next delivery is quick!

1 Like

Hi there !

I received my working rtl8153 (TP-Link UE300) adapter.

Unfortunately it didn't solve the issue compared to the ax88179 : CPU Bottleneck (25% out of 100% being sirq, as a mix of ksoftirqd and kworker) and it can no longer route a 1000 Mbps traffic.

With kernel 5.4, from WAN to LAN (download things from the Internet to my computer with TCP/IP, be it speedtest torrent https or iperf etc) bandwidth is maxed out at 600 Mbps before CPU saturates, and ~800 Mbps upstream.

(I'm using the internal NIC as LAN and USB NIC as WAN).

With kernel 5.10 things improved but aren't solved (730 Mbps download before CPU bottleneck, and 1000 Mbps upload with 24% sirq : the 1 Gbps upload routing just fits)

Download information screenshot with kernel 5.4 :

Upload information screenshot with kernel 5.4

Download information screenshot with kernel 5.10 :

Upload information screenshot with kernel 5.10

As I'm using the same installation than the one I used for ax88179, both drivers are installed (and may be loaded?)
When I configured it last time, I had installed the following packages :

opkg update

opkg install luci openssh-sftp-server kmod-usb-net-asix-ax88179 kmod-usb-net-rtl8152 kmod-rt2800-usb bwm-ng iperf iperf3 tcpdump openvpn-openssl

So I'll try to do the installation again, but installing only them one by one - I'll begin with luci and kmod-usb-net-rtl8152, see if the problem persists.

  • If so I may post my image and/or config files to compare with EnfermaSexy (as he isn't encoutering the same issue).
  • If not I'll install the others packages and see if/when performance drops

I'll keep you informed as soon as this test is complete!
See you later

this will smash your sirq's at high rates...

1 Like

So I just installed OpenWRT last snapshot
OpenWrt SNAPSHOT r15695-8286f3a3d3 / LuCI Master git-21.029.67968-af1f961

After the first start I temporarily placed eth0 as DHCP client (by editing /etc/config/dhcp and network)

opkg update
opkg install luci kmod-usb-net-rtl8152

Then reconfigured eth0 as before in "dhcp" and "network" files, and rebooted.
WAN and WAN6 (DHCP and DHCPv6) are created on eth1 (the rtl8153 in this case)

Same problem with rtl8153 than with ax88179.

My question is : @EnfermeraSexy did you do something that is supposed to avoid this performance bottleneck regression, for being able to route 1000 Mbps?

I may share this install's rootfs if necessary but from the few things I configured on it (described above: almost nothing), I'm not sure it would be really interesting to see. But if someone believe it's useful, I can share it

I never say routing. I say that if i run an iperf through the adapter my CPU usage is near 0%. But routing are other words.

1 Like

Thanks so now everything is clear :smile: (no problem of course that's fine, I should have used more clear words in my questions before).

The TP-Link dongle is cheap so even if I eventually replaced the ax88179 edimax dongle by a rtl8153 tp-link one for almost nothing, after all the result I found is still a result ^^ even if not the expected one.

Interesting (even if bad news) to know that this regression issue is affecting OpenWRT Pi4 users with home 1Gbps fiber, no matter if they are using rtl8153 or ax88179.

Unfortunately the newer kernels are only partially correcting the issue. So here I'm at the end of what I can understand and debug. I'm confident someone will be able, some day, to give back its performance to Pi4 on this subject with new kernels, but for personally it's out of my scope.

If the issue is affecting both rtl8153 and ax88179 on arm64 on bcm2711 I guess they have a common cause - but when I guessed so about integrated NIC and USB NIC I was wrong (as things improved into new kernels for integrated NIC but not for the USB one) so I probably won't make too many bets on this subjects :smile:

Try putting irqbalance on the device. It's 4 core so perhaps you split the WAN and LAN across different cores and this is the solution. I believe in my testing I was using it.

2 Likes

Sounds interesting!

On the old version I'm using there was almost no CPU usage at 1000 Mbps routing (both ways) so even if it was on 1 single core it was fine.

On this new version, if CPU usage doesn't fit 1 core anymore, a workaround may indeed be to use the others. I'll try to learn about that this evening (I should leave my computers few hours for now ^^)

Also look at the cpufreq setup, perhaps the processor speed is being kept low, maybe some defaults changed.

I know its a low speed, but thats the CPU usage that i get using SQM too. Do you use software offload?

Following up on my cpufreq message, on my Pi which is running raspbian for routing, I did:

cd /sys/devices/system/cpu/cpu0/cpufreq
watch -n 1 cat scaling_cur_freq

and found that at idle it was 600MHz but during a speed test it rapidly rose to 1500MHz. Check to see that this behavior is what you're seeing as well... On my Pi I have

$ cat scaling_governor
ondemand
1 Like

I think raspbian has these set at boot if I remember correctly:

echo 600000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo 600000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo 600000 > /sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq
echo 600000 > /sys/devices/system/cpu/cpu3/cpufreq/scaling_min_freq
echo 50 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 50 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
echo 100000 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate

On OpenWRT you can paste them to /etc/rc.local

Hi, I'm back (using OpenWrt SNAPSHOT r15696-17fa01bb79 / LuCI Master git-21.029.67968-af1f961 as a new snapshot openwrt-bcm27xx-bcm2711-rpi-4-ext4-factory.img.gz has been made available in the afternoon)

So for CPU frequency no problem about this point:
While doing nothing:

cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq 
600000

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 
600000

During bottleneck:

cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq 
1499999

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 
1500000

Governor is preset on "ondemand"

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 
ondemand

irqbalance is doing something, however it isn't really doing enough to compensate the regression:

TP-Link UE300 - rtl8153 without / with irqbalance running:
Up: 800 Mbps without / 1000 Mbps with
Down: 600 Mbps without / 800 Mbps with
25% sirq seems capped to 33% sirq with irqbalance running

Edimax EU-4306 - ax88179 without / with irqbalance running:
Up: 600 Mbps / 900 Mbps
Down : 430 Mbps / 670 Mbps
25% sirq seems capped to 50% sirq with irqbalance running

@EnfermeraSexy I guess I'm not using any of these (unless it is activated by default), may be I should try to find those features and enable/disable them to see if performance is improving

It's not, you will find it in firewall tab.

You should really set these if you want to compare performance to Raspbian

        echo 50 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
        echo 100000 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate
        echo 50 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

Results on:

Raspbian with kernel rpi-5.4.83-v7l+ armv7l
Raspbian with kernel rpi-5.4.83-v8+ aarch64
Raspbian with kernel rpi-5.10.11-v8+ aarch64 (they are now using the new LTS since few days ago)

cat /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
50

cat /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate
100000

cat /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
50

On my old OpenWrt Snapshot (not having the performance issue)
OpenWrt SNAPSHOT, r11631-deb835849a - kernel 4.19.86 #0 SMP Sat Dec 7 23:21:36 2019 aarch64

cat /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
95

cat /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate
20000

cat /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
1

On the last one OpenWrt SNAPSHOT, r15696-17fa01bb79 - kernel 5.4.95 #0 SMP Sat Feb 6 16:13:33 2021 aarch64

root@Pi4-OpenWrt:~# cat /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
95

root@Pi4-OpenWrt:~# cat /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate
10000

root@Pi4-OpenWrt:~# cat /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
1

I tried to change the values (and check back if they were applied) using

/sys/devices/system/cpu/cpufreq/ondemand/up_threshold && echo 100000 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate && echo 50 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

But it had no effect.

@EnfermeraSexy
For software flow offloading, once enabled, it does a little something for download rate over the router (it goes from ~450 Mbps to ~550 Mbps)
However it reduces a little bit the upload rate (from ~600 Mbps to ~540 Mbps)

Anyway I believe it's disconnected from what caused the regression, as I'm not using those features (both on the new and old version).

Finding the regression is (probably) more about kernel versions testing to see exactly when the problem was created.
Guys who are good at programming and optimizing performance on drivers may probably be even faster to see where/why the CPU is used too much compared to before.

I'm thinking of 2 others tracks to find out when/why the regression occured:
When I tried kernel rpi-4.19.y I was unable to get USB 3 working on my Pi4.
Where can we find the source code of the 4.19.86 kernel which was used on OpenWrt SNAPSHOT, r11631-deb835849a? Was it the rpi-4.19.86 one? If so, I'm wondering why I cannot have USB3 working on it.

For doing those tests, I'm using a modern Pi4 with 8GB RAM, which already ran new versions of raspbian : I believe the VIA VL805 USB3 chip is often updated (eeprom update), while my old Pi4 (one of the very 1st ones, 2GB RAM) is still running with old VL805 firmware, as I didn't run any Raspbian on it for at least 1 year. I believe I need to run the new OpenWRT on the old Pi4 just to see if something changed because of hardware or eeprom.

According to:
https://www.raspberrypi.org/documentation/hardware/raspberrypi/booteeprom.md :
"vl805.bin The VLI805 USB firmware EEPROM image - ignored on 1.4 board revision which does not have a dedicated VLI EEPROM"

It's not impossible that these changes are triggering the troubles, after all.
"boot" folder is also containing things that may do some difference when running kernel that worked fine before?

So I tested running

  • Old OpenWrt version on the New RPi4 board (which is rev 1.4, 8GB RAM)
  • New OpenWrt version on the Old RPi4 board (which is rev (? one of the firsts available boards), 2GB RAM).

For the Old OpenWrt version on the new board, the boot files (startup4.dat and fixup4.dat) had to be updated in order to boot on the new board (blinking green led and clear message when I connected a screen). Then, the system started but the USB ports aren't working with new boards on kernels < 5.4 - so I had no USB. It confirms the fact that old kernels can't make USB working, using new boards (but those kernels are able to on old boards). It may be a change in the kernel itself, or with its associated device-trees for bcm2711.

For the New OpenWrt version, it is behaving the same way on both old and new RPi 4 boards. So it is clear that the hardware revision / VL805 firmware upgrades aren't triggering the issue.

To figure out where between 4.19 kernel and 5.4 kernel the regression occured about USB3 NIC devices CPU usage, I'll have to do the tests on the old board! I'm running out of time for now but I'll keep you informed as soon as I have new test results about that.

1 Like

I have a similar setup as you and I'm getting fluctuating download speeds too with my 1gig FTTH with Bell.

I have the Rpi4b with the TP-LINK UE300 as my eth1 (WAN) and the onboard nic as my eth0 (LAN)

The results you see above is with today's SNAPSHOT that I compiled myself with Routing/NAT Offloading OFF.

I'm interested to see what type of results you'll be getting with your tests. Let me know if I can be of any assistance in any way. I may be limited on certain tests as this is running as my primary router and my kids will rip me a new one if the internet goes down for more than 5 minutes lol

edit:
My hardware:

  • TP-Link MC220L Gigabit Media Converter (to convert the fibre to eth)
  • TP-LINK TL-SG105E (v1) 5-Port Gigabit Easy Smart Switch (VLAN tagging, 1 port to the HH3K, 1 port to the Rpi4b)
  • TP-Link 16 Port Gigabit Ethernet Network Switch (TL-SG116) (my LAN)
  • TP-Link Foldable USB 3.0 to 10/100/1000 Gigabit Ethernet Network Adapter (UE300) (used with the Rpi4b for eth1)

Currently my tp-link TL-SG105E (v1) smart switch is setup this way:

  • Port 1: Ethernet from Fibre converter (1gig FTTH).
  • Port 2: Bell HH3000 (tv) vlan 35/36 tagged)
  • Port 3: Rpi4b (tp-link UE300) pvid 35 tagged
  • Port 4: management to LAN switch
  • Port 5: unused

ran a second test with a different Bell server in the Toronto area and got better results:

I didn't change any of my settings. Weird Wild Stuff :thinking:

Most speed test sites aren't really capable of providing you with the bandwidth needed to test a gigabit fiber while also testing all the other people who are simultaneously running tests. It's not that surprising really.

To test kernel regressions, you really need a controlled test-bench with two devices running iperf3.