Performance issue (CPU bottleneck on ksoftirqd/0) on current OpenWrt snapshot (Raspberry Pi4)

Hi all,

I'm here because I was going to upgrade the OpenWRT version on my Raspberry Pi4 after lot of time on the previous one. But because of an huge drop of bandwidth performance I'll wait a little bit :wink:

Previous one is OpenWrt SNAPSHOT r11631-deb835849a / LuCI Master git-19.339.73009-ea6d0d2 with kernel 4.19.86

New one is OpenWrt SNAPSHOT r15616-1b1bb6bf19 / LuCI Master git-21.020.56896-af422b1 with kernel 5.4.92

Since I have 1 Gbps fiber at home, I did some speedtest with the new version and I'm blocked at 400 Mbps instead of 981 Mbps on the previous version.

So I compared, in the previous version, a ~1000 Mbps iperf test (iperf -s into OpenWRT iperf -c from local network) didn't use much CPU (apart from iperf itself, no process was above 0% CPU usage). But now the process "ksoftirqd/0" is topping to 25% and using much more CPU power than iperf itself.

Do you know why this changed? Is it possible to correct this performance regression?
Thank you in advance

could be iperf / latest kernel ... possibly something else...

let me know if you need some previous master versions... happy to upload a few for you to try...

perf is great for things like this but i'm not sure that it's still available / on this target...

Thanks, before doing bisection method on the versions between OpenWrt SNAPSHOT r11631-deb835849a and OpenWrt SNAPSHOT r15616-1b1bb6bf19, there may be some easy information to find out on what is already running (let's hope there is enough!).

iperf is not part of the problem : just by having a LAN device downloading over the Internet (be it speedtest.net, fast.com, torrent or just fast http) is enough to make it struggle when new OpenWRT is placed between as router. Even with 980 Mbps bandwidth through it as router, the previous version showed no visible CPU usage under top (or just 1 or 2 tiny percent).

I used iperf to see if the insane CPU usage appears when using only the Integrated NIC or only my USB3 GbE NIC but the insane usage from ksoftirqd/0 appears on both of them.

It seems that the bottleneck can only be into the network stack, or a common point between both NIC drivers.


Bottleneck begins as soon as ksoftirqd/0 tops to 25% (I guess it means single threaded to 1 CPU over 4). This is all "sirq", I guess it means "soft IRQ" but guys like you will be much better than me in understanding what into OpenWRT could be using as much CPU time as "sirq"

Thank you in advance for your suggestions!

1 Like

well random observation... may or may not be related... around the time of dnsmasq-logspam release... my internet went down for half a day... noticed alot of netifd||procd page faults... could be related... ( probably a false positive tho' )

also... probably prudent to mention what USB-NIC driver is in use...

edit: 'mpstat' et.al. may also be useful here...

Maybe latest master has updated USB-NIC driver, which one do you have?

Hi, I'm using latest snapshot.
It's not related to the USB NIC because the issue also exists on the integrated NIC : it's a more general problem.

Abnormal CPU usage is seen on :

  • WAN <-> LAN traffic (normal internet routing use)
  • WAN <-> router traffic (iperf between router and a WAN server)
  • LAN <-> router traffic (iperf between router and a LAN server)

I'll may try to see if Raspberry Pi kernel (or even better, the kernel used on this OpenWRT version) and his related modules have the same issue when placed on Raspbian for example. If so the problem is on the Raspberry Pi kernel side. If not, it's likely to be somewhere into OpenWRT network stack settings - but it's reaching the limits of my knowledge... for now :wink:

Ok scary CPU usage on ksoftirqd/0 issue is present into Raspbian current version with Raspberry Pi kernel 5.4.83-v7l+

It's a problem from outside OpenWRT! I'll try to discuss about this tough regression on Raspberry Pi forums or kernel Github.

Sorry for having taken a little bit of your time, I hope this may help someone googling the same issue!

1 Like

Thanks for calling it out though. Would love it if you come back and let us know what you find, and/or file an OpenWrt bug report describing the fix/workaround/updated kernel version needed etc.

2 Likes

Hi, yes of course I'll keep you informed here as soon as I have a something to make this issue disappear.

I tried building 5.4.92 for my x86_64 PC to check if the problem was a general "5.4 mainline Linux problem" but it doesn't seem to (at least not for x86_64)

I'll build tomorrow the 4.19, 5.4 and 5.10 branches of the raspberry pi kernel to see which one have the issue (I believe the 4.19 won't have the issue, at least 4.19.86 didn't) and then probably open an issue on their github. I may also try to do a check on 4.20, 5.0, 5.1, 5.2, 5.3 to see when the problem appeared (with a 3700X kernel build are fast enough)

1 Like

Hi, here is the results of my tests.

Some details :
I did them on a Pi 4 using only integrated NIC and Raspbian, building kernels using the bcm2711_defconfig file provided within their source code. I did them for both 64 bits and 32 bits, (we are interested in 64 bits but differences in results between 32 bits and 64 bits should probably be known to avoid confusion). I compared those kernels using iperf TCP and UDP tests at ~1000 Mbps rate (with the Pi 4 receiving).

For 64 bits :
Using raspberrypi kernel Github versions, it seems all rpi-4.19.y versions are affected by 1 core at 100% usage from ksoftirqd (at least on what I tested : rpi-4.19.50, rpi-4.19.93, rpi-4.19.127 versions).
Same issue up to current LTS rpi-5.4.83, rpi-5.5.19, and rpi-5.6.19

Problem for 64 bits disappears starting from rpi-5.7.19 (1 core at 10% usage from ksoftirqd 1 Gbps TCP, 0% using UDP).
Into rpi-5.10.10 (next LTS) it's even better : 0% CPU usage from ksoftirqd in both TCP and UDP 1 Gbps tests.

Conclusion for 64 bits:
On another 1Gbps Internet friend's connection, OpenWrt SNAPSHOT r12215-9c19c35d1e / LuCI Master git-20.038.38813-faabe98, Kernel Version 4.19.101 doesn't have the issue.
It means OpenWRT kernels for Raspberry Pi 4 Version 4.19.86 (mine) and 4.19.101 (my friend's one) didn't have the issue while rpi-kernels 4.19.y around those versions had the issue : may be OpenWRT didn't used the same kernel, or had a patch applied over it (which has been lost when switched to 5.4)? I'll try to find those kernels for more testing on Raspbian.

For 32 bits :
Problem didn't exist at all into rpi-4.19.y versions (0% CPU usage from ksoftirqd for both TCP and UDP on rpi-4.19.50, rpi-4.19.93, rpi-4.19.127).
But it appeared starting 4.20.y version (at least rpi-4.20.17) and partially disappeared starting rpi-5.6.y (tested rpi-5.6.19). While on 64 bits the problem only disappears on rpi-5.7.y.

Partially disappeared because in 32 bits, on nowadays kernel versions, TCP traffic still uses 30% of 1 core on ksoftirqd for 1 Gbps TCP/IP test now (while it wasn't using CPU on 4.19.y).

PS :
If someone would like to run those kernels for driving some tests without having to rebuild, here is my builds (modules, dtb, overlays, and kernel file itself of course)
https://pix-server-sorel.pixconfig.fr/Manual/32-bits-rpi-bcm2711-built-kernels.tar.gz
https://pix-server-sorel.pixconfig.fr/Manual/64-bits-rpi-bcm2711-built-kernels.tar.gz

I'll try to search a little bit more into Google to see what changed at rpi-5.6.y and rpi-5.7.y (and after) about network performance. For now I didn't anything interesting apart some guy who encountered the issue, didn't solved it but worked around just enough to get his 4 Gbps on compute module... so nothing interesting about how this issue really got repaired

2 Likes

I just did the test of an rpi-5.10.y kernel into my OpenWrt install

Conclusion :
It completely solved the issue of CPU usage with integrated NIC (0% ksoftirqd/0 0% sirq while receiving or sending ~990 Mbps between the router and a LAN device).

But it didn't solved the issue with the USB3 ax88179 NIC :sob: I should have seen it coming ^^

Anyway now we know it's in the kernel and its modules.

More details :
Even with the 5.10.10 kernel, for routing through both NIC I'm stuck to ~460 Mbps downstream and exactly 600 Mbps upstream (with ksoftirqd/0 hitting 25% in both case). The upstream bandwidth result show a really tiny difference with 5.9.92 (was capped at 560) but still very far of the results obtained with olders OpenWrt kernels 4.19.86 or 4.19.101 (they are routing ~980 Mbps downstream or upstream without having the CPU to bottleneck).
Transferring over the USB NIC only makes ksoftirqd/0 hitting 25% when around 800 Mbps (up or down).

I guess someone who is good at performance tuning and drivers would be able to locate and correct this performance drop, and/or to find out how this has been corrected on rpi-5.7.y compared to rpi-5.6.y, but it's unfortunately beyond my abilities...

In the following days, I may try to do some testing of the different kernels versions against the USB3 ax88179 NIC and try to post an issue on the Raspberry Pi kernel Github if I'm able to spot when the issue appeared with it

Edit :
I found some way to gather information about interrupts usage. The thing that uses as much interrupt on CPU0 when transferring over the USB3 NIC is BRCM STB PCIe MSI 524288 Edge xhci_hcd

root@Pi4-OpenWrt:~# cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
  3:      62334       7093       5125       3030     GICv2  30 Level     arch_timer
 11:        800          0          0          0     GICv2  65 Level     fe00b880.mailbox
 14:          2          0          0          0     GICv2 153 Level     uart-pl011
 15:          0          0          0          0     GICv2 112 Level     bcm2708_fb DMA
 17:         45          0          0          0     GICv2 114 Level     DMA IRQ
 24:          1          0          0          0     GICv2  66 Level     VCHIQ doorbell
 25:       6452          0          0          0     GICv2 158 Level     mmc1, mmc0
 31:     224765          0          0          0     GICv2 189 Level     eth0
 32:       1904          0          0          0     GICv2 190 Level     eth0
 38:          0          0          0          0     GICv2 175 Level     PCIe PME, aerdrv
 39:    2678266          0          0          0  BRCM STB PCIe MSI 524288 Edge      xhci_hcd
IPI0:      1899       5696      43575      30967       Rescheduling interrupts
IPI1:       208        340        104        251       Function call interrupts
IPI2:         0          0          0          0       CPU stop interrupts
IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
IPI4:         0          0          0          0       Timer broadcast interrupts
IPI5:      5032       1331       7589      10575       IRQ work interrupts
IPI6:         0          0          0          0       CPU wake-up interrupts
Err:          0

2 Likes

I've got and RTL8153 and mine is working fine. Just 1% sirq at gigabit.
I can't make the test you made because my network is shared with others, but you made a really good job testing everything.
Seems like the problem is only with ax88179

2 Likes

I think that NIC is just low quality, it has always required a lot more CPU than the UE300 with realtek chip... just replace it :wink:

1 Like

Mmmhh :thinking:

I ordered some rtl8153 based USB3 1Gbps NIC just to see if it avoided this performance regression, and as it will clearly be useful for me to have one.

But unless a change occurred into the ax88179 driver, it's strange that the same problem doesn't occur on both rtl8153 and ax88179 (I'm going to do the test soon, but EnfermeraSexy have you tested this on last snapshot? Because on previous snapshots, both ax88179 and rtl8153 were running perfectly fine* at 1 Gbps). As I wondered if this performance regression issue wasn't located into PCIe or USB bus handling of the rpi kernel.

*by perfectly I mean more than an entire year, awesome stability, no packet loss, 1000 Mbps + 1000 Mbps full duplex with no trouble... the quality hasn't been low, it has been perfect)

If the regression is only about ax88179 driver (which would then probably not be Raspberry Pi related) then it would be an interesting information for later when I'll have some more time to dig into the issue (or if anyone does before I do).

By the way, even if buying new stuff is acceptable for testing, trashing and replacing good working hardware to solve failed software work/regressions should absolutely never be seen as the normal solution :wink:. Too often, if not almost always, this is what is advised. Even if of course for old/rare/unsupported/scary old hardware this is sometimes the only possible way.

It's a chance we do not trash our Raspberry Pi 4 each time some regression or bug occurs on it :sweat_smile:

I'll keep you informed as soon as I have tested this and got enough new information
See you later!

there are at 5-10 users/threads around the forum that would contradict that statement...

your own statements also conflict in this regard...

Oh.. so I guess it depends on the manufacturer who placed the chip on the board (mine has "Edimax" logo on it)
So I hope the one I ordered with rtl8153 isn't as bad :grin: I'll soon get the answer!
EDIT : I had enough time to modify my order, and ordered the Tp-Link one (UE300) suggested by dlakelan, so that my rtl8153 device is more likely be fine too when it arrives.

For the USB NIC vs Integrated NIC they are located in 2 different bus (unless I'm wrong I believe on the PCIe, on RPi4, the only thing present is the USB3 controller). This is why I thought the initial issue (affecting both) had a common cause - but having integrated NIC issue solved alone on recent kernels, tends to prove I wasn't as right as I tought :upside_down_face:

yes... that may seem to play a role... as several of the reports have been under stress/load... aka... pcb/voltage/thermal related

although doesn't exclude pure driver / driver non-constants...

Yes, the one from yesterday and its not giving problems to me.

1 Like

Cool, many thanks for this feedback. It reduces the search area about where/why the regression occurred

After the weekend I'll do some tests with x86_64 and may bemainline kernels on RPi4, vs rpi-kernels, and then compare both drivers to spot differences. This will be one of my first time reading driver code, although I'm good at c/c++ and µcontroller programming so I guess I may succeed to understand what the code does, and what changed

1 Like

Hi there, let's give some news!

I recently tested mainline kernel 5.11.0-rc5 on my Raspberry Pi 4 and it's affected by the high cpu usage from ksoftirqd/0 (100% out of 400% / or 25% out of 100% depending on the "top" version) when using TCP/IP over ax88179.

This afternoon I tested mainline kernels 5.4.94, 5.10.12 and 5.11.0-rc6 on an Intel Core2Quad computer equipped with PCIe VIA VL805 USB3 card (same chip as the raspberry pi 4) and there is no visible usage from ksoftirqd/0 (or less than 1% : as the Q9400 CPU isn't 100 times more powerful than the bcm2711 I believe it means it's simply not affected at all).

So the conclusion of this thread is probably about to be the following one : high CPU usage from integrated NIC have been solved in new kernels, anyway it isn't causing as much performance issue as the ax88179 regression, which seems to come from the mainline kernel (but only affecting arm64, or may be just bcm2711) - and isn't solved in recent kernel versions.


I may probably have tested few previous mainline kernels versions on the Pi 4, but I guess I won't be able to run the bcm2711 SoC with too old mainline kernels (and I'm a little bit too overwhelmed for now to test all of them :smile:)

By the way my TPlink UE300 / rtl8153 dongle arrived. But lucky as I am, he got squished under some truck(s) before being delivered here ^^ another is supposed to arrive next friday.


Heartbreaking isn't it :sob: (but fortunately I got a refund)
I hope transporters won't roll over my second order :smile:

1 Like