Hi, here is the results of my tests.
Some details :
I did them on a Pi 4 using only integrated NIC and Raspbian, building kernels using the bcm2711_defconfig file provided within their source code. I did them for both 64 bits and 32 bits, (we are interested in 64 bits but differences in results between 32 bits and 64 bits should probably be known to avoid confusion). I compared those kernels using iperf TCP and UDP tests at ~1000 Mbps rate (with the Pi 4 receiving).
For 64 bits :
Using raspberrypi kernel Github versions, it seems all rpi-4.19.y versions are affected by 1 core at 100% usage from ksoftirqd (at least on what I tested : rpi-4.19.50, rpi-4.19.93, rpi-4.19.127 versions).
Same issue up to current LTS rpi-5.4.83, rpi-5.5.19, and rpi-5.6.19
Problem for 64 bits disappears starting from rpi-5.7.19 (1 core at 10% usage from ksoftirqd 1 Gbps TCP, 0% using UDP).
Into rpi-5.10.10 (next LTS) it's even better : 0% CPU usage from ksoftirqd in both TCP and UDP 1 Gbps tests.
Conclusion for 64 bits:
On another 1Gbps Internet friend's connection, OpenWrt SNAPSHOT r12215-9c19c35d1e / LuCI Master git-20.038.38813-faabe98, Kernel Version 4.19.101 doesn't have the issue.
It means OpenWRT kernels for Raspberry Pi 4 Version 4.19.86 (mine) and 4.19.101 (my friend's one) didn't have the issue while rpi-kernels 4.19.y around those versions had the issue : may be OpenWRT didn't used the same kernel, or had a patch applied over it (which has been lost when switched to 5.4)? I'll try to find those kernels for more testing on Raspbian.
For 32 bits :
Problem didn't exist at all into rpi-4.19.y versions (0% CPU usage from ksoftirqd for both TCP and UDP on rpi-4.19.50, rpi-4.19.93, rpi-4.19.127).
But it appeared starting 4.20.y version (at least rpi-4.20.17) and partially disappeared starting rpi-5.6.y (tested rpi-5.6.19). While on 64 bits the problem only disappears on rpi-5.7.y.
Partially disappeared because in 32 bits, on nowadays kernel versions, TCP traffic still uses 30% of 1 core on ksoftirqd for 1 Gbps TCP/IP test now (while it wasn't using CPU on 4.19.y).
PS :
If someone would like to run those kernels for driving some tests without having to rebuild, here is my builds (modules, dtb, overlays, and kernel file itself of course)
https://pix-server-sorel.pixconfig.fr/Manual/32-bits-rpi-bcm2711-built-kernels.tar.gz
https://pix-server-sorel.pixconfig.fr/Manual/64-bits-rpi-bcm2711-built-kernels.tar.gz
I'll try to search a little bit more into Google to see what changed at rpi-5.6.y and rpi-5.7.y (and after) about network performance. For now I didn't anything interesting apart some guy who encountered the issue, didn't solved it but worked around just enough to get his 4 Gbps on compute module... so nothing interesting about how this issue really got repaired