Limited ethernet performance on APU2

Hi,
I have two APU2's, configured as access points, connected to 1 GB ethernet.
Testing with iperf3 shows 900Mbits/sec when sending data from APU2, which is fair enough.
But when receiving, it can only achieve 300 MBits/sec.

It should be noted that when sending, CPU usage goes to 100% on a single CPU (the box has 4 cores), but when receiving, it never goes over 30% of CPU usage on a single core.
There is a similar thread but no solution there either: APU2c4 with 19.07 poor ethernet performance

Running fresh OpenWrt 22.03.0, but boxes had the same issue with earlier releases, too.
Any hints?
Thanks,
shpokas

I have run more or less the same tests last year on 2 APU2 boards with clients behind them (I mean, iperf3 was launched on 2 computers that were connected to 2 APU2 openwrt boxes, like this:

PC --- APU2 (firewall, nat, vpn) --- APU2 (firewall, nat, vpn) --- PC

My test was made to measure VPN performance, not ethernet performance, but I also measured ethernet performance.

My results (from memory) are that ethernet performance is fine at 970+ megabit with low cpu use (when iperf3 is NOT running on the APU box itself), openvpn performance is about 300 mbit with one CPU core maxed up (openvpn is not multicore) and wireguard performance is about 700 mbit.

I know my results are not what you are asking for, but I'd say that for a real life scenario, where traffic is not generated or terminated on the firewall the APU2 works fine.

I'd also suppose that the issue you are seeing is something related to how iperf3 works and its use of a single core.

Hi,
thanks for reply!

But why would command
iperf3 -c 10.26.0.1 -R
work differently from
iperf3 -c 10.26.0.1

First one is instructing iperf3 server to send data, last one is receiving data from iperf server.
And I don't have anything inbetween, both iperf server and APU2 is on the same network segment 10.26.0.0/24 with firewall disabled on APU2.

Results:

iperf3 -c 10.26.0.1 -R
...
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   360 MBytes   302 Mbits/sec  1152             sender
[  5]   0.00-10.00  sec   360 MBytes   302 Mbits/sec                  receiver

iperf3 -c 10.26.0.1
...
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.01 GBytes   871 Mbits/sec   12             sender
[  5]   0.00-10.00  sec  1.01 GBytes   869 Mbits/sec                  receiver

I don't know for sure, but maybe it has something to do with the number of interrupts per second? Something that is not directly related to CPU time gets maxed up and becomes a bottleneck.

Look at the second row from the top of the "top" command, where the totals are shown. what are the totals while running the tests? (run a long test and allow about 5 seconds after starting it for the values to stabilize before reding them)

Mem: 122216K used, 1789764K free, 1536K shrd, 14572K buff, 23544K cached
CPU:   0% usr   0% sys   0% nic  95% idle   0% io   0% irq   3% sirq

I'd expect some high values on irq or sirq.

One way to quickly asses the load is to calculate 100 - %idle as some tools do not report irq/sirq.
However with a multicore router looking at busybox' top aggregate load numbers (combined for all CPUs) is not that helpful, however htop (installable package should be available) will report load data for each CPU individually and can be configured to also show sirq.

Doesn't seem that much to me.

85% idle on a four core CPU can mean anything from each CPU itself loaded 15% or a single CPU loaded 60%. Even that would not be alarming, but it illustrates that top's aggreage single load percentage is not the best tool to detect/diagnose issues that might be caused by overload of individual CPUs.

No, definitely it's not the IRQ or SIRQ. I have run out of ideas.

I already told in the first post that individual, single core usage never goes over 30% when receiving data.
I would love to see 100% usage on a single core because that gives 900 MBits/sec when iperf data is sent to server.

htop screenshots with
iperf sending

iperf receiving

Well, what are alternatives to iperf3? it seems this tool is dominating testing space, but I already have run into a bug with and old version of iperf3 that was distributed with Debian 9.

Iperf 2.3.8-rc1 and bufferbloat testing - #4 by frollic might be useful

TLDR; try iperf instead of iperf3

1 Like

Same result, unfortunately.

root@ap1:~# iperf -c 10.26.0.1
------------------------------------------------------------
Client connecting to 10.26.0.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  1] local 10.26.0.2 port 58958 connected with 10.26.0.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.05 sec  1.07 GBytes   913 Mbits/sec
root@ap1:~# iperf -c 10.26.0.1 -R
------------------------------------------------------------
Client connecting to 10.26.0.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  1] local 10.26.0.2 port 35238 connected with 10.26.0.1 port 5001 (reverse)
[ ID] Interval       Transfer     Bandwidth
[ *1] 0.00-10.01 sec   381 MBytes   320 Mbits/sec

Are you running iperf on the APU? That is a meanigful test, albeit not a test of the APUs capabilities as router, for that you are better off with moving iperf server and client to different computers (just make sure these devices are not directly connected by a switch).

Yes you did, this is why I was surprised to see simple top output with one aggregate CPU line.
For htop you can and should enable detailed reporting for the CPU bars so you can see sirq immediately (can be done somewhere in the configuration). I expect this not to change the picture but it will give a better feel where the CPU spends its time.

I booted system rescue CD and got somewhat opposite results.
Now sending is slower than receiving, which goes at almost full wire speed.
I also ensured the interface used is the same in both OpenWRT and SystemRescue.
So it is clearly something in the OS, but it is not clear what exactly.

[root@sysrescue ~]# iperf3 -c 10.26.0.1
...
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   714 MBytes   599 Mbits/sec    0             sender
[  5]   0.00-10.00  sec   712 MBytes   597 Mbits/sec                  receiver

[root@sysrescue ~]# iperf3 -c 10.26.0.1 -R
...
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec   19             sender
[  5]   0.00-10.00  sec  1.09 GBytes   939 Mbits/sec                  receiver

Well, I found what is the problem here - a BIOS setting caused this issue.
"PCIe power management features" - when enabled, causes significant throughput reduction, not only for wired, but for wireless transmission as well.
Not sure if this is a bug or a feature, but - DO NOT enable it!

2 Likes

how did you find the solution ?? I would never have thought of going to see in the bios.

I actually started with going back to old BIOS'es, because I had run out of ideas, but at the same time I sort of remembered that I've seen better throughput.
And old BIOSes did not show the problem, so I was determined to find which BIOS did introduce it until I found that it's not actually the BIOS itself, but a setting in it :slight_smile:
It should be noted that for APU2, BIOS upgrade causes customizations to be reset to default values, it did help a little, too :wink:

2 Likes

And, finally, it's been known all the time...
https://github.com/pcengines/sortbootorder

1 Like
  • u PCIe power management features - enables/disables PCI Express power management features like: ASPM, Common Clock, Clock Power Management (if supported by PCI Express endpoints). Enabling this option will reduce the power consumption at the cost of performance drop of Ethenet controllers and WiFi cards

ho yes !

how did you get into the APU2 bios via minicom? I tried disabling macros in minicom and using F10 and minicom seems not to pass the F10 through to serial or coreboot isn't responding to it.

My hardware is an APU2 but the bios saysduring boot:

PC Engines apu1                                                                 
coreboot build 20183108                                                         
BIOS version v4.8.0.3