Upload speed on 19.07.3

Yes, it is PPPOE with VLAN tagging and media convertor. The reason I gave up on my ISP provided router, was that every time I was trying to configure it, it would die and not allow me to login again. So, I have a big brick sitting on a self.

The only question left is why is it happening on the upload and not on the download? It is the same PPPOE.

It is and it is not. There could be different amount of effort required to create a packet to send vs process the packet on receive. The test I suggested would help. Otherwise, you should open another topic to understand why sending is more expensive than receiving.

1 Like

Thank you fantom-x.

If I can add my two cents here, it's clear that the cpu is being maxed out by softirq's as the top output shows that.

As far as I can see from the specs on this device and your top output, it's a dual core cpu. So what might be happening is that on a download, both cores are processing the interrupts more equally, while on an upload one core is processing a lot more interrupts than the other. The softirqs should be handled by whatever core handles the actual hardware interrupt.

The top output shows this may be the case, since I note 45% sirqs on core 0 and 25% on core 1 on the upload screenshot of top you posted.

So, cat /proc/interrupts just before doing a download and just after. Subtract the one from the other for each irq / core for the network card interrupts to see how they're distributed across both cores.

Then do the same on an upload.

I surmise you may find that on the upload that the one core is doing a lot more work, which would make your upload cpu bound.

If this is the case then you should (given that it's supported by the kernel) be able to assign the interrupt affinity of the tx and rx IRQ's manually to each core to fix the problem.

This is pure speculation, but it's worth checking.

The other thing that may help here is to post a full listing of all processes running on the router...the top output you have posted doesn't show everything.

This is wrong: do not look at the individual processes CPU utilization. Look at the second line in the summary, which shows 3% idle CPU: I do not know how that can be move even.

Also misleading: soft irqs are monitored via cat /proc/softirqs.

You're misreading what I said.

The total interrupts generated can't be changed, but their distribution across the cores CAN be changed.

Softitrqs are monitored through /proc/softirqs, but one cannot change the affinity of these. So look at the way the hard irqs are distributed and make sure they are distributed evently.

One CAN change the affinity of the hard irqs, which would indirectly affect where the softirqs are processed.

In fact, for my 8 core router it is necessary to assign hardware iqs manually to prevent a single core being maxed out.

You'll note that I said that "The softirqs should be handled by whatever core handles the actual hardware interrupt.". I didn't state that /proc/interrupts are the softirqs. They're obviously not.

No, I am not.

Yes, one can do that: find /sys/devices/platform/ -name "?ps_cpus". You can manage soft irqs affinity there. It is more difficult to tune it properly, though.

This is not always true: both kinds of IRQ's can be managed separately and I get the best performance from my router by assigning hard interrupts to specific cores AND allowing soft IRQ's to run on any core: this provided the most equal CPU utilization for me and the best throughput. By default, the soft IRQ's do indeed run on the same core which services the hard IRQ's, but there is no requirement for that nor it is the best configuration for all use cases.

What router is that? I am jealous....

Yep, that's precisely what I do and what I was suggesting, although I obviously didn't express that clearly enough. The softirq's, if they can run on any core, should more or less follow the distribution of the hardware irqs.

So I was suggesting to monitor /proc/interrupts to see how the hardware irqs are actually being distributed on both upload and download, as I'm guessing that on upload it's possible that one core is doing a lot more work.

If that's the case, then fixing that while allowing the softirqs to run on any core should distribute the load more evenly.

https://www.supermicro.com/en/products/motherboard/A2SDi-8C-HLN4F

16GB RAM and 250GB SSD. It's a bit of overkill I know...

The total idle cpu is 3%: why do you think that only one core is used then?

The OP noted that during a speed test that the CPU is not maxed out all the time, only for a "brief 1-2 seconds", so I was not taking (perhaps erroneously) the screenshots of top showing 3%/5% idle as being indicative of the permanent load during the speedtest, just the peak load or the load at the time the screenshot was generated.

If the overall cpu idle % is in the low single digit percentages only briefly as he claimed, then it's possible that at other times, one core could be capped out while the other core is running at lower utilization some of the time. And over time, this would limit the speed of the upload due to the capped out core.

There's a lot we don't know based on the limited information he's supplied, so most of this is just pure guesswork. Maybe the OP can make a video of the top output during an upload that could give more information.

See the video of the speed test:

Video

I did some research and found this article and the corresponding script. Should it be placed in Startup -> Local Startup and reboot?

You do not need that script any more. I fixed the irqbalance configuration/script a while ago, so just install it and enable it in /etc/config/irqbalance followed by reload_copnfig.

I wonder why I can't get 900 down on almost identical hardware? OpenWRT 19.07 and bcm53xx target + flow offloading.

I have installed irqballance:

image

There is some improvement in speed but marginal and the difference in CPU use is as bellow

image

I have set the option interval '1'

first at 10, then 5 and lastly 1.

I guess I will have to live with this until I change the router or change the architecture of my home network.

You already had an even cpu load distribution. You should not need irqbalance. If you keep running it, then do not decrease the interval too much, because it will be using more a pond more cpu itself. Plus you will be invalidating cpu caches when the irq’s are constantly jumping between cores.

I changed back and then disabled the irqbalance, if there is a difference one way or another it is not significant.

I think the workaround I found might help OpenWRT 19.07 and bcm53xx target + flow offloading