Irqbalance on x86 worth it?

Hi,

I just read about irqbalance. I have 500Mbit fibre connection (actually 1Gbit throttled down to 500Mbit by provider in media converter). I use x86 2-core atom fanless box to run OpenWRT as my main (wired only) router. When I run bandwidth test I still get around 450-ish Mbit/sec and my sirq is hovering around 44% (according to top).
I plan to upgrade to 1Gbit and wonder whether my current setup will be able to support it?

So my questions are:

  1. Is irqbalance doing any good on x86 build and should I install it?
  2. Is sirq affected by irqbalance at all or is it hw irq only?
  3. Does "50% sirq" in 2-core system mean that my CPU is now maxed or is there still a headroom?

Thanks!

This is load during 450Mbit download:

wrt

1 Like

Install and run htop while doing your speedtest. That will show you whether one core is maxed out, or two cores are running at half capacity. As for irqbalance, it's just something that you have to try to be honest. It can improve the situation for some systems, but can also do nothing in others.

3 Likes

Thanks. Done.
Does 82% on one core @ 0.5Gbit mean I only have headroom for maximum ~0.6Gbit or will it be able to "split" the burden to other core once i upgrade to 1Gbit WAN? If not, can I make it use other core with irqbalance?

This is load @0.5Gbit/sec:

That does look like you are running into a CPU bottleneck. You can try rerunning the test with irqbalance running to see if the situation improves. Better yet, you can determine your maximum speed if you have some time to spare without having to guess or without having to have access to 1 gbit internet connection :slight_smile:

Connect a PC to your router on the WAN port. Set a static IP in a private range on your PC, and use the same subnet on your WAN interface on the router. Now run iperf3 in server mode on this PC. Next, connect a second PC via one of the LAN ports as normal, and run iperf3 in client mode with the first PC's IP as the target to connect to. Make sure you leave stuff like the firewall, masquerading, etc enabled on the router, since that is also going to be taking up CPU cycles and will also be used for your internet connection. Make sure you test iperf3 both in normal mode and reserve mode (-R switch) to test both directions. Also try playing around with more than 1 stream to simulate multiple connections taking up bandwidth.

Now iperf3 will let you know how much your router is able to handle WAN <=> LAN while doing NAT and running a firewall, without having access to a 1 gbit internet connection :slight_smile:

1 Like

Good idea, I will try it! :+1:

I will be little underwhelmed if two core x86 @ 1.6GHz cannot handle routing/NAT @ 1Gbit at wire speed :frowning:

I forgot to mention: make sure LAN and WAN are both on different subnets.

For sure you should run irqbalance to test.

Are you running SQM or other QoS?

The Atom series processors are pretty lightweight. you don't mention which exact processor.

I have an 8-core C3758 and I've done a fair bit of investigation regarding network tuning

  1. No. It simply periodically redistributes the IRQ affinity to different CPUs based on historical usage. It's not really balancing per se
  2. No, only hw irqs
  3. 50% sirq means your cpu is used up 50% with sirqs

htop, in the setup screen, can be configured to show detailed cpu stats and it shows sirqs in magenta

This really depends what you're doing. If you add packet capture applications such as snort, softflowd, etc then your sirq use will go up considerably. Using SQM cake also causes usage to jump a lot.

If you have one or two of these, it's probably best to pin them to different cpu cores with taskset.

I have much better results manually tuning interrupt affinity on my 8-core system. In the case of a two core system, you have much less choices available to you. I'd probably put tx queues on one core and rx queues on the other core such that on a download one core processes rx for the wan and the other core processes rx for the lan and the inverse for tx.

EDIT: how you do it is situation dependent; there no right answer. For example, if you're running snort on your lan interface you might want to pin snort to a cpu core and both tx and rx irq's for the lan interface to the same core so that processing happens locally, while the other core handles the wan interface (and possibly any interrupts generated by an sqm instance you might have running on the wan interface).

This is a pretty useful background article on the network stack

1 Like

Nope. No need to as it is very stable low-latency fibre connection. Just plain vanilla NAT, Guest WLAN and OpenVPN (which was not in use during the test). I do have 30 or so traffic rules in my Firewall though, most of type "do not let device with MAC xxx go out on Internet" or "Let device with IP xxx be accessible from network xxx".

Thanks for great reply. I do not run anything terribly CPU intensive (except OpenVPN, but it is not used frequently). Only thing I can think of are 30 traffic rules (of simple type, like "do not let IP xxx go out on Internet) and Yamon (which is monitoring Internet usage).

I will do iperf3 test end-to-end on wired connection and check again. If it is able to route @ Gbit with some % to spare, I am fine :slight_smile:

You'll definitely benefit from assigning the smp_affinity manually. If not, the irqs tend to cluster on cpu0

My understanding is that this is older behavior, several years ago. It doesn't periodically do anything but rather one-time attempts to distribute across all cpus.

50% of all cpu cycles, but obviously from the htop result, one cpu is used almost completely.

convert these to ipsets, you can probably drop this to about 3 or 4 rules.

default init script behaviour still seems to be to run it as a daemon with a 10s refresh interval rather than as a one-shot

#!/bin/sh /etc/rc.common

START=90
STOP=10

USE_PROCD=1

service_triggers()
{
    procd_add_reload_trigger "irqbalance"
}

start_service() {
    local enabled
    config_load 'irqbalance'
    config_get_bool enabled irqbalance enabled 0
    [ "$enabled" -gt 0 ] || return 0

    # 10 is the default
    config_get interval irqbalance interval 10

    # A list of IRQ's to ignore
    banirq=""
    handle_banirq_value()
    {
        banirq="$banirq -i $1"
    }
    config_list_foreach irqbalance banirq handle_banirq_value

    procd_open_instance "irqbalance"
    procd_set_param command /usr/sbin/irqbalance -f -t "$interval" "$banirq"
    procd_set_param respawn
    procd_close_instance
}
~                   

Curious as to why dnsmasq is chewing 30% of your cpu in this image. Are you doing some kind of dns benchmark or running something that's making many dns queries?

OK guys, I have now run iperf3 LAN <-> WAN (in both directions) via cabled connection. My fears were somewhat unfounded as I got around 940Mbit/s both ways using 1 - 8 streams simultaneously. When I used 32 streams bandwidth went down to 700MBit/s-ish (but I am unsure if it was router or my PC that was clamping)

htop @ 940Mbit using 4 streams:
wrt3

So my conclusion is that there is plenty headroom in this x86 router (using Celeron(R) CPU N3050 @ 1.60GHz) for 1Gbit speed. (OpenVPN is CPU-limited to ~170MBit using 256 bit crypto but I plan to replace it with Wireguard in the future.

There's headroom, as long as you're not using a packet inspection app. Remember that a single stream will be processed by one irq and therefore one CPU core.

You add a pcap-type capture mechanism and you're going to be seeing a whole lot more softirqs. And that 85% usage will likely hit 100% instantly. If you want to see what I mean, try to run something like softflowd on one of the interfaces and test again

Also try running different threads of iperf3 sessions on multiple ports eg

iperf3 -s 192.168.1.1 -P 8 -p 5201
Iperf3 -s 192.168.1.1 -P 8 -p 5202

check passmark chart for cpu preformance, single thread is mostly important, but even on atom d525 i can route 940Mbps in both ways without problem with good nics, i have thin client with celeron 847 and itel i340-t4 and with pppoe i can do nat @ 940mbps without problem, those nics have 4tx and 4rx queues and this do the work