Irqbalance not working?

I have installed the "irqbalance" package on a Linksys WRT3200ACM running OpenWrt 18.06. This package only contains a binary and no configuration files, so I added a line to "/etc/rc.local" to launch it (with no additional parameters), and upon reboot "ps" shows the program is running.

However, when I look into "/proc/interrupts", this is what I see:

           CPU0       CPU1       
 17:          0          0     GIC-0  27 Edge      gt
 18:   31286472   45819630     GIC-0  29 Edge      twd
 19:          0          0      MPIC   5 Level     armada_370_xp_per_cpu_tick
 21:   22209654          0     GIC-0  34 Level     mv64xxx_i2c
 22:         21          0     GIC-0  44 Level     ttyS0
 36:   15572107          0      MPIC   8 Level     eth1
 37:    2380424          0      MPIC  12 Level     eth0
 38:       8759          0     GIC-0  50 Level     ehci_hcd:usb1
 39:          0          0     GIC-0  51 Level     f1090000.crypto
 40:          0          0     GIC-0  52 Level     f1090000.crypto
 41:          0          0     GIC-0  58 Level     ahci-mvebu[f10a8000.sata]
 42:      40796          0     GIC-0 116 Level     f10d0000.flash
 43:     211416          0     GIC-0  57 Level     mmc0
 44:     453943          0     GIC-0  49 Level     xhci-hcd:usb2
 45:          2          0     GIC-0  54 Level     f1060800.xor
 46:          2          0     GIC-0  97 Level     f1060900.xor
 47:          0          0  f1018100.gpio  24 Edge      gpio-keys
 48:          0          0  f1018100.gpio  29 Edge      gpio-keys
 49:       1430   22168991     GIC-0  61 Level     mwlwifi
 50:   24566424          0     GIC-0  65 Level     mwlwifi
IPI0:          0          1  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:    6069090    6516981  Rescheduling interrupts
IPI3:      27303    7712031  Function call interrupts
IPI4:          0          0  CPU stop interrupts
IPI5:          0          0  IRQ work interrupts
IPI6:          0          0  completion interrupts
Err:          0

I see a couple interrupts seem to be balanced, and one runs mostly on CPU1, but most of them are being served exclusively by CPU0. Is this normal behaviour, or should the interrupts be more balanced? Can anything be done to improve the situation? Should I consider this is "good enough"?

2 Likes

It does in master, but yes not in 18.06:

# cat /etc/config/irqbalance
config irqbalance 'irqbalance'
        option enabled '1'

In comparison on ipq8065/ nbg6817 (master, kernel 4.19):

# cat /proc/interrupts 
           CPU0       CPU1       
 16:    4855869   10819277     GIC-0  18 Edge      gp_timer
 18:         33          0     GIC-0  51 Edge      qcom_rpm_ack
 19:          0          0     GIC-0  53 Edge      qcom_rpm_err
 20:          0          0     GIC-0  54 Edge      qcom_rpm_wakeup
 29:          0          0     GIC-0 202 Level     adm_dma
 30:   29055904    1445708     GIC-0 255 Level     eth0
 31:   42474368     940190     GIC-0 258 Level     eth1
 32:       8903        628     GIC-0 130 Level     bam_dma
 33:          0          0     GIC-0 128 Level     bam_dma
 34:      31042       2157     GIC-0 136 Level     mmci-pl18x (cmd)
 36:          0          0   PCI-MSI   0 Edge      aerdrv
 38:          0          0   PCI-MSI 134217728 Edge      aerdrv
 39:          7          0     GIC-0 184 Level     msm_serial0
 40:     386087          0     GIC-0 187 Level     1a280000.spi
 41:          1          0   msmgpio  53 Edge      keys
 42:          2          0   msmgpio  54 Edge      keys
 43:          2          0   msmgpio  65 Edge      keys
 44:          0          0     GIC-0 142 Level     xhci-hcd:usb1
 45:     182350      12826     GIC-0 237 Level     xhci-hcd:usb3
 46:    8891000          0   PCI-MSI 524288 Edge      ath10k_pci
 47:   16305129          0   PCI-MSI 134742016 Edge      ath10k_pci
IPI0:          0          0  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:    1328451    4837561  Rescheduling interrupts
IPI3:       1110   17065108  Function call interrupts
IPI4:          0          0  CPU stop interrupts
IPI5:    5436588    7683214  IRQ work interrupts
IPI6:          0          0  completion interrupts
Err:          0
3 Likes

I've checked the package sources at "master", it does contain an init and a config files, but there is nothing there other than conditionally launching the program with a "-f" (foreground) parameter, I do not see why mine should not work also.

It used to work also there.

When I created the irqbalance package definition, I only defined the binary tool itself, not the surrounding service launch stuff which has been added later. The binary itself should work, too.

(But I haven't checked with 18.06 for some time.)

1 Like

It seems to work also on 18.06 with ipq806x/R7800:

 OpenWrt 18.06-SNAPSHOT, r7865-5880dd48d5
 -----------------------------------------------------
root@router1:~# cat /proc/interrupts
           CPU0       CPU1
 16:      12580      19697     GIC-0  18 Edge      gp_timer
 18:         33          0     GIC-0  51 Edge      qcom_rpm_ack
 19:          0          0     GIC-0  53 Edge      qcom_rpm_err
 20:          0          0     GIC-0  54 Edge      qcom_rpm_wakeup
 26:          0          0     GIC-0 241 Edge      ahci[29000000.sata]
 27:          0          0     GIC-0 210 Edge      tsens_interrupt
 28:      14001       3484     GIC-0  67 Edge      qcom-pcie-msi
 29:      13821          0     GIC-0  89 Edge      qcom-pcie-msi
 30:     177071          0     GIC-0 202 Edge      adm_dma
 31:       4773        299     GIC-0 255 Level     eth0
 32:       6495        261     GIC-0 258 Level     eth1
 33:          0          0     GIC-0 130 Level     bam_dma
 34:          0          0     GIC-0 128 Level     bam_dma
 35:          0          0   PCI-MSI   0 Edge      aerdrv
 36:      14001       3484   PCI-MSI   1 Edge      ath10k_pci
 68:          0          0   PCI-MSI   0 Edge      aerdrv
 69:      13821          0   PCI-MSI   1 Edge      ath10k_pci
101:         10          0     GIC-0 184 Level     msm_serial0
102:          2          0   msmgpio   6 Edge      gpio-keys
103:          2          0   msmgpio  54 Edge      gpio-keys
104:          2          0   msmgpio  65 Edge      gpio-keys
105:          0          0     GIC-0 142 Level     xhci-hcd:usb1
106:          0          0     GIC-0 237 Level     xhci-hcd:usb3
IPI0:          0          0  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:      17699      19082  Rescheduling interrupts
IPI3:         48       9401  Function call interrupts
IPI4:          0          0  CPU stop interrupts
IPI5:       5698      10682  IRQ work interrupts
IPI6:          0          0  completion interrupts
Err:          0

But it seems to converge IRQs back to core0 if there is only light wifi traffic.

1 Like

"Works" on IPQ4019, though I suspect that its algorithms are tuned for 8/12/32-core devices running as servers.

I'd probably hand-assign the IRQs across the cores if performance was a big issue for me. Two radios, two MACs, seems like one-per-core might be worth exploring.

1 Like

Could you elaborate on how to assign interrupts to CPUs, please?

I believe setting the processor affinity mask would accomplish locking a given IRQ to one or more cores.

1 Like

See related discussion in e.g. here:

In short, echo affinity to IRQs one by one, like:
echo 2 >/proc/irq/30/smp_affinity

Read them with:

cat /proc/irq/31/smp_affinity
2

Bitmask.
1 = core 0
2 = core 1
4 = core 2 ... Etc.

3 = both core 0 and 1

3 Likes

Ok, looks like "irqbalance" is definitively not balancing interrupts:

# cat /proc/interrupts 
           CPU0       CPU1       
 17:          0          0     GIC-0  27 Edge      gt
 18:   42204708   62110861     GIC-0  29 Edge      twd
 19:          0          0      MPIC   5 Level     armada_370_xp_per_cpu_tick
 21:   30980104          0     GIC-0  34 Level     mv64xxx_i2c
 22:         21          0     GIC-0  44 Level     ttyS0
 36:   21427660          0      MPIC   8 Level     eth1
 37:    3239131          0      MPIC  12 Level     eth0
 38:       8759          0     GIC-0  50 Level     ehci_hcd:usb1
 39:          0          0     GIC-0  51 Level     f1090000.crypto
 40:          0          0     GIC-0  52 Level     f1090000.crypto
 41:          0          0     GIC-0  58 Level     ahci-mvebu[f10a8000.sata]
 42:      49564          0     GIC-0 116 Level     f10d0000.flash
 43:     271060          0     GIC-0  57 Level     mmc0
 44:     693583          0     GIC-0  49 Level     xhci-hcd:usb2
 45:          2          0     GIC-0  54 Level     f1060800.xor
 46:          2          0     GIC-0  97 Level     f1060900.xor
 47:          0          0  f1018100.gpio  24 Edge      gpio-keys
 48:          0          0  f1018100.gpio  29 Edge      gpio-keys
 49:       1430   32679330     GIC-0  61 Level     mwlwifi
 50:   33147418          0     GIC-0  65 Level     mwlwifi
IPI0:          0          1  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:    8441851    8953602  Rescheduling interrupts
IPI3:      41218   10614183  Function call interrupts
IPI4:          0          0  CPU stop interrupts
IPI5:          0          0  IRQ work interrupts
IPI6:          0          0  completion interrupts
Err:          0
# cat /proc/irq/*/smp_affinity
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
1

So, I tried to launch "irqbalance" with debug options enabled:

# irqbalance -df
This machine seems not NUMA capable.
Isolated CPUs: 00000000
Adaptive-ticks CPUs: 00000000
Package 0:  numa_node is -1 cpu mask is 00000003 (load 0)
        Cache domain 0:  numa_node is -1 cpu mask is 00000002  (load 0) 
                CPU number 1  numa_node is -1 (load 0)
        Cache domain 1:  numa_node is -1 cpu mask is 00000001  (load 0) 
                CPU number 0  numa_node is -1 (load 0)
Adding IRQ 50 to database
Adding IRQ 49 to database
Adding IRQ 17 to database
Adding IRQ 18 to database
Adding IRQ 19 to database
Adding IRQ 21 to database
Adding IRQ 22 to database
Adding IRQ 36 to database
Adding IRQ 37 to database
Adding IRQ 38 to database
Adding IRQ 39 to database
Adding IRQ 40 to database
Adding IRQ 41 to database
Adding IRQ 42 to database
Adding IRQ 43 to database
Adding IRQ 44 to database
Adding IRQ 45 to database
Adding IRQ 46 to database
Adding IRQ 47 to database
Adding IRQ 48 to database
NUMA NODE NUMBER: -1
LOCAL CPU MASK: ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff




-----------------------------------------------------------------------------
Package 0:  numa_node is -1 cpu mask is 00000003 (load 0)
        Cache domain 0:  numa_node is -1 cpu mask is 00000002  (load 0) 
                CPU number 1  numa_node is -1 (load 0)
                  Interrupt 49 node_num is -1 (ethernet/3069092192:0) 
        Cache domain 1:  numa_node is -1 cpu mask is 00000001  (load 0) 
                CPU number 0  numa_node is -1 (load 0)
                  Interrupt 50 node_num is -1 (ethernet/3069092192:0) 
  Interrupt 48 node_num is -1 (other/3069092192:0) 
  Interrupt 47 node_num is -1 (other/3069285056:0) 
  Interrupt 46 node_num is -1 (other/3069285056:0) 
  Interrupt 45 node_num is -1 (other/3069285056:0) 
  Interrupt 44 node_num is -1 (other/3069285056:0) 
  Interrupt 43 node_num is -1 (other/3069285056:0) 
  Interrupt 42 node_num is -1 (other/3069285056:0) 
  Interrupt 41 node_num is -1 (other/3069285056:0) 
  Interrupt 40 node_num is -1 (other/3069285056:0) 
  Interrupt 39 node_num is -1 (other/3069285056:0) 
  Interrupt 38 node_num is -1 (other/3069285056:0) 
  Interrupt 37 node_num is -1 (other/3069285056:0) 
  Interrupt 36 node_num is -1 (other/3069285056:0) 
  Interrupt 22 node_num is -1 (other/3069285056:0) 
  Interrupt 21 node_num is -1 (other/3069285056:0) 
  Interrupt 17 node_num is -1 (other/3069285056:0) 
  Interrupt 19 node_num is -1 (other/3069285056:0) 
  Interrupt 18 node_num is -1 (other/3069285056:0) 



-----------------------------------------------------------------------------
Package 0:  numa_node is -1 cpu mask is 00000003 (load 460000000)
        Cache domain 0:  numa_node is -1 cpu mask is 00000002  (load 300000000) 
                CPU number 1  numa_node is -1 (load 300000000)
                  Interrupt 49 node_num is -1 (ethernet/3069092192:99501050) 
        Cache domain 1:  numa_node is -1 cpu mask is 00000001  (load 160000000) 
                CPU number 0  numa_node is -1 (load 160000000)
                  Interrupt 50 node_num is -1 (ethernet/3069092192:73573328) 
  Interrupt 48 node_num is -1 (other/3069092192:1) 
  Interrupt 47 node_num is -1 (other/3069285056:1) 
  Interrupt 46 node_num is -1 (other/3069285056:1) 
  Interrupt 45 node_num is -1 (other/3069285056:1) 
  Interrupt 44 node_num is -1 (other/3069285056:687126) 
  Interrupt 43 node_num is -1 (other/3069285056:1) 
  Interrupt 42 node_num is -1 (other/3069285056:1) 
  Interrupt 41 node_num is -1 (other/3069285056:1) 
  Interrupt 40 node_num is -1 (other/3069285056:1) 
  Interrupt 39 node_num is -1 (other/3069285056:1) 
  Interrupt 38 node_num is -1 (other/3069285056:1) 
  Interrupt 37 node_num is -1 (other/3069285056:291508) 
  Interrupt 36 node_num is -1 (other/3069285056:80143878) 
  Interrupt 22 node_num is -1 (other/3069285056:1) 
  Interrupt 21 node_num is -1 (other/3069285056:62049560) 
  Interrupt 17 node_num is -1 (other/3069285056:1) 
  Interrupt 19 node_num is -1 (other/3069285056:1) 
  Interrupt 18 node_num is -1 (other/3069285056:131657506)

Does this means that only interrupts 49 and 50 are being considered for balancing?

If you try to manually change the CPU to IRQ mapping, it does not work for Armada 38x.
I think only the Wi-Fi IRQs can be overridden/manually assigned.
Armada has 2 CPU Ports but almost all Ethernet interrupts are handled on one CPU.
I don't know why.
Also both CPU/Ethernet Ports have 8 hardware queues.
With tc you can view the traffic distribution over the queues. With the "tx workaround around" patch applied all packets hit one queue.
When the patch is removed more packets are hitting the another queues. But the majority still hits one queue.
There are two features I came across.
RSS (Receive Side Scaling) and RPS (Receive Packet Steering).
RSS doesn't seem to be supported by mvneta.
At least it is not configurable through ethtool.
RPS however runs in software. But I only spend short time with it because there wasn't that much of a difference. But maybe you want to check it out. And RSS/RPS only affect RX packets, I guess.
For TX it is XPS.
By default on my device RPS is set to one CPU only. (CPU 2)
And XPS alternates for each queue.
First Queue maps to CPU 2, second Queue to CPU 1, third queue to CPU 2 and so on.

3 Likes

This is the patch in question.

Removing this patch should not cause issues unless you run NAS or a VPN. It also lets eth0 and eth1 alternate between CPUs more - but not balanced per se.

I've also used irqbalance and found that the ethernet ports can't be balanced on mvneta. CPU is assigned to ports in the dts or somewhere else somehow.

Of note: Mamba used to have a network interrupt balancer.

EDIT:
Results from the removal of "300-mvneta-tx-queue-workaround.patch"
mwlwifi manually moved to CPU1

Router:

Dumb AP:

That is the RPS feature I talked about.
This patch maps eth0 to CPU 2 and eth1 to CPU 1.
With DSA, I don't know how useful this is.
Using a mask of 3 should allow the usage of both CPUs.

And I guess, CPU Port != CPU.

I still don't have a firm grasp on reading DTS, maybe you can explain this better?

Specficially these parts:

&eth0 {
	status = "okay";
	phy-mode = "rgmii-id";
	buffer-manager = <&bm>;
	bm,pool-long = <0>;
	bm,pool-short = <1>;
	fixed-link {
		speed = <1000>;
		full-duplex;
	};
};

&eth2 {
	status = "okay";
	phy-mode = "sgmii";
	buffer-manager = <&bm>;
	bm,pool-long = <2>;
	bm,pool-short = <3>;
	fixed-link {
		speed = <1000>;
		full-duplex;
	};
};

and

&mdio {
	status = "okay";

	switch@0 {
		compatible = "marvell,mv88e6085";
		#address-cells = <1>;
		#size-cells = <0>;
		reg = <0>;

		ports {
			#address-cells = <1>;
			#size-cells = <0>;

			port@0 {
				reg = <0>;
				label = "lan4";
			};

			port@1 {
				reg = <1>;
				label = "lan3";
			};

			port@2 {
				reg = <2>;
				label = "lan2";
			};

			port@3 {
				reg = <3>;
				label = "lan1";
			};

			port@4 {
				reg = <4>;
				label = "wan";
			};

			port@5 {
				reg = <5>;
				label = "cpu";
				ethernet = <&eth2>;

				fixed-link {
					speed = <1000>;
					full-duplex;
				};
			};
		};
	};
};

Me neither :joy:
Take a look here:

& indicates to overwrite a section that was included from another DTS file. (I guess?)
eth0 and eth2 parameters are defined/configured.
A switch is also defined with 5 Ports and one CPU Port that is mapped to eth2.

It seems like that the RSS/RPS/XPS/Hardqeue Thing doesn't seem to work.
So I switched over to replace mq on eth0/eth1 directly with fq_codel.
I was not sure if fq_codel was multi queue aware but there was a commit in 4.19.74 kernel:
(But actually doesn't matter cause (almost) all interrupts are handled by one CPU)

net: sched: fix reordering issues

[ Upstream commit b88dd52c62bb5c5d58f0963287f41fd084352c57 ]

Whenever MQ is not used on a multiqueue device, we experience
serious reordering problems. Bisection found the cited
commit.

The issue can be described this way :

- A single qdisc hierarchy is shared by all transmit queues.
  (eg : tc qdisc replace dev eth0 root fq_codel)

- When/if try_bulk_dequeue_skb_slow() dequeues a packet targetting
  a different transmit queue than the one used to build a packet train,
  we stop building the current list and save the 'bad' skb (P1) in a
  special queue. (bad_txq)

- When dequeue_skb() calls qdisc_dequeue_skb_bad_txq() and finds this
  skb (P1), it checks if the associated transmit queues is still in frozen
  state. If the queue is still blocked (by BQL or NIC tx ring full),
  we leave the skb in bad_txq and return NULL.

- dequeue_skb() calls q->dequeue() to get another packet (P2)

  The other packet can target the problematic queue (that we found
  in frozen state for the bad_txq packet), but another cpu just ran
  TX completion and made room in the txq that is now ready to accept
  new packets.

- Packet P2 is sent while P1 is still held in bad_txq, P1 might be sent
  at next round. In practice P2 is the lead of a big packet train
  (P2,P3,P4 ...) filling the BQL budget and delaying P1 by many packets :/

To solve this problem, we have to block the dequeue process as long
as the first packet in bad_txq can not be sent. Reordering issues
disappear and no side effects have been seen.

A single qdisc hierarchy is shared by all transmit queues.
(eg : tc qdisc replace dev eth0 root fq_codel)

Which indicates fq_codel is multi queue aware?

Because (almost )all ethernet interrupts hit CPU 0/1 anyway,
maybe it's better to move mwlwifi to CPU 1/2.
(By default also interrupts on CPU 0/1)

There is also the multiq scheduler that simply round robins packets through the queues.
So in theory, setting RPS/XPS to an alternating scheme (CPU1,CPU2,CPU2 and so on)
should distribute the interrupts equally across the CPU cores but of course that doesn't work either.

Also why are most interrupts are handled by GIC (Generic Interrupt Controllers) and eth0/1 by MPIC?
What is MPIC? I can't find any info on this.
//edit
Nvm. It is MultiProcessor Interrupt Controller. Hmm..
https://www.kernel.org/doc/Documentation/devicetree/bindings/interrupt-controller/marvell%2Carmada-370-xp-mpic.txt

Optional properties:

  • interrupts: If defined, then it indicates that this MPIC is
    connected as a slave to another interrupt controller. This is
    typically the case on Armada 375 and Armada 38x, where the MPIC is
    connected as a slave to the Cortex-A9 GIC. The provided interrupt
    indicate to which GIC interrupt the MPIC output is connected.

Also mvneta doesn't allow to change tx-usec/tx-frames setting via ethtool.
And generates an interrupt per tx packet.
I guess, on higher bandwidths that can cause a bit of CPU overhead?

Thank you guys for all the information, looks like the current situation is "as good as it gets", unless I am willing to fiddle with patches.

Many thanks!

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.