It seems to me. Driver is not wrote to use more than one core and has to be rewrite. For example Ubiquiti default image has own driver for ethernet as kernel module. It could be see in lsmod (just my thought)
Irqbalance works for me; that means: it relocated interrupts for my radios to CPU1 and CPU3. Default is all on CPU0. Doesn’t do anything for the Ethernet. Manually I can relocate the interrupt to a different (single) core.
Doesn’t the driver need to support multiple queues to assign multiple interrupts to different cores (queues)? This is what I noticed when I did a quick google on this subject.
I enabled CPU HOTPLUG and I can bring the separate cores “online” and “offline”; but it’s not enabled by default. Not noticing any performance difference. Any idea how to benchmark that?
I did benchmark using ethernet tester. It generate traffic against loopback. Each is connected to another interface (not in bridge). How you did enabled CPU hotplug? I can do the benchmark.
CPU hotplug can be enabled from the kernel menuconfig. But other than being able to put cores offline and online I see no difference.
I think manually changing the interrupt to e.g. CPU2 might make a difference since “only” Ethernet will use that core. Main user land will continue on CPU0. But, including fast path patches, I don’t think any performance difference can be measured.
Getting a multi core enabled or multi queue driver might help a lot in balancing the whole system. Not sure if the hardware needs to support that feature.
On older linux kernels, switching on CPU HOTPLUG made the kernel configure APIC correctly. It should not make a difference on newer kernels, but since we are running on embedded (limited) devices, I thought maybe it would make a difference, since it was disabled by default (most likely because most SoC's are still single core, so it doesn't make sense to have it enabled).
Which leaves it a driver "issue". If someone could make it multiple queue, with each queue its own interrupt, it might make a difference. Still, it might be a hardware limitation (not an expert on this).
You can try something like:
echo "2" > /proc/irq/10/smp_affinity
But, given your original /proc/interrupts list, you only have ethernet as main interrupt, no wifi or anything like that that might help to switch it to a different CPU / core. Ethernet performance will benefit more from HW-NAT or something like the fast-path patches.
I'm not sure if the SQM scripts can run multi-core, or at least be offloaded to a different core. That might improve performance.
I've same router, and I tried also to balance the IRQs, but no luck.
I'm also confused about the way the CPUs get numbered when I use cat /proc/interrupts.
The CPU MT7621A has only 2 cores, but it shows 4 (2 threads per core I guess). So my guess is that balancing between the two threads of a single core is not so interesting. So just balancing among the 2 cores is good enough, but how I know if core 1 is threads 0 and 1 or 0 and 2 or whatever?
Furthermore:
if I try echo "4" >/proc/irq/10/smp_affinity, the IRQs go to CPU2 in cat /proc/interrupts.
echo "3" >/proc/irq/10/smp_affinity goes to CPU0
echo "2" >/proc/irq/10/smp_affinity goes to CPU1
echo "1" >/proc/irq/10/smp_affinity goes to CPU0
echo "0" >/proc/irq/10/smp_affinity goes to CPU0
echo "8" >/proc/irq/10/smp_affinity goest to CPU3
So it looks like it is an hex digit, binary 4 digits, and the lower order bit set first is the one that selects the CPU/thread.
I didnt study the datasheet of 7621, so I dont know the number of cores exactly, but it has more than 1 physical HyperTrading should be the answer. I was tested distribute the interupts of into two virtual cores and this scenario has no sense. But if you distribute IRQ between physical cores, throughput should be more better
I did today and it is clear to me that it has 2 cores and each core chan handle 2 threads, so it appears as 4 virtual CPUs to the OS, so I think it makes sense (in my case as I only have many IRQs in the Ethernet and one of the radios, the other one - 2.4 - is disabled because it doesn't work as it frequently make the router to reboot) to distribute the most consuming or frequent IRQs among the 2 cores and may be smaller ones into the 2nd threads of each core.