X86_64 nat performance

Background:
I have a 250mbps connection but am only currently able to get about 100mbps via speedtest using my WD 750n running OpenWrt 15.05. I am assuming that NAT is my bottleneck (I have tried a few different servers and times of day, its pretty consistent). I am hesitant to upgrade the software on it because it might slow it down further, so I was experimenting with OpenWrt as a guest via KVM on a server. Its a Ivy Bridge dual socket server with 4 Intel GBE ports. I have hardware virtualization on and used PCIe redirection for two of the NIC ports.

Question:
I set up a stock build of 18.06, configured one of the ports to be WAN and the other LAN, when connected to a laptop I was getting ~55mbps down. I know there is some overhead in a VM, but with PCIe redirection, 8 CPU cores and plenty of RAM I expected better performance. I might try a test with it running on bare metal, but are there any ideas why I am seeing such poor performance?

That seems very slow for an x86_64. A relatively low-powered "AMD Embedded G series GX-412TC, 1 GHz quad Jaguar core with 64 bit and AES-NI support, 32K data + 32K instruction cache per core, shared 2MB L2 cache" handles 300 mbps without breaking a sweat. An Archer C7 v2, 720 MHz MIPS, single core, can handle 300 mbps NAT (perhaps more, at least without fancy SQM).

Do you have irqbalance running? Are there any hints in top or equivalent as to where resources are running thin?

Thanks for the sanity check, it looks like it was a misbehaving USB NIC. I retested with a desktop using a motherboard NIC and was seeing ~280mbps. I'm pretty psyched, I can use extra ports on an already running server to get much better internet performance. I wasn't sure OpenWRT as a KVM client was going to work well, but now that I figured out my issue it seems great.

afaik irqbalance is rather counterproductive to routing workload, as it tends to shift interrupts concerning the same data (packets) onto different cpu's which causes cache misses.

Have you tested this at all? I do run irqbalance and didn't explicitly test before/after.

What @fuller mentions makes sense I know that your usually want to pin such threads to one specific core
I doubt you'll notice any difference at gigabit speeds however on x86.

I do think the irqbalance seems helpful to offload things like disk handling or wifi radio handling to separate cores when dealing with boxes that have multiple functions such as all-in-one routers and/or NAS function.

I see it as a hacky solution to something that the kernel should handle on its own but that's just me... :wink:

1 Like

Yeah, I agree with that, but until the kernel does a good job something is needed when you serve a lot of interrupts, and routers serve a lot of interrupts.

Irqbalance identifies the highest volume interrupt sources, and isolates them to a single unique cpu, so that load is spread as much as possible over an entire processor set, while minimizing cache miss rates for irq handlers.

:+1:

I knew I read about this for FreeBSD and here we go:


If anyone has a similar document for Linux please link it, IRQ pinning doesn't seem to do much on FreeBSD at least on a rather beefy x86 system.

ok, i guess they "fixed" it then :slight_smile:

i must admit its been some time since i touched it, but my impression is still that irqbalance was introduced to solve the problem of unpredictable runtime and irq-count with lots of userspace threads (hosting).
aka optimizing for dynamic workloads.