Single Thread Performance vs Multiple Cores

Quick processor showdown...

In the Red Corner:
Intel i3 4160T
Speed: 3.10GHz
Cores: 2

In the Blue corner:
Intel i5 4590T
Speed: 2 GHz
Cores: 4

All other specifications for these chips are virtually identical. Both have 35W TDPs.

My biggest processor hogs are OpenVPN and SQM. Will OpenWRT benefit more from additional cores (threads) or from the faster base frequency?

OpenVPN in one thread and SQM in another leaves not much processor power for additional stuff. Although SQM is single threaded I think this is per interface, so if you have simultaneous up and downloads, additional threads are very good. I'd also think that the i5 does more per cycle than the i3 due to better caches and things.

what's the single core passmark on each?

According to the specification comparison at Intel product comparison the main differences for OpenWrt seem to be:

2 core + SMT vs. 4 core
3.1 GHz fix vs. 2 GHz - 3 GHz with turbo
3 MB cache vs. 6 MB cache
no VT-d vs. VT-d
ECC vs. no ECC

For pure single thread performance, the i3 might have a slight advantage as the i5 runs at up to 3GHz.
That is as long as the load is 'light' enough to fit into the smaller cache of the i3.

On a system with more load the cache of the i5 and the real 4 CPUs will probably outperform the i3.

If you want to use virtualisation, the VT-d might be quite nice to have.

If you want to use ECC memory, you have to take the i3 (not even an i7 like the 4765T supports ECC)

Passmark...
i3 4160T: Multi-Thread: 4473 Single-Thread: 1804
i5 4590T: Muilti-Thread: 5627 Single-Thread: 1687

Feels the i5 doesn't have a great deal of advantage with the 2 extra cores there. I wonder if PassMark runs at the 'TurboBoost' speed.

Yeah - it's one of the rare i3s that supports ECC (it was actually the processor from my old NAS which recently got an upgrade). However my OpenWRT motherboard doesn't support ECC so it's a moot point.

I guess the i5 is probably the right choice - I assume that the Kernel in OpenWRT supports 'TurboBoost'?

In the 19.07 branch as well as in master CONFIG_X86_INTEL_PSTATE is set to y in the generic subtarget kernel config, so I think the turbo states work 'out of the box'.

The i5 can clock up to 3 GHz and possibly as low as 2 GHz, depending on a variety of factors. In your given Passmark single thread results, the numbers look like the i5 ran close to its max. frequency at ~2.9 GHz.

Given that both CPUs have a GPU as well, which your router probably will not exercise you might have a bit more thermal headroom for running the i5 at all-core turbo frequencies, so I would probably try the quad-core i5...

Hard to say, both are beneficial, but with both SQM and VPN I would assume that the quad core should give you more ooomp...

The i3 has hyperthreading (i.e. each core shows up as two in the system).

For CPU-intensive jobs it's better to disable HT in it or you risk that your applications go on two different "virtual cores" that are in fact a single hardware core, wasting half of your CPU processing power.

Note that Intel's "Turbo boost" is not permanent high frequency speed unless the motherboard hacks around that limitation (many do, it's either a feature you can enable in the BIOS or always on), also it can reach the higher frequency only by disabling cores, so yeah it can clock up to 3 GHz, but it will be a dualcore or even single core when doing that, so what's the point of an i5 then.

Also note that Intel's TDP number is calculated without Turbo Boost so it may very well be that the i5 with Turbo Boost isn't a 35w part anymore.

Afaik OpenVPN is not really multicore, but most of the heavy lifting is done by the AES-NI crypto hardware accelerator that both processors have. That's the part that will make the biggest difference, as can be seen in the general OpenVPN performance graph in the wiki https://openwrt.org/docs/guide-user/services/vpn/openvpn/performance

Imho I'd go with the i3 as it has the highest base frequency and no "turbo" shenanigans, and try disabling HyperThreading in the BIOS and see if it makes any difference.

I wouldn't quite go along with this statement, yes there are security reasons to disable HT (meltdown, spectre and the various related issues) - but from a purely performance point of view, HT is almost always an advantage. Not quite as much as a real dedicated core, but almost always more than a ~30% (in some cases even significantly more) advantage. While it's certainly possible to envision a usage scenario that 'optimizes' against HT (inducing many cache misses but few context switches), but that's a highly artificial scenario.

On Intel hardware, HT is considered non-fixable in regards to meltdown/ spectre related issues, so it does make sense to disable it for security sensitive devices, but it still provides a noticeable performance boost in almost all cases. AMD CPUs do not share these issues, at least not to this extent.

1 Like

I meant a situation where applications you are running aren't properly multithreaded and/or you don't have a whole lot of other processes running in the system.

If your application is heavily multithreaded it's fine as its threads will be allocated all over the virtual cores available, but if you have single or even just two threads and the system places it in the second "virtual core" of a single hardware core it's a problem.

Core pinning (manually assigning a process to a hardware core) is a thing for these situations.

They are mostly irrelevant for a firewall/router that is not running VMs or untrusted software.

Yes, that is correct - if you use the device purely as a router with few dynamic loads (e.g. not multiple VPN connections being terminated on the router, no fileserver tasks (which shouldn't be considered for a router in the first place)), but especially x86_64 lends itself particularly to 'overloading' (outside of commercial environments, where using dedicated servers is more easy to defend).

in this comparison and your stated workload i'd strongly argue for the i3 (higher base freq)

Thanks for all in the input. In the absence of any consensus one way or the other, I decided to conduct some very unscientific tests. I forgot to mention the router does run a single VM also (A Debian 10/nginx DMZ reverse proxy) - but that does not consume much resource compared to OpenVPN/SQM.

The outcome was that the i5 showed significantly lower average core usage when routing at line-speed (500mbps internet connection) than the i3 in hyper threading mode. The i5 averaged about 10-15% core usage, the i3 was in the 25-30% range. I did not have the patience to go back and test the i3 in non HT mode.

Now - you could, probably correctly, argue that the difference there is almost entirely consistent with the fact the 4 threads are running on 2 cores with the i3 - hence doubling the usage of the physical cores. However my theory is that if the 'max' workload is not anywhere near pushing the limit of a single core (hence hitting the frequency limitation of the i5) then having the extra physical core capacity will be the greater benefit in the long run.

I also went on to ponder that, as routing is a physical bus activity, rather than an entirely on-core compute activity, Hyper-threading is unlikely to improve this type of workload, as the workload is constrained by contention on access to the bus and two virtual cores cannot simultaneously access a physical bus.

Either way. I went with the i5 and I'm happy with the decision!

3 Likes