OpenWRT and x86 AVX2/AVX512

Hi there,

I've been looking around but couldn't find anything specific and/or current.

Can OpenWRT be compiled with let's say AVX512/x86-64-v4 support? I.e. -march=x86-64-v4 or -C target-cpu=x86-64-v4 for rust.

Is there any way to enable this? I have modern x86 hardware and would like to make the most of it. There's also this which enables it for the kernel (I get I'd need to apply this patch manually but it's not the main point anyway)

So, is there a way to reliably enable higher CPU levels like v3/v4 instead of just baseline? Almost any PC releases in the last 15 years support v2 and v3 is also not uncommon (v4 is more uncommon and AVX512 would probably not provide as much benefit but still) We have older legacy builds for 586 CPUs but no build for modern CPUs specifically.

Any ideas?

There have been attempts:

But the conclusion is, that it's too much load on the build infrastructure for limited benefits. The future of the x86 subtargets pops up every now and then: https://lists.openwrt.org/pipermail/openwrt-devel/2026-April/044779.html

I see. I could probably implement that in my build since it's pretty simple.

However are there any ways to modify the rust or go build flags? The most performance hungry packages I'm using are written in Go and Rust. The CFLAGS won't do much for them.

Wouldn't it be possible to add the targets without automatic compilation? And in the wiki, a note should be included informing that the specialized target is available for manual compilation.

If you're building from source anyway, you can pass the argument as suggested: https://github.com/openwrt/openwrt/pull/16363#issuecomment-2352328823

Well, it seems that there is no need for any build system modifications as in:
Advanced configuration options (for developers) -> Kernel extra CFLAGS
You can pass whatever additional KCFLAGS you want to the kernel and thus pass your desired optimisations

You're right that this could be documented better, feel free to do so :slight_smile:

This is only for the kernel. Not the packages/OS

Everything would be possible, but you underestimate the human cost of maintaining it. This needs attention for every kernel update - and we see with the 32-bit subtargets that no one really wants to do (basically) the same job three times.

For routing specific tasks,, AVX2/ AVX512 doesn't do anything for you, but you can do your own optimized builds easily.

Fair points. I was just throwing ideas to the table... Even Ubuntu only recently started offering a v3 repo.

…and I still call bull.... on that, for several reasons.

  • there haven't really been benchmarks for this, the range of 'improvement' varies widely (so I'm not all that convinced about the -lack of- numbers, keep in mind, things which really profit from this, like ffmpeg, can already provide optimized flavours without having to build everything for different architectures)
  • the impact varies widely between multimedia like workloads (audio/ video codecs, image manipulation, etc.) and basic OS tools (kernel, libc, hostapd, all the routing tools, cli tools, etc.), which don't use AVX/ AVX2/ AVX512 at all and can't profit from enabling in the slightest
  • considering the different expected use cases, you can't really compare the findings for a general purpose distribution like Ubuntu/ Fedora/ OpenSuSE and effectively an embedded routing OS

So what would be needed first, is an actual comparison with hard numbers what the impact for this would be

  • for routing (throughput numbers)
    I expect this to be close to zero
  • for encryption heavy use cases (e.g. VPN, wireguard, OpenVPN, strongswan)
    there may be some impact, but I don't think it really makes a difference (but please, prove me wrong)
  • for things like file serving (with- or without fs encryption, luks and friends)
    for the basic file serving the impact should be more or less zero, with luks maybe a little (like above for VPN uses)
  • there is (almost) no X11/ wayland or media playback available on OpenWrt to talk about
    so the aspects with the largest expectable impact don't really exist -for OpenWrt- at all
  • there are no games at all
    another topic that could profit
  • there are no (graphical) web browsers (firefox/ chromium/ konqueror/ …) packaged for OpenWrt at all
  • there are very few more complex tools using rust, golang, python, ruby, perl, lua, etc. even less among the typically installed ones
    so at least right now, any improvements don't really matter for OpenWrt
  • qemu/ kvm is a quite advanced, niche topic for OpenWrt
    anything is possible here, the impact may be noticable - or not matter. given that within the VM, you will typically see more optimized (general purpose linux like-) binaries, so what OpenWrt itself does, matters very little
  • containerization, like lxc, podman, etc.
    more commonly found on OpenWrt, but still not a typical use case - how big the impact would be, again varies massively, depending on what you want to run inside your container - and again, you would see non-OpenWrt (more optimized) binaries inside the container, so even if those would profit from AVX*, they probably don't depend on what OpenWrt's base system provides all that much
  • …?
    please provide further examples

Most of the distributions you raise as examples above (Fedora/ Ubuntu) make these changes in order to re-define the ISA baseline, the hard cut-off for no-longer wanted 'old' systems. Neither of those want to continue supporting x86-64-v1 in the long term (or at all). For OpenWrt the situation differs quite a bit, as older systems still remain quite viable (RAM size basically doesn't matter, there are no web browsers to support, no media players, no image manipulation tools). Power consumption (depending on your location, electricity prices per kW/h might mater a lot - or -within reason- not at all) and routing throughput are the only things that really matter here, but AVX* support doesn't, at all.

Keep in mind, right now we already have four different ISA levels for x86_64, but at least I haven't seen a comprehensive breakdown how much these optimizations actually gain you

  • for different use cases
    again, base system (kernel, libc, routing stuff, cli stuff, even more advanced things like nextcloud or AGH) vs multimedia/ video/ audio/ graphics/ AI processing (which matters on general purpose linux, but not -at all- for OpenWrt)
  • between the different ISA levels
  • and which CPU's actually fall into which category
    just as (some) examples from my zoo of devices, without thinking twice, can you tell me which is which?
    • AMD64 3500+, s939
    • AMD64 X2 4200+, s939
    • AMD64 X2 4600+, s754
    • Intel Atom 330
    • Intel core2duo Q9550
    • Intel Pentium G850
    • Intel Celeron 1037u
    • Intel Pentium g2020
    • Intel core i7-2600k
    • Intel core i7-3700
    • Intel core i5-4430
    • Intel core i5-6400
    • Intel core i5-7400
    • AMD GX-212JC
    • Intel j4105
    • Intel j4125
    • AMD Ryzen Embedded R1505G
    • insert various AMD AM4 generations here
    • Intel N100
    • Intel core i7-13700

It's complicated, way more nuanced than expected - telling which CPUs belong into which category is hard - and the existing benchmarks for general purpose distributions simply don't apply for the things OpenWrt is primarily used for. Quite a few of the examples above can still be very sensible devices for OpenWrt, excluding support for them would be premature - and exploding the number of x86_64 ISAs for OpenWrt even more (in terms of buildbot load and human attention). The big desktop distros want to kill off support for old systems, the use cases those are typically used for might make that sound more tempting, but OpenWrt will never run firefox/ chromium anyways.

EDIT: if you think the example CPU list above would be just mean from me, that is a big question actual users will have, should this be implemented - and it's not at all easy to answer it.

EDIT2: really, it all boils down to a very simple question, how much of a speed gain have you actually seen, on which CPU, a) for 'typical' routing tasks, b) for encryption heavy tasks (VPN, encrypted fs), c) for your hypothetical rust tools, d) for your imaginary use cases involving virtualization/ containerization
Please, give me some before- and after numbers, I don't expect a comprehensive benchmark, just 2*two hard (and reproducible) numbers (normal routing throughput, before-after - your AVX heavy use cases, before-after).
But please, don't just parrot the (not at all comprehensive) numbers from Fedora/ Ubuntu/ OpenSuSE, they simply don't apply to the things OpenWrt is typically used for - or expect change for the sake of it.

EDIT3: I really don't want to discourage you, but I'm also very interested in some cold hard facts how much better things are on the other side, before expecting the project to invest real work for no(?) return. So please, give some easily reproducible test case to verify the impact.

Disclaimer: Not an OpenWrt developer, not in the decisive chain, nor wearing any hats here, nor having a horse in the race. The above are just my personal opinions and experiences. You don't have to convince me, but I do have an opinion (and I have been on the side of hardware support being pulled from under your feet before, it's not nice - so I really want to know about the practically attainable gains).

I'm absolutely not asking for this to be supported or even really implemented upstream. This was more of a "how could this possibly be done" question.

I don't know about other use cases but for me the core of it was VPN performance since most encryption standards can make use of SIMD.

The rust tools weren't hypothetical. I use Podman, a few of its dependencies are written in rust.

If I asked for something it would be that you can easily modify CFLAGS/RUSTFLAGS/GOFLAGS in menuconfig, currently it only seems to be possible for CFLAGS(?) but even then it's pretty confusing. But even then, somebody who would do that probably can handle editing a Makefile.

If other users want to ask for that, they can.

The following is just my personal opinion BUT: yes there are currently four x86 ISA levels. But 3 of those are old x86 CPUs in the release span of ~5 years from MMX to Geode. x86-64 has been around for >20 years and at least x86-64-v3 has been an immense jump (v2 arguably too, but v3 is the big jump really) and now covers >10 years as well. If there's any discussion about which ISA levels are worth it to maintain (yes I know there's always a human component, but I'm talking purely technical here, also OpenWRT regards legacy support highly) then a newer x86 64 bit ISA seems more important than 3 different legacy ISAs.

That doesn't necessarily mean that there'd be a tangible performance difference between AVX being available/enabled or not. Really, that's all the point about asking about some numbers.

I didn't imply there would be. As I said, primarily my usecase was VPN. The rust tooling was an afterthought.

Still, I've found out where in the rust package it can be applied and creating a patch for my source tree. I'll see if I get some numbers out of it but probably they would be negligable. It's more of a "why not" moment. I do this for fun. If I just wanted something that just works I'd be using Debian or Rocky :stuck_out_tongue: