Optimized NAS-Latency-Memory build for the Linksys EA6350v3 (civic)

Note: This software is cutting edge and should not be used under production. Overclocking will not burst your SoC in flames and it should run smooth if you don't live in hell (or in Spain during summer). Beware, tho, that it might introduce unknown stability problems. Maybe there is a reason why OEM didn't commited... maybe they just didn't want to give you free performance...

Note: This build has a patched VLAN which works as expected under LuCI.

Warning: The full notes are available inside the latest release (fc44a-v0.20). You will be forced to read it, as the actual link to download is all the way to the bottom.

Featues:

  • Built from master (or trunk for SVN guys!)
  • Built using GCC 9 (latest stable).
    • -O3 optimization, globally enabled. The most agressive optimization.
  • Fully preemptive, tickless @ 500 Hz kernel for minimum latency.
  • GNU bash as the default shell and nano with syntax highlights, for a more pleasing administration experience.
  • Minimum clock frequency to 200 MHz with schedutil governor to avoid lag spikes.
  • All operating points enabled and CPU clock latency improved to match OEM value. Along with schedutil, the CPU scaling should be optimal.
  • Enabled the latest Linux elevators for optimized NAS loads.
    • mq-kyber by default for non-rotational multiqueue.
    • mq-bfq by default for rotational multiqueue and UAS.
    • noop for single queue devices (i.e. ubiblock and the internal flash).
  • Enabled the most optimized and reliable Linux filesystems (ext4, f2fs and xfs) and support for exfat. ntfs can be manually installed from the packages that are distributed with the flashable ROM.
  • Enabled extroot for the OEM partition for an extra of 42 MB to install packages.
  • A sensible collection of packages and kernel modules preinstalled and fully compatible with OpenWrt's packages (install from LuCI). Kernel packages are not compatible.
  • Ulta Kernel Same-Page Merging (UKSM), zswap (lz4hc with Sony's z3fold) and zram (zstd) for better memory usage. This allows users to run large adblock and ban-ip lists, samba4, minidlna and LuCI, at the same time, without swapping to a slow disk or without dropping the expensive SQUASHFS blocks (which are slow to read and hungry to decompress).
  • A custom set of scripts for:
    • Reverting back to stock (one of the most useful features).
    • Testing calibration files for the wireless driver to optimize the wireless performance under hardware variations.
    • Using the OEM partition for data and packages for an extra of 42 MB of storage... for free!
    • Configuring the firmware, things such as using zstd for zram and using 192.168.50.1 to avoid network collisions.
    • Fine tuning the memory subsystem and the priorities, niceness, OOM scores and sheduling policies of the system processes.
  • Working VLANs: the VLAN configuration in LuCI works as expected in this build.
  • unbound installed and preconfigured, instead of dnsmasq, for a fast (caching), secure (DNSSEC) and private (DNS over TLS) DNS resolver (currently, working as a forwarder to quad9. See https://quad9.net)
  • Overclocking: OEM experimented with overclocking to 820 MHz. They didn't commited. Two versions availabe: stock and overclocked.
    • Note that overclocking will not increase the performance, it is in fact a technique for reducing latency.

Enable the "Watch" button in GitHub to get in touch with the latest improvements!


Known issues and feedback here.

7 Likes

Great work @NoTengoBattery

Due to a patch in the 4.19.73 kernel, I'm now announcing that this device now supports hardware acceleration for OpenSSL/OpenVPN.

The kernel patches is not the only thing needed. Kernel drivers are needed and some configuration hacking is also needed. Currently, the hardware acceleration is fully supported and no extra configuration is needed (however, you can disable the acceleration in /etc/init.d/bootz) for this build.

Please: never enable hardware acceleration for SSH because it simply does not work.. Also, the hardware acceleration is disabled inside the Failsafe Mode, as the configuration needed to enable it in the applications is done during the bootz's first boot (which does not apply on Failsafe Mode).

The applications running with hardware acceleration are:
v0.16+: openvpn, hostapd, uhttpd and unbound.

Please don't do this,


https://wiki.gentoo.org/wiki/GCC_optimization#-O

There is no real reason to avoid it. This software is just for convenience and not for real use, as for any snapshot of OpenWrt itself.
Therefore thanks for your suggestion (it sounds like an order), but I will just ignore it as long as no fatal bugs are found. The software is fully usable and no major bugs have been found due the usage of -O3 during the whole development (more than half a year).
Said that, I see no benefit from removing it as I see no benefit from having it.


Also: that was true in the old days of GCC 4.x (I know it, I cooked ROMs for Android). GCC since version 6.x will handle everything good because Linus pushed pressure over GCC's team. You know how is he... https://www.youtube.com/watch?v=_36yNWw_07g

Working with VLAN

As the result of the problem of using VLAN in this device, I think I’ve come to a solution. More testing is needed and feedback is required.

You can read the details here: IPQ40xx Switch Config "Strangeness"

And provide feedback either in this thread or in the GitHub repo (I prefer the latest).

Inside the OEM firmware, I found this in /etc/init.d/perf-test.sh:

}
setnonkrait_perf() {
    #Quad core CPU for IPQ40xx
    echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 
    echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor 
    echo "performance" > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor 
    echo "performance" > /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor 
    echo "710000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
    echo "710000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
    echo "710000" > /sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq
    echo "710000" > /sys/devices/system/cpu/cpu3/cpufreq/scaling_min_freq
}

product=`cat /etc/product`

And after testing, I found this results when the device uses the "power saving" frequency:

root@EA6350v3:~# for i in /sys/devices/system/cpu/cpufreq/policy0/scaling_{min,max}_freq; do echo 48000 > $i; echo $i: $(cat $i); done
/sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq: 48000
/sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq: 48000
root@EA6350v3:~# ping localhost
PING localhost (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=64 time=4.100 ms
64 bytes from 127.0.0.1: seq=1 ttl=64 time=3.278 ms
64 bytes from 127.0.0.1: seq=2 ttl=64 time=3.050 ms
64 bytes from 127.0.0.1: seq=3 ttl=64 time=2.749 ms
64 bytes from 127.0.0.1: seq=4 ttl=64 time=4.296 ms
64 bytes from 127.0.0.1: seq=5 ttl=64 time=3.492 ms
64 bytes from 127.0.0.1: seq=6 ttl=64 time=2.934 ms
64 bytes from 127.0.0.1: seq=7 ttl=64 time=3.324 ms
64 bytes from 127.0.0.1: seq=8 ttl=64 time=3.385 ms
64 bytes from 127.0.0.1: seq=9 ttl=64 time=7.103 ms
^C
--- localhost ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 2.749/3.771/7.103 ms

And after testing, I found this results when the device uses the "performance" frequency:

root@EA6350v3:~# for i in /sys/devices/system/cpu/cpufreq/policy0/scaling_{min,max}_freq; do echo 716000 > $i; echo $i: $(cat $i); done
/sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq: 716000
/sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq: 716000
root@EA6350v3:~# ping localhost
PING localhost (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=64 time=0.466 ms
64 bytes from 127.0.0.1: seq=1 ttl=64 time=0.295 ms
64 bytes from 127.0.0.1: seq=2 ttl=64 time=0.276 ms
64 bytes from 127.0.0.1: seq=3 ttl=64 time=0.277 ms
64 bytes from 127.0.0.1: seq=4 ttl=64 time=0.321 ms
64 bytes from 127.0.0.1: seq=5 ttl=64 time=0.281 ms
64 bytes from 127.0.0.1: seq=6 ttl=64 time=0.300 ms
64 bytes from 127.0.0.1: seq=7 ttl=64 time=0.291 ms
64 bytes from 127.0.0.1: seq=8 ttl=64 time=0.298 ms
64 bytes from 127.0.0.1: seq=9 ttl=64 time=0.305 ms
^C
--- localhost ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 0.276/0.311/0.466 ms

By only doing this the ping to localhost reduced it's latency 10 times. This is an important setting, the governor used by default cannot deal with this because "ping localhost" (or any ping from any client) does not load the CPU enough. This results in ping spikes even whit SQM and noticeable with low speed (but low latency) WAN. Therefore, I will be changing the governor to schedutil instead of the performance used by the OEM and I will change the minimum CPU frequency to 200000, so it can save more power and run cooler than the OEM but providing performance at the same time.


Also, I've introduced some "optimizations" in the kernel to try to reduce latency, specially useful when using SQM (or other software stacks that run in the kernel most of the time such as OpenVPN).

Of course, this is not magic and the device just can't with some loads (i.e. 30MBPS max. for OpenVPN even when hardware acceleration is enabled).

1 Like

Ok, I don't mean to ask a dumb/noob question but is this ready to go for hosting an external HDD as an FTP server/media server?

I'd like to wipe my EA6350v3 and load this but I don't want to lose the linksys partition and I'm afraid of bricking the router as well!

Thank you

No, it doesn't contain minidlna, ftp server or such and no ntfs support (which I wouldn't recommend using anyway).

Ok so I would need to download FTP server and minidlna.

Downloading those while in Luci still overwrites the other partition, correct?

And do you know if the bin made by NoTengoBattery is any better than the wiki?

Being a custom firmware it's unlikely to work with the official repo but I guess you could try...

You are totally wrong. It works with all packages from OpenWrt, but kernel packages.
If that's the case anyway, the ROM includes all installable ipks. I'm not new at this, dear. I know what I do.

@sassriverrat the router has full USB support already but the official OpenWrt image does not have installed the SCSI driver (needed for most forms of USB storage). Talking about my distro, it have full support for USB storage, but not all filesystems are supported out of the box.

Regarding to minidlna, it is a huge package and the device only have a total of 30MB effective of ROM, it probably won't install and you will need to use extroot.

Also: regarding to reverting back to Linksys, I've developed and @bill888 has documented a way to reverting back without problems or known risks.


The firmware contains samba if you are interested, and I may build a version with minidlna and any filesystem preinstalled. But you still need to use a swap device because the device only have 256MB of RAM.

While optimization is a good thing in most cases you will run into situations where it causes issues.

You're not applying to "just" the kernel, you're applying to everything except for the few packages that strips the optimization flags and this will bite you in the end for sure. Issues like "O3 causes segmentation fault, O2 works" are not uncommon and is one of many reasons why the majority of distributions doesn't use it by default. This blog post also touches the issue on why it's not a great idea to apply blindly. https://developers.redhat.com/blog/2018/03/21/compiler-and-linker-flags-gcc/

As for packages the same rule applies as with any other firmware built using master, it will get desynced and it's usually just a matter of days before you'll get a library (package) mismatch. RH also mentions that O3 might not be fully ABI compatible.

Regarding minidlna it'll work just fine with that amount of space and leave alot free for other software.

Still speaking about x86 which have many ABIs. ARM uses the AAPCS ABI which is standardized and respected by GCC (in particular the GNU AAPCS). In fact, that being true the hand made assembly of OpenSSL won't work and it's not the case.

In x86 you should care about the ABI because macOS uses a custom sysv ABI, Microsoft has it's own PE ABI and Linux uses the gnu-sysv ABI.

The kernel ABI is stable and standardized so no problems when talking to the kernel using either sysfs, procfs or syscalls including sysctl.

Also: clang is ABI incompatible with GCC in Windows, but works just fine macOS, Linux x86 and on Linux ARM. The only risks about the ABI in Linux ARM is about using programs with mixed hard or soft float. That does not make Clang buggy and also makes no O3 buggy un GCC.

This build uses the same ABI and float ABI as the OpenWrt and therefore all OpenWrt packages work. Still seeing not problem or valid reason.

And let me ask you: did you ever tried using this build? You don't tust? Build one yourself with O3 and let's see if it breaks something. Till then, I'll kindly ask you to stop being noisy.

Thanks for your interest.

In fact, dear @diizzy, Linksys itself builds it's firmware using O3. That's why I included it in my configuration and following OEM's practices I've built a fully preemptive kernel.
They even overclocked the SoC and yet not burnt out devices or reports of random crashes using the OEM firmware so far.

Ok....so how is it that linksys is able to put a dlna server in their OS? Just curious. I haven't gotten a very definitive on either direction in terms what is better for the long term (I can learn a UI and Luci wasn't too bad...although I still like dd-wrt!)

The firmware gets compressed during build, it's a SQUASHFS read-only image. When you install it appart, you get it inside a read-write UBIFS partition that is "appended" to the firmware image, that fills the 30 MB but it's fully uncompressed. So you won't get all of these 30 MB, if the image weights like 20 MB then you will only get 10 MB.

Also the OEM uses another partition, called syscfg which is 42 MB long and fully empty, we could use it but it will cause some "problems" as OpenWrt itself is not designed to use that space.


If you want, I can build one with DLNA installed within the compressed firmware.

@NoTengoBattery

What do you recommend? I'm very open ears.

As I said, my intentions are as follows:

  1. 2.4GHz radio will be on, no wan connection (so no internet access) and only local access with a USB external hdd attached that's full of movies.

  2. 5GHz radio will be on, wan connection (internet access) AND have access as well to the USB external HDD.

  3. External is currently formatted NTFS and is 4tb in size but I can change format. Ideally something windows recognizes as the movies are most likely to be added (well new movies added) by plugging into a laptop and loading them on.

Most critical- users on 2.4 radio can only get onto the HDD and cannot get the internet connection. 5ghz has internet access. I'm working on a satellite connection for the wan- the ethernet connection to the modem is obviously good but from the modem to internet is shaky....so if that were to somehow cause instability in the router, that's an issue. (the linksys doesn't always like it)

I almost just wish I had two routers...it would make the internet setup easier, but that's how I found OpenWRT!

Well, let me help a bit with some recommendations:

  1. For setting WLAN as you want you'll probably need to do heavy modifications to the LAN interface, I think you will need to add a second LAN just for local access (non routed LAN). Anyway, with 2.4GHz the maximun bandwith of 300 MBPs will be only 37.5 MB/s to the disk.
    1.1 I do recommend asking about it in the https://forum.openwrt.org/c/general section because it have less to do with the firmware (the Linux OS running behind the scenes) you use but more on how you use OpenWrt itself.
  2. I would not recommend NTFS for a NAS. It's very CPU inefficient and will cap the bandwith of the disk to 10 MB/s if you are lucky, while pushing the CPU at 100% (I've already tried). I would recommend using XFS (a lightweigth, low overhead filesystem for Linux) and for reading/writing to the disk in Windows/macOS clients you can use Samba (Network Shares).
  3. For DLNA to work, specially if you have huge collections, you will need physical swap for the first run as minidlna indexes all files at once in RAM. The device has 256 MB of RAM which may invoke the oom_reaper which will kill either Samba or minidlna.
    3.1 By default, OpenWrt stores the DLNA database inside the RAM's tmpfs which will make the RAM problem just worse. I recommend storing the DLNA database inside a real storage that will survive reboots (thus avoiding recreating large databases every time you reboot).
    3.2 The default OpenWrt will look like crashed (no network traffic at all) while indexing, but this build is optimized for avoiding this from things I've learnt from the Linsys firmware. So I will recommend you using this build if OpenWrt does not seems to be fluent enough.

It's totally doable, the harder part is to get those details in hand. I've saved you a lot of headache if you follow my recommendation.

Note: current builds does not include DLNA, but I can build one ready-to-go with DLNA (I've be using it myself, other users can just disable it if they don't use it).

hmmm... all sounds good thus far.

So to ask further questions-

  1. By modifications, can these be done inside of Luci or whatever your interface is? I was able to setup the system I wanted (just without the external storage because I couldn't get the HDD recognized) in terms of 2.4 and 5ghz access on the router that I bricked. If you mean modifying by using command line, I'll definitely have to seek help what commands I'll need....
    That bandwidth is 37.5 across all devices, right (meaning 10 devices could split it down to 3.7 MB/s each....right)? I guess it doesn't really matter, that's all I have to deal with.

  2. XFS- is Samba a program? I need something that Android and iOS users can use as well.....thus I was running the FTP server using the linksys OS.

  3. Yes, my collection is about 2tb total right now, but always growing....
    So in regards to 3.1- can this all be done using JUST the router and external HDD? Due to mounting and equipment restraints, I don't have the ability to put anything else in....at least not at the moment. There may be the ability in the future to put a dedicated PC to use for the media server but not right now.