I know this thread is a couple months old, but I spent a lot of time figuring out how to get the most out of the dynalink WRX36 hardware, so Ill give a few pointers (if nothing else for anyone else seeing this thread in the future).
COMPILE FLAGS: for target / kernel optimization flags, i use
-O3 -mcpu=cortex-a53+crc+crypto+rdma
This uses all (well, most) of GCC's optimizations, enables all the "performance oriented but not enabled by default" arm cpu extensions, and tells gcc to tailor the code specifically for the cortex-A53 and not to care if it runs on anything else.
Note: dont use -march=...
or -mtune=...
These will only serve to make the compiled code more "general", and will potentially avoid using some higher performance instructions in favor of being compatible with more CPU's. If you only care if it runs on the dynalink wrx36 then just use -mcpu
.
KERNEL TWEAKS: I modify my kernel config in many ways, but here are a few that (i think) have a larger impact:
- making the kernel fully preemptable
- setting the interrupt timer to 1000 hz
- enabling some of the hw offload capabilities in the "networking" configs
- adding all the compatible hw accelerated crypto routines (everything except the ones labled "arm 8.2 crypto")
Warning: make kernel_menuconfig
probably doesnt work the way you think/want it to work. See this thread for details, but basically if you just run
make kernel_menuconfig; make
your kernel is going to end up missing a bunch of stuff. At minimum run
make kernel_menuconfig target/linux/clean prepare; make
Even better still, use the script in the thread linked above.
NSS BUILDS: This isnt a well known fact, but the cortex-a53 isnt the only chip that can process data in the dynalink WRX36...there is a secondary "NSS" network processing unit (NPU) as well. Mainline openwrt doesnt really use this (yet), but thanks to the hard work of a few people it is quickly becoming possible to enable it.
See this thread for details.
it is still in "beta" but I just built a custom image with NSS enabled and almost everything works (still troubleshooting ksmbd and unbound). But cpu usage from maxing out my gigabit connection has been significantly reduced.
For example: previously running something like speedtest-netperf.sh
on my gigabit connection would take up ~40% of the overall CPU time between the 4 cores, with CPU0 being constantly at >90% utilization. On the new NSS build it has dropped to ~13% overall utilization and <30% CPU0 utilization...a nearly 70% reduction in CPU usage for the same task!
When I get the last remaining kinks ironed out ill probably post a public release of the build. If you want to try yourself here are the .config and kernel .config's im using. clone this openwrt fork.
hope this info was is of use to you (or at least is useful to someone out there).