Brand new to this and complete n00b. Started messing around with my Zyxel NGB6817. I got OpenWrt installed and setup with Luci and a few other basics.
I'm having a few issues getting max performance. I have 1GB up and down, confirm at the modem.
I configured a few things including Software Flow Offloading, Hardware Flow Offloading, Irqbalance, and set the CPU govenour to performance. All based on other recommendations I found.
I'm still seeing my up and down closer to 100MB (up to about 200MB).
Am I missing a setting or simple setup or is this a result of not being able to use the built in HW optimizations?
For using OpenWrt, the ipq806x target isn't quite capable to route at 1 GBit/s linespeed - it should do more than 100 MBit/s (depending on your expectations for latencies probably between 350-400 MBit/s up to ~600-650 MBit/s, with software flow-offloading enabled), but 1 GBit/s is a bit above its abilities (you would need mvebu or x86_64 for that, perhaps ipq8074 in the future). Especially with 1+ GBit/s fibre connections, you're quickly leaving the prosumer space and are fairly far into enterprise territory and converging towards x86_64.
Given that the two 1.7 GHz KRAIT300 derived ARMv7 cores can't keep up with routing at 1 GBit/s linespeed, the vendor has added two 800 MHz little-endian ubicom32 derived NSS/ NPU cores for offloading most of the networking (NAT, routing, PPP, even higher level protocols like IPsec) into hardware (well, a firmware blob running on these NSS/ NPU cores). With the help of this semi-closed NSS subsystem (free, but non-mainline, driver with a closed/ proprietary firmware running on the NSS cores), the various OEM firmwares can go beyond the performance limits of the ARMv7 SOC and actually achieve routing at 1 GBit/s line speed (at least for the protocols supported by the NSS firmware). OpenWrt currently does not take leverage of these (this would be an opportunity for a hardware flow-offloading driver, but writing this won't be for the faint of heart), leaving them dormant and using the ARMv7 cores exclusively, which means you won't achieve the same throughput as the OEM firmware for your device.
yes, at least for kernel 4.14.x (which is the current default in master or the openwrt-19.07 branch, while openwrt-18.06 was still on kernel 4.9.x), it's not available in earlier kernels and currently broken in newer ones (for all targets).
currently not available for any target but mt7621, so not for your device.
yes
Yes, this and setting the lower clockspeed boundary to 800 MHz (instead of 384 MHz) will help.
Yes.
That said, in a very synthetic local benchmark (iperf3), doing routing and NAT between two local servers (one on WAN, one on LAN) with 1 GBit/s ethernet and DHCP, my nbg6817 (using kernel 4.19.x and only irqbalance && setting the lower clockspeed boundary to 800 MHz && the up_threshold of the ondemand scheduler to 20) doesn't fare that badly either:
But that's just that, a synthetic benchmark which doesn't take real world complications (PPPoE?, latencies, SQM, etc.) into account, so I really don't want to spread this as gospel. If you do want a conservative estimate for ipq8065's practical WAN-to-LAN throughput under OpenWrt, it will be somewhere between 350-400 MBit/s. Chances for NSS/ NPU support[0] for OpenWrt in the future are non-zero, but very low.
--
[0] Don't consider NSS/ NPU support to be a magic bullet, while it does provide a significant speedup, it can only offload protocols supported by the proprietary NSS firmware - and to the extent of its abilities. Bugs inside the NSS firmware would be 'unfixable', things like SQM simply aren't supported (meaning it would fall back to mere software routing again in the best case, or break spectacularly if the NSS firmware doesn't let go voluntarily).
@slh I realize I won't get to the high end, just wanted to make sure I was configured correctly to get the max performance I can. I'll disable the HW offloading since it is not supported.
As for setting the lower boundary and up threshold what is the command to do that? I couldn't find it, but likely just don't know where to look.
irqbalance depends a lot on the version of OpenWrt you're running, as it only pretty recently gained a procd initscript. If you are on a recent snapshot version, you can simply toggle it to be enabled into /etc/config/irqbalance:
For older versions you need to start it manually (or add it to /etc/rc.local)
The lower clockspeed boundary can be found in /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq (for each core individually), you can change the value (in kHz):
Check /sys/devices/system/cpu/cpu1/cpufreq/scaling_available_frequencies for valid frequencies (which differ between ipq8064 and ipq8065, the nbg6817 is a ipq8065 device; the ondemand scheduler defaults to the lowest frequency as base clockspeed):
These tweaks are for the (default) ondemand scheduler, aside from irqbalance neither of these (equivalents) should be necessary for the performance scheduler, as that keeps the clockspeed always at 100% (1.725 GHz).
All of these settings only affect the running system and don't survive a reboot (you can add them to /etc/rc.local, above the exit 0 line), but I'd suggest to keep it non-permanent for the time being.
In my experience those settings are everything needed, as the ondemand scheduler is rather slow to ramp up clockspeed under load (especially for bursty load, which is rather common for normal internet usage at home). Without them, you'd rather see the throughput vary between ~650-840 MBit/s during an iperf3 run.
But again, I can only test those speeds in synthetic benchmarks using artificial loads, as my WAN speed is (currently) limited to VDSL2 @100/40 MBit/s, which the nbg6817 can take of easily.
Thank @slh. I'm on a snapshot from today, so have the latest. I have everything configured. Will run some more benchmarks, but looking better (up in the 600MB range).
i also have a patch set that fixes the L2 cache scaling on ipq806x: R7800 cache scaling issue. i need to tidy it up to get it merged. i am not sure if it will have much effect on NAT though.