Bandwidth Issues on Zyxel NGB6817

Hi All,

Brand new to this and complete n00b. Started messing around with my Zyxel NGB6817. I got OpenWrt installed and setup with Luci and a few other basics.

I'm having a few issues getting max performance. I have 1GB up and down, confirm at the modem.

I configured a few things including Software Flow Offloading, Hardware Flow Offloading, Irqbalance, and set the CPU govenour to performance. All based on other recommendations I found.

I'm still seeing my up and down closer to 100MB (up to about 200MB).

Am I missing a setting or simple setup or is this a result of not being able to use the built in HW optimizations?

Any help would be greatly appreciated.

For using OpenWrt, the ipq806x target isn't quite capable to route at 1 GBit/s linespeed - it should do more than 100 MBit/s (depending on your expectations for latencies probably between 350-400 MBit/s up to ~600-650 MBit/s, with software flow-offloading enabled), but 1 GBit/s is a bit above its abilities (you would need mvebu or x86_64 for that, perhaps ipq8074 in the future). Especially with 1+ GBit/s fibre connections, you're quickly leaving the prosumer space and are fairly far into enterprise territory and converging towards x86_64.

Given that the two 1.7 GHz KRAIT300 derived ARMv7 cores can't keep up with routing at 1 GBit/s linespeed, the vendor has added two 800 MHz little-endian ubicom32 derived NSS/ NPU cores for offloading most of the networking (NAT, routing, PPP, even higher level protocols like IPsec) into hardware (well, a firmware blob running on these NSS/ NPU cores). With the help of this semi-closed NSS subsystem (free, but non-mainline, driver with a closed/ proprietary firmware running on the NSS cores), the various OEM firmwares can go beyond the performance limits of the ARMv7 SOC and actually achieve routing at 1 GBit/s line speed (at least for the protocols supported by the NSS firmware). OpenWrt currently does not take leverage of these (this would be an opportunity for a hardware flow-offloading driver, but writing this won't be for the faint of heart), leaving them dormant and using the ARMv7 cores exclusively, which means you won't achieve the same throughput as the OEM firmware for your device.

yes, at least for kernel 4.14.x (which is the current default in master or the openwrt-19.07 branch, while openwrt-18.06 was still on kernel 4.9.x), it's not available in earlier kernels and currently broken in newer ones (for all targets).

currently not available for any target but mt7621, so not for your device.

yes

Yes, this and setting the lower clockspeed boundary to 800 MHz (instead of 384 MHz) will help.

Yes.


That said, in a very synthetic local benchmark (iperf3), doing routing and NAT between two local servers (one on WAN, one on LAN) with 1 GBit/s ethernet and DHCP, my nbg6817 (using kernel 4.19.x and only irqbalance && setting the lower clockspeed boundary to 800 MHz && the up_threshold of the ondemand scheduler to 20) doesn't fare that badly either:

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   111 MBytes   928 Mbits/sec    0    551 KBytes
[  5]   1.00-2.00   sec   112 MBytes   936 Mbits/sec    0    578 KBytes
[  5]   2.00-3.00   sec   102 MBytes   854 Mbits/sec    2    731 KBytes
[  5]   3.00-4.00   sec   110 MBytes   923 Mbits/sec    0    761 KBytes
[  5]   4.00-5.00   sec   111 MBytes   933 Mbits/sec    0    761 KBytes
[  5]   5.00-6.00   sec   110 MBytes   923 Mbits/sec    0    854 KBytes
[  5]   6.00-7.00   sec   101 MBytes   849 Mbits/sec    0    880 KBytes
[  5]   7.00-8.00   sec   109 MBytes   912 Mbits/sec    0    885 KBytes
[  5]   8.00-9.00   sec   111 MBytes   933 Mbits/sec    0    887 KBytes
[  5]   9.00-10.00  sec   111 MBytes   933 Mbits/sec    0    887 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.06 GBytes   913 Mbits/sec    2             sender
[  5]   0.00-10.00  sec  1.06 GBytes   910 Mbits/sec                  receiver

But that's just that, a synthetic benchmark which doesn't take real world complications (PPPoE?, latencies, SQM, etc.) into account, so I really don't want to spread this as gospel. If you do want a conservative estimate for ipq8065's practical WAN-to-LAN throughput under OpenWrt, it will be somewhere between 350-400 MBit/s. Chances for NSS/ NPU support[0] for OpenWrt in the future are non-zero, but very low.

--
[0] Don't consider NSS/ NPU support to be a magic bullet, while it does provide a significant speedup, it can only offload protocols supported by the proprietary NSS firmware - and to the extent of its abilities. Bugs inside the NSS firmware would be 'unfixable', things like SQM simply aren't supported (meaning it would fall back to mere software routing again in the best case, or break spectacularly if the NSS firmware doesn't let go voluntarily).

4 Likes

@slh I realize I won't get to the high end, just wanted to make sure I was configured correctly to get the max performance I can. I'll disable the HW offloading since it is not supported.

As for setting the lower boundary and up threshold what is the command to do that? I couldn't find it, but likely just don't know where to look.

irqbalance depends a lot on the version of OpenWrt you're running, as it only pretty recently gained a procd initscript. If you are on a recent snapshot version, you can simply toggle it to be enabled into /etc/config/irqbalance:

# cat /etc/config/irqbalance 
config irqbalance 'irqbalance'
        option enabled '1'

For older versions you need to start it manually (or add it to /etc/rc.local)

The lower clockspeed boundary can be found in /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq (for each core individually), you can change the value (in kHz):

echo 800000 >/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo 800000 >/sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq

Check /sys/devices/system/cpu/cpu1/cpufreq/scaling_available_frequencies for valid frequencies (which differ between ipq8064 and ipq8065, the nbg6817 is a ipq8065 device; the ondemand scheduler defaults to the lowest frequency as base clockspeed):

# cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_available_frequencies 
384000 600000 800000 1000000 1400000 1725000

Tweaking the ondemand scheduler's settings works via /sys/devices/system/cpu/cpufreq/ondemand/up_threshold, e.g.:

echo 20 >/sys/devices/system/cpu/cpufreq/ondemand/up_threshold

These tweaks are for the (default) ondemand scheduler, aside from irqbalance neither of these (equivalents) should be necessary for the performance scheduler, as that keeps the clockspeed always at 100% (1.725 GHz).

All of these settings only affect the running system and don't survive a reboot (you can add them to /etc/rc.local, above the exit 0 line), but I'd suggest to keep it non-permanent for the time being.

In my experience those settings are everything needed, as the ondemand scheduler is rather slow to ramp up clockspeed under load (especially for bursty load, which is rather common for normal internet usage at home). Without them, you'd rather see the throughput vary between ~650-840 MBit/s during an iperf3 run.

But again, I can only test those speeds in synthetic benchmarks using artificial loads, as my WAN speed is (currently) limited to VDSL2 @100/40 MBit/s, which the nbg6817 can take of easily.

4 Likes

Thank @slh. I'm on a snapshot from today, so have the latest. I have everything configured. Will run some more benchmarks, but looking better (up in the 600MB range).

1 Like

imo you will also want

for file in /sys/class/net/*
do
echo 3 > $file"/queues/rx-0/rps_cpus"
echo 3 > $file"/queues/tx-0/xps_cpus"
done

in rc.local. or better yet in a custom hotplug.d as it gets lost eg every time a change is made in Luci. afaik this enables RPS and XPS: https://www.kernel.org/doc/Documentation/networking/scaling.txt

i also have a patch set that fixes the L2 cache scaling on ipq806x: R7800 cache scaling issue. i need to tidy it up to get it merged. i am not sure if it will have much effect on NAT though.

1 Like

Thanks I will try that

In my experience, those settings don't make a difference at all.

in my experience it makes the sirqs more evenly distributed across the two cpus when using sqm.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.