Ubiquiti EdgeRouter X, Loading OpenWrt and performance numbers


#22

Placed a small note, in my original post, that all settings are default/unchanged. Everything is running stock 18.06.1, r7258-5eb055306f. Please check out the section "NEED MORE SPEED!! Enabling hardware flow control:" and let me know if I need to make more refinements.


#23

I have benchmarked my Dir-860L by hooking up a computer to WAN and another one to LAN. iperf was giving me 550-600 mbit with sqm on, and could even be pushed to 700 mbit with qos-simplest. Cake was roughly 550 mbit. His numbers seem to match mine.


#24

It seems shocking honestly, it's an 880 MHz MIPS core and 1200Mhz ARM cores struggle to do 300-400Mbps with SQM according to reports.


#25

CPU performance and I/O performance are sadly not the same, nor is the level of optimization or hardware offloading between different archs or even SOCs.

MIPS is basically a traditional workstation architecture (SGI), 'just' scaled down (for power consumption, costs, etc.) to router needs (and then forgotten for far too long), it has inherited a good I/O performance from there.

ARM on the other hand comes from the other end of the spectrum, low-power and low performance (more firmware than operating system) and was actively improved to gain performance, mostly with a focus on mobile (phone-) usage so far - a task where I/O performance doesn't matter that much and where features tend to get offloaded into dedicated IP blocks (which don't necessarily have FOSS driver support).


#26

SQM isn't I/O though, it's raw math operations and memory accesses I'd imagine. Certainly it tends to run out of CPU cycles at high rates of packet rather than becoming I/o wait bound


#27

Sorry that it took me a while to figure out how to setup sqm.

TLDR:
At 16 threads, SQM seem to max out around 200Mbps with minimal BufferBloat.
At 16 threads, Stock configuration seem to max out around 550Mbps with minimal BufferBloat.
At 16 threads, Hardware Acceleration configuration seem to max out around 650Mbps with bad buffer bloat. (DSL Reports)
Speedtest.net puts hardware acceleration at 900Mbps+

Here is the info and tests you were looking for:

16 threads http://www.dslreports.com/speedtest/44441321
24 threads http://www.dslreports.com/speedtest/44441245
root@OpenWrt:/etc# /etc/init.d/sqm stop
root@OpenWrt:/etc# /etc/init.d/sqm start
SQM: Starting SQM script: simple.qos on eth0.201, in: 950000 Kbps, out: 950000 Kbps
SQM: simple.qos was started on eth0.201 successfully

root@OpenWrt:/etc# cat /etc/config/sqm
config queue 'eth0'
        option enabled '1'
        option qdisc 'fq_codel'
        option script 'simple.qos'
        option qdisc_advanced '0'
        option linklayer 'none'
        option interface 'eth0.201'
        option download '950000'
        option upload '950000'
        option debug_logging '1'
        option verbosity '5'
16-threads http://www.dslreports.com/speedtest/44441443
root@OpenWrt:/etc# cat /etc/config/sqm
config queue 'eth0'
        option enabled '1'
        option script 'simple.qos'
        option qdisc_advanced '0'
        option linklayer 'none'
        option interface 'eth0.201'
        option download '950000'
        option upload '950000'
        option verbosity '5'
        option debug_logging '0'
        option qdisc 'cake'
16-threads http://www.dslreports.com/speedtest/44441521
root@OpenWrt:/etc# cat /etc/config/sqm
config queue 'eth0'
        option enabled '1'
        option qdisc_advanced '0'
        option linklayer 'none'
        option interface 'eth0.201'
        option download '950000'
        option upload '950000'
        option verbosity '5'
        option debug_logging '0'
        option qdisc 'cake'
        option script 'layer_cake.qos'
16-threads http://www.dslreports.com/speedtest/44441560
root@OpenWrt:/etc# cat /etc/config/sqm

config queue 'eth0'
        option enabled '1'
        option qdisc_advanced '0'
        option linklayer 'none'
        option interface 'eth0.201'
        option download '950000'
        option upload '950000'
        option verbosity '5'
        option debug_logging '0'
        option qdisc 'cake'
        option script 'simplest.qos'
OMG BAD!!!!
16-threads http://www.dslreports.com/speedtest/44441599
root@OpenWrt:/etc# cat /etc/config/sqm

config queue 'eth0'
        option enabled '1'
        option qdisc_advanced '0'
        option linklayer 'none'
        option interface 'eth0.201'
        option download '950000'
        option upload '950000'
        option verbosity '5'
        option debug_logging '0'
        option qdisc 'cake'
        option script 'simplest_tbf.qos'
16-threads http://www.dslreports.com/speedtest/44441633
root@OpenWrt:/etc# cat /etc/config/sqm

config queue 'eth0'
        option enabled '1'
        option qdisc_advanced '0'
        option linklayer 'none'
        option interface 'eth0.201'
        option download '950000'
        option upload '950000'
        option verbosity '5'
        option debug_logging '0'
        option qdisc 'cake'
        option script 'piece_of_cake.qos'

Now running the "stock" OpenWRT with out any SQM

16-threads, NO hardware acceleration, NO SQM
http://www.dslreports.com/speedtest/44441698
16-threads, WITH hardware acceleration, NO SQM
http://www.dslreports.com/speedtest/44441763

My take away? If you are an internet connection that is 600Mbps, or less, just run the stock configuration and don't bother with hardware acceleration.

I would love to see someone, with more skills than I, start up a performance tuning thread for the ERX.


#28

Awesome, that was extremely thorough information, thanks, and it confirms my gut instinct that cake was going to do just something like 200Mbps, which it did. Here's the clicky-link to the piece_of_cake result: http://www.dslreports.com/speedtest/44441633

Since it was set to 950Mbps, cake is being throttled by its own ability to do calculations on the packets. If you set it to something like 200Mbps speeds it will have even better bufferbloat performance. For someone doing VOIP or games, the variation between 20 and 70ms ping times is going to be noticeable in terms of variable hitreg or garbled glitchy audio.


#29

These numbers make a lot more sense. Thanks for running through all of these permutations. I would update the original post so there is no confusion though.

I'm not sure your final conclusion about the stock configuration is correct though. My thoughts:

  • Seems like if your connection is 200mbps or less, then SQM is probably still a good idea (less variability in latency) since the CPU can handle it.
  • If your connection is in the 200-600mbps range, I'm not sure your test is enough for those users. You are not saturating your connection, so your buffers/queues are not full on the router. This may keep bloat low in this scenario, but users with speeds in this range should perform their own tests.
  • For the higher speeds, I think this is not an issue with HW acceleration per se - you are simply getting closer to saturating your connection, which inherently results in more saturated queues (and therefore bloat). Since HW accel is not currently compatible with SQM, that can't save you.

I'm curious what CPUs are fast enough to handle near gigabit WAN connections with SQM (cake or fq_codel).


#30

x86 celerons and up do it. I don't think anything less than an x86 can reliably do shaping on 500Mbps keeping buffers acceptably fast for VOIP.


#31

Any particular model? I feel like the "celeron" label has been applied to so many completely different CPUs over the years. Any nice compact/embedded/fanless packages with 2-5 ethernet ports?

I'm also curious if the PC Engines APU2 boards would be fast enough. Quad core AMD Jaguar @ 1 GHz. Currently about $125 with 3 ethernet ports, 2GB RAM, and a case. Cheaper than a high-end consumer router.


#32

pretty sure @jeff uses the Apu2 boards and yes they'll do it or at least get close to 1Gbps. If I were looking for x86 boards I'd look for AES-NI, anything with it is going to be fast enough for 1Gbps shaping.


#33

Again, that you all for the great feed back. Here are a few more tests thanks to Drawz and Dlakelan. Just playing with the Download/Upload options.

16 - threads, ~10% CPU idle, http://www.dslreports.com/speedtest/44442429
root@OpenWrt:/etc# cat /etc/config/sqm

config queue 'eth0'
        option qdisc_advanced '0'
        option linklayer 'none'
        option interface 'eth0.201'
        option verbosity '5'
        option debug_logging '0'
        option qdisc 'cake'
        option script 'piece_of_cake.qos'
        option enabled '1'
        option download '500000'
        option upload '500000'
16 - threads, ~10% CPU idle, http://www.dslreports.com/speedtest/44442489

root@OpenWrt:/etc# cat /etc/config/sqm

config queue 'eth0'
        option qdisc_advanced '0'
        option linklayer 'none'
        option interface 'eth0.201'
        option verbosity '5'
        option debug_logging '0'
        option qdisc 'cake'
        option script 'piece_of_cake.qos'
        option enabled '1'
        option download '250000'
        option upload '250000'
16 - threads, ~10% CPU idle.
http://www.dslreports.com/speedtest/44442516
http://www.dslreports.com/speedtest/44442540
http://www.dslreports.com/speedtest/44442568

root@OpenWrt:/etc# cat /etc/config/sqm

config queue 'eth0'
        option qdisc_advanced '0'
        option linklayer 'none'
        option interface 'eth0.201'
        option verbosity '5'
        option debug_logging '0'
        option qdisc 'cake'
        option script 'piece_of_cake.qos'
        option enabled '1'
        option download '100000'
        option upload '100000'

#34

Thanks! APU2 does AES-NI, so I may give that a go.


#35

Obviously it's not the AES-NI that helps with shaping itself, but rather the fact that the device is recent enough to have it. AES-NI does in fact help with VPN of course.


#36

Hmm, a little strange that all those tests had similar speeds, are you sure you're disabling hw offload and saving/reloading the sqm instance? because you're getting like 500Mbps even when you sent 100Mbps speeds?


#37

Ah I was wondering about that. Makes sense. Thanks again!


#38

Double checked the hardware acceleration is off. Check my configs. All looked good. Same results. Then gave it a reboot. BIG difference:
100Mbps http://www.dslreports.com/speedtest/44442792
200Mbps http://www.dslreports.com/speedtest/44442865
250Mbps http://www.dslreports.com/speedtest/44442834


#39

Awesome, that confirms, about 200Mbps with proper bufferbloat control!


#40

Also updated the need for speed section of my original post.


#41

Going to toss out a hypothesis. "Hardware flow offloading" (HFO) is better then SQM, in all cases, on the ERX.

My logic. SQM maxes out at 200Mbps. If we compare HFO @ 200Mbps, I would argue buffer bloat is as good, or better, than what SQM can offer.

Problem. I don't know how to test that one. I can hard limit my connection down to 100Mbps, but I would like to test this at all level and see where buffer bloat starts to become an issue. Can any one test this, or give me a "how to" on rate limiting my incoming WAN connection?