SQM - BB vs LEDE - major diff in performance

JonFo · February 6, 2018, 9:28pm

This is about a pretty big performance difference between SQM on the older BarrierBreaker build vs LEDE 17.0.4. Both share the exact same sqm config file.

On a fast line (150Mbps), there is pretty serious cut to top speed while QoS is on with the latest LEDE/SQM scripts.

I'm hoping @moeller0 can give me some feedback based on these metrics.

The hardware is a TP-Link WDR3600. Using SQM scripts, simple.qos.

On BB, the Qos and line speed are pretty close. With QoS off, we see 150Mbps, with QoS on, we get in the mid 140s.

On with latest LEDE build from gwlim
https://github.com/gwlim/Fast-Path-LEDE-OpenWRT (we also tried the normal build, same results).

With SQM off:
6 netperf streams on download to same netperf server: 141 Mbps
sirq 24%

With SQM on:
6 netperf streams on download to same netperf server: 69
sirq 30%

6 netperf streams on download to 6 different netperf servers: 68
sirq 31%

Since moeller0 asks for these, here are the qdisc info and the sqm config (from the LEDE run):

cat /etc/config/sqm

config queue 'eth1'
option upload '10000'
option qdisc 'fq_codel'
option script 'simple.qos'
option qdisc_advanced '0'
option linklayer 'none'
option interface 'eth0'
option download '150000'
option debug_logging '0'
option verbosity '5'
option enabled '1'

tc -d qdisc

qdisc noqueue 0: dev lo root refcnt 2
qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 12 direct_packets_stat 0 ver 3.17 direct_qlen 1000
qdisc fq_codel 110: dev eth0 parent 1:11 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 120: dev eth0 parent 1:12 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 130: dev eth0 parent 1:13 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn
qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------
qdisc noqueue 0: dev br-lan root refcnt 2
qdisc noqueue 0: dev eth0.1 root refcnt 2
qdisc noqueue 0: dev eth0.2 root refcnt 2
qdisc noqueue 0: dev wlan0 root refcnt 2
qdisc noqueue 0: dev wlan1 root refcnt 2
qdisc htb 1: dev ifb4eth0 root refcnt 2 r2q 10 default 10 direct_packets_stat 0 ver 3.17 direct_qlen 32
qdisc fq_codel 110: dev ifb4eth0 parent 1:10 limit 1001p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn

tc -s qdisc

qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 12 direct_packets_stat 0 direct_qlen 1000
Sent 8509129 bytes 106664 pkt (dropped 0, overlimits 1720 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 110: dev eth0 parent 1:11 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 120: dev eth0 parent 1:12 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn
Sent 8509129 bytes 106664 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 1514 drop_overlimit 0 new_flow_count 28333 ecn_mark 0
new_flows_len 0 old_flows_len 2
qdisc fq_codel 130: dev eth0 parent 1:13 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------
Sent 281859908 bytes 190609 pkt (dropped 1, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc noqueue 0: dev br-lan root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.1 root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.2 root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan1 root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc htb 1: dev ifb4eth0 root refcnt 2 r2q 10 default 10 direct_packets_stat 0 direct_qlen 32
Sent 283312692 bytes 189806 pkt (dropped 803, overlimits 151278 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 110: dev ifb4eth0 parent 1:10 limit 1001p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 283312692 bytes 189806 pkt (dropped 803, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 1514 drop_overlimit 448 new_flow_count 12513 ecn_mark 0
new_flows_len 0 old_flows_len 2

Thanks in advance for any help and insights.

tapper · February 6, 2018, 10:16pm

Hi there have you tried setting up the sqm from scratch as there mite of bin a syntax change in the file. Rename you sqm file to sqm.back and then try with the stock file. also Did you save settings from the BB build you were on because if you did one of the config files mite be wrong throwing off your tests.

moeller0 · February 7, 2018, 7:39am

This seems to be an outlier, typically on that hardware sqm's shapers top out at around 60 to 70 mbps (combined uplink and downlink). simple.qos/htb/fq_codel will keep the latency low in that condition, while piece_of_cake.qos/cake will allo the latency under load to increase a bit more regaining a bit more bandwidth, but no matter what we where never able to reliably shape at 140 Mbps on that hardware.,, Sorry, if this is not the answer you are looking for.

These sirq numbers look really low, how did you measure? (I recommend to log into the router during a speedtest and run "top -d 1" and keep looking at the idle and sirq numbers, whenever idle gets close to zero your router is out of CPU cycles...)

Thanks for the other information, you are right in that I really want to see that; but I can not spot any issues with these files at all.

Best Regards

JonFo · February 7, 2018, 12:29pm

On stock BB builds that is true, but we use the GWLIM optimized BB build and it has great throughput. So those stats are achievable on that hardware/software combo.

Those were indeed measured by ssh console running top. Again they could be low due to the optimized build.

Thanks for reviewing the posted config and stats. We did a lot of reading and reviewing before posting, but my friend and I are puzzled and wondered if there was a significant change in SQM (or HTB) that would explain this.

My friend also has a C7v2, so he tested that on his 250Mbps line. Will post results of that in a bit. But still same performance delta with QoS on.

JonFo · February 7, 2018, 1:48pm

So here are the results with a C7v2 on a 250Mbps cable line.

Interestingly, the C7v2 gets more throughput than 3600 with SQM off but less throughput than 3600 with SQM on.

30 streams for 20 seconds to load line

239 Mbps Total with QoS off

/etc/config/sqm

config queue 'eth1'
option qdisc 'fq_codel'
option linklayer 'none'
option qdisc_advanced '1'
option squash_dscp '1'
option squash_ingress '1'
option ingress_ecn 'ECN'
option egress_ecn 'NOECN'
option etarget 'auto'
option script 'simple.qos'
option interface 'eth0'
option upload '10000'
option enabled '1'
option download '239000'

stop/start SQM

tc -s qdisc

qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 12 direct_packets_stat 0 direct_qlen 1000
Sent 1074 bytes 12 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 110: dev eth0 parent 1:11 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 120: dev eth0 parent 1:12 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms
Sent 1074 bytes 12 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 159 drop_overlimit 0 new_flow_count 8 ecn_mark 0
new_flows_len 0 old_flows_len 1
qdisc fq_codel 130: dev eth0 parent 1:13 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------
Sent 656 bytes 8 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 1390126 bytes 3154 pkt (dropped 0, overlimits 0 requeues 3)
backlog 0b 0p requeues 3
maxpacket 542 drop_overlimit 0 new_flow_count 7 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan1 root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc htb 1: dev ifb4eth0 root refcnt 2 r2q 10 default 10 direct_packets_stat 0 direct_qlen 32
Sent 878 bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 110: dev ifb4eth0 parent 1:10 limit 1001p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 878 bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 145 drop_overlimit 0 new_flow_count 7 ecn_mark 0
new_flows_len 1 old_flows_len 0

30 streams for 20 seconds to load line

98 Mbps Total with QoS on

Post SQM test results

tc -s qdisc

qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 12 direct_packets_stat 0 direct_qlen 1000
Sent 6939184 bytes 97579 pkt (dropped 0, overlimits 289 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 110: dev eth0 parent 1:11 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms
Sent 1865 bytes 19 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 157 drop_overlimit 0 new_flow_count 19 ecn_mark 0
new_flows_len 1 old_flows_len 0
qdisc fq_codel 120: dev eth0 parent 1:12 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms
Sent 6937319 bytes 97560 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 1514 drop_overlimit 0 new_flow_count 55729 ecn_mark 0
new_flows_len 0 old_flows_len 1
qdisc fq_codel 130: dev eth0 parent 1:13 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------
Sent 261763953 bytes 175045 pkt (dropped 1, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 1526595 bytes 3439 pkt (dropped 0, overlimits 0 requeues 3)
backlog 0b 0p requeues 3
maxpacket 542 drop_overlimit 0 new_flow_count 7 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan1 root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc htb 1: dev ifb4eth0 root refcnt 2 r2q 10 default 10 direct_packets_stat 0 direct_qlen 32
Sent 261672753 bytes 173367 pkt (dropped 1680, overlimits 122368 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 110: dev ifb4eth0 parent 1:10 limit 1001p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 261672753 bytes 173367 pkt (dropped 1680, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 1514 drop_overlimit 647 new_flow_count 36204 ecn_mark 0
new_flows_len 0 old_flows_len 2

tc -d qdisc

qdisc noqueue 0: dev lo root refcnt 2
qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 12 direct_packets_stat 0 ver 3.17 direct_qlen 1000
qdisc fq_codel 110: dev eth0 parent 1:11 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms
qdisc fq_codel 120: dev eth0 parent 1:12 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms
qdisc fq_codel 130: dev eth0 parent 1:13 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms
qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc noqueue 0: dev br-lan root refcnt 2
qdisc noqueue 0: dev wlan1 root refcnt 2
qdisc htb 1: dev ifb4eth0 root refcnt 2 r2q 10 default 10 direct_packets_stat 0 ver 3.17 direct_qlen 32
qdisc fq_codel 110: dev ifb4eth0 parent 1:10 limit 1001p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn

So it looks like there is shelf limit on the throughput with QoS on in LEDE and we can't load the line with QoS set to line speed on high speed lines. If it was due to processing power you would think the C7v2 would get more speed than the 3600. So, it looks like SQM is doing something to prevent loading the line on high speed connections. If I drop QoS below 100 Mbps the netperf load tests get within 10% of QoS setting like expected. As I set over 100 Mbps I get very little upside on my actual throughput.

So, setting QoS to 50 Mbps downoad I get 45 Mbps throughput.

Going to QoS of 100 Mbps download I get 85 Mbps throughput.

From there I get very little improvement as I increase QoS.

QoS of 125 Mpbs only improves to 89 Mpbs throughput.

QoS of 200 Mbps only improves to 94 Mbps throughput.

It does not look like it runs out of CPU or sirq capacity either, something limits the flow above a certain point. Could it be because we are running the netperf process in the router itself and SQM limits locally sourced traffic?

moeller0 · February 7, 2018, 3:41pm

Okay, fair enough, now id gwlim's lede builds do not carry the same patches that could be a reason for the difference. I also note that the qualcom fast-path is mutually exclusive with sqm.

They could be or could be not, the relevant number is idle, it is just that typically most of the percentage missing in idle is soaked up be sirq...

Not as far as I can tell, sure lede sports a rather more modern kernel than BB but the shaping capability for normal builds seems to be rather constant over time.

Which has more or less the same CPU as you wdr3600 just at 720 instead of 560 MHz, so I would naively expect it hitting a tad more than 60-70 (unless it also suffers from too low memory bandwidth).

dlakelan · February 7, 2018, 3:51pm

This seems like a good candidate for the difference between with and without shaper. Perhaps if no qdisc is on, this optimized build automatically uses the fast-path stuff?

I've been harping on it a lot recently, but if you have 100-250mbit connection you should run an x86 as your router, let the LEDE device be a really nice AP.

moeller0 · February 7, 2018, 4:09pm

okay that seems to be in agreement with the cpu frequencies, no? (720/560*69 = 88.71) So far as expected, even though not as desired.

Well once you run out of CPU cycles, simple.qos/HTB/fq_codel will simply stop giving you more bandwidth...

Here I disagree, the performance of the C7v2 of 98 Mbps seems to be in accordance with the wdr3600 result.

That is a bit awkward to put it, but yes traffic shaping comes at a relative high CPU cost and will show the limits of CPUs.

As stated above the selected qos script (simple.qos using HTB) and qdisc (fq_codel) will absolutely honor your latency requests, if you try layer)cake.qos/cake you should see that you get a bit more bandwidth out of your link (but at increased latency under load), but this is probably not ging to help much...

One question that is important is always on what osi-layer are you measuring the throughput. The shaper shapes gross rates it will, in your case, assume 14 bytes of per packet overhead (which on your docsis link is wrong, you should account for 18 bytes, but that is not your current issue, so please ignore).

So HTB has a few toggles which allow to make it a bit less expensive (at a latency cost) and we try to automatically adjust those with increasing bandwidth (as the latency cost of increased batching decreases with bandwidth).

I am happy to interpret this as a data point after you confirm that in the "top -d 1" test idle stayed well above 0% (say never dropped below 10% to allow for the imprecise sampling).

Oops, yes these processes will require cpu cycles as well, so I would advise against that unless you really want to benchmark your router's capability to also serve data in addition to its bread and butter duty of "just" routing.

Best Regards

moeller0 · February 7, 2018, 4:12pm

Sure, but at that speed I guess a number of the more recent ARM based routers will also be totally sufficient (these router's often use way less electricity than an x86, but since I only ever look at full desktop x86, I might have an unrealistic high expectation of x86 energy requirements).

dlakelan · February 7, 2018, 4:24pm

There are a large number of "network appliances" out of China these days that use on the order of 10-15 watts, are compact, no moving parts, and will blow even the ARM routers out of the water. Of course they have no WiFi, but they also have no gotchas. You can't brick them, they will do all the routing and shaping you ever want, they will have 2-4 gigabytes of RAM, and 20-60 Gigs of SSD storage, and they will still have CPU cycles left for stuff like an NFS server or the like. Plus, even with a simple AP to provide wifi, they tend to cost about the same as high end routers.

A few years back, you'd be looking at a full x86 mini tower or something and it'd be 60 to 150 watts, and cost you $600. Today that's not the case, and with consumer bandwidth increasingly large and cost of x86 plummeting ... the case for the consumer router (as opposed to x86 appliance router and consumer AP) is quickly disappearing.

At the low end of cost for example there's this:

https://www.amazon.com/XCY-Celeron-2-41GHz-Supporting-Ethernet/dp/B0719L1VFK

and a simple AP:

https://www.amazon.com/TP-Link-AC1200-Wireless-Wi-Fi-Access/dp/B01LLAK1UG

compared to say an ARM based router like: https://www.amazon.com/Wireless-StreamBoost-Beamforming-Antennas-NBG6817/dp/B01I4223HS

Price might be a little higher, but performance of the router appliance, and support for the hardware in the kernel and un-brickability, and not having to deal with factory lock-outs and soforth is probably worth the ~ $50 or whatever for most people who have 200 mbit connections.

Since an AP isn't necessarily a security critical thing you may even be fine with sticking with the factory firmware, plus the mounting options are better so you can probably get better signal.

All this tells me that in my opinion, LEDE should focus on getting itself onto more APs and also focus on some kind of software for controlling multiple APs from a single configuration.

EDIT: another thing to consider is the consolidation options. People who get 200mbit+ connections probably want a NAS or media server or something or other like that, and so buying a slightly higher horsepower x86 router than that one and combining the two functions will give better performance and better cost and electricity savings than buying an ARM router and a separate NAS box.

dlakelan · February 7, 2018, 4:45pm

As a data point, my closet has a J1900 based mobo mini itx router/NAS with a USB enclosure running 4 spinning drives, and a 24 port smart switch and the customer router for my ATT fiber connection all on a UPS that report 72 watts total. That costs me about $75/year in electricity. Each drive is rated between 8 and 15 watts, so about 32 watts of that 72 is probably the drives. The router box itself is likely around 10 to 20 watts.

The zyxel armor power supply is rated 12 volts 3.5A on wikidevi and is probably over-provisioned but still we're talking... 20 watts or something.

dlakelan · February 7, 2018, 6:11pm

Also looks quite interesting. I haven't used it but it's a good price point for a combined Router/NAS and has 2 NICs (no idea if they're Intel) and specs say it supports Ubuntu which also probably means Debian.

JonFo · February 7, 2018, 6:39pm

Ah, yes, it performs a bit better than just the freq change would suggest, but agreed, it's not a huge step up.

I believe you are correct, the LEDE version of the custom build will use fast-path if no qdisc is set on a given interface. But the old BB version was pre-fastpath IIRC.
We also tested the 17.01.4 normal build and it had the same QoS-on results on the C7.

I agree CPU is quite possibly the limiting factor here, but the huge delta relative to the BB based tests on same model HW made us suspect something else.
Idle did not raise alarms but we were focused on sirq, so will re-test and capture the exact values.

Understood, and to validate, we ran DSLreports tests from a wired PC, and results correlated with the netperf local tests, so it does not seem the extra in-router process was the limiting factor.

I appreciate all the feedback on this topic.

JonFo · February 7, 2018, 6:53pm

Very Interesting stuff, pricing and form factors have indeed improved a lot. Thanks for sharing that.

The AP linked, and others in that family do make great AP's. For one, because unlike Ubiquity, they use standard POE and one can directly mange them by logging into their local web UI; no need for a central console if all you have is one or two APs on factory firmware. I have two myself and like them.

Cool, and up to what speed can you shape with QoS?

moeller0 · February 7, 2018, 7:13pm

I seem to recall that @gwlim was a proponent of overclocking router CPUs (and in a sane fashion including stability tests); maybe the BB build you used was running the CPU at > 560 MHz?

So, I initially was distracted by the fact that idle goes down while the other go up under load, so my casual observations did not finger idle to be special until I started to pay attention to the details an realized that 0% idle actually does translate into the router has no CPU cycles to spare. That in itself is not so bad, but for that fact that "the router has actually far fewer CPU cycles available than it desires" has the exact same 0% idle "phenotype".

Ah, that is valuable information, in my limited tests I think saw some effect of running netperf on the router itself, but I did not actually research that any deeper after realizing that this was not testing what I intended to test, so thanks for the additional data point here.

I am somewhat sorry, that I will not be able to really help, that is maybe https://forum.openwrt.org/t/overclocking-router-devices/1298 has some pointer on how to overclock your wdr3600 as that might give you just enough additional cpu cycles to make sqm work at your bandwidth.

Best Regards

moeller0 · February 7, 2018, 7:19pm

So I mostly agree with you, and many thanks for the information about recent prices and power consumption of x86, but with the consolidation argument I have some discomfort. My gut-feeling is that it probably is a good idea to have the entry point to one' own network not do too much important stuff in addition to its core duties, I simply think that not putting all my eggs into one, exposed to the internet, basket is all that cautious (especially since I tend to opt for convenience over strict security quite often, so I am not ready to vouch for the security of my main router).

dlakelan · February 7, 2018, 7:20pm

I have 3 NICS bonded into a smart switch, I shape a gigabit fiber with a custom HFSC setup and fq_codel. It's a little hard to measure because in fact most speed tests can't measure a gigabit connection, and also because I use a squid proxy and an ipv6 only LAN with Tayga for NAT64... but I regularly get 500 mbit + from dslreports with minimal bufferbloat. Because of the proxy the user CPU usage can be high. If I just had it routing and not doing 64 translation it would shape the full gigabit fine (it did that before I changed things for various reasons).

My concern isn't really raw speed (because the gigabit is kinda fake anyway, it depends on how much the neighborhood is really using) but I really want reasonably high speeds while remaining absolutely perfect with up to 3 or 4 VOIP calls at once even while multiple devices are streaming videos or I'm downloading packages to upgrade my Debian laptop or whatnot. SO far, so good.

dlakelan · February 7, 2018, 7:25pm

This is a good point. I do have some consolidation, but I also run additional firewalls on my laptop, desktop, HP printer, and I set randomly generated passwords on all my management interfaces, and have offsite backups of all my files in a safe deposit box I update every 3 months or so and an onsite backup I update weekly. My feeling is that assuming the router is the security moat that none shall pass is probably not the best security scenario.

To each their own of course, but there's a good argument to be made that in the future x86 boxes will continue to get cheaper or stay similar priced with more power, and so even unconsolidated, it starts to make good sense.

moeller0 · February 7, 2018, 7:53pm

Ah, sorry I should have said "all my eggs" as I was really only talking about my personal comfort knowing the security compromises I accepts for convenience (since my file server offers less services it is easier to keep the number of compromises low ). I fully trust that there are folk, you included, who are very much on top of keeping things secure.

And I fully subscribe to that idea, and I add typically a moat was followed by a wall so security is certainly a bit like an onion.

I guess that will also show in OTC routers. For example I fully expect lantiq SoCs to be x86 based any time soon (unless they already ship these); noe grant you it will be atoms and not the real beefy x86 cores but still they should run circles around say the OP's almost 10years? old 560MHz mips cpu...

lantis1008 · February 7, 2018, 8:06pm

Just throwing my 2c in the ring, this test should be repeated with a CC build.
From what we’ve observed at Gargoyle there has been a drop in routing speed under normal operation between the large releases (i.e. kernel jumps). The biggest one was BB->CC when a con track caching option was dropped (and maybe patched back in?? Can’t remember).

In my experience this datapoint will be somewhere between the two existing tests.