Need help with configuring SQM

r43k3n · March 16, 2017, 8:55pm

I'm trying to configure SQM to best utilise my internet connection. I'm running UPC 120/10Mb in Poland, real speeds are 117/9,8Mb. It's a cable connection with modem/router setup as bridge and WR1043NDv4 connected to it.

Would you suggest any changes to the config I'm using now?

config queue 'eth1'
	option enabled '1'
	option interface 'eth0.2'
	option download '111150'
	option upload '9500'
	option debug_logging '0'
	option verbosity '5'
	option qdisc 'cake'
	option script 'layer_cake.qos'
	option qdisc_advanced '1'
	option squash_dscp '1'
	option squash_ingress '1'
	option ingress_ecn 'ECN'
	option egress_ecn 'NOECN'
	option qdisc_really_really_advanced '1'
	option iqdisc_opts 'nat dual-dsthost diffserv4 rtt 300ms'
	option eqdisc_opts 'nat dual-srchost diffserv4 rtt 300ms'
	option linklayer 'ethernet'
	option overhead '28'
	option linklayer_advanced '1'
	option tcMTU '2047'
	option tcTSIZE '128'
	option tcMPU '0'
	option linklayer_adaptation_mechanism 'cake'

I have a few questions though:

What are the differences between diffserv4 and diffserv8? Are they more or less CPU intensive compered to each other?
I read somewhere here that for cable connection it's best to set rtt to 200ms or 300ms. I have to say that with 300ms I see better speeds. Lower then 100ms (which I think is default) causes speed degradation. Higher then 300ms doesn't seem to have any impact on connection. My question is how do I actually measure this?

Without these:

 option linklayer_advanced '1'
 option tcMTU '2047'
 option tcTSIZE '128'
 option tcMPU '0'
 option linklayer_adaptation_mechanism 'cake'

those options are not even working. Is this normal or a bug?

option linklayer 'ethernet'
option overhead '28'

As you can see tcMTU, tcTSIZE, tcMPU are at default values. Should I adjust them in anyway?
For linklayer_adaptation_mechanism is choose cake which seems to be best for everything but are other options any better? I didn't really tested this one.

Result of tc -s qdisc:

root@WR1043NDv4_LEDE:~# tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
 Sent 2268783573 bytes 3413521 pkt (dropped 451, overlimits 0 requeues 376)
 backlog 0b 0p requeues 376
  maxpacket 1514 drop_overlimit 0 new_flow_count 7499 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.1 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 8007: dev eth0.2 root refcnt 2 bandwidth 9500Kbit diffserv4 dual-srchost nat rtt 300.0ms noatm overhead 28 via-ethernet
 Sent 1945239803 bytes 1968211 pkt (dropped 27072, overlimits 3451678 requeues 0)
 backlog 37328b 25p requeues 0
 memory used: 591616b of 4Mb
 capacity estimate: 9500Kbit
                 Bulk   Best Effort      Video       Voice
  thresh     593744bit    9500Kbit    4750Kbit    2375Kbit
  target        30.7ms      15.0ms      15.0ms      15.0ms
  interval     315.7ms     300.0ms     300.0ms     300.0ms
  pk_delay         0us      81.0ms       3.0ms       990us
  av_delay         0us      25.9ms       812us       514us
  sp_delay         0us       1.1ms       217us        21us
  pkts               0     1991764         895        2649
  bytes              0  1983461781      782398      344012
  way_inds           0      154852           0          11
  way_miss           0       95266         133         411
  way_cols           0           0           0           0
  drops              0       27072           0           0
  marks              0           0           0           0
  sp_flows           0           5           0           0
  bk_flows           0          10           0           0
  un_flows           0           0           0           0
  max_len            0        1514        1294         479

qdisc ingress ffff: dev eth0.2 parent ffff:fff1 ----------------
 Sent 1229111738 bytes 2117996 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 8008: dev ifb4eth0.2 root refcnt 2 bandwidth 111150Kbit diffserv4 dual-dsthost nat wash rtt 300.0ms noatm overhead 28 via-ethernet
 Sent 1258275636 bytes 2117669 pkt (dropped 327, overlimits 744420 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 1299520b of 15140Kb
 capacity estimate: 111150Kbit
                 Bulk   Best Effort      Video       Voice
  thresh      6946Kbit  111150Kbit   55575Kbit   27787Kbit
  target        15.0ms      15.0ms      15.0ms      15.0ms
  interval     300.0ms     300.0ms     300.0ms     300.0ms
  pk_delay        88us        29us        27us         9us
  av_delay         3us        13us        13us         4us
  sp_delay         3us         5us         7us         2us
  pkts              51     2077548        2070       38327
  bytes           5249  1256182981      275663     2299775
  way_inds           0      131431           0           0
  way_miss          19      126520         536           3
  way_cols           0           0           0           0
  drops              0         327           0           0
  marks              0           0           0           0
  sp_flows           0           1           0           0
  bk_flows           0           1           0           0
  un_flows           0           0           0           0
  max_len          402        1514        1169         185

qdisc fq_codel 0: dev tun0 root refcnt 2 limit 10240p flows 1024 quantum 1500 target 5.0ms interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0

Best Regards

moeller0 · March 16, 2017, 10:50pm

Well diffserv8 has 8 different priorety tiers, diffserv4 only 4 (giess how many diffserv3 offers ), besteffort only has one tier. I venture the guess that if you need to ask you probably want diffserv3/4 at best (these put BK1 in the background tier and typical VoIP markings into the "express" tier). If you do not actively use DSCP markings you might be able to save a few CPU cycles by going to besteffort, if you have CPU issues, otherwise no need to bother.

This value basically defines the time window your shaper will give the endpoints to react to shaping signals (drops or ECN), the larger the more bandwidth will be utilized, at the cost of additional bandwidth. The rule of thumb is to set RTT to the RTT you typically encounter to the servers you work with mostly. In that light unless your servers are network wise far away 300ms seems a bit [quote="r43k3n, post:1, topic:2388"]
3. Without these:

option linklayer_advanced '1'
option tcMTU '2047'
option tcTSIZE '128'
option tcMPU '0'
option linklayer_adaptation_mechanism 'cake'

those options are not even working. Is this normal or a bug?
[/quote]

Well, for cake only the last one is operational, tcMTU, tcSIZE have no meaning, tcMPU will at one point become relevant for cake as well..., but as the GUI notices, you typically do not need to change those.

Nope not at the moment, for simple.qos with fq_codel I would recommend to set tcMPU to 64 (this might become the new default at one point).

Well, htp_private only works for HTB, tc_stab works for both, and cake only for cake. For your setup cake seems the best fit, as that will automatically try to account for any header overhead the kernel automatically adds (for sinple.qos with fq_codel you probably would need to set overhead to 28-14).

I hope this helps....

r43k3n · March 16, 2017, 11:09pm

English is not my native language and I'm having troubles with understanding what you wrote. Can you rephrase the quoted section, please?

I had tcMPU set as 64 and I had serious bufferbloat issues over WiFi. I reverted this to 0 and I'm testing now. If the problem will persist I will update this thread.

Yes, this helps a lot. Thank you very much.

moeller0 · March 17, 2017, 10:42am

Let me try: Use ping to measure the round trip time (rtt), or the time it takes for a packet from your computer to reach the remote server and for the response packet fro that server to reach your computer again. Repeat this for the servers you typically connect to and look at the ping/ICMP RTTs you get (and don't be amazed if you see much shorter RTTs to some than expected, that is the service CDNs (content delivery networks supply)). Ideally you use this value as basis to select the RTT to use. It turns out that 100ms is a pretty decent compromise that works for many people, assuming that your packets do not always needto cross long distances. If you are based in Europe and access data in California I would assume 200-300ms to be a better value. But please note that experience indicates, that the RTT value only needs to be in the right order of magnitude and not exactly precise. The general trade off is higher RTTs cause higher bandwidth utilization at the cost of increased latency under load (or rather longer settling times).
If this is still to unintelligible may I respectfully point you to google translate?

This should not cause bufferbloat, effectively it will better account for the real size that small packets require. That said Wifi is a completely different kettle of fish that is not well controlled by sqm-scripts. The variable bandwidth aspect of wifi makes a fixed shaper like sqm a bad match. But your router should be covered by the make-wifi-fast improvements that are art of LEDE builds...

Best Regards

r43k3n · March 17, 2017, 1:46pm

Like I said 300ms seems to work best for me. I measured the pings with sqm turned off and bufferbloat in full effect and I got pings at 200ms-350ms with 250ms-300ms on average. What's why I set 300ms as rtt and because I read somewhere here that for cable internet connections it's a wise choice. With SQM enabled I got 20ms-something but the performance degradation was significant when using 20ms as rtt value.

My English is not that bad that I need to use Google Translate (I would actually say it's on a decent level) but there are some situations where I'm having troubles with understanding meaning behind some sentences, that's why I ask you to rephrase the quoted section. I hope this wasn't too much trouble.

That's what I thought but right now it seems to have fixed the issue for me. I'm not 100% sure but unless someone will complain again about high pings or bufferbloat I will consider setting tcMPU to 0 as a solution. At first I though there was a problem with rtt being set too high but like I said I didn't saw any differences in using lower values then 100ms.

Ethernet ports are working great, no issues there. WiFi is the only interface I noticed the problem with bufferbloat. Like I said before I will keep an eye on the bufferbloat.

I think it does. The WiFi performances seems to be good but I live in 12 storey high-rise with multiples 2.4Ghz WiFi networks around me so I guess this might contribute to bufferbloat effect.

Best Regards

moeller0 · March 17, 2017, 2:50pm

Well you should measure the RTT for sqm while your network is not loaded. Most cable networks are actually quite well connected so the default RTT of 100ms should work well. You can easily tests this though: let's assume you often connect to www.ucla.edu (to take an address in Los Angeles, California that is quite far away) simply run (on your router):
ping -c 20 www.ucla.edu

root@router:~# ping -c 10 www.ucla.edu
PING www.ucla.edu (2607:f010:2e8:228::ff:fe00:152): 56 data bytes
64 bytes from 2607:f010:2e8:228::ff:fe00:152: seq=0 ttl=52 time=190.560 ms
64 bytes from 2607:f010:2e8:228::ff:fe00:152: seq=1 ttl=52 time=200.078 ms
64 bytes from 2607:f010:2e8:228::ff:fe00:152: seq=2 ttl=52 time=189.315 ms
64 bytes from 2607:f010:2e8:228::ff:fe00:152: seq=3 ttl=52 time=188.061 ms
64 bytes from 2607:f010:2e8:228::ff:fe00:152: seq=4 ttl=52 time=188.003 ms
64 bytes from 2607:f010:2e8:228::ff:fe00:152: seq=5 ttl=52 time=187.971 ms
64 bytes from 2607:f010:2e8:228::ff:fe00:152: seq=6 ttl=52 time=189.390 ms
64 bytes from 2607:f010:2e8:228::ff:fe00:152: seq=7 ttl=52 time=194.109 ms
64 bytes from 2607:f010:2e8:228::ff:fe00:152: seq=8 ttl=52 time=271.057 ms
64 bytes from 2607:f010:2e8:228::ff:fe00:152: seq=9 ttl=52 time=255.538 ms

--- www.ucla.edu ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 187.971/205.408/271.057 ms

Which gives an effective RTT of 200ms (as above RTT does not need to be precise just in the right ball park), so if you would often connect to that system and also transfer lots of data you could set cake's RTT parameter to 200 or 250.

traceroute www.ucla.edu

if you have enough room you could also install mtr on the router (opkg update ; opkg install mtr) and then run:
mtr www.ucla.edu
This will give you and online updated version of whet traceroute shows with additionally remembering the max and min values per intermediary hosts. The relevant instruction is figuring out where most of your traffic goes to and what the minimal RTT to that target is.
Again setting RTT too high will increase the latency under load (aka the bufferbloat) while increasing bandwidth utilisation. The default 100ms often works well, if in your case 300 works better than 100 by all means select 300, but if 300 works better than 300 that would not be an indicator to switch to 300 unless you also tested intermediary values. Put differently, I would recommend to set the RTT rather on the low than the large side, otherwise you will see more bufferbloat. Or rather leave it at 100ms...
Setting RTT to 300ms based on vague recollections of recommendations found somewhere seems less satisfactory than simply trying a few values and selecting something based on real experienced behaviour, no?

The google translate thing was not about troubling me, it is simply I tend to think complicated and write so as well; meaning if I could have said it simpler I would have tried...

Well, at least try to repeat the experiment a few times, if tcMPU 64 makes things worse with layer_cake or piece_of_cake and option linklayer_adaptation_mechanism 'cake' I will eat a broom stick. As I mentioned in that case tcMPU is not used by sqm-scripts. If you made that oberservation with option linklayer_adaptation_mechanism 'tc_stab' however there can be a side effect. But also in that case it would be great if you could try to confirm that this parameter is responsible for degraded performance. Wifi is variable enough that it is hard to figure out root causes of degradations even with repeated measurements that basing decisions on single observations might be a bit optimistic. Now, you probably did repeated measureents, so I just wanted to document this idea for other readers.

Well as @dlang wrote (in another thread) sqm-scripts really only tackle fixed bandwidth link buffer bloat control, so it is quite expected that sqm scripts does not do wnders for wifi. Interestingly did the key bufferbloat developers move on to improve wifi...

Best Regards