Ultimate SQM settings: Layer_cake + DSCP marks (New Script!)

The memory used figure represents the peak queue length and doesn't necessarily mean it is the current figure. Also to say that cake is consuming memory is not quite correct. The kernel manages the actual packet buffers and hence the memory required for those buffers. Cake (or any other qdisc's job) is to arrange those buffers into a line or queue.

High queue lengths suggest a traffic flow or flows that are not responding to the usual 'congestion indicated' signals of either ECN or packet drop. As the overall queue length gets to 'memlimit' cake simply starts throwing away packets in an attempt to get the queue length back under control...this is actually the BLUE part of the combined CODEL/BLUE algorithm called COBALT.

You are unlikely to improve the situation by increasing memlimit, there may be more benefit by decreasing it but overall I'd investigate quite what traffic is going across the link before playing too much with parameters. Also check cpu usage.

I have seen some remote backup clients play badly with ECN enabled, CAKE signals congestion but the other end doesn't pass back the message, so the local backup client just keeps sending more and more. CAKE classifies those as 'unresponsive flows', and when memlimit is hit, BLUE kicks in and kills packets in that flow with prejudice! In my case despite the high queue lengths, CAKE kept the other flows nicely isolated and everything was responsive.

4 Likes

@ldir Thanks for the info. BTW it's an x86 box with an e6400 in it, CPU usage is almost non existent.

When time permits I'll see what type of traffic it is, it's a student house so who knows, the majority of them move out soon so I may never know. I'll wait and see how many complaints I get, if it's only one of them, CAKE maybe doing it's job properly with the DSCP marks and keeping the internet working well for everyone else.

The usage in those screen shots represent just under 48 hours if I remember correctly.

Yes the e6400 cpu should be fine :rofl: I note your incoming (ISP downstream) rate is 40Mbit. Is that your ISP's rate or a bit below it?

The connection I pay for is 40Mbit but it's over provisioned by about 4 Mbit, so 44 total. I've already lowered it to 38Mbit. It's been good so far, there isn't the same amount of traffic as when I had the problem reported to me.

try to use 100% of your upload bandwidth, then see if there's any complaints.

With 100% upload there will be complaints for sure. It's looking more like a cable network problem right now, cable modem has gone offline or been reset twice by the students living there in the last 12 hours.

The hard part now is being able to VPN in and see the logs and signal levels on the modem before the modem is reset.

1 Like

@hisham2630 has anything been changed since this thread was posted? please keep us updated in case we missed anything.

also keeping an eye on your https://github.com/hisham2630/Ultimate-SQM-settings-Layer_cake-DSCP-marks-New-Script

Hello any one can help me on setting up on wdr4300? I am stuck on updating the sqm file.

^^^ I can't help with custom sqm files because I have not written script for the tins because it is too complicated, I just dscp seperate the flows. Right now I have the issue where I can't set the standard dscp for a interface to bulk so that I can mark the connections I want to mark and go that route. I looked into ip set but I really don't want to make custom tins.

Will this script work without editing any keys or do we need to modify it depending on the router?

Hello, im not sure if below stats is normal Tin 2 and 4 registers some packets in lan eth0.1 interface but empty on wan eth0.2 interface, is this even possible.

qdisc noqueue 0: dev lo root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 68893220 bytes 97046 pkt (dropped 0, overlimits 0 requeues 1) 
 backlog 0b 0p requeues 1
  maxpacket 1514 drop_overlimit 0 new_flow_count 22 ecn_mark 0
  new_flows_len 1 old_flows_len 6
qdisc noqueue 0: dev br-lan root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc cake 800b: dev eth0.1 root refcnt 2 bandwidth 43671Kbit diffserv8 dual-dsthost nat nowash ingress no-ack-filter split-gso rtt 100.0ms noatm overhead 18 mpu 64 
 Sent 65229545 bytes 60026 pkt (dropped 198, overlimits 54263 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 523776b of 4Mb
 capacity estimate: 43671Kbit
 min/max network layer size:           28 /    1500
 min/max overhead-adjusted size:       64 /    1518
 average network hdr offset:           14

                  Tin 0        Tin 1        Tin 2        Tin 3        Tin 4        Tin 5        Tin 6        Tin 7
  thresh      43671Kbit    38212Kbit    33435Kbit    29256Kbit    25599Kbit    22399Kbit    19599Kbit    17149Kbit
  target          5.0ms        5.0ms        5.0ms        5.0ms        5.0ms        5.0ms        5.0ms        5.0ms
  interval      100.0ms      100.0ms      100.0ms      100.0ms      100.0ms      100.0ms      100.0ms      100.0ms
  pk_delay        1.2ms          0us        118us        5.3ms        694us          0us        241us         82us
  av_delay        466us          0us         33us        1.5ms        335us          0us         52us          5us
  sp_delay         21us          0us         18us         24us         14us          0us         27us          5us
  backlog            0b           0b           0b           0b           0b           0b           0b           0b
  pkts            33608            0         1472        19564         4397            0         1122           61
  bytes        45809492            0      1254989     12998648      5322431            0       139592         2706
  way_inds            0            0            0           61            0            0            0            0
  way_miss           19            0           26          543           32            0          918            4
  way_cols            0            0            0            0            0            0            0            0
  drops             180            0            0           15            3            0            0            0
  marks               0            0            0            0            0            0            0            0
  ack_drop            0            0            0            0            0            0            0            0
  sp_flows            1            0            1            2            1            0            5            0
  bk_flows            0            0            0            0            0            0            0            0
  un_flows            0            0            0            0            0            0            0            0
  max_len          1514            0         1514         1514         1514            0          352           90
  quantum          1332         1166         1020          892          781          683          598          523

qdisc cake 800e: dev eth0.2 root refcnt 2 bandwidth 45885Kbit diffserv8 dual-srchost nat nowash no-ack-filter split-gso rtt 100.0ms noatm overhead 44 mpu 64 
 Sent 3659524 bytes 36987 pkt (dropped 0, overlimits 301 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 7936b of 4Mb
 capacity estimate: 45885Kbit
 min/max network layer size:           28 /    1500
 min/max overhead-adjusted size:       72 /    1544
 average network hdr offset:           14

                  Tin 0        Tin 1        Tin 2        Tin 3        Tin 4        Tin 5        Tin 6        Tin 7
  thresh      45885Kbit    40149Kbit    35130Kbit    30739Kbit    26896Kbit    23534Kbit    20592Kbit    18018Kbit
  target          5.0ms        5.0ms        5.0ms        5.0ms        5.0ms        5.0ms        5.0ms        5.0ms
  interval      100.0ms      100.0ms      100.0ms      100.0ms      100.0ms      100.0ms      100.0ms      100.0ms
  pk_delay         70us          0us          0us         69us          0us          0us         67us         32us
  av_delay         18us          0us          0us         19us          0us          0us         30us          7us
  sp_delay         14us          0us          0us         14us          0us          0us         23us          7us
  backlog            0b           0b           0b           0b           0b           0b           0b           0b
  pkts            19819            0            0        15950            0            0         1120           98
  bytes         1511762            0            0      2048313            0            0        94085         5364
  way_inds            0            0            0          174            0            0            2            0
  way_miss           23            0            0          602            0            0          682           27
  way_cols            0            0            0            0            0            0            0            0
  drops               0            0            0            0            0            0            0            0
  marks               0            0            0            0            0            0            0            0
  ack_drop            0            0            0            0            0            0            0            0
  sp_flows            0            0            0            1            0            0            5            0
  bk_flows            0            0            0            1            0            0            0            0
  un_flows            0            0            0            0            0            0            0            0
  max_len          1514            0            0         1514            0            0          380           90
  quantum          1400         1225         1072          938          820          718          628          549

If you would like a visual way to view the SQM queues, check this out: SQM Reporting?

Can you help me understand this better? Isn't piece_of_cake.qos essentially "besteffort" using a single tin (T0)? Wouldn't the diffserv4 option actually alter the underlying script so it uses layer_cake.qos instead or am I missing something?

Also, how would wash vs. nowash come into play in this scenario? Since nowash is the default, wouldn't it be recommended to place wash in the eqdisc_opts for both the WAN and LAN queues?

FWIW, here is my current sqm config file and I am very open to any opportunities for improving it:

config queue 'wan'
	option ingress_ecn 'ECN'
	option interface 'eth0'
	option debug_logging '0'
	option verbosity '5'
	option qdisc_advanced '1'
	option squash_dscp '0'
	option squash_ingress '0'
	option qdisc_really_really_advanced '1'
	option linklayer 'none'
	option enabled '1'
	option script 'layer_cake.qos'
	option egress_ecn 'ECN'
	option download '0'
	option upload '24500'
	option qdisc 'fq_codel'
	option eqdisc_opts 'diffserv4 docsis dual-srchost nat ack-filter'

config queue 'lan'
	option ingress_ecn 'ECN'
	option interface 'eth1'
	option debug_logging '0'
	option verbosity '5'
	option qdisc_advanced '1'
	option squash_dscp '0'
	option squash_ingress '0'
	option qdisc_really_really_advanced '1'
	option linklayer 'none'
	option enabled '1'
	option egress_ecn 'ECN'
	option download '0'
	option upload '450000'
	option qdisc 'fq_codel'
	option eqdisc_opts 'diffserv4 ethernet dual-dsthost nat ingress'
	option script 'layer_cake.qos'

config queue 'GUEST'
        option ingress_ecn 'ECN'
        option interface 'eth1.9'
        option debug_logging '0'
        option verbosity '5'
        option qdisc_advanced '1'
        option squash_dscp '0'
        option squash_ingress '0'
        option qdisc_really_really_advanced '1'
        option linklayer 'none'
        option enabled '1'
        option egress_ecn 'ECN'
        option download '0'
        option upload '450000'
        option qdisc 'fq_codel'
        option eqdisc_opts 'diffserv4 ethernet ether-vlan dual-dsthost nat ingress'
        option script 'layer_cake.qos'

config queue 'IOT'
        option ingress_ecn 'ECN'
        option interface 'eth1.99'
        option debug_logging '0'
        option verbosity '5'
        option qdisc_advanced '1'
        option squash_dscp '0'
        option squash_ingress '0'
        option qdisc_really_really_advanced '1'
        option linklayer 'none'
        option enabled '1'
        option egress_ecn 'ECN'
        option download '0'
        option upload '450000'
        option qdisc 'fq_codel'
        option eqdisc_opts 'diffserv4 ethernet ether-vlan dual-dsthost nat ingress'
        option script 'layer_cake.qos'
1 Like

IIRC, layer_cake.qos gets its default isolation mode from defaults.sh and that happens to be diffserv3. Sure you can also use layer_cake here, but it will not make much difference.

That really depends on whether you assume your internal dscps to be worth hiding from your ISP. In that case adding wash should be fine, for ingress adding wash might be more relevant and a better default than nonwash.

I would rather use the link layer tab to configure 18 bytes of overhead and mpu of 64 bytes explicitly, but that is a matter of taste, as the docsis keyword will result in the same outcome.

And what is the logic behind 3 individual instances of ingress shapers all set at option upload '450000'? Depending on the routing you might try to sink 450.000*3 = 1350.000 or 1.35 Gbps of traffic, or the IOT and GUEST shapers might rarely trigger since lan already makes sure that internet traffic stays below 450000 Kbps? Or are you trying to limit IOT and GUEST's access to LAN? (I assume that IOT and GUEST from their names would be isolated from LAN)?

1 Like

That makes sense--thanks for clarifying!

Makes sense as well. I'm not concerned about my ISP's interpretation of my internal dscps. They're going to do whatever they are going to do anyway, regardless of my wishes, I'm sure. But based on my readings and understanding, it does seem having wash on the ingress would be desirable.

Oh dear... the logic. We had to go there. :stuck_out_tongue:

I originally started out with a basic SQM setup (single queue on my WAN interface) and it's been working great as far as I could tell. You probably will recall some of my posts on other threads about my SQM performance and I had it tuned quite nicely. But, I was very bothered this weekend by the fact that I would see almost no activity in my BK tin, tons in BE, some in VI, and tiny amounts in VO. I started trying to investigate and ended up finding this thread. After reading hundreds of posts between this thread and the previous "Ultimate SQM settings" (>700 posts), I decided to give some of this a try and ended up going down a rabbit hole trying to solve for something that maybe doesn't matter in my case at the end of the day...

Back to the details--apart from my tinkering with a NanoPi R2S, my main router is a dedicated OpenWrt x86_64 VM. My AP is another standalone R7800 running OpenWrt, so it's outside this equation. I have my primary LAN VLAN, as well as a VLAN for guest (eth1.9) and another for IoT (eth1.99).

I was at the point last evening where I had SQM on my WAN (eth0) and LAN (eth1) as well as several iptables rules and dnsmasq ipset settings, mostly the way @hisham2630 had them with a few tweaks+additions. I am seeing a much better balance of usage between the four diffserv tins, overall. But, I started to wonder if the eth1 (LAN) queue was enough for all traffic heading into that physical interface or if it was literally only shaping for eth1, but not for eth1.9 and eth1.99. So I added queues for those VLANs as well.

Unfortunately, after >800 posts between this thread and the previous, there is a fair amount of conflicting information, mixed with various "noise", and a lot of very specialized one-off use-cases thrown in. So I probably ended up with some very goofy "logic", evidenced by my SQM config I shared.

I am NOT trying to limit bandwidth to my guest and IoT subnets. Ultimately I just wanted them to be included in good prioritization that SQM does on the whole. Having said that, perhaps this is all just over-the-top in my use-case and I should go back to a single SQM queue on the WAN. Your thoughts?

So a shaper on eth1 should shape all VLANs on eth1 as well as non-taggep packets, but a shaper on eth1.99 will only act on packets with VLAN tag 99. That should be relative easy to confirm, just set the eth1 shaper to a really low value and run speedtests from all your "sub-nets". But if you have enough CPU to spend, running the same packet through two shapers should work as well :wink:

1 Like

Spot on, Sir! After I wrote my last reply to you, I started rethinking the whole thing. Funny what a night of sleep and writing out one's thoughts can do to help bring clarity to a situation.

I actually started peeling this thing back after some testing. The counters in my eth1 tins would increase at the same rates as the counters in my eth1.9 and eth1.99 tins. During a heavy speed test, this was very obvious using my netdata-chart-sqm chart. This conclusion is very logical, again, after a night of sleep and "talking it out" with you here.

I went back down to a single queue for eth1 and single queue for eth0. While I do have CPU to spare on my x86_64 box, my goal is to eventually get things worked out to run fully from my NanoPi R2S and I'm sure having four SQM queues on it would make it beg for mercy. I'm not even sure it will like two SQM queues.

But anyway, in the interest of keeping SQM straight forward (which is why I have loved ditching traditional QoS), I think this is good for now.

Going back to this wash vs nowash notion for a moment, though... something else I thought about is that given @hisham2630's dscp script already does this:

... this means with eth1 having its own queue now, dscp from the upstream ISP was getting washed anyway regardless of the CAKE default of nowash. So adding the wash option is not going to break anything, it's just redundant in this case. Is that the way you would interpret this as well?

For anyone who's been testing and tinkering with all this for a while, I would love to have some feedback on a PR I submitted against @hisham2630's project.

The intent of this PR was to combine the ipv4 and ipv6 DSCP scripts into a single script that would act on both iptables and ip6tables in dual-stack environments, or just iptables in ipv4 environments. The driver for this is ease of administration if one is adding multiple rules. This new script would, in a dual-stack environment, create both the ipv4 & ipv6 rules based on a single rule definition instead of separate iptmark and ipt6mark calls for each rule.

This would also allow a relatively seamless migration of current configuration during a transition from ipv4 to dual-stack.

Anyway, PR link for anyone interested in checking it out: https://github.com/hisham2630/Ultimate-SQM-settings-Layer_cake-DSCP-marks-New-Script/pull/5 (specific commit for refactor: here)

1 Like

My thinking on wash/nowash:

egress: If the ISP is going to ignore my DSCPs, and they probably should, then why should I waste CPU in clearing them?

ingress: Incoming (from ISP) marks are mostly irrelevant for me anyway since I re-mark with my own mapping courtesy act_ctinfo for the benefit of CAKE's tins. Now whether CAKE should then wash those restored markings is a debate to be had especially where wifi is involved.

4 Likes

Is there any measurable impact when creating a port-based DSCP marking rule as opposed to an IP-based DSCP marking rule?

For example, given the following, is there a performance impact between these three options?

  1. iptmark -p udp -m multiport --ports 3478:3481 -j DSCP --set-dscp-class CS6 -m comment --comment "MS Teams UDP"
  2. iptmark -p udp -d 13.107.64.0/18,52.112.0.0/14,52.120.0.0/14 -j DSCP --set-dscp-class CS6 -m comment --comment "MS Teams UDP"
  3. iptmark -p udp -m multiport --ports 3478:3481 -d 13.107.64.0/18,52.112.0.0/14,52.120.0.0/14 -j DSCP --set-dscp-class CS6 -m comment --comment "MS Teams UDP"

I realize the 2nd and 3rd options create three rules each in the dcsp_mark chain (one rule per subnet). In several cases such as this MS Teams use-case, I would prefer to prioritize traffic based on IP/Subnet + port. But if the resulting rules create a performance hit for packets traversing the dscp_mark chain, I would be willing to reconsider.

Update
Pondering this some more... by adding one or more destination subnets, this would negate the ingress shaping of the same port--yes? So, it's probably a bad idea to add any IP qualifications on these rules. Maybe I'm overthinking it.