SQM 3 layer cake with br and IP Tables packet marking?

Stepan · March 3, 2018, 11:16pm

I'm currently using piece_of_cake on WAN but have 4 separate VLAN's set on my WRT1200AC, each bridging one network jack to it's own WiFi. I'd like to slap EF on OCSERV and my VOIP box connected in DMZ, while demoting some other other stuff CS1, but unsure about best approach.

Does it make sense to move to 3 layer_cake and then use IP Tables to mark packets based upon source MAC or source IP or protocol and such? OCSERV already has packet marking built in so that one's easy, but I don't see any built-in way to "hard limit" other attached hosts.

The following website talks about using tc classes to do the marking, so maybe I need to deep dive into this but I wonder whether it will it work alongside layer_cake? Link: http://combo.cc/articles/openwrt-upload-quota-and-bandwidth-limiting.html

Also I'd like separate QoS instance on each of my four VLAN's. Each VLAN contains a br, connecting one network jack to it's associated WLAN. Do I need to configure veth pairs so that I'll have a place to hang the per-VLAN QoS? Link: http://models.street-artists.org/2017/12/11/inbound-qos-with-virtual-ethernet-and-policy-routing/

A nudge in the right direction would be very much appreciated.

hnyman · March 4, 2018, 8:16am

The approach sounds right in principle. The layer_cake script in SQM needs package classifications to perform the classification optimally. So, it sounds great that at least some of your devices/programs do set the DSCP in advance.

Regarding cake, I think that instead of assigning a tc class or iptables "mark", you might stick to assigning DSCP tags to incoming packets from selected devices.

@moeller0 might be the best guy here to validate your thoughts.

To my knowledge, there is no clear guide on this in the Openwrt/LEDE wikis. You need to write the iptables mangle commands by yourself.

The details of the DSCP diffserv traffic classes that cake knows, can be found starting at the cake source below:

github.com

dtaht/sch_cake/blob/cobalt/sch_cake.c#L2076


				srchost->srchost_refcnt--;
				dsthost->dsthost_refcnt--;
			}
			goto begin;
		}


		/* Last packet in queue may be marked, shouldn't be dropped */
		if (!cobalt_should_drop(&flow->cvars, &b->cparams, now, skb,
			b->bulk_flow_count * !!(q->rate_flags &
						CAKE_FLAG_INGRESS)) ||
		    !flow->head)
			break;


		/* drop this packet, get another one */
		if (q->rate_flags & CAKE_FLAG_INGRESS) {
			len = cake_advance_shaper(q, b, skb,
						  now, true);
			flow->deficit -= len;
			b->tin_deficit -= len;
		}
		b->tin_dropped++;

Note that varying amount of DSCP codes are used for diffserv8, diffserv4, diffserv3 and diffserv_llt modes in cake. Check each mode in the source.

Stepan · March 4, 2018, 9:59am

This is indeed looking most elegant @hnyman, especially given my limited knowledge:

And so in the spirit of this theme, I've left 4 layer_cake to do what it does best on my WAN interface, but I've also installed simple.qos on each of my br interfaces. Layer_cake is taking care of normal DSCP stuff, while simple.qos enforces hard-limits on specific VLAN's and on specially tagged packets. For instance I could clearly use the "Experimental" aka "Local Use" ToS codes (xxxxxx11) along with a slightly hacked simple.qos to implement bandwidth strangling caps based upon stateful usage parameters, analogously to the website link in my previous post.

The thing that's fuzzing me out is what specific technique to use for assigning ToS codes to packets coming from the LAN's into the router, and where best to assign the codes. Are there other options besides standard mangling? It seems that I really will need to bridge each of my br interfaces to a veth pair, just to have some place to do the tagging and QoS at exactly the VLAN level. A veth would also allow me to install simple.qos onto the wan-facing member veth1 and thereby the input and output would be the "right way around", which I presume may have some benefits versus how the cake code is written with regard to asymetries in wan bandwidth and default options for nat dual-dsthost diffserv4 dscp-squash etc. Being better aligned with how the SQM and cake code is written will reduce risk of breakage when new versions of these opkg are released.

ODDLY ENOUGH:
$ veth.ko & kmod-veth
$ opkg install kmod-veth
$ sudo modprobe veth
$ lsmod | grep veth
$ ip link add ve-lf type veth peer name ve-wf #...,,,...,,,...,,,...,,,...,,,...,,, PEBKAC! DOES NOT WORK
$ ip link add type veth #...,,,...,,,...,,,...,,,...,,,...,,,...,,,...,,,...,,,...,,,...,,,...,,,...,,,... WORKS FINE
Anyone know where I can find the CORRECT man page for this thing?

On the topic of which DSCP tags to massage unto packets prior to ingest by cake:layer_cake, the readings seem to indicate these these 4 be used so that ineroperability with external software and equipment will be maximized:
. Expedited Forwarding (46)
. CS3 (24)
. Best Effort CS0 (0)
. Scavenger CS1 (8)
It seems to me that CS0 is the same as untagged, so I wonder why bother to tag those packets in the first place?

And for outright bandwidth throttling via cake:simple.qos based upon stateful parameters I'd use the Local Use ToS values:
. ToS 10000011 = gentle restraint
. ToS 01000011 = choke hold
. ToS 00100011 = strangle hold
. Best Effort CS0 (0) = unrestricted
. All others = unrestricted
The idea being that bona-fide DSCP processing will take place one hop downstream, at the WAN interface.

Although the cascaded approach is very piece-meal, I'd like to propose that it's easy to understand, and easy to cobble together without having to spend weeks learning all the nitty gritty of IP Tables and the entire tc subsystem. It also leaves the layer_cake code alone and unmodified, so I don't have to regurgitate my hacks every time the good cake folks come up with a new improved opkg for it, seeing that it's under heavy development.

My #!/bin/sh uci router config script is already at 1000 lines, so the less super-tweaking I have to do to finish this gremlin the better. Rubber bands and glue are gReAt so long as it works and is maintainable ;o) Having simple.qos cascaded with layer_cake makes my router into a simpleton robot serial paquet killer zomby! It could become a popular hack ^^

moeller0 · March 5, 2018, 2:30pm

So in an ideal world applications would allow the use to set the DSCPs, but since the world is not ideal we need work arounds. Again these work-arounds would IMHO best be implemented directly on the hosts terminating the relevant traffic. But this unfortunately is rather hard in reality (e.g. Windows claim to be able to do this on a per application basis, but seems to require a domain controller).
The next best approach is to do this marking/re-mapping somewhere centrally (which has the advantage of only having one place to do the configuration in) but it sort of looses information about the application (and one might want EF marking for a VPN over port 80 but keep normal html traffic at normal priority, so ip addresses and port numbers really are not fully sufficient to identify the underlaying application unambiguously).
So for your WAN traffic you could set up re-marking by using iptables on all the SoCs inward facing interfaces (LAN, WLAN, ...) then the shaper on the WAN interface will see the desired DSCP markings. But for ingress this is somewhat tricky, as partly due the use of IFB's sqm's ingress traffic will hit the qdsisc layer before the iptables layer, so re-mapping at that point is going to be too late. Using a veth pair is a potential work around for that problem (but I assume that this is also not going to be free, so there will be additional CPU cost for that, plus this probably needs be done after NAT, so that the more important internal address are available for the filtering).

Why exactly? So what kind of fairness is your goal here?
If you want to go the veth route following @dlakelan's blog post is a great idea.

Not really, I believe that this will make selecting the keywords somewhat more intuitive, but functionally it should not matter (otherwise report a bug )

static int cake_config_diffserv4(struct Qdisc *sch)
{
/*  Further pruned list of traffic classes for four-class system:
 *
 *          Latency Sensitive  (CS7, CS6, EF, VA, CS5, CS4)
 *          Streaming Media    (AF4x, AF3x, CS3, AF2x, TOS4, CS2, TOS1)
 *          Best Effort        (CS0, AF1x, TOS2, and those not specified)
 *          Background Traffic (CS1)
 *
 *              Total 4 traffic classes.

No cake does not tag these packets at all, CS0 is basically the default marking for non-diffserv conscious applications, so cake's goal here is to always treat CS0 by default as best effort. The TOS-Octet od diffserv field is part of the IP-headers, so all IP headers will carry these fields the question is only how to treat each of the markings...

But DSCPS are only going to be guaranteed inside a DSCP domain, so inside your own network you are free to use any of the 63 values for any purpose you see fit. Trying to use some of the more common mapping will help with those few applications that actually DSCP mark be default, with a lot of luck your ISP and transit might leave these untouched, so that your flows other endpoint might also get your DSCP signal....

I think it really sould not matter so much whether you start your modifications from /usr/lib/sqm/simple.qos or from /usr/lib/sqm/layer_cake.qos in the end both are just script snippets that are called from sqm-scripts...

I think the first question should be what kind of fairness do you want, and the second question for me would be does any of the modes cake offers gets you close enough to the desired behavior that you can avoid having to roll your own?

dlakelan · March 5, 2018, 3:18pm

I think @moeller0 is on the right track here. What do you want the system to do for you? For me, for example, I wanted absolute priority to be given to voice, so that it goes first each and every time a packet comes in... so most recently I rolled my own based on HFSC. But for most people the existing layer_cake stuff is probably fine, so you mainly want to put the right DSCP on the packets.

fuller · March 5, 2018, 3:53pm

from my understanding, cake already does this throu basic fairness.
the thing hfsc (or htb for that matter) does, is delay sending other stuff to "leave room" for if a voice packet might appear (there could arrive serveral voice packets in the time it takes to send one big data package)
so ...

it is probably fine for you too

dlakelan · March 5, 2018, 4:09pm

The thing that HFSC does which HTB and layer_cake don't do is it decouples control of latency and bandwidth as two separate controls. Furthermore it offers hard-ish real-time guarantees on latency. So, by putting my voice packets into a real-time queue capable of using a hundreds of megabits for up to 5ms, I ensure that even with potentially tens of simultaneously queued voice packets (say for 5 or 10 simultaneous phone calls) the voice queue will drain completely within 1/4 of the inter-packet generation time (20ms) before any other traffic goes at all. Typically this is NOT what HTB does, as it guarantees a certain minimum bandwidth to each of its classes. I want complete starvation of everything until the voice queue is drained (or 5ms passes). It works GREAT for voice.

Then, within the link-sharing classes that regular traffic goes in, again I can control latency and bandwidth separately. So for example when a video stream starts it gets tens of megabits for 100ms to fill buffers, and will make other classes wait.... the problem with cake's basic per-user fairness is... I don't want fairness, I want certain things to get sent quickly and other things to wait, and I know which things, and I know the latency performance that I want each class to have. HFSC does that. It lets you control latency precisely and mathematically.

Furthermore I use an ipv6 only LAN network (except a few legacy devices) and I don't know if cake's per-ip fairness does anything at all for ipv6 but I definitely don't want it doing per-ip fairness between my voice ATA and some legacy android based video streaming device for example.

Let's put it this way, until I understood HFSC and set it up, which was just in the last few months, even a custom HTB system didn't give me the voice performance I wanted. Now: flawless voice.

moeller0 · March 5, 2018, 4:24pm

But in the end all shapers suffer from head of line blocking, that is they offer no packet preemption, so if your bandwidth is too low even HFSC will not guarantee real-time delivery (but I agree that in this extreme case one is screwed anyway).

moeller0 · March 5, 2018, 4:30pm

It should (that is actually easier for IPv6 than IPv4 since cake does not need to deNAT IPv6 packets (cake assumes who ever uses a port re-mapping NAT system for IPv6 is entitled to keep all the pieces ). SO if this does not work, please report a bug.

Jepp that could be disaterous, but with a fast enough network relative sparse VoIP should still get enough badwidth (VoIP typically consumes less than 100Kbps per concurrent flow, so is tiny).

That is a bit odd, I am on VoIP since 2013 and (after a few self inflicted wounds with a misconfigured firewall) I have glitch free telephony even on a lowly 50/10 link (heck, even on 6/1 and 16/2.5 VoIP simply worked like a charm), but that was always with sqm/simple.qos and no WIFI on the VoIP path...
On the other hand I am not very picky, so this could also be a case of me having much lower expectations

Best Regards

fuller · March 5, 2018, 4:38pm

this is exactly what i meant and i think they do
are you sure?

dlakelan · March 5, 2018, 4:38pm

Not quite sure what you mean here. Do you mean that once a packet is put onto the NIC you can't pull it back? That is true, but given a known serialization delay you can include this in your HFSC calculation. With gigabit NICs serialization delay is something like 1500 bytes/(1 gigabit/s) =
0.012 ms so not important for controlling latencies at the 5ms scale. With WiFi you might have more of an issue, still with WMM and the VO queue being forced by using not EF but CS6 for the DSCP it also helps.

Yeah, I have wifi in the voice path some of the time, and generally my wired connection is better... but also I am picky. I don't like the effect of even 1% packet loss in VOIP. This may be in part because I use opus codec on my android phone and although it's pretty good at packet loss concealment, it also is really good at compression, and with compression comes a more important piece of the audio being lost for each packet... I also can pretty easily have 3 or 4 simultaneous voice calls going.

Android and flaky wifi may have more to do with this than core network requirements. as I say, my wired ATA is generally better. By tagging my voice CS6 and forcing it into the VO WMM queue and forcing it to go ahead of all other packets via HFSC and priority queues in my switches, it may better compensate for Android wifi flakiness.

dlakelan · March 5, 2018, 4:57pm

Other issues I have include that all my voice traffic goes through a VPS running Asterisk. The lack of real-time scheduling on the VPS combined with the opus codec may induce substantially more jitter than if you are say going from an ATA direct to a voip provider via ulaw/alaw encoding. So any amount of jitter reduction in the network allows you to tolerate that much more jitter in the transcoding. I do this transcoding because I do frequently call direct between "internal" lines and then I get great opus 16khz audio without transcoding... but when I place outbound calls to the public telephone system the transcoding will be required. Of course the ATA doesn't use opus, and so that might be part of why it works better.

In general, I think VOIP is going to benefit a LOT from transition to ipv6, so it can't happen fast enough for me. One aspect I strongly suspect is going on is internal carrier grade NAT at the ISPs causing problems, and another thing I personally suspect is that providers like ATT internally screw with competitors voice packets. I can't prove it, but I have my suspicions based on experience.

EDIT: also, the worst performance I had was when I was on a DOCSIS cable network and was using the provider's cable modem. We've seen all kinds of weirdness reported here related to DOCSIS (remember the "pipe cleaner flow" ?). So it could be that your DSL line provides substantially better performance than DOCSIS. Finally, having a really high bandwidth connection could provide its own challenges. As you say 100kbit/s is not much bandwidth compared to say 600 megabits of downloading something from google drive... and with HTB guaranteeing say a few tens of megabits to my voice and guaranteeing hundreds of megabits to my general surfing... since it doesn't have packet by packet preemption... it could have substantially more jitter.

HFSC lets me put voice into real-time classes, they take total priority over all link-sharing classes, and so even if someone is in the middle of a 600 megabit download from google drive... all my voice packets go first onto the wire as soon as they arrive in the real time queue. HTB doesn't do that.

moeller0 · March 5, 2018, 5:51pm

As far as I can tell, no linux qdisc will be able to preempt a packet; the shaper qdisc basically meters its outflow to the NIC, and I assume that the actual transfer from qdisc to NIC is really fast, then the qdisc waits until the appropriate packet delivery time at the configured bandwidth before handing more packets to the NIC. I do not believe the qdsisc has an easy way to tell the qdisc to cancel the packet currently in transfer, so I see no way how a shaper qdisc can preempt packets in flight. I note that VDSL2's ptm supposedly allows packet preemption (which also requires to store/queeu the preempted packet somewhere until it can be transferred or its transmission can be resumed).

Now, I would be amazed if ethernet NICs would allow the same preemption, but I do not want to rule it out.

moeller0 · March 5, 2018, 5:55pm

Yes, exactly, so people with say a 6/0.5 Mbps link will still easily run into latency and jitter issues even when using HFSC. But HFSC is here not worse than any other shaper

Fair enough, I guess using a dect phone for the "mobile" part instead of wifi, plus living in a small apartment and being quite generous about quality, I am not the right person to talk to about optimizing VoIP

moeller0 · March 5, 2018, 5:58pm

I always thought one could configure HTB for strict precedence as well, not that I ever wanted to do that, so it might not be possible...

dlakelan · March 5, 2018, 6:15pm

I think it can have strict precedence but only within a sort of granularity that is coarser than you'd like. The descriptions of HFSC I've seen suggest that HTB performs about similar to HFSC link-sharing when you set only a single rate for HFSC (you can set two rates for each HFSC the "initial rate" and the "long term rate", and also you set a duration of the initial rate. The initial rate determines how fast your queue drains.)

The thing you can do with HFSC is you can also have RT queues, and you can have two rates, an initial rate and a long-term rate. as well as a duration for the initial rate.

What this basically does is it lets HFSC know what things it can block or slow down and what things it can speed up, and how much. It's very good and flexible, but difficult to wrap your head around without a good description, which I did finally find: Blog post, which links to the tutorials I used to wrap my head around it http://models.street-artists.org/2018/01/16/understanding-hfsc-in-linux-qos/

fuller · March 5, 2018, 6:15pm

word!
so glad we now have >30Mbps and cake, so i can forget about this horrific htb/hfsc setups we have been doing for the last 20years

fuller · March 5, 2018, 6:21pm

yea, complex shaping sounds nice, but please don't re-invent wondershaper

dlakelan · March 5, 2018, 6:25pm

Right, at 0.5 Mbit/s a 1500 byte packet will take 24 ms to serialize, so basically you can't do voip on that effectively while also having other traffic, because you need every 20 ms to send a packet. If you want say 1500 byte/5ms that is 2.4Mbit/s so around 3 Mbit/s is the minimum uplink speed for reasonably quality VOIP shared with other activities, this is true even though you need only 100 kbit/s for the voip packets themselves, it's only because head of line stalling needs to be small like 5ms for a full MTU packet so you can have a hope of controlling jitter.

EDIT: combine the fact that my DOCSIS connection had about 3mbit/s uplink speed, and that DOCSIS does all kinds of weirdness to combine packets and share the uplink, thereby inducing latency and jitter, and it explains why my phone never worked well on my DOCSIS connection... And specifically that I could hear people fine, but other people complained that my voice would cut in and out and sound garbled, hence the uplink being the issue.

Further edit: I suppose you might improve voice quality a lot on low bandwidth uplinks by setting an artificially low MTU on the WAN, something like say 700 bytes, and force things to fragment, thereby reducing head of line stalling. Sure you'd get poorer throughput but you'd be knocking jitter down to something manageable.

FURTHER FURTHER EDIT: minimum MTU for ipv6 is 1280 which wouldn't help much with the jitter/latency so no possibility to do the 700 byte MTU thing with ipv6. in general probably just best to leave the MTU alone, not to do any kind of VOIP on less than 3000 kbps uplink.

moeller0 · March 5, 2018, 10:32pm

Well, you can always use MSS clamping to at least try to push TCP packet size down... I agree you get less usable goodput and more relative overhead, but for latency and jitter that still might be a decent tradeoff... especially if upgrading the bandwidth to something nicer is not an option...
The VoIP packets them selves are typically much smaller anyways... (I just note that in my decent experience with simple.qos I typically set quantum to 300 especially to make small packet flows mix nicer with large packet flows, but I digress )