SQM Logic approach for Internet Bandwith management

I don’t know much about the nft approach to this so cannot answer with authority. I have two observations/questions:

  1. Since nft allows you to run packet mangling rules on ingress the current sqm-scripts method of creating an IFB interface (on which a cake instance is applied) has been dropped. This leads to the observation that packet classification rules will need to be run on every ingress and egress packet.

  2. On the ingress path, are the packets pre or post NAT? If pre-NAT then any classification rules that are based on internal IP addresses aren’t going to work.

The evolution (in the sense that evolution is a series of successful mistakes) of the existing IFB/tc act_ctinfo/iptables/set-dscpmark combines a series of limitations & workarounds into something that’s bigger than the sum of its parts. I’m sure that nftables can be used but I do wonder if in the rush to do so other babies have been thrown out with the bathwater of iptables.

Keep in mind a) I’m not an expert and b) I came into this based on the existing framework that sqm-scripts provided ie. IFBs. I understand some of those ‘old skool’ limitations, I do not understand what ‘freedoms’ are offered by the ’new hotness’ of nftables. So what is done under the ‘old skool’ and why:

Ingress:

Packet comes in on wan interface. We’d like to apply shaping to it for flow fairness purposes but cannot (limitation 1), we can only shape on an egress path (we can police on an ingress path but that’s functionally limited). Workaround 1) redirect ingress packets from wan to an IFB interface (ifb4wan). Apply CAKE shaper to egress of the ifb4wan interface, thus we now have a CAKE shaped ingress path. Because CAKE is clever it can look into the conntrack/nat table and apply host fairness across internal hosts. This all happens before iptables gets a handle on it, thus mangle rules to play with DSCPs aren’t run yet. So this is limitation 2) How to apply DSCPs to packets such that CAKE sees them for classification purposes and preferably post NAT so that internal addresses are used for that decision. Workaround 2) We can’t directly. But the conntrack table is something we can access at that point so we could store a desired DSCP into that table (and set a bit flag in it to say that we’ve done so) and restore that DSCP to the packet before CAKE gets to see it. So re-write the IFB redirect to 1) use act_ctinfo to restore any stored DSCP and 2) redirect to the IFB interface, so now CAKE sees DSCPs and can do its own lookup for NAT interface fairness purposes. Job done for ingress at least.

wan ingress -> act_ctinfo (restore DSCP) -> ifb4wan -> CAKE (nat lookup) shaper -> us!

Egress:

Shaping on wan egress is easier since we can just apply CAKE to the interface and it naturally goes on the egress side of wan. Cake has NAT/conntrack smarts to lookup internal hosts for internal host fairness, all is sweetness and light.

Since packets are still in iptables domain (unlike ingress), if we want packets to be classified in a certain way all we need do is write some rules in iptables’ mangle table to change the DSCP and egress CAKE will notice and do the right thing. If at the end of that mangling we then had some tool to store that DSCP into the firewall mark then we can let act_ctinfo in on the story and also ingress CAKE will do the right thing too. That tool is iptables connmark —set-dscpmark. And at that point the job is done, however it’s a bit inefficient since every packet has to go through the mangle tables/routine and we update the conntrack firewall mark every time too (even though mostly it won’t change)

I sortof created a ‘DSCP offload engine’. The easiest tweak is to only go through the classification rules if we haven’t stored a DSCP into the firewall mark yet. That also requires an instance of ‘act_ctinfo’ on the egress interface (before egress cake) where it effectively restores what was decided for the connection by the first run through the mangle rules.

For my own purposes I wanted to implement an automatic de-prioritisation, ie. if a best-effort connection transfers sufficient traffic, consider and treat it like a long-term download/upload. That required a second flag store in the firewall mark and some more slightly convoluted iptables rules but the principle is there.

iptables mangle -> unset go through dscp setting rules -> store with -set-dscp -> act_ctinfo -> cake -> egress wan
-> set -> act_ctinfo -> cake -> egress wan

Kevin

4 Likes