CAKE w/ DSCPs - cake-qos-simple

Thanks @plater. I'm not on @dave14305's level - hardly anyone is. @dave14305 any chance you could elaborate a little on why the new version doesn't work and why the old version works for the benefit of @plater and I and other interested readers? I'd also be interested if you see any other way to implement the ct approach whilst still covering this edge case, since it seems that's still 5-10% faster according to @plater's testing (at least the version that doesn't work!).

The new version relies on ct state, which changes during the course of the connection’s lifetime. I read this article so that I appear smarter (I think @ldir posted this somewhere a long time back):

https://thermalcircle.de/doku.php?id=blog:linux:connection_tracking_3_state_and_examples

For the current implementation to work, the connection state has to be “new” or “untracked” when it’s being sent out the wan interface. It’s designed around marking traffic on egress to wan and restoring the DSCP on ingress from wan. This covers most scenarios just fine.

But a port forward from the Internet to our local LAN is “backwards” in regards to our design. The initial packet comes in on the wan interface, with a ct state of “new”, and gets forwarded to an internal IP. Since it is coming IN on the wan interface, it’s not captured by our rule watching for the outbound interface. (oifname $ul_if ct state new…).

When the internal LAN client replies to the new connection, it will get routed out the wan (upload) interface, but as soon as the conntrack system sees a reply to the original “new” packet, it will change the ct state to “established”. So by the time the packet reaches the oifname $ul_if ct state new… rule, the outbound interface matches, but the ct state does not.

Don’t believe everything you read on the Internet… :grinning:

I don’t share the desire to hang on to the ct approach since it already demonstrates fragility. The conditional bit in act_ctinfo lets you keep track of whether the connection has already been dealt with.

I can’t really see how such a subtle change in a rule would contribute to a noticeable performance difference. He may see a difference, but probably not because of the rule itself.

Oh boy. I must apologize. I retested with the idea that the performance difference I was seeing could in part be due to the fact that the DSCP’s are now being altered when they previously weren’t and not so much because the lack of ct hurt performance. I hadn’t considered that, sorry.

I re-ran the test with the port closed for a more equal comparison. I don’t exactly know how to isolate this for a scientific like result but the difference is negligible. I honestly can’t see much if any difference in performance now. I guess that makes the decision easier.

2 Likes

Splendid explanation. Thank you.

So if there’s no performance difference isn’t a good way forward just to switch to:

oifname wan ct state new,untracked,established goto classify-and-store-dscp

What would be the effect of adding the ct direction reply here?

Wait no. I'm sorry, I tested

oifname wan ct state new,untracked goto classify-and-store-dscp

vs

oifname wan goto classify-and-store-dscp

There's little to no difference between the two.

Edit, I restested again with

oifname wan ct state new,untracked,established goto classify-and-store-dscp

Unless something went wrong with my restore or I had two rules enabled at once last night (possible) I'm not seeing too much difference between the three now. At this point maybe someone else can confirm?

By adding established, you now process (nearly) every outbound packet, even if it’s already been marked and saved. Plus you have the tc filter on egress already restoring the DSCP from conntrack. So it’s extra, duplicative work, mangling packets many times.

OK so it seems like the special bit approach is still the best.

Absent any objections, I’ll revert to:

And @plater thanks for raising all of this. It’s been super interesting. And I’ve definitely learnt a couple of things.

2 Likes

Remember you can now combine the old way into:

ct mark set ip dscp or 128
ct mark set ip6 dscp or 128

Edit: but there is a bug in nftables 1.0.8 that makes it display wrongly. So keep it as separate lines until 1.0.9.

Display wrongly but still work fine? If so, I’m inclined to go with it!

Just display.

Does this look OK?

@plater does it work for you too?

3 Likes

Yes. I’ve been running with these exact changes for a couple of days. Everything works great.

Thank you and @dave14305 for putting up with me! :grin:

1 Like

On the contrary, this has been a very enjoyable and helpful exchange. Thank you for helping improve cake-qos-simple.

@moeller0 you might be interested in reading from here if you haven't already. Turned out that using 'ct state new, untracked' on packets directed to wan to work out whether to classify tracked connections (classification upon connection creation rather than classification of every packet) breaks in the case of connections opened in the other direction (from outside to inside rather than from inside to outside). The solution is to go back to using a conditional bit on the conntrack.

@dave14305 in writing the above summary for @moeller0 it just dawned on me that another solution might have been instead to additionally include iifname wan ct state new, untracked - would that have worked too? That is, classify not only egress packets, but also ingress packets associated with new connections. Though perhaps that's less efficient because it means evaluating not only all egress packets but also all ingress packets.

1 Like

Sure, but you would still be left wondering if any other ct states might cause issues.

1 Like

hello in this script when

are we supposed to see the tos or is it only for the download part? thanks

# First check correct flows and DSCPs correctly set by your LAN client on upload
   tcpdump -i wan -vv always 0x00 ??? 
   # Second check correct flows and corresponding DSCPs are getting set by router on download
   tcpdump -i ifb-wan -vv example cs4 0x80 ?

With tcpdump you can use "(ip[1]!=0)" to filter on non-zero TOS values.

Here is an example:

root@OpenWrt-1:~# tcpdump -i ifb-wan -v "(ip[1]!=0)"
tcpdump: listening on ifb-wan, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:57:03.365488 IP (tos 0x80, ttl 54, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    1.1.1.3.853 > 192.168.0.2.39254: Flags [S.], cksum 0x9c63 (correct), seq 1450840744, ack 1678205163, win 65160, options [mss 1320,sackOK,TS val 3645466303 ecr 1224120091,nop,wscale 13], length 0

This means a packet from Cloudflare's family DNS service was sent to my LAN with tos 0x80, i.e. cs4 (voice).

Here is the correspondence between TOS and DSCP:

https://www.tucny.com/Home/dscp-tos

2 Likes

ok thanks but if you do tcpdump -i wan -vv do you also have the correspondence or is it normal to have 0x00?

Normal with the default settings in cake-qos-simple owing to the use of the 'wash' setting in cake in respect of the upload options - see here:

cake_ul_options="diffserv4 triple-isolate nat wash no-ack-filter noatm overhead 0"

https://man7.org/linux/man-pages/man8/tc-cake.8.html

With 'wash':

root@OpenWrt-1:~# tcpdump -i wan -v port 853
10:02:01.469332 IP (tos 0x0, ttl 65, id 5791, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.0.2.59426 > 1.0.0.3.853: Flags [.], cksum 0xc1d3 (incorrect -> 0xfe3a), ack 3837, win 1002, options [nop,nop,TS val 2053242796 ecr 555599014], length 0

With 'nowash':

root@OpenWrt-1:~# tcpdump -i wan -v port 853
10:03:32.399027 IP (tos 0x80, ttl 65, id 32670, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.0.2.54100 > 1.1.1.3.853: Flags [.], cksum 0xc2d4 (incorrect -> 0x834f), ack 493, win 1002, options [nop,nop,TS val 1224509170 ecr 3023558283], length 0
1 Like

Here is my tcpdump invocation that will show any packet not CS0/best effort

tcpdump -i pppoe-wan -v -n '(ip and (ip[1] & 0xfc) >> 2 != 0)' or '(ip6 and (ip6[0:2] & 0xfc0) >> 6  != 0)'

replace pppoe-wan with the desired interface, and != 0 with the test you are after, note the main difference to @Lynx's version is in masking out the ECN bits, and in adding IPv6. Also note that for IPv6 tcpdump reports class instead of tos, but both contain the DSCP bitfield.

For completeness the following operates on the 2 bit ECN bitfield:

tcpdump -i pppoe-wan -v -n '(ip6 and (ip6[0:2] & 0x30) >> 4  != 0)' or '(ip and (ip[1] & 0x3) != 0)' # NOT Not-ECT

it will report all ECT(0), ECT(1) and CE marked packets, replace != 0 for specific values, e.g for IPv6:

tcpdump -i pppoe-wan -v -n 'ip6 and (ip6[0:2] & 0x30) >> 4  == 0' # Not-ECT
tcpdump -i pppoe-wan -v -n 'ip6 and (ip6[0:2] & 0x30) >> 4  == 1' # ECT(1)
tcpdump -i pppoe-wan -v -n 'ip6 and (ip6[0:2] & 0x30) >> 4  == 2' # ECT(0)
tcpdump -i pppoe-wan -v -n 'ip6 and (ip6[0:2] & 0x30) >> 4  == 3' # CE
2 Likes