CAKE w/ DSCPs - cake-qos-simple

This feels like a clean and elegant separation between implementation and policy, and reuses the already existing policy/rule definition method; IMHO that fits well with both the 'simple' in the projects name and the general unix philosophy of doing a few things well per binary/project and allow to combine multiple binaries/projects.

Also, but this is subjective, I would hope that this approach would make folks think twice about each rule as it needs to be added manually... (well one can obviously copy and paste sections from /etc/config/firewall)

1 Like

What does this mean? You can put multiple ports and ranges in a single rule by separating them with a space (e.g. 3478-3497 16384-16387 16393-16402).

One extra thing I added to my own setup is to insert the check into the firewall4 mangle table to skip rules if the conversation is already marked.

# cat /usr/share/nftables.d/chain-pre/mangle_postrouting/00-ctinfo-skip-marked.nft 
oifname $wan_devices ct mark & 64 != 0 return

The wan variable is created by firewall4.

@dave14305 does that work in terms of preventing LuCi firewall rules from overwriting?

Shouldn't rules overwrite LAN-set DSCPs?


@moeller0 does this look OK:

printf "\nSetting up CAKE on interface: '${ul_if}' with bandwidth: '${cake_ul_rate_Mbps}Mbit/s' and options: '${cake_ul_options}'.\n"
tc qdisc add dev "${ul_if}" root handle 1: cake bandwidth "${cake_ul_rate_Mbps}Mbit" ${cake_ul_options}

printf "\nSetting up tc filter to restore DSCPs from conntrack on egress packets on interface '${ul_if}'.\n"
tc filter add dev "${ul_if}" parent 1: protocol all matchall action ctinfo dscp 63 continue

if [[ -n "${overwrite_ul_ect_1_val}" ]]
then
        printf "\nSetting up filters to overwrite upload ECT(1) values with decimal value: '${overwrite_ul_ect_1_val}'.\n"
        tc filter add dev "${ul_if}" parent 1: protocol ip u32 match ip dsfield 1 0x3 action pedit ex munge ip dsfield set "${overwrite_ul_ect_1_val}" retain 0x3 pipe csum ip4h
        tc filter add dev "${ul_if}" parent 1: protocol ipv6 u32 match ip6 priority 1 0x3 action pedit ex munge ip6 traffic_class set "${overwrite_ul_ect_1_val}" retain 0x3
fi

if [[ -n "${overwrite_ul_ect_0_val}" ]]
then
        printf "\nSetting up filters to overwrite upload ECT(0) values with decimal value: '${overwrite_ul_ect_0_val}'.\n"
        tc filter add dev "${ul_if}" parent 1: protocol ip u32 match ip dsfield 2 0x3 action pedit ex munge ip dsfield set "${overwrite_ul_ect_0_val}" retain 0x3 pipe csum ip4h
        tc filter add dev "${ul_if}" parent 1: protocol ipv6 u32 match ip6 priority 2 0x3 action pedit ex munge ip6 traffic_class set "${overwrite_ul_ect_0_val}" retain 0x3

fi

printf "\nSetting up CAKE on interface: '${dl_if}' with bandwidth: '${cake_dl_rate_Mbps}Mbit/s' and options: '${cake_dl_options}'.\n"
tc qdisc add dev "${dl_if}" root handle 1: cake bandwidth "${cake_dl_rate_Mbps}Mbit" ${cake_dl_options}

if [[ -n "${overwrite_dl_ect_1_val}" ]]
then
        printf "\nSetting up filters to overwrite download ECT(1) values with decimal value: '${overwrite_dl_ect_1_val}'.\n"
        tc filter add dev "${dl_if}" parent 1: protocol ip u32 match ip dsfield 1 0x3 action pedit ex munge ip dsfield set "${overwrite_dl_ect_1_val}" retain 0x3 pipe csum ip4h
        tc filter add dev "${dl_if}" parent 1: protocol ipv6 u32 match ip6 priority 1 0x3 action pedit ex munge ip6 traffic_class set "${overwrite_dl_ect_1_val}" retain 0x3
fi

if [[ -n "${overwrite_dl_ect_0_val}" ]]
then
        printf "\nSetting up filters to overwrite download ECT(1) values with decimal value: '${overwrite_dl_ect_0_val}'.\n"
        tc filter add dev "${dl_if}" parent 1: protocol ip u32 match ip dsfield 2 0x3 action pedit ex munge ip dsfield set "${overwrite_dl_ect_0_val}" retain 0x3 pipe csum ip4h
        tc filter add dev "${dl_if}" parent 1: protocol ipv6 u32 match ip6 priority 2 0x3 action pedit ex munge ip6 traffic_class set "${overwrite_dl_ect_0_val}" retain 0x3
fi

I can spot no issue, but I have not tested that either... :wink:

1 Like

No, but since I still use the ctinfo set bits, I can avoid re-evaluating every packet in firewall4. Since that bit was removed in this script, it would need a different method.

1 Like

So how to implement all this now?

Should I have the service script just call the nft calls for:

oifname wan ct state new,untracked goto store-dscp-in-conntrack

        chain store-dscp-in-conntrack {

                meta nfproto ipv4 ct mark set (@nh,8,8 & 252) >> 2
                meta nfproto ipv6 ct mark set (@nh,0,16 & 4032) >> 6
        }

And user sets nftables rules in LuCi for classifications, and we add:

# cat /usr/share/nftables.d/chain-pre/mangle_postrouting/00-ctinfo-skip-marked.nft 
oifname $wan_devices ct mark & 64 != 0 return

to give just one classification per connection? Will that line above prevent the LuCi rules from classifying what has already been classified?

No, because thereā€™s no extra bit being set in ct mark that indicates it is already set. You canā€™t use ct mark > 0 because that would not catch things set with CS0. Maybe you can duplicate the ct state check for new and untracked.

Ah, so disadvantage with LuCi setting DSCPs is that it has to work on EVERY packet?

There is surely a clever solution somehow.

Can we just have nftables apply a specific, unusual mark for ct state new, untracked and have LuCi match on that specific, unusual mark for all its markings?

I would say this is the right thing to do anyway, as otherwise applications can not change the DSCP of an already established connection. I would really like to see measurements of how much CPU cycles it saves to essentially cache the decision for the life time of a connection.

Thatā€™s what you used to do with ORā€™ing the 128 with the ct mark.

Worth matching on such a special mark (representing ct state new, untracked) or like @moeller0 suggests not worth it and problematic with LAN clients wanting to change?

I must admit I don't understand your file above to return. Could you elaborate on that a little?

All of this does seem to slightly undermine switching to LuCi for classification rather than have it done in cake-qos-simple. But I can't deny setting up DSCP rules in a graphical interface is just so much nicer than fiddling about with files.

Maybe the LuCi firewall could be expanded to add 'ct state' matching?

It says: for egress traffic on wan, if I previously set the ct mark conditional bit with 64 (you used to use 128), donā€™t bother running this packet through the rest of the mangle_postrouting chain.

Originally we checked a similar condition before running the classify chain (the inverse, if not set, do the classification):

Side-note: webRTC over UDP can use different DSCPs inside a single UDP 5-tuple (what cake considers to be a flow) e.g. if multiplexing multiple RTP sub flows over the same UDP connection. In that situation the whole store and recover from conntrack gets murky fast... (e.g. what if a low priority data stream with the LE-DSCP is multiplexed with normal priority video/audio (e.g. AF21) now the first of these encountered by the firewall sets the DSCP for the lifetime of the connection (with the caching approach) or the DSCP for the reverse direction changes a lot). Neither of these two sounds all that great, let's hope that most implementations use independent UDP/TCP/... connections for each RTP-flow...

What I wonder is whether adaptive applications exist, that e.g. start with say CS0 and later reclassify themselves LE, in which case (relative slow change of DSCP) tracking these changes in the conntrack database makes some sense...

WebRTC is used e.g. by browser based video/audio conferencing so is not a totally wild use-case...

1 Like

Please forgive my obtuseness here.

But isn't the intention with this to prevent the LuCi-based DSCP rules from kicking in (once the classification has already occured)? That's why I don't understand:

Or do you prevent running the packet through the rest of the mangle_postrouting chain to prevent something else from kicking in (saving to conntrack)? But if the latter, why not just use:

oifname wan ct state new,untracked goto store-dscp-in-conntrack

        chain store-dscp-in-conntrack {

                meta nfproto ipv4 ct mark set (@nh,8,8 & 252) >> 2
                meta nfproto ipv6 ct mark set (@nh,0,16 & 4032) >> 6
        }

If the dscp is already saved to conntrack, I donā€™t need to evaluate the firewall4 rules anymore.

Hereā€™s what it looks like for me right now with my current implementation (running nftables 1.0.8 and only using diffserv3):

        chain mangle_postrouting {
                type filter hook postrouting priority mangle; policy accept;
                oifname "wan" ct mark & 0x00000040 != 0x00000000 return
                oifname "wan" ip dscp cs0 ip daddr @bulk4 counter ip dscp set cs1 comment "!fw4: Bulk DSCP IPv4"
                oifname "wan" ip6 dscp cs0 ip6 daddr @bulk6 counter ip6 dscp set cs1 comment "!fw4: Bulk DSCP IPv6"
                oifname "wan" ip dscp cs0 ip daddr @voice4 counter ip dscp set cs6 comment "!fw4: Voice DSCP IPv4"
                oifname "wan" ip6 dscp cs0 ip6 daddr @voice6 counter ip6 dscp set cs6 comment "!fw4: Voice DSCP IPv6"
                meta nfproto ipv4 oifname "wan" udp dport 53 counter ip dscp set cs6 comment "!fw4: DNS DSCP"
                meta nfproto ipv6 oifname "wan" udp dport 53 counter ip6 dscp set cs6 comment "!fw4: DNS DSCP"
                meta nfproto ipv4 oifname "wan" tcp dport 53 counter ip dscp set cs6 comment "!fw4: DNS DSCP"
                meta nfproto ipv6 oifname "wan" tcp dport 53 counter ip6 dscp set cs6 comment "!fw4: DNS DSCP"
                meta nfproto ipv4 oifname "wan" udp dport 123 counter ip dscp set cs6 comment "!fw4: NTP DSCP"
                meta nfproto ipv6 oifname "wan" udp dport 123 counter ip6 dscp set cs6 comment "!fw4: NTP DSCP"
        }
}
table inet sqm_ctinfo_cake {
        chain sqm_ctinfo_postrouting {
                type filter hook postrouting priority mangle + 1; policy accept;
                oifname "wan" ct mark & 0x00000040 == 0x00000000 jump sqm_store_dscp
        }

        chain sqm_store_dscp {
                ct mark set ip dscp | 0x40 counter
                ct mark set ip6 dscp | 0x40 counter
        }
}
1 Like

Ah, I see.

Is there a good way to populate those sets? @bulk4, etc. I see how classifying based on:

  • destination protocol + port; and
  • destination IP address,

would cover almost everything.

I use dnsmasq 2.89 ipsets in LuCI to populate the sets. Mostly for the bulk sets. Not much in the voice except my WiFi calling. This is from /etc/config/dhcp:

config ipset
        list name 'bulk4'
        list name 'bulk6'
        list domain 'backblaze.com'
        list domain 'backblazeb2.com'
        list domain 'ms-acdc.office.com'
        list domain 'windowsupdate.com'
        list domain 'update.microsoft.com'
        list domain 'onedrive.com'
        list domain '1drv.ms'
        list domain '1drv.com'
        list domain 'sharepoint.com'

config ipset
        list name 'voice4'
        list name 'voice6'
        list domain 'epc.att.net'
1 Like

Sweet! Does that work from 22.03.05 onwards or later? On 22.03.05 I see:

root@OpenWrt-1:~# dnsmasq --version
Dnsmasq version 2.86  Copyright (c) 2000-2021 Simon Kelley

I'm still undecided about moving all the classification to the OpenWrt firewall. It's really hard to figure out how best to integrate this in the most harmonious and user-friendly way.

With future nftables we can do almost everything in nftables including even mirroring packets to the IFBs. In that situation it still seems to make sense to me to have cake-qos-simple generate/load its own separate .nft file using separate table that can be deleted.

I'm tempted just to expand the existing default .nft file and config:

gen_nft_rules()
{
        load_config

        printf "Generating new default nft.rules file for cake-qos-simple.\n"

        mkdir -p "${PREFIX}"

        cat > "${PREFIX}/nft.rules.tmp" <<-EOT
        # cake-qos-simple nftables rules

        # This nft script:
        # 1) classifies DSCPs (to supplement or replace those set by LAN clients); and
        # 2) stores DSCPs in conntracks for restoration using tc action ctinfo dscp 63 128

        table inet cake-qos-simple
        delete table inet cake-qos-simple

        ${nft_rules_vars}

        table inet cake-qos-simple {

                chain hook-postrouting {

                        type filter hook postrouting priority mangle + 1

                        #  classify any new, untracked connections on WAN
                        oifname ${ul_if} ct state new,untracked goto classify-and-store-dscp
                }

                chain classify-and-store-dscp {

                        jump classify-dscp
                        jump store-dscp-in-conntrack
                }

                chain classify-dscp {

                        meta l4proto . th dport vmap @rules_proto_dport

                        # IoT devices (uncomment to use)
                        ether saddr \$BULK_MACS goto dscp_set_bulk

                }

                map rules_proto_dport {
                        type inet_proto . inet_service : verdict
                        elements = \$PROTO_DPORT_DSCP_MAP
                }

                # designate packet for cake tin: bulk
                chain dscp_set_bulk {
                        ip dscp set cs1
                        ip6 dscp set cs1
                }

                # designate packet for cake tin: besteffort
                chain dscp_set_besteffort {
                        ip dscp set cs0
                        ip6 dscp set cs0
                }

                # designate packet for cake tin: video
                chain dscp_set_video {
                        ip dscp set cs2
                        ip6 dscp set cs2
                }

                # designate packet for cake tin: voice
                chain dscp_set_voice {
                        ip dscp set cs4
                        ip6 dscp set cs4
                }

                chain store-dscp-in-conntrack {

                        meta nfproto ipv4 ct mark set (@nh,8,8 & 252) >> 2
                        meta nfproto ipv6 ct mark set (@nh,0,16 & 4032) >> 6
                }
        }
        EOT

        if [[ -f "${PREFIX}/nft.rules" ]]
        then
                printf "Warning: nftables rules file ${PREFIX}/nft.rules already exists.\n"
                printf "Saving new nftables rules file as: '${PREFIX}/nft.rules.new'.\n"
                mv "${PREFIX}/nft.rules.tmp" "${PREFIX}/nft.rules.new"
        else
                printf "Saving new nftables rules file as: '${PREFIX}/nft.rules'.\n"
                mv "${PREFIX}/nft.rules.tmp" "${PREFIX}/nft.rules"
        fi
}

gen_config()
{
        printf "Generating new default config for cake-qos-simple.\n"

        mkdir -p "${PREFIX}"

        cat > "${PREFIX}/config.tmp" <<-EOT
        # cake-qos-simple configuration options

        ul_if=wan # upload interface
        dl_if=""  # download interface override (normally left blank and IFB derived for $ul_if ingress)

        cake_ul_rate_Mbps=20  # cake upload rate in Mbit/s
        cake_dl_rate_Mbps=20 # cake download rate in Mbit/s

        cake_ul_options="diffserv4 triple-isolate nat wash ack-filter noatm overhead 0"
        cake_dl_options="diffserv4 triple-isolate nat nowash ingress no-ack-filter noatm overhead 0"

        overwrite_ul_ect_0_val=0 # overwrite upload ECT(1) values with decimal value (e.g. 0, 1, 2, 3), else "" to disable
        overwrite_ul_ect_1_val=0 # overwrite upload ECT(0) values with decimal value (e.g. 0, 1, 2, 3), else "" to disable
        overwrite_dl_ect_0_val=0 # overwrite download ECT(1) values with decimal value (e.g. 0, 1, 2, 3), else "" to disable
        overwrite_dl_ect_1_val=0 # overwrite download ECT(1) values with decimal value (e.g. 0, 1, 2, 3), else "" to disable

        # the following nftables variables will be used to generate a default nft.rules file

        nft_rules_vars="# ### START OF CUSTOMISABLE NFT VARS SECTION (DO NOT DELETE THIS LINE) ###

        # correspondence between protocol, destination port and DSCPs
        # the format is:
        # 'protocol' . 'destination port' . dscp_set_bulk OR dscp_set_besteffort OR dscp_set_video OR dscp_set_voice
        define PROTO_DPORT_DSCP_MAP = {
                tcp . 53 : goto dscp_set_voice,  # DNS
                udp . 53 : goto dscp_set_voice,  # DNS
                tcp . 853 : goto dscp_set_voice, # DNS-over-TLS
                udp . 853 : goto dscp_set_voice, # DNS-over-TLS
                udp . 123 : goto dscp_set_voice  # NTP
        }

        # local MAC addresses to set to bulk (e.g. IoT devices)
        # replace MAC address below with comma separated entries
        define BULK_MACS = {
                02:00:00:00:00:00
        }

        # ### END OF CUSTOMISABLE NFT VARS SECTION (DO NOT DELETE THIS LINE) ###"
        EOT

        if [[ -f "${PREFIX}/config" ]]
        then
                printf "WARNING: config file ${PREFIX}/config already exists.\n"
                printf "Saving new config file as: '${PREFIX}/config.new'.\n"
                mv "${PREFIX}/config.tmp" "${PREFIX}/config.new"
        else
                printf "Saving new config file as: '${PREFIX}/config'.\n"
                mv "${PREFIX}/config.tmp" "${PREFIX}/config"
        fi
}

to try to incorporate destination IP address handling.

Itā€™ll be in 23.05. It worked in 21.02 with original ipsets, but 22.03 was hampered by dnsmasq 2.86 not supporting nftables.