NFtables and QoS in 2021

That is exactly what I am looking for, apart from the fact that this script also marks the incoming packets (something that the other scripts lack)

something I didn't understand was how to use "ctinfo_5layercake.qos", I know you have to use patched sqm files to use diffserv5, but I don't see a description of where to find and/or download them

You will need to patch the sch_cake.c file in the kernel and the build your own blob.
However, think hard what you want to use that additional priority tier for. It is not that 5 is automatically better than 4... As a rule of thumb try to up-prioritize as little as possible (as I keep repeating, trying to prioritize all packets ends up prioritizing none; in a sense prioritization is a zero sum game).

+1; be judicious and sparse when prioritizing.

Not true, qosify allows to mark incoming packets, as do elan's and dlakelan's scripts if configured appropriately. ldir's approach is elegant in that it works automatically for IPv4...

1 Like

Hey @moeller0 how about just setting the tins without DSCP marks using the tc overrides?

So the tins have indices starting with 1, right?

No I think 0, but zero is the best effort tin, which for some reason needs to be zero... (probably because in besteffort mode there is only a single tin and the first one needs to be zero due to indexing)
I have not tried that and wonder how that works with skb->hash, but maybe both can be manipulated...

The nice thing about using DSCPs to steer packets into tins is that there are more ways and places to set DSCPs than skb priority (endpoints can set the desired DSCP directly, while skb->priority manipulation needs to run on the router).

1 Like

Remember that nftables can replace some tc filter functions also.

https://wiki.nftables.org/wiki-nftables/index.php/Classification_to_tc_structure_example

See here for @ldir’s diffserv5 patch:

But thats it because isn't a generic solution for all of us, and sometimes we cannot have the time or skill to support it, English isn't my native language, but remembering the troubles I has when I wrote my nftables rules with verdict maps, here goes a (big) snippet to show the use of verdict maps to set DSCP:

caveats you need to edit files, so not accesible by gui

This file contains the nftables postrouting chain so really you need to add meta oifname... to your original chain postrouting or include the file

file /etc/nftables.d/postrouting.nft

include "/etc/nftables.d/sets_postrouting.nft"

chain postrouting {
    type filter hook postrouting priority 0 ; policy accept;
    meta oifname vmap @ifname_QoS
}

You need to modify the interfaces to match yours

file /etc/nftables.d/sets_postrouting.nft

include "/etc/nftables.d/QoS.nft"

map ifname_QoS {
    type ifname : verdict ;
    elements = {
        eth1 : jump select_priority,
        br-wifi : jump select_priority,
        eth0 : jump select_priority
    }
}

the priority_bulk, priority_voice priority_video are similar the priority_besteffort only changing cs0 by cs6 (voice) cs3 (vide0) cs1 (bulk)

with "meta priority set" send the packets to the desired qdisc/class and in my wan use 4 qdiscs(classes), so you delete them minus one and point to your qdisc

file /etc/nftables.d/QoS.nft

include "/etc/nftables.d/sets_ports_QoS.nft"

# https://wiki.nftables.org/wiki-nftables/index.php/Setting_packet_metainformation

chain priority_best-effort {
    ip dscp set cs0
    ip6 dscp set cs0
    # https://wiki.nftables.org/wiki-nftables/index.php/Classification_to_tc_structure_example
    # priority set 1:0x2 ; sends packet to htb 1:2 ; big packets upload
    meta length > 192 meta priority set "1:0x2" accept
    # priority set 1:0x6 ; sends packet to htb 1:6 ; control download tcp packets
    meta l4proto tcp meta priority set "1:0x6" accept
    # priority set 1:0x8 ; sends packet to htb 1:8 ; control download udp packets
    meta priority set "1:0x8" accept
    # jump log_qos_ipv4
}

chain select_priority {
    # dscp 27 is set by me in one app outside the router so I know than the traffic is unimportant so set to bulk
    ip dscp 27 jump priority_bulk
    ip6 dscp 27 jump priority_bulk
    # here select the priority by port 
    # if need ips you need modify the following two statements and the verdict map port_priority or add before new statements and a new verdict
    # https://wiki.nftables.org/wiki-nftables/index.php/Verdict_Maps_(vmaps)
    meta l4proto . th dport vmap @port_priority
    meta l4proto . th sport vmap @port_priority
    ip protocol icmp jump priority_best-effort
    meta l4proto ipv6-icmp jump priority_best-effort
    meta l4proto ospf jump priority_voice
    # Any packet not processed above will go to bulk
    jump priority_bulk
}

file /etc/nftables.d/sets_ports_QoS.nft

# Don't set priority bulk here, 
# all packets not listed here will be marked bulk by default

map port_priority {
    type inet_proto . inet_service : verdict ;
    flags interval;
    elements = {
        tcp . 22 : jump priority_ssh,
        tcp . 53 : jump priority_voice,
        udp . 53 : jump priority_voice,
        udp . 67 : jump priority_voice,
        udp . 68 : jump priority_voice,
        tcp . 80 : jump priority_best-effort, # Hypertext Transfer Protocol (HTTP)
        udp . 80 : jump priority_best-effort, # HTTP/3 uses QUIC
        tcp . 19305-19308 : jump priority_video, # Google Talk, DUO, Hangouts
        tcp . 25565 : jump priority_video, # Gaming Minecraft
        udp . 25565 : jump priority_video, # Gaming Minecraft
        udp . 51871 : jump priority_ssh,
        udp . 51872 : jump priority_ssh
    }
}

the workflow for each packet:

  • select a interface
  • if a interface is listed call select_priority
  • there examine each packet for one attribute and call priority_(best-effort, bulk, video, voice)
    if check dport or sport call the map port_priority to call priority_(best-effort, bulk, video, voice) by port
  • in priority_(best-effort, bulk, video, voice) set the dscp to ip and ip6, after that check the length of the packet and enqueue it in a class htb but you can skip the length check and and enqueue in one qdisc

Don't have any performance statistics but works for me in a 250/100 connection in a nanopi r4s, but maybe serve as a idea to use named vmaps, because the above snippet is a solution to set the QoS of my network not a generic one.

Gabrielo

2 Likes

@gabrielo thank you very much for taking the time to write this out - it is very helpful indeed for everyone reading this thread.

What's the significance of this being done pre or post routing? Does that alter the time of application of the DSCP marks? A challenge I face is getting the DSCP marks applied such that ingress packets on 'br-lan' and 'br-guest' (upload data) that is combined/mirrored to an IFB will get the DSCP marks ready for CAKE to see them. If the latter is not possible I am thinking I need to use the tc filter tin override feature in CAKE or determine another solution.

maybe I'm confused by your terms ingress/upload data so will try to explain, again english is not my language:

in my nanopi reversed the ethernet cards eth0 is my internal network and eth1 is my wan (I did for legacy reasons) really doesnt matter

in the snippet above you have the following:

eth1 : jump select_priority,

there are all packets internal network -> internet

and with the fragment line:

meta priority set "1:0x2" accept

nftables puts in the htb class 1:2 but can use a qdisc if that is what do you have at the eth1(wan)

my htb class contains a cake qdisc to use the dscp in the packet

so the fragments above will set the dscp in all packets:

my lan            internet   wan
192.168.0.0/16 -> 0.0.0.0/0  eth1

so for traffic out via wan you don't need any IFB to set dscp and enqueue to a qdisc wih the snippet, better copy and modify the bulk video and voice missing functions and try the code

about (pre) postrouting time, I'm really don't know, I think than the dscp marks only functions in the postrouting chain, I didn't try in the prerouting because I won't have any need to do, the postrouting chain satisfy my requeriments.

1 Like

Excited to try nftables DSCP marking, I tried putting this file into /etc/nftables.d/20-try.nft:

chain tagin {
    	  ## mangle priority, retag anything coming from LAN, you
    	  ## might want to do the same for anything coming from WAN
    	  ## (assumed to be eth1)
	  
    	  type filter hook ingress device br-lan priority -149; policy accept;

	  ip dscp set cs3 ## convert all to cs3 first, this is my base DSCP tag rather than cs0
	  ip6 dscp set cs3

	  # tag ntp packets very high priority
	  ip protocol udp udp sport ntp ip dscp set cs6
	  ip6 nexthdr udp udp sport ntp ip6 dscp set cs6


	  ## icmp/icmpv6 gets high priority but you might not want
	  ## this!  it does let you find out what the round trip time
	  ## for high priority packets is by just using ping though
	  ip protocol icmp ip dscp set cs5
	  ip6 nexthdr icmpv6 ip6 dscp set cs5 

	  ## game traffic on ip and ipv6
	  udp dport {7000-9000, 27000-27200} ip dscp set cs5
	  udp sport {7000-9000, 27000-27200} ip dscp set cs5

	  ip6 nexthdr udp udp dport {7000-9000, 27000-27200} ip6 dscp set cs5
	  ip6 nexthdr udp udp sport {7000-9000, 27000-27200} ip6 dscp set cs5

	  # I have a custom shaper with different classes 1:10, 1:20
	  # are realtime, 1:30 is high priority nonrealtime, 1:40 is
	  # normal, 1:50 is nfs fileserver bulk, 1:60 is very low
	  # priority, if you use cake, you can remove this whole thing
	  # and just use layer cake, it will use the DSCP on its own
	  

	  meta priority set 1:40 ## default

	  ip dscp {ef,cs6} meta priority set 1:10
	  ip dscp {cs5} meta priority set 1:20
	  ip dscp {af41, af42, af43} meta priority set 1:30
	  ip dscp {cs2} meta priority set 1:50
	  ip dscp {cs1} meta priority set 1:60

	  ip6 dscp {ef,cs6} meta priority set 1:10
	  ip6 dscp {cs5} meta priority set 1:20
	  ip6 dscp {af41, af42, af43} meta priority set 1:30
	  ip6 dscp {cs2} meta priority set 1:50
	  ip6 dscp {cs1} meta priority set 1:60

    }

This is taken from: QoS and nftables ... some findings to share

n.b. I removed the outer table definition and just retained the 'chain' definition - is that correct? Because otherwise I got an error about unexpected table. Also I set the line:

    	  type filter hook ingress device br-lan priority -149; policy accept;

But this doesn't seem to do anything even though the firewall restarts without any errors. Namely if I check the 'tos' using tcpdump I see no changes to the packets.

Am I perhaps missing certain packages for this ingress prerouting to work?

What am I missing? @dlakelan?

1 Like

The ingress hook chain needs to be in a table of type netdev instead of inet. You will need to configure a “ruleset-append” include in /etc/config/firewall to add a table to the existing ruleset.

1 Like

Turns out this isn’t exactly true. inet supports ingress since kernel 5.10, but not sure it behaves the same as netdev.

http://git.netfilter.org/nftables/commit/?id=701e5dee5f53a131cd46d761f40db4c74ce3d33c

1 Like

Do you know what packages I need? Do I need kmod-nft-netdev?

Seems reasonable. Won’t hurt to install it anyway.

Any clue then why just creating the file as I did above didn't work? I also tried installing that module but still no effect.

I wonder if I am missing another package needed for this special ingress hook. Or maybe reboot after installing that package?

I will also try adding in the table in the way you suggested to see if that works.

if you are using cake qdiscs, when make tests be sure which the cake qdiscs are created with the nowash option to preserve the dscp

because the wash option clears the dscp field from all packets after put the packet in the designated tin, so tcpdump is unable to see any dscp marks

1 Like

No effect on what? Be specific. Post output. Add some counters to your rules and see if they are being hit.

OK I have created the file 20-try.nft in /etc/nftables.d/ with content:

chain tagin {
          ## mangle priority, retag anything coming from LAN, you
          ## might want to do the same for anything coming from WAN
          ## (assumed to be eth1)

          type filter hook ingress device br-lan priority -149; policy accept;

          ip dscp set cs3 ## convert all to cs3 first, this is my base DSCP tag rather than cs0
          ip6 dscp set cs3

          # tag ntp packets very high priority
          ip protocol udp udp sport ntp ip dscp set cs6
          ip6 nexthdr udp udp sport ntp ip6 dscp set cs6


          ## icmp/icmpv6 gets high priority but you might not want
          ## this!  it does let you find out what the round trip time
          ## for high priority packets is by just using ping though
          ip protocol icmp ip dscp set cs5
          ip6 nexthdr icmpv6 ip6 dscp set cs5

          ## game traffic on ip and ipv6
          udp dport {7000-9000, 27000-27200} ip dscp set cs5
          udp sport {7000-9000, 27000-27200} ip dscp set cs5

          ip6 nexthdr udp udp dport {7000-9000, 27000-27200} ip6 dscp set cs5
          ip6 nexthdr udp udp sport {7000-9000, 27000-27200} ip6 dscp set cs5

          # I have a custom shaper with different classes 1:10, 1:20
          # are realtime, 1:30 is high priority nonrealtime, 1:40 is
          # normal, 1:50 is nfs fileserver bulk, 1:60 is very low
          # priority, if you use cake, you can remove this whole thing
          # and just use layer cake, it will use the DSCP on its own


          meta priority set 1:40 ## default

          ip dscp {ef,cs6} meta priority set 1:10
          ip dscp {cs5} meta priority set 1:20
          ip dscp {af41, af42, af43} meta priority set 1:30
          ip dscp {cs2} meta priority set 1:50
          ip dscp {cs1} meta priority set 1:60

          ip6 dscp {ef,cs6} meta priority set 1:10
          ip6 dscp {cs5} meta priority set 1:20
          ip6 dscp {af41, af42, af43} meta priority set 1:30
          ip6 dscp {cs2} meta priority set 1:50
          ip6 dscp {cs1} meta priority set 1:60

    }

then:

/etc/init.d/firewall restart

then:

tcpdump -i br-lan -vv

and this gives output:

tcpdump: listening on br-lan, link-type EN10MB (Ethernet), capture size 262144 bytes
16:41:04.968162 IP (tos 0x0, ttl 128, id 40940, offset 0, flags [none], proto ICMP (1), length 60)
    XXX.lan > one.one.one.one: ICMP echo request, id 1, seq 83, length 40
16:41:05.011274 IP (tos 0x0, ttl 58, id 1754, offset 0, flags [none], proto ICMP (1), length 60)
    one.one.one.one > XXX.lan: ICMP echo reply, id 1, seq 83, length 40

So as you can see the ingress tos value is not getting set as per:

          ip protocol icmp ip dscp set cs5
          ip6 nexthdr icmpv6 ip6 dscp set cs5

and in the context of the ingress hook:


          type filter hook ingress device br-lan priority -149; policy accept;

What am I missing?

Try checking the tcpdump on your ifb interface, or check the cake stats on that interface. I don’t know if tcpdump will capture br-lan before or after the ingress hook in an inet table.

Did you add any counters to the marking rules? Just add the keyword counter at the end of some of your statements. Then check them with nft list chain inet fw4 tagin

1 Like

@Lynx

very good work i has try to make a special script on elan script in add VPN interface

like this

## Go to "Network -> Interfaces" and write the name of your "WAN" interface.
WAN="wan"
VPN="tun0" ### add by me 
## Add veth devices and OpenVpn
ip link set veth0 up
ip link set veth1 up
ip link set veth1 promisc on
ip link set veth1 master br-lan
ip rule del priority 100 > /dev/null 2>&1
ip route del table 100 > /dev/null 2>&1
ip route add default dev veth0 table 100
ip rule add iif $WAN priority 100 table 100
ip -6 rule del priority 100 > /dev/null 2>&1
ip -6 route del table 100 > /dev/null 2>&1
ip -6 route add default dev veth0 table 100
ip -6 rule add iif $WAN priority 100 table 100
ip rule add iif $VPN priority 100 table 100 ##add by me
ip -6 rule del priority 100 > /dev/null 2>&1
ip -6 route del table 100 > /dev/null 2>&1
ip -6 route add default dev veth0 table 100
ip -6 rule add iif $VPN priority 100 table 100
## Delete the old qdiscs created by the script
tc qdisc del dev veth0 root > /dev/null 2>&1
tc qdisc del dev $WAN root > /dev/null 2>&1
tc qdisc del dev $VPN root > /dev/null 2>&1 ##ad by me all before part
## Inbound / Ingress
if [ "$BANDWIDTH_DOWN" != "" ]; then
    tc qdisc add dev $VPN root cake $BANDWIDTH_DOWN_CAKE $AUTORATE_INGRESS_CAKE $PRIORITY_QUEUE_INGRESS $HOST_ISOLATION_INGRESS $NAT_INGRESS $WASH_INGRESS $INGRESS_MODE $RTT $COMMON_LINK_PRESETS $ETHER_VLAN_KEYWORD $LINK_COMPENSATION $OVERHEAD $MPU $EXTRA_PARAMETERS_INGRESS
fi
## Delete the old qdiscs created by the script
    tc qdisc del dev veth0 root > /dev/null 2>&1
    tc qdisc del dev $WAN root > /dev/null 2>&1
    tc qdisc del dev $VPN root > /dev/null 2>&1 ## add by me 

tell me if you think is good idea

for the moment with cyberghost vpn i has no bufferbloat in upload but always in download i has just experimented the vpn with this script :wink:

my bufferbloat witouth vpn https://www.waveform.com/tools/bufferbloat?test-id=03760703-49e5-4e03-9b21-c6fec47c6550

with vpn but not setting add https://www.waveform.com/tools/bufferbloat?test-id=e7956a3d-f2c6-4b25-9998-39bf5be562d1

with vpn and my new settings add :stuck_out_tongue: https://www.waveform.com/tools/bufferbloat?test-id=6c67837a-54ff-4946-a186-bd444fbeb315