NFtables and QoS in 2021

I think they should work together fine but the tagging that qosify does will be overwritten by NF tables so I recommend only doing tagging in NF tables that doesn't interfere

1 Like

Since qosify is a eBPF filter applied by tc, wouldn’t it come after nftables, at least on egress? Ingress might be an interesting question to understand which gets processed last.

Yes I was thinking ingress since a bunch of our tagging is on ingress at the moment. I'm really not sure how ingress works in terms of the order I'm sure there's a diagram we could understand better

1 Like

Def hotplug is needed, because after settings 19, still not loading on restart...

ask and ye shall receive... I added a 13-nfthotplug script and a few lines describing how to install it to the README

3 Likes

I stumbled upon this post from last year, and I wonder if there's a working example of such a map for DSCP to connmark to mimic --set-dscp?

Admittedly, I'm not running nftables at the moment and am in a network quiet period as our holiday guests arrive tonight. :slight_smile: Otherwise I'd give it ago. But I think Daniel's post linked above shows the flexibility of nftables and how you can approach a problem from many angles.

1 Like

Yes, we could do this. but we can't do it at ingress because conntrack hasn't seen the packet yet. But in prerouting or postrouting or such you could do it no problem.

I'll add some commented out skeleton for you to fill in and try out.

Ok @dave14305 I pushed some commented out code that's a skeleton for connmark/dscp set and restore. what you'll want to do is write in the marks you want to include in this, and uncomment the two lines that do the mark set and mark restore. It's up to you to debug any syntax errors etc, but if you get it working send a pull request and I'll incorporate it.

2 Likes

hi daniel I find it interesting your rules skeleton, I tried to reproduce to see with the usual iptables rules on nftables, can you help me to convert his rules you think

iptables -t mangle -A POSTROUTING -p udp --dst 192.168.2.160 -j DSCP --sport 30000:45000 --dport 3074 --set-dscp-class CS5 -m comment --comment "Dopam-IT_1987-UDP-1-CALL-OF-DUTY" 
 
iptables -t mangle -A POSTROUTING -p udp --src 192.168.2.160 -j DSCP --sport 3074 --dport 30000:45000 --set-dscp-class CS5 -m comment --comment "Dopam-IT_1987-UDP-2-CALL-OF-DUTY" 

thanks

To convert iptables rules you can get hints by using iptables-translate on linux:

dlakelan@tintin:~$ iptables-translate -t mangle -A POSTROUTING -p udp --dst 192.168.2.160 -j DSCP --sport 30000:45000 --dport 3074 --set-dscp-class CS5 -m comment --comment "Dopam-IT_1987-UDP-1-CALL-OF-DUTY" 
nft add rule ip mangle POSTROUTING ip daddr 192.168.2.160 udp sport 30000-45000 udp dport 3074 counter ip dscp set 0x28 comment \"Dopam-IT_1987-UDP-1-CALL-OF-DUTY\"

dlakelan@tintin:~$ iptables-translate -t mangle -A POSTROUTING -p udp --src 192.168.2.160 -j DSCP --sport 3074 --dport 30000:45000 --set-dscp-class CS5 -m comment --comment "Dopam-IT_1987-UDP-2-CALL-OF-DUTY" 
nft add rule ip mangle POSTROUTING ip saddr 192.168.2.160 udp sport 3074 udp dport 30000-45000 counter ip dscp set 0x28 comment \"Dopam-IT_1987-UDP-2-CALL-OF-DUTY\"

of course, you need to put it in a different place for my script but at least it gives you the syntax.

1 Like

I have some basics working to map the dscp into ct mark based on the map. I had to switch the interface from iifname to oifname since we are in POSTROUTING. So far it works, and i've implemented the act_ctinfo on ingress with tc for now.

# nft list table inet cttags
table inet cttags {
        map dscpct {
                typeof ip dscp : ct mark
                elements = { cs1 : 0x00000008,
                             ef : 0x0000002e,
                             cs6 : 0x00000030 }
        }

        map ctdscp {
                typeof ct mark : ip dscp
                elements = { 0x00000008 : cs1, 0x0000002e : ef, 0x00000030 : cs6 }
        }

        chain cttags {
                type filter hook postrouting priority filter; policy accept;
                ip saddr 192.168.1.118 ip dscp set cs6 counter packets 45476 bytes 18565321
                oifname "eth1" ct mark set (@nh,8,8 & 252) >> 2 map @dscpct counter packets 45614 bytes 18580579
                oifname "eth1" ct mark set ct mark << 26 | 0x02000000
        }
}
ipv4     2 tcp      6 7431 ESTABLISHED src=192.168.1.118 dst=139.59.210.197 sport=59045 dport=443 packets=298 bytes=80920 src=139.59.210.197 dst=68.49.203.129 sport=443 dport=59045 packets=289 bytes=79931 [ASSURED] mark=3254779904 zone=0 use=2
# tc -s filter show dev eth1 parent ffff:
filter protocol all pref 49152 matchall chain 0
filter protocol all pref 49152 matchall chain 0 handle 0x1
  not_in_hw (rule hit 120669)
        action order 1: ctinfo zone 0 pipe
         index 1 ref 1 bind 1 dscp 0xfc000000 0x02000000 installed 700 sec used 0 sec firstused 700 sec DSCP set 119371 error 0 CPMARK set 0
        Action statistics:
        Sent 409687705 bytes 302536 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

        action order 2: mirred (Egress Redirect to device ifb4eth1) stolen
        index 1 ref 1 bind 1 installed 700 sec used 0 sec firstused 700 sec
        Action statistics:
        Sent 411131438 bytes 303522 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

My test was just to set any traffic from my laptop to CS6 and it worked on both egress (via nftables marks) and ingress (by tc act_ctinfo).

Once this is working reliably, I'll see what prevents us from doing the same on ingress. I was getting errors with this line so I commented it out and went with act_ctinfo:

/etc/nftables.conf:218:47-65: Error: Binary operation (<<) is undefined for map expressions
            iifname $wan ct mark != 0x55 ip dscp set ct mark map @ctdscp
                                         ~~~~~~~     ^^^^^^^^^^^^^^^^^^^
1 Like

good work @dave14305 can you post the script here i will try debugging with you in search to nftables settings
thanks

If I understand correctly, on ingress conntrack hasn't seen the packet and therefore ct matches aren't available.

The best and most reliable way to shape incoming traffic from the WAN with prioritization is to route it out a LAN interface with a cake instance (or other priority aware qdisc) on that egress interface. If you're sending to multiple LANs it might be necessary to TBF the WAN ingress to control the overall bandwidth as well.

Note: act_ctinfo is also an option in OpenWrt but is specific to OpenWrt because as I understand it it breaks the network layering and Linux upstream will never accept it.

This is why I've been very happy using @ldir ctinfo_layercake script with act_ctinfo. We only have to set DSCP on egress, save the DSCP to the connmark, and let act_ctinfo restore it on ingress. So my initial hope is to replicate all the iptables egress chains with nftables.

If I put CAKE on the LAN, then I have to manage the tracking rules twice (at least in the iptables world), but it is an option for me since my RPi only has one LAN port.

I understand that act_ctinfo is already upstream, but the set-dscp iptables extension was rejected.

1 Like

Oh NICE!

As for managing tracking rules twice, I don't think so in nftables. one single place in postrouting would be enough. It's easy to move things from mark to dscp in nftables also as you showed! so a special save dscpmark extension is unnecessary.

So it sounds like we're close to a general purpose tagging Utopia.

3 Likes

This is the cttags table I have replaced from the original conf file. My only disappoinment is in waiting for dnsmasq 2.87 to be published (Simon Kelley is MIA again, relocating to a new country apparently). So I've added sets and rules to eventually be populated by dnsmasq, but they do nothing today with dnsmasq 2.86.

And to be clear, I'm running snapshot with nftables 1.0.0, so if something doesn't work for you, it's probably due to the version differences.

table inet cttags...
table inet cttags {

	map dscpct {
		typeof ip dscp : ct mark
			elements = {
				cs0 : 0x00,
				cs1 : 0x08,
				cs2 : 0x10,
				cs3 : 0x18,
				cs4 : 0x20,
				cs5 : 0x28,
				cs6 : 0x30,
				cs7 : 0x38,
				be : 0x00,
				af11 : 0x0a,
				af12 : 0x0c,
				af13 : 0x0e,
				af21 : 0x12,
				af22 : 0x14,
				af23 : 0x16,
				af31 : 0x1a,
				af32 : 0x1c,
				af33 : 0x1e,
				af41 : 0x22,
				af42 : 0x24,
				af43 : 0x26,
				ef : 0x2e
			}
	}

	set bulk4 {
		type ipv4_addr
		counter
		comment "Bulk IPv4"
	}

	set bulk6 {
		type ipv6_addr
		counter
		comment "Bulk IPv6"
	}

	set besteffort4 {
		type ipv4_addr
		counter
		comment "BE IPv4"
	}

	set besteffort6 {
		type ipv6_addr
		counter
		comment "BE IPv6"
	}

	set video4 {
		type ipv4_addr
		counter
		comment "Video IPv4"
	}

	set video6 {
		type ipv6_addr
		counter
		comment "Video IPv6"
	}

	set voice4 {
		type ipv4_addr
		counter
		comment "Voice IPv4"
	}

	set voice6 {
		type ipv6_addr
		counter
		comment "Voice IPv6"
	}

	define facetime_ports = { 3478-3497, 16384-16387, 16393-16402 }
	define zoom_ports = { 8801-8810 }

	chain cttags {
		type filter hook postrouting priority 0; policy accept;

		# match sets (populated externally by dnsmasq, et al)
		ip daddr @bulk4 ip dscp set cs1 comment "bulk4 to CS1"
		ip6 daddr @bulk6 ip6 dscp set cs1 comment "bulk6 to CS1"
		ip daddr @besteffort4 ip dscp set cs0 comment "besteffort4 to CS0"
		ip6 daddr @besteffort6 ip6 dscp set cs0 comment "besteffort6 to CS0"
		ip daddr @video4 ip dscp set af41 comment "video4 to AF41"
		ip6 daddr @video6 ip6 dscp set af41 comment "video6 to AF41"
		ip daddr @voice4 ip dscp set cs6 comment "voice4 to CS6"
		ip6 daddr @voice6 ip6 dscp set cs6 comment "voice6 to CS6"

		# individual IP or port rules
		ip daddr 17.0.0.0/8 tcp dport { 993, 5223 } ip dscp set cs0 comment "Apple Mail and APNS CS0"
		udp sport $facetime_ports udp dport $facetime_ports ip dscp set af41 comment "Facetime AF41"
		udp dport $zoom_ports ip dscp set cs3 comment "Zoom CS3"
		udp sport 4500 udp dport 4500 ip dscp set cs6 comment "WiFi Calling CS6"

		# Convert the current DSCP value to an equivalent conntrack mark using the map
		# Then save it in the high bits of the mark for restoration with act_ctinfo
		oifname $wan ct mark set ip dscp map @dscpct counter
		oifname $wan ct mark set ct mark lshift 26 or 0x2000000
	}
}
2 Likes

ok i will try now @dave14305 i keep informed thanks for all

my result for the moment

my final nft list ruleset end

root@OpenWrt:/etc/hotplug.d/net# /etc/init.d/nftables restart
restart
root@OpenWrt:/etc/hotplug.d/net#

}
table inet cttags {
        map dscpct {
                typeof ip dscp : ct mark
                elements = { cs0 : 0x00000000,
                             cs1 : 0x00000008,
                             af11 : 0x0000000a,
                             af12 : 0x0000000c,
                             af13 : 0x0000000e,
                             cs2 : 0x00000010,
                             af21 : 0x00000012,
                             af22 : 0x00000014,
                             af23 : 0x00000016,
                             cs3 : 0x00000018,
                             af31 : 0x0000001a,
                             af32 : 0x0000001c,
                             af33 : 0x0000001e,
                             cs4 : 0x00000020,
                             af41 : 0x00000022,
                             af42 : 0x00000024,
                             af43 : 0x00000026,
                             cs5 : 0x00000028,
                             ef : 0x0000002e,
                             cs6 : 0x00000030,
                             cs7 : 0x00000038 }
        }

        set bulk4 {
                type ipv4_addr
                counter
                comment "Bulk IPv4"
        }

        set bulk6 {
                type ipv6_addr
                counter
                comment "Bulk IPv6"
        }

        set besteffort4 {
                type ipv4_addr
                counter
                comment "BE IPv4"
        }

        set besteffort6 {
                type ipv6_addr
                counter
                comment "BE IPv6"
        }

        set video4 {
                type ipv4_addr
                counter
                comment "Video IPv4"
        }

        set video6 {
                type ipv6_addr
                counter
                comment "Video IPv6"
        }

        set voice4 {
                type ipv4_addr
                counter
                comment "Voice IPv4"
        }

        set voice6 {
                type ipv6_addr
                counter
                comment "Voice IPv6"
        }

        chain cttags {
                type filter hook postrouting priority filter; policy accept;
                ip daddr @bulk4 ip dscp set cs1 comment "bulk4 to CS1"
                ip6 daddr @bulk6 ip6 dscp set cs1 comment "bulk6 to CS1"
                ip daddr @besteffort4 ip dscp set cs0 comment "besteffort4 to CS0"
                ip6 daddr @besteffort6 ip6 dscp set cs0 comment "besteffort6 to CS0"
                ip daddr @video4 ip dscp set af41 comment "video4 to AF41"
                ip6 daddr @video6 ip6 dscp set af41 comment "video6 to AF41"
                ip daddr @voice4 ip dscp set cs6 comment "voice4 to CS6"
                ip6 daddr @voice6 ip6 dscp set cs6 comment "voice6 to CS6"
                ip daddr 17.0.0.0/8 tcp dport { 993, 5223 } ip dscp set cs0 comment "Apple Mail a                                                                              nd APNS CS0"
                udp sport { 3478-3497, 16384-16387, 16393-16402 } udp dport { 3478-3497, 16384-16                                                                              387, 16393-16402 } ip dscp set af41 comment "Facetime AF41"
                udp dport 8801-8810 ip dscp set cs3 comment "Zoom CS3"
                udp sport 4500 udp dport 4500 ip dscp set cs6 comment "WiFi Calling CS6"
                udp sport { 3074, 3659, 10000-20000, 30000-45000 } udp dport { 3074, 3659, 10000-                                                                              20000, 30000-45000 } ip dscp set cs5 comment "ps5 CS5"
                oifname "wan" ct mark set (@nh,8,8 & 252) >> 2 map @dscpct counter packets 534 by                                                                              tes 101479
                oifname "wan" ct mark set ct mark << 26 | 0x02000000
        }
}
root@OpenWrt:~#

I already found my mistakes in writing nft rules. All my port rules are missing the ip protocol or ip6 nexthdr expressions, so they don't work as expected.

Below is my updated table, but I've adapted it more to mimic ldir's script. So it's different than my previous post, but you can see the updates to the port rules in cttags.

Summary
table inet cttags {

	map dscpct {
		typeof ip dscp : ct mark
			elements = {
				cs0 : 0x00,
				cs1 : 0x08,
				cs2 : 0x10,
				cs3 : 0x18,
				cs4 : 0x20,
				cs5 : 0x28,
				cs6 : 0x30,
				cs7 : 0x38,
				be : 0x00,
				af11 : 0x0a,
				af12 : 0x0c,
				af13 : 0x0e,
				af21 : 0x12,
				af22 : 0x14,
				af23 : 0x16,
				af31 : 0x1a,
				af32 : 0x1c,
				af33 : 0x1e,
				af41 : 0x22,
				af42 : 0x24,
				af43 : 0x26,
				ef : 0x2e
			}
	}

	set bulk4 {
		type ipv4_addr
		counter
		comment "Bulk IPv4"
	}

	set bulk6 {
		type ipv6_addr
		counter
		comment "Bulk IPv6"
	}

	set besteffort4 {
		type ipv4_addr
		counter
		comment "BE IPv4"
	}

	set besteffort6 {
		type ipv6_addr
		counter
		comment "BE IPv6"
	}

	set video4 {
		type ipv4_addr
		counter
		comment "Video IPv4"
	}

	set video6 {
		type ipv6_addr
		counter
		comment "Video IPv6"
	}

	set voice4 {
		type ipv4_addr
		counter
		comment "Voice IPv4"
	}

	set voice6 {
		type ipv6_addr
		counter
		comment "Voice IPv6"
	}

	define facetime_ports = { 3478-3497, 16384-16387, 16393-16402 }
	define zoom_ports = { 8801-8810 }

	chain in_dscp {
		type filter hook postrouting priority 0; policy accept;

		oifname $wan ct mark and 0x1c00000 == 0 jump qos_sqm
	}

	chain qos_sqm {
		ct mark and 0x2000000 == 0 counter goto cttags
	}

	chain qos_sqm_remap {
		# Add rules to modify non-zero DSCP incoming from LAN

		# Convert the current DSCP value to an equivalent conntrack mark using the map
		# Then save it in the high bits of the mark for restoration with act_ctinfo
		ct mark set ip dscp map @dscpct counter
		ct mark set ct mark lshift 26 or 0x3000000
	}

	chain cttags {
		# meta nftrace set 1
		ip dscp != 0 counter goto qos_sqm_remap
		ip6 dscp != 0 counter goto qos_sqm_remap

		# match sets (populated externally by dnsmasq, et al)
		ip daddr @bulk4 ip dscp set cs1 comment "bulk4 to CS1"
		ip6 daddr @bulk6 ip6 dscp set cs1 comment "bulk6 to CS1"
		ip daddr @besteffort4 ct mark set 0x1000000 comment "besteffort4 to CS0"
		ip6 daddr @besteffort6 ct mark set 0x1000000 comment "besteffort6 to CS0"
		ip daddr @video4 ip dscp set af41 comment "video4 to AF41"
		ip6 daddr @video6 ip6 dscp set af41 comment "video6 to AF41"
		ip daddr @voice4 ip dscp set cs6 comment "voice4 to CS6"
		ip6 daddr @voice6 ip6 dscp set cs6 comment "voice6 to CS6"

		# individual IP or port rules
		ip daddr 17.0.0.0/8 tcp dport { 993, 5223 } ip dscp set cs0 counter comment "Apple Mail and APNS CS0"
		ip protocol udp udp sport $facetime_ports udp dport $facetime_ports ip dscp set af41 counter comment "Facetime AF41"
		ip6 nexthdr udp udp sport $facetime_ports udp dport $facetime_ports ip6 dscp set af41 counter comment "Facetime AF41"
		ip protocol udp udp dport $zoom_ports ip dscp set cs3 counter comment "Zoom CS3"
		ip6 nexthdr udp udp dport $zoom_ports ip6 dscp set cs3 counter comment "Zoom CS3"
		ip protocol udp udp sport 4500 udp dport 4500 ip dscp set cs6 counter comment "WiFi Calling CS6"
		ip6 nexthdr udp udp sport 4500 udp dport 4500 ip6 dscp set cs6 counter comment "WiFi Calling CS6"
		ip protocol tcp tcp dport { 6020-6030 } ip dscp set cs1 log counter comment "Comcast Speedtest CS1"
		ip6 nexthdr tcp tcp dport { 6020-6030 } ip6 dscp set cs1 log counter comment "Comcast Speedtest CS1"

		# Convert the current DSCP value to an equivalent conntrack mark using the map
		# Then save it in the high bits of the mark for restoration with act_ctinfo
		ct mark set ip dscp map @dscpct counter
		ct mark set ip6 dscp map @dscpct counter
		ct mark set ct mark lshift 26 or 0x2000000
	}
}

But I wouldn't follow my examples yet. I'm confusing myself as I learn the syntax. I'm not sure what is right at the moment.

3 Likes

i has modified a little bit but and WoWWWW my game is fantastic
my tc -qdisc

root@OpenWrt:~# tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1518 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
 Sent 2534615536 bytes 3515039 pkt (dropped 0, overlimits 0 requeues 11)
 backlog 0b 0p requeues 11
  maxpacket 10556 drop_overlimit 0 new_flow_count 71 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev lan1 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev lan2 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev lan3 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev lan4 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 8017: dev wan root refcnt 2 bandwidth 16Mbit diffserv4 triple-isolate nonat nowash no-ack-filter split-gso rtt 100ms noatm overhead 44
 Sent 50935585 bytes 111516 pkt (dropped 926, overlimits 67337 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 166482b of 4Mb
 capacity estimate: 16Mbit
 min/max network layer size:           28 /    1490
 min/max overhead-adjusted size:       72 /    1534
 average network hdr offset:           14

                   Bulk  Best Effort        Video        Voice
  thresh          1Mbit       16Mbit        8Mbit        4Mbit
  target         18.2ms          5ms          5ms          5ms
  interval        113ms        100ms        100ms        100ms
  pk_delay          0us       2.09ms          0us         48us
  av_delay          0us        210us          0us          3us
  sp_delay          0us          2us          0us          2us
  backlog            0b           0b           0b           0b
  pkts                0        71379            0        41063
  bytes               0     44424884            0      7901956
  way_inds            0           15            0           20
  way_miss            0          396            0          185
  way_cols            0            0            0            0
  drops               0          925            0            1
  marks               0            0            0            0
  ack_drop            0            0            0            0
  sp_flows            0            1            0            1
  bk_flows            0            1            0            0
  un_flows            0            0            0            0
  max_len             0         6016            0         1306
  quantum           300          488          300          300

qdisc cake 801a: dev br-lan root refcnt 2 bandwidth 56Mbit diffserv4 triple-isolate nonat nowash no-ack-filter split-gso rtt 100ms noatm overhead 44
 Sent 118175959 bytes 113944 pkt (dropped 169, overlimits 141151 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 231920b of 4Mb
 capacity estimate: 56Mbit
 min/max network layer size:           28 /    1500
 min/max overhead-adjusted size:       72 /    1544
 average network hdr offset:           14

                   Bulk  Best Effort        Video        Voice
  thresh       3500Kbit       56Mbit       28Mbit       14Mbit
  target         5.19ms          5ms          5ms          5ms
  interval        100ms        100ms        100ms        100ms
  pk_delay          0us        732us         43us        426us
  av_delay          0us        573us          3us         20us
  sp_delay          0us          2us          2us          6us
  backlog            0b           0b           0b           0b
  pkts                0       101301          163        12649
  bytes               0    111813277        44198      6564180
  way_inds            0           11            0            0
  way_miss            0          336            1           52
  way_cols            0            0            0            0
  drops               0          169            0            0
  marks               0            0            0            0
  ack_drop            0            0            0            0
  sp_flows            0            1            1            1
  bk_flows            0            1            0            0
  un_flows            0            0            0            0
  max_len             0         7814         2052         1314
  quantum           300         1514          854          427

root@OpenWrt:~#

define ps5_ports = { 3074,3659,10000-20000,30000-45000 }
	
ip protocol udp udp sport $ps5_ports udp dport $ps5_ports ip dscp set cs5 counter comment "ps5 cs5"
		ip6 nexthdr udp udp sport $ps5_ports udp dport $ps5_ports ip6 dscp set cs5 counter comment "ps5 CS5"

you should be able to just match on udp like:

udp port 123

and in inet family tables that will match both ipv6 and ipv4 udp

ip protocol udp udp port 123

is not needed, and will match just ipv4 in inet type tables.

1 Like