NFtables and QoS in 2021

Windows but I think it's not doing anything.

OK let me try that. Do you know off hand an easy way to try that?

you created a chain in table inet fw4 right? instead of that create a new table, add chain with rules in there, something like:

nft add table netdev tagging
nft add chain netdev tagging tagin ...

Can I put this inside /etc/nftables.d/?

I agree, but it explicitly says it will not do anything:

-v TOS         Type Of Service (IPv4-only. This setting has been deprecated
                   and has no effect on the type of service field in the IP
                   Header).

so windows stopped allowing users to fudge with ping DSCPs some years ago, however you can still perform the trick I posted earlier (in this thread?) to make windows still apply DSCPs to ICMP packets...

1 Like

Argh - any tips for how to enter this? Seems really awkward:

root@OpenWrt:~# nft 'add chain netdev retag tagin { type filter hook ingress device "br-lan" priority -149 ; policy accept; }'
Error: Could not process rule: Not supported
add chain netdev retag tagin { type filter hook ingress device "br-lan" priority -149 ; policy accept; }

I just want to try this out:

table netdev retag {
chain tagin {
          ## mangle priority, retag anything coming from LAN, you
          ## might want to do the same for anything coming from WAN
          ## (assumed to be eth1)

          type filter hook ingress device "br-lan" priority -149; policy accept;

          ip dscp set cs3 ## convert all to cs3 first, this is my base DSCP tag rather than cs0

          ## icmp/icmpv6 gets high priority but you might not want
          ## this!  it does let you find out what the round trip time
          ## for high priority packets is by just using ping though
          ip protocol icmp ip dscp set cs5 counter log flags ip options
    }
}

Amongst other things I tried adding lines in /etc/config/firewall to load in file, but that also failed. Unexpected table or something like that.

So what do I need to do to add this table?

Either run the nft add commands manually or setup the include in fw4 like I mentioned a couple days ago.

If you have syntax troubles, post your commands and the output.

nft add table netdev tagging
nft add chain netdev tagging tagin \{ type filter hook ingress device "br-lan" priority -149 \; policy accept\; \}
2 Likes

Cheers. What does this tell us:

root@OpenWrt:~# nft list chain netdev retag tagin
table netdev retag {
        chain tagin {
                type filter hook ingress device "br-lan" priority -149; policy accept;
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter packets 10 bytes 812
        }
}
root@OpenWrt:~# tcpdump -i br-lan icmp -vv
tcpdump: listening on br-lan, link-type EN10MB (Ethernet), capture size 262144 bytes
01:10:48.901000 IP (tos 0x0, ttl 50, id 60476, offset 0, flags [DF], proto ICMP (1), length 84)
    Pixel-3a.lan > OpenWrt.lan: ICMP echo request, id 920, seq 1, length 64
01:10:48.901291 IP (tos 0xa0, ttl 64, id 37593, offset 0, flags [none], proto ICMP (1), length 84)
    OpenWrt.lan > Pixel-3a.lan: ICMP echo reply, id 920, seq 1, length 64
01:10:48.914588 IP (tos 0x0, ttl 50, id 60477, offset 0, flags [DF], proto ICMP (1), length 84)
    Pixel-3a.lan > OpenWrt.lan: ICMP echo request, id 921, seq 1, length 64
01:10:48.914789 IP (tos 0xa0, ttl 64, id 37594, offset 0, flags [none], proto ICMP (1), length 84)
    OpenWrt.lan > Pixel-3a.lan: ICMP echo reply, id 921, seq 1, length 64
01:10:48.927910 IP (tos 0x0, ttl 50, id 60481, offset 0, flags [DF], proto ICMP (1), length 84)
    Pixel-3a.lan > OpenWrt.lan: ICMP echo request, id 922, seq 1, length 64
01:10:48.928154 IP (tos 0xa0, ttl 64, id 37595, offset 0, flags [none], proto ICMP (1), length 84)
    OpenWrt.lan > Pixel-3a.lan: ICMP echo reply, id 922, seq 1, length 64

Why is a0 only getting set from 192.168.1.1 to my lan client? Not from lan client to 192.168.1.1?

And then not at all between Lan client and 1.1.1.1:

root@OpenWrt:~# tcpdump -i br-lan icmp -vv
tcpdump: listening on br-lan, link-type EN10MB (Ethernet), capture size 262144 bytes
01:14:29.878499 IP (tos 0x0, ttl 50, id 59058, offset 0, flags [DF], proto ICMP (1), length 84)
    Pixel-3a.lan > one.one.one.one: ICMP echo request, id 923, seq 1, length 64
01:14:29.923761 IP (tos 0x0, ttl 58, id 48165, offset 0, flags [none], proto ICMP (1), length 84)
    one.one.one.one > Pixel-3a.lan: ICMP echo reply, id 923, seq 1, length 64
01:14:29.941141 IP (tos 0x0, ttl 50, id 59076, offset 0, flags [DF], proto ICMP (1), length 84)
    Pixel-3a.lan > one.one.one.one: ICMP echo request, id 924, seq 1, length 64
01:14:29.987768 IP (tos 0x0, ttl 58, id 20397, offset 0, flags [none], proto ICMP (1), length 84)
    one.one.one.one > Pixel-3a.lan: ICMP echo reply, id 924, seq 1, length 64
01:14:30.002406 IP (tos 0x0, ttl 50, id 59092, offset 0, flags [DF], proto ICMP (1), length 84)
    Pixel-3a.lan > one.one.one.one: ICMP echo request, id 925, seq 1, length 64
01:14:30.047774 IP (tos 0x0, ttl 58, id 59053, offset 0, flags [none], proto ICMP (1), length 84)
    one.one.one.one > Pixel-3a.lan: ICMP echo reply, id 925, seq 1, length 64
^C
6 packets captured
6 packets received by filter
0 packets dropped by kernel
root@OpenWrt:~# tcpdump -i ifb-ul icmp -vv
tcpdump: listening on ifb-ul, link-type EN10MB (Ethernet), capture size 262144 bytes
01:15:51.755535 IP (tos 0x0, ttl 50, id 13883, offset 0, flags [DF], proto ICMP (1), length 84)
    Pixel-3a.lan > one.one.one.one: ICMP echo request, id 926, seq 1, length 64
01:15:51.820113 IP (tos 0x0, ttl 50, id 13886, offset 0, flags [DF], proto ICMP (1), length 84)
    Pixel-3a.lan > one.one.one.one: ICMP echo request, id 927, seq 1, length 64
01:15:51.880870 IP (tos 0x0, ttl 50, id 13901, offset 0, flags [DF], proto ICMP (1), length 84)
    Pixel-3a.lan > one.one.one.one: ICMP echo request, id 928, seq 1, length 64
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel
root@OpenWrt:~# tcpdump -i ifb-dl icmp -vv
tcpdump: listening on ifb-dl, link-type EN10MB (Ethernet), capture size 262144 bytes
01:16:15.532679 IP (tos 0x0, ttl 58, id 24397, offset 0, flags [none], proto ICMP (1), length 84)
    one.one.one.one > Pixel-3a.lan: ICMP echo reply, id 929, seq 1, length 64
01:16:15.592694 IP (tos 0x0, ttl 58, id 61999, offset 0, flags [none], proto ICMP (1), length 84)
    one.one.one.one > Pixel-3a.lan: ICMP echo reply, id 930, seq 1, length 64
01:16:15.653695 IP (tos 0x0, ttl 58, id 59832, offset 0, flags [none], proto ICMP (1), length 84)
    one.one.one.one > Pixel-3a.lan: ICMP echo reply, id 931, seq 1, length 64

All packets entering the system are processed by this hook. It is invoked after the network taps (ie. tcpdump), right after tc ingress and before layer 3 protocol handlers, it can be used for early filtering and policing.

tcpdump happens before netdev ingress.

Ref: http://git.netfilter.org/nftables/tree/doc/nft.txt

1 Like

So I checked and if I ping OpenWrt from my phone the nft counter increases. If I ping 1.1.1.1 from my phone it does not. What's the reason for that?

Update: the tc ingress mirror redirect blocks it.

I disabled the ifb and tc mirror redirect and now the counters increase in pinging 1.1.1.1.

Without the IFBs on my vpn interface I see 0xao set both ways between lan client and 1.1.1.1. I presume that reflectors send back same dscp in repsonse as in request?

So my problem then is that tc ingress + redirect to ifb on br-lan occurs before the netdev ingress hook on br-lan.

If I'm right then nftables can't be used to set DSCP marks in respect of an IFB interface because the tc ingress happens before the ingress hook so by the time the packet is at ifb the nftables hook has lost it.

Does my analysis seem correct?

Don't suppose nftables could instead set DSCP based on the egress on the ifb following the tc mirror?

I guess at a fundamental level the issue is can nftables act somewhere: 1) before the tc ingress and mirror to IFB; or 2) after the tc ingress and mirror to IFB but before CAKE sees it to act on DSCP?

Seems like this may not be possible and then the only option is to use CAKE tin overrides using tc calls.

1 Like

it is verified by bellow setup and yes, tcpdump does step in before netdev or inet tables so not useful to verify tagging:

wan=eth0
lan=br-lan
wan - owrt - lan client
client$ ping -c 1 1.1.1.1

# watching the wan interface no sign of tagging
$ tcpdump -i eth0 -v -n icmp
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:58:33.425488 IP (tos 0x0, ttl 63, id 58438, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.80.128 > 1.1.1.1: ICMP echo request, id 3, seq 1, length 64
11:58:33.429412 IP (tos 0x0, ttl 128, id 65500, offset 0, flags [none], proto ICMP (1), length 84)
    1.1.1.1 > 192.168.80.128: ICMP echo reply, id 3, seq 1, length 64

# wathcing the lan interface though, the icmp reply, a.k.a the wan ingress traffic is marked
$ tcpdump -v -n -i br-lan icmp
tcpdump: listening on br-lan, link-type EN10MB (Ethernet), capture size 262144 bytes
12:01:35.725132 IP (tos 0x0, ttl 64, id 150, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.0.227 > 1.1.1.1: ICMP echo request, id 4, seq 1, length 64
12:01:35.829493 IP (tos 0xa0, ttl 127, id 65505, offset 0, flags [none], proto ICMP (1), length 84)
    1.1.1.1 > 10.0.0.227: ICMP echo reply, id 4, seq 1, length 64

# as expected with this nft config: 
table netdev retag {
        chain tagin {
                type filter hook ingress device "eth0" priority -149; policy accept;
                ip dscp set cs3
                ip6 dscp set cs3
                ip protocol udp udp sport 123 ip dscp set cs6
                ip6 nexthdr udp udp sport 123 ip6 dscp set cs6
                ip protocol icmp ip dscp set cs5 counter packets 4 bytes 336
                ip6 nexthdr ipv6-icmp ip6 dscp set cs5
        }
}

cs5 dscp = 0xa0 tos
2 Likes

Since these IFB games happen along the path and not on the wan interface, there might be a conceptually later interface/device on which to capture the packets with the changes already applied, no?

1 Like

oh, sorry if this was not clear: yes that was my idea too, to verify if not on the wan interface but on the lan will see the tagging or not. and running parallel tcpdump on wan and lan interface i can see this happening, i.e. tagging is done but not visible in tcpdump wan session, only in tcpdump lan session.

1 Like

@grrr2 do you agree with my post above that it seems that it is not possible using nftables to mark packets such that the marks appear on an IFB interface in time for CAKE to see them?

I create an IFB that takes combined ingress traffic from br-lan and br-guest and I want to use nftables to mark DSCPs on packets before they get intercepted by CAKE on the IFB. I do this because I have a VPN and so need to apply CAKE before the upstream encryption.

It seems that the nftables ingress hook occurs after tc ingress and this is problematic for my aim because it means the packets are redirected to the IFB before they get picked up by the nftables ingress hook.

Put another way, is it possible to use nftables to mark DSCPs such that CAKE on IFB will see them, as follows:

I tried ingress hook on 'br-lan' (I gather multiple devices can be specified for ingress hook so would be possible to have ingress hook for both br-lan and br-guest) but that did not see the packets in time to mark them because tc ingress happens BEFORE the nftables ingress hook. So that option failed, and I believe that is because tc ingress redirect pulls the packets away before the nftables ingress hook kicks in. So the nftables hook never even gets the packets and the nftables counters don't even increase.

if i understand correctly http://git.netfilter.org/nftables/tree/doc/nft.txt

NETDEV ADDRESS FAMILY
~~~~~~~~~~~~~~~~~~~~
The Netdev address family handles packets from the device ingress and egress
path. This family allows you to filter packets of any ethertype such as ARP,
VLAN 802.1q, VLAN 802.1ad (Q-in-Q) as well as IPv4 and IPv6 packets.

.Netdev address family hooks
[options="header"]
|=================
|Hook | Description
|ingress |
All packets entering the system are processed by this hook. It is invoked after
the network taps (ie. *tcpdump*), right after *tc* ingress and before layer 3
protocol handlers, it can be used for early filtering and policing.
|egress |
All packets leaving the system are processed by this hook. It is invoked after
layer 3 protocol handlers and before *tc* egress. It can be used for late
filtering and policing.

then it looks you are right. as I understand netdev is the closest to hardware layer in nft, and it still kicks in after network taps and tc ingress according to above.

3 Likes

Great find, this puts an end to the idea of being able to eventually use nftables for DSCP marking on the ingress side... too bad with IPv6 getting more prominent the prablem of not being able to peek into internal IP addresses would eventually diminish (with IPv4 becoming less popular). Alas, it does not look like this is going to work any time soon...

1 Like

Nope, because nftables can also send packets to the IFB, so we just don't use tc mirred action and instead use nftables

It looks like the command in nftables is fwd from an ingress hook chain in a netdev table.

Docs on this in the nft wiki are basically non-existent unfortunately.

4 Likes

Am I close?

root@OpenWrt:~# cat 20-try.nft
table netdev cake {
        chain catch {
                type filter hook ingress device "br-lan" priority -149; policy accept;
                ip daddr != 192.168.1.0/24 jump process_and_forward
        }
        chain process_and_forward {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-ul"
        }
}
root@OpenWrt:~# nft -f 20-try.nft
20-try.nft:9:17-31: Error: Could not process rule: No such file or directory
                ether type ip fwd to "ifb-ul"
                              ^^^^^^^^^^^^^^^

First update: needs: 'kmod-nft-netdev'

Second update: sweetness:

root@OpenWrt:~# tcpdump -i ifb-ul -vv host 1.1.1.1
tcpdump: listening on ifb-ul, link-type EN10MB (Ethernet), capture size 262144 bytes
19:15:08.358978 IP (tos 0xa0, ttl 128, id 13029, offset 0, flags [none], proto ICMP (1), length 60)
    xx.lan > one.one.one.one: ICMP echo request, id 1, seq 1, length 40
19:15:09.376526 IP (tos 0xa0, ttl 128, id 13030, offset 0, flags [none], proto ICMP (1), length 60)
    xx.lan > one.one.one.one: ICMP echo request, id 1, seq 2, length 40
^C

@dlakelan it works...

2 Likes

Interesting, will have a look. This is not super urgent, since IPv4 is still with us and NAT66 thankfully not the norm. However to be useful for ingress DSCP setting nftables either need a table that maps IPv6 addresses to a more stable identifier*, or people need to use longer term stable interface identifiers...

*) I would have said use the MAC address, but a number of devices started randomizing that, and honestly I rather have some inconvenience in my home network than sacrificing the good that MAC randomization brings for non-friendly environments like coffee-shop WiFi.

2 Likes

@dlakelan and @dave14305 so very close!

My upload portion (ingress from br-lan and br-guest) works now, but not yet the download portion (egress from br-lan and br-guest).

nftables supports some funky syntax:

table netdev cake {

        chain capture-ul {
                type filter hook ingress devices = { br-lan, br-guest } priority -149; policy accept;
                ip daddr != { 192.168.1.0/24, 192.168.2.0/24 } jump process-ul
        }

        chain process-ul {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-ul"
        }

        chain capture-dl {
                type filter hook egress devices = { br-lan, br-guest } priority -149; policy accept;
                ip saddr != { 192.168.1.0/24, 192.168.2.0/24 } jump process-dl
        }

        chain process-dl {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-dl"
        }

}
~
root@OpenWrt:~# nft -f 20-try.nft
20-try.nft:14:8-17: Error: Could not process rule: Not supported
        chain capture-dl {
              ^^^^^^^^^^
20-try.nft:14:8-17: Error: Could not process rule: No such file or directory
        chain capture-dl {
              ^^^^^^^^^^

Sadly it looks like the egress hook is not supported until linux kernel 5.16+.

Otherwise the script above seems like an elegant and simple way to process packets and forward to IFBs for CAKE. I hope we do not have to wait until OpenWrt takes on kernel 5.16+ though.

Any ideas? I could always use the conventional 'tc mirr' for egress. Seems a shame to have to adopt a hybrid approach though.

1 Like