NFtables and QoS in 2021

Nope, because nftables can also send packets to the IFB, so we just don't use tc mirred action and instead use nftables

It looks like the command in nftables is fwd from an ingress hook chain in a netdev table.

Docs on this in the nft wiki are basically non-existent unfortunately.

4 Likes

Am I close?

root@OpenWrt:~# cat 20-try.nft
table netdev cake {
        chain catch {
                type filter hook ingress device "br-lan" priority -149; policy accept;
                ip daddr != 192.168.1.0/24 jump process_and_forward
        }
        chain process_and_forward {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-ul"
        }
}
root@OpenWrt:~# nft -f 20-try.nft
20-try.nft:9:17-31: Error: Could not process rule: No such file or directory
                ether type ip fwd to "ifb-ul"
                              ^^^^^^^^^^^^^^^

First update: needs: 'kmod-nft-netdev'

Second update: sweetness:

root@OpenWrt:~# tcpdump -i ifb-ul -vv host 1.1.1.1
tcpdump: listening on ifb-ul, link-type EN10MB (Ethernet), capture size 262144 bytes
19:15:08.358978 IP (tos 0xa0, ttl 128, id 13029, offset 0, flags [none], proto ICMP (1), length 60)
    xx.lan > one.one.one.one: ICMP echo request, id 1, seq 1, length 40
19:15:09.376526 IP (tos 0xa0, ttl 128, id 13030, offset 0, flags [none], proto ICMP (1), length 60)
    xx.lan > one.one.one.one: ICMP echo request, id 1, seq 2, length 40
^C

@dlakelan it works...

2 Likes

Interesting, will have a look. This is not super urgent, since IPv4 is still with us and NAT66 thankfully not the norm. However to be useful for ingress DSCP setting nftables either need a table that maps IPv6 addresses to a more stable identifier*, or people need to use longer term stable interface identifiers...

*) I would have said use the MAC address, but a number of devices started randomizing that, and honestly I rather have some inconvenience in my home network than sacrificing the good that MAC randomization brings for non-friendly environments like coffee-shop WiFi.

2 Likes

@dlakelan and @dave14305 so very close!

My upload portion (ingress from br-lan and br-guest) works now, but not yet the download portion (egress from br-lan and br-guest).

nftables supports some funky syntax:

table netdev cake {

        chain capture-ul {
                type filter hook ingress devices = { br-lan, br-guest } priority -149; policy accept;
                ip daddr != { 192.168.1.0/24, 192.168.2.0/24 } jump process-ul
        }

        chain process-ul {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-ul"
        }

        chain capture-dl {
                type filter hook egress devices = { br-lan, br-guest } priority -149; policy accept;
                ip saddr != { 192.168.1.0/24, 192.168.2.0/24 } jump process-dl
        }

        chain process-dl {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-dl"
        }

}
~
root@OpenWrt:~# nft -f 20-try.nft
20-try.nft:14:8-17: Error: Could not process rule: Not supported
        chain capture-dl {
              ^^^^^^^^^^
20-try.nft:14:8-17: Error: Could not process rule: No such file or directory
        chain capture-dl {
              ^^^^^^^^^^

Sadly it looks like the egress hook is not supported until linux kernel 5.16+.

Otherwise the script above seems like an elegant and simple way to process packets and forward to IFBs for CAKE. I hope we do not have to wait until OpenWrt takes on kernel 5.16+ though.

Any ideas? I could always use the conventional 'tc mirr' for egress. Seems a shame to have to adopt a hybrid approach though.

1 Like

Remind me why you want the egress hook?
For tagging in the downstream direction just process the packets in regular forward hook chain. Oh wait I see you want it to go through an iFB, I guess you want to dscp tag it before it hits the IFB?

You can ingress tag on the WAN but you won't get decrypted packets that way. I guess probably the wire guard doesn't have an ingress?

Exactly - I want one interface for CAKE for upload and one interface for CAKE for download notwithstanding having WAN/WireGuard and br-lan/br-guest.

Presently I achieve this by setting up two IFB interfaces as follows:

That is, I create 'ifb-ul' by mirroring the ingress from br-lan and br-guest and I create 'ifb-dl' by mirroring the egress from br-lan and br-guest. I also take care to skip out the LAN-LAN traffic.

It does, but it's layer 3:

And so mirroring/forwarding from the WireGuard interface won't work properly.

Therefore instead I create my IFB for download traffic by taking the egress from br-lan and br-guest.

The tc-based solution for my dual ifb approach looks like this:

    tc qdisc add dev br-lan handle ffff: ingress
    tc qdisc add dev br-guest handle ffff: ingress
	tc qdisc add dev br-lan handle 1: root prio
	tc qdisc add dev br-guest handle 1: root prio

	# capture upload (ingress) on br-lan and br-guest
	tc filter add dev br-lan parent ffff: protocol ip prio 1 u32 match ip dst 192.168.1.0/24 action pass
	tc filter add dev br-lan parent ffff: protocol ip prio 2 matchall action mirred egress redirect dev ifb-ul
	tc filter add dev br-guest parent ffff: protocol ip prio 1 u32 match ip dst 192.168.2.0/24 action pass
	tc filter add dev br-guest parent ffff: protocol ip prio 2 matchall action mirred egress redirect dev ifb-ul
       	
	# capture download (egress) on br-lan and br-guest
	tc filter add dev br-lan parent 1: protocol ip prio 1 u32 match ip src 192.168.1.0/24 action pass
	tc filter add dev br-lan parent 1: protocol ip prio 2 matchall action mirred egress redirect dev ifb-dl
	tc filter add dev br-guest parent 1: protocol ip prio 1 u32 match ip src 192.168.2.0/24 action pass
	tc filter add dev br-guest parent 1: protocol ip prio 2 matchall action mirred egress redirect dev ifb-dl

I think the equivalent in nftables with DSCP marking support looks like this:

table netdev cake {

        chain capture-ul {
                type filter hook ingress devices = { br-lan, br-guest } priority -149; policy accept;
                ip daddr != { 192.168.1.0/24, 192.168.2.0/24 } jump process-ul
        }

        chain process-ul {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-ul"
        }

        chain capture-dl {
                type filter hook egress devices = { br-lan, br-guest } priority -149; policy accept;
                ip saddr != { 192.168.1.0/24, 192.168.2.0/24 } jump process-dl
        }

        chain process-dl {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-dl"
        }

}

I love how simple and elegant this is and how it facilitates DSCP tagging prior to forwarding to the IFBs.

But sadly whilst the upload part works in respect of ingress from br-lan/br-guest, the download part does not yet work because the egress hook isn't supported until linux kernel 5.16+ - see here:

https://lwn.net/Articles/876497/

And here:

So unless somehow this egress hook functionality is backported such that we can use it in our 5.10 kernel in OpenWrt, this solution won't work until way into the future.

Does this actually matter for NF tables packet processing or is it just a problem when you try to look at the packets in tcpdump?

Who can explain me the role of ICMP? And on which value it influences... I ask this because by default my ISP mark it as CS6...

So I understand that as @tohojo stated:

there are no headers, and so in forwarding the packets to an IFB you end up with something like:

11:02:12.674405 40:00:6c:06:86:f4 (oui Unknown) > 45:00:00:57:d1:22 (oui Unknown), ethertype Unknown (0x3470), length 87:
        0x0000:  7813 0a05 0002 01bb ce0b 96af 9437 464f  x............7FO
        0x0010:  8843 5018 07fb 2b58 0000 1703 0300 2a00  .CP...+X......*.
        0x0020:  0000 0000 0000 b390 2546 fbc5 aeef 7af5  ........%F....z.
        0x0030:  5c6d 549d bf3a da05 825c 5a16 8dd9 0905  \mT..:...\Z.....
        0x0040:  6a67 24dd 40ee 822a aa                   jg$.@..*.

and I understand that this means that CAKE does not see the proper flow information?

Perhaps DSCP marks could be applied in nftables using a VPN ingress hook that is preserved outside, or presumably more naturally by using a postrouting hook, but this doesn't help in terms of getting the single interface combining mixture of flows from WAN/VPN needed for CAKE, right?

In 'nftables' 'fwd to' only works in the context of netdev, and forwarding from the WireGuard interface apparently does not work given the lack of the headers. So ingress on the vpn interface won't work, and although 'egress' from 'br-lan'/'br-guest' will work, that's not supported yet.

Is it actually a problem for cake or is it just a problem that TCP dump doesn't know how to display such packets

I'm taking @tohojo's word for it here. Doesn't CAKE need to see what the source and destination of the packets is to actually provide flow fairness et al?

Well it should see the layer 3 source in destination it just doesn't see ethernet source in destination which I don't think should matter however cake has to know what this packet is that it's a layer 3 bear packet and therefore know where to look inside the packet for the various Fields it wants to look at and I don't know whether it's smart enough to do that

@tohojo could you expand on this aspect discussed above here perhaps? Would having CAKE work on IFB including combination of flows from WAN and layer 3 flows from WireGuard interface break CAKE?

So @dlakelan if you are right then ingress hooks on WAN and VPN would presumably(?) work for DSCP marking and then forwarding to IFB. I was actually using this for some time using tc (without DSCP marking), and it was only when I looked at the 'tcpdump' stuff that it dawned on me that something might not be right.

If CAKE can actually work with layer 3 packets in IFB taken from VPN (mixed with packets taken from WAN) something like this in nftables might then work(?):

table netdev cake {

        chain capture-ul {
                type filter hook ingress devices = { br-lan, br-guest } priority -149; policy accept;
                ip daddr != { 192.168.1.0/24, 192.168.2.0/24 } jump process-ul
        }

        chain process-ul {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-ul"
        }

        chain capture-dl {
                type filter hook ingress devices = { wan, vpn } priority -149; policy accept;
                ip saddr != { $wg_endpoint } jump process-dl
        }

        chain process-dl {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-dl"
        }
}

BTW @dlakelan if this approach would work, would there be a good way to feed in:

wg_endpoint=$(wg show | awk '{if($1 == "endpoint:"){split($2,a,":"); print a[1]}}')

into the nftables script? I guess I'd just have service file or hotplug run in the background and launch script with 'nft -f ..'?

Yes the issue is whether this is just a problem with TCP dump being confused by the packet if everybody else is happy then it's no big deal maybe you could set it back up and then look at the capture on your lan side and see if the packets coming from VPN wind up with the correct dscp as they leave the lan

All ICMP packets? I see CS6 on ICMP packets generated by my ISPs internal hops but not on ICMP responses say from 9.9.9.9.

So this test nftables script:

root@OpenWrt:~# cat test.nft
table netdev cake {
              chain capture-dl {
                type filter hook ingress device vpn priority -149; policy accept;
                jump process-dl
        }

        chain process-dl {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-test"
        }
}

results in nothing showing up on ifb-test. How would I correctly 'fwd to'?

First Update: removing 'ether type ip' and just retaining 'fwd to' works (as in I see garbled text using 'tcpdump -i ifb-test'), but how do I know whether DSCP marks are retained or not?

Second Update: ah, I see on 'br-lan' that DSCP is preserved:

17:32:03.874033 IP (tos 0xa0, ttl 58, id 26627, offset 0, flags [none], proto ICMP (1), length 60)
    one.one.one.one > XX.lan: ICMP echo reply, id 1, seq 60, length 40

OK so @dlakelan you are right it works from DSCP marking perspective so the question for @tohojo is whether CAKE can properly work with ifb that has mixture of layer 3 flows from VPN and other layer flows from WAN.

ping between vpn remote client and lan client, and checking on lan side should reveal i guess.

1 Like

Run some traffic through your VPN marked for a high tier tin and then see what cake does with it

Would that be conclusive though?

Well it would certainly be suggestive and fairly strongly so

1 Like