NFtables and QoS in 2021

Remind me why you want the egress hook?
For tagging in the downstream direction just process the packets in regular forward hook chain. Oh wait I see you want it to go through an iFB, I guess you want to dscp tag it before it hits the IFB?

You can ingress tag on the WAN but you won't get decrypted packets that way. I guess probably the wire guard doesn't have an ingress?

Exactly - I want one interface for CAKE for upload and one interface for CAKE for download notwithstanding having WAN/WireGuard and br-lan/br-guest.

Presently I achieve this by setting up two IFB interfaces as follows:

That is, I create 'ifb-ul' by mirroring the ingress from br-lan and br-guest and I create 'ifb-dl' by mirroring the egress from br-lan and br-guest. I also take care to skip out the LAN-LAN traffic.

It does, but it's layer 3:

And so mirroring/forwarding from the WireGuard interface won't work properly.

Therefore instead I create my IFB for download traffic by taking the egress from br-lan and br-guest.

The tc-based solution for my dual ifb approach looks like this:

    tc qdisc add dev br-lan handle ffff: ingress
    tc qdisc add dev br-guest handle ffff: ingress
	tc qdisc add dev br-lan handle 1: root prio
	tc qdisc add dev br-guest handle 1: root prio

	# capture upload (ingress) on br-lan and br-guest
	tc filter add dev br-lan parent ffff: protocol ip prio 1 u32 match ip dst 192.168.1.0/24 action pass
	tc filter add dev br-lan parent ffff: protocol ip prio 2 matchall action mirred egress redirect dev ifb-ul
	tc filter add dev br-guest parent ffff: protocol ip prio 1 u32 match ip dst 192.168.2.0/24 action pass
	tc filter add dev br-guest parent ffff: protocol ip prio 2 matchall action mirred egress redirect dev ifb-ul
       	
	# capture download (egress) on br-lan and br-guest
	tc filter add dev br-lan parent 1: protocol ip prio 1 u32 match ip src 192.168.1.0/24 action pass
	tc filter add dev br-lan parent 1: protocol ip prio 2 matchall action mirred egress redirect dev ifb-dl
	tc filter add dev br-guest parent 1: protocol ip prio 1 u32 match ip src 192.168.2.0/24 action pass
	tc filter add dev br-guest parent 1: protocol ip prio 2 matchall action mirred egress redirect dev ifb-dl

I think the equivalent in nftables with DSCP marking support looks like this:

table netdev cake {

        chain capture-ul {
                type filter hook ingress devices = { br-lan, br-guest } priority -149; policy accept;
                ip daddr != { 192.168.1.0/24, 192.168.2.0/24 } jump process-ul
        }

        chain process-ul {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-ul"
        }

        chain capture-dl {
                type filter hook egress devices = { br-lan, br-guest } priority -149; policy accept;
                ip saddr != { 192.168.1.0/24, 192.168.2.0/24 } jump process-dl
        }

        chain process-dl {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-dl"
        }

}

I love how simple and elegant this is and how it facilitates DSCP tagging prior to forwarding to the IFBs.

But sadly whilst the upload part works in respect of ingress from br-lan/br-guest, the download part does not yet work because the egress hook isn't supported until linux kernel 5.16+ - see here:

https://lwn.net/Articles/876497/

And here:

So unless somehow this egress hook functionality is backported such that we can use it in our 5.10 kernel in OpenWrt, this solution won't work until way into the future.

Does this actually matter for NF tables packet processing or is it just a problem when you try to look at the packets in tcpdump?

Who can explain me the role of ICMP? And on which value it influences... I ask this because by default my ISP mark it as CS6...

So I understand that as @tohojo stated:

there are no headers, and so in forwarding the packets to an IFB you end up with something like:

11:02:12.674405 40:00:6c:06:86:f4 (oui Unknown) > 45:00:00:57:d1:22 (oui Unknown), ethertype Unknown (0x3470), length 87:
        0x0000:  7813 0a05 0002 01bb ce0b 96af 9437 464f  x............7FO
        0x0010:  8843 5018 07fb 2b58 0000 1703 0300 2a00  .CP...+X......*.
        0x0020:  0000 0000 0000 b390 2546 fbc5 aeef 7af5  ........%F....z.
        0x0030:  5c6d 549d bf3a da05 825c 5a16 8dd9 0905  \mT..:...\Z.....
        0x0040:  6a67 24dd 40ee 822a aa                   jg$.@..*.

and I understand that this means that CAKE does not see the proper flow information?

Perhaps DSCP marks could be applied in nftables using a VPN ingress hook that is preserved outside, or presumably more naturally by using a postrouting hook, but this doesn't help in terms of getting the single interface combining mixture of flows from WAN/VPN needed for CAKE, right?

In 'nftables' 'fwd to' only works in the context of netdev, and forwarding from the WireGuard interface apparently does not work given the lack of the headers. So ingress on the vpn interface won't work, and although 'egress' from 'br-lan'/'br-guest' will work, that's not supported yet.

Is it actually a problem for cake or is it just a problem that TCP dump doesn't know how to display such packets

I'm taking @tohojo's word for it here. Doesn't CAKE need to see what the source and destination of the packets is to actually provide flow fairness et al?

Well it should see the layer 3 source in destination it just doesn't see ethernet source in destination which I don't think should matter however cake has to know what this packet is that it's a layer 3 bear packet and therefore know where to look inside the packet for the various Fields it wants to look at and I don't know whether it's smart enough to do that

@tohojo could you expand on this aspect discussed above here perhaps? Would having CAKE work on IFB including combination of flows from WAN and layer 3 flows from WireGuard interface break CAKE?

So @dlakelan if you are right then ingress hooks on WAN and VPN would presumably(?) work for DSCP marking and then forwarding to IFB. I was actually using this for some time using tc (without DSCP marking), and it was only when I looked at the 'tcpdump' stuff that it dawned on me that something might not be right.

If CAKE can actually work with layer 3 packets in IFB taken from VPN (mixed with packets taken from WAN) something like this in nftables might then work(?):

table netdev cake {

        chain capture-ul {
                type filter hook ingress devices = { br-lan, br-guest } priority -149; policy accept;
                ip daddr != { 192.168.1.0/24, 192.168.2.0/24 } jump process-ul
        }

        chain process-ul {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-ul"
        }

        chain capture-dl {
                type filter hook ingress devices = { wan, vpn } priority -149; policy accept;
                ip saddr != { $wg_endpoint } jump process-dl
        }

        chain process-dl {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-dl"
        }
}

BTW @dlakelan if this approach would work, would there be a good way to feed in:

wg_endpoint=$(wg show | awk '{if($1 == "endpoint:"){split($2,a,":"); print a[1]}}')

into the nftables script? I guess I'd just have service file or hotplug run in the background and launch script with 'nft -f ..'?

Yes the issue is whether this is just a problem with TCP dump being confused by the packet if everybody else is happy then it's no big deal maybe you could set it back up and then look at the capture on your lan side and see if the packets coming from VPN wind up with the correct dscp as they leave the lan

All ICMP packets? I see CS6 on ICMP packets generated by my ISPs internal hops but not on ICMP responses say from 9.9.9.9.

So this test nftables script:

root@OpenWrt:~# cat test.nft
table netdev cake {
              chain capture-dl {
                type filter hook ingress device vpn priority -149; policy accept;
                jump process-dl
        }

        chain process-dl {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-test"
        }
}

results in nothing showing up on ifb-test. How would I correctly 'fwd to'?

First Update: removing 'ether type ip' and just retaining 'fwd to' works (as in I see garbled text using 'tcpdump -i ifb-test'), but how do I know whether DSCP marks are retained or not?

Second Update: ah, I see on 'br-lan' that DSCP is preserved:

17:32:03.874033 IP (tos 0xa0, ttl 58, id 26627, offset 0, flags [none], proto ICMP (1), length 60)
    one.one.one.one > XX.lan: ICMP echo reply, id 1, seq 60, length 40

OK so @dlakelan you are right it works from DSCP marking perspective so the question for @tohojo is whether CAKE can properly work with ifb that has mixture of layer 3 flows from VPN and other layer flows from WAN.

ping between vpn remote client and lan client, and checking on lan side should reveal i guess.

1 Like

Run some traffic through your VPN marked for a high tier tin and then see what cake does with it

Would that be conclusive though?

Well it would certainly be suggestive and fairly strongly so

1 Like

Well blow me down it works:

qdisc cake 8018: dev ifb-test root refcnt 2 bandwidth 25Mbit diffserv4 triple-isolate nonat nowash ingress no-ack-filter split-gso rtt 100ms noatm overhead 92
 Sent 2057581 bytes 1785 pkt (dropped 9, overlimits 3061 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 333312b of 4Mb
 capacity estimate: 25Mbit
 min/max network layer size:           40 /    1420
 min/max overhead-adjusted size:      132 /    1512
 average network hdr offset:            0

                   Bulk  Best Effort        Video        Voice
  thresh       1562Kbit       25Mbit    12500Kbit     6250Kbit
  target         11.6ms          5ms          5ms          5ms
  interval        107ms        100ms        100ms        100ms
  pk_delay          0us          0us       13.1ms         12us
  av_delay          0us          0us       12.1ms          0us
  sp_delay          0us          0us         37us          0us
  backlog            0b           0b           0b           0b
  pkts                0            0         1790            4
  bytes               0            0      2068843          240
  way_inds            0            0            0            0
  way_miss            0            0           39            1
  way_cols            0            0            0            0
  drops               0            0            9            0
  marks               0            0            0            0
  ack_drop            0            0            0            0
  sp_flows            0            0            1            0
  bk_flows            0            0            1            0
  un_flows            0            0            0            0
  max_len             0            0         4260           60
  quantum           300          762          381          300

OK then all I need to know is how to feed in:

wg_endpoint=$(wg show | awk '{if($1 == "endpoint:"){split($2,a,":"); print a[1]}}')

to this script:

table netdev cake {

        chain capture-ul {
                type filter hook ingress devices = { br-lan, br-guest } priority -149; policy accept;
                ip daddr != { 192.168.1.0/24, 192.168.2.0/24 } jump process-ul
        }

        chain process-ul {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-ul"
        }

        chain capture-dl {
                type filter hook ingress devices = { wan, vpn } priority -149; policy accept;
                ip saddr != { $wg_endpoint } jump process-dl
        }

        chain process-dl {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ip fwd to "ifb-dl"
        }
}

@gabrielo would the mapping technique you mentioned a la:

map ifname_QoS {
    type ifname : verdict ;
    elements = {
        br-lan: jump select_priority,
        br-guest : jump select_priority
    }
}

offer a substitute way to SKIP over the packets from my WireGuard peer on WAN?

1 Like

Congratulations you have now experienced the developers worst nightmare a bug in the debugging tool :sweat_smile:

Can you think of a cute way to skip over WireGuard peer traffic on WAN so I don't duplicate traffic with the traffic captured from VPN?

Otherwise I somehow need to feed in my $wg_endpoint source, which is a bit awkward.

        chain capture-dl {
                type filter hook ingress devices = { wan, vpn } priority -149; policy accept;
                ip saddr != { $wg_endpoint } jump process-dl
        }

do you have dynamic or static wg peers?