NFtables and QoS in 2021

Admittedly static but I'd like a generic solution here that won't break if my peer goes down and I'm onto secondary peer. Also will that nftables rule break if vpn goes down? Maybe I just need to delete/add rule in hotplug?

if your peer is down it will not generate traffic, i mean the list is static or not.

Ah - I see what you mean. You mean I can just feed in:

ip saddr != { $wg_endpoint1, $wg_endpoint2,... }

That would be the easy way if you really do need dynamic you can set up a named set and add elements to it at run time

1 Like

Yes I see like here: https://wiki.nftables.org/wiki-nftables/index.php/Sets#Named_sets

I'd still really like @tohojo to confirm whether this approach will work properly.

Also in the future will be easier when nftables egress hook becomes available since then can just do it all from the lan interfaces without messing about with worrying about wireguard peer IP duplications. And that provides a very generic solution then.

But anyway thanks for your input here @dlakelan. Seems promising.

And @gabrielo please chip in if you see a cute way to skip over WireGuard peer traffic on WAN to avoid duplication with the traffic pulled from vpn other than just specifying the WireGuard src peer IPs.

the wireguard traffic seen from wan is one udp port so you can make a rule with udp port and a accept statement to finish processing at that point

meta oifname wan udp port 51999 accept
jump process-dl

or you can use a map alike a ifname_QoS with the wan jumping to a function containing the above two statements and vpn jumping directly to process-dl

2 Likes

@dlakelan, @dave14305, @_FailSafe what is the status regarding the connection tracking that you looked into above (IIUC setting DSCP on download based on connection track mark associated with upload?).

Can it be used in ingress hook at the moment?

Is there a working example you can provide?

yes, question is how often you want to create this (unnamed) set ...

but as @gabrielo said your wg traffic is via known udp port, or i guess the allowed ip you use is something systematic (e.g. all peers using a private ip range as tunnel ip), or worst case all peer address is know before hand. so basically you can create a static config.

but if for any reason you have very complex wg network and really want/need lots of rules then you'll need for example a periodic wg show all dump and update the rules accordingly. which you can do easily in two ways:

  • you can add/remove nft rules on the fly via nft add/delete rule command
  • or if you create a named set similarly you can nft add/delete element on the fly

by the way could you please upload all your config including tc and nft to see the whole stuff in one place? am a bit lost what has been the final working config.

and congratulations for all the tests and effort to succeed at the end! now we just need a good package to put the pieces together :wink:

1 Like

it's just this here:

table netdev cake {

        chain capture-ul {
                type filter hook ingress devices = { br-lan, br-guest } priority -149; policy accept;
                ip daddr != { 192.168.1.0/24, 192.168.2.0/24 } jump process-ul
        }

        chain process-ul {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-ul"
        }

        chain capture-dl {
                type filter hook ingress devices = { wan, vpn } priority -149; policy accept;
                ip saddr != { $wg_endpoint } jump process-dl
        }

        chain process-dl {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ip fwd to "ifb-dl"
        }
}

This allows setting DSCPs and forwarding to IFBs for upload and download despite multiple interfaces (e.g. WAN/VPN/br-lan/br-guest).

For me was a good introduction to learn about nftables.

I'm curious to see now if I can leverage the connection tracking ingress restoration as per the discussion earlier on in this thread. Namely I'd quite like Windows 11 to set DSCP marks in applications for outbound traffic and then for those DSCP marks to get applied on my ingress, but I'm not sure if that's possible at the point of the ingress hook. I'd also like to see a working example of that restoration.

1 Like

with tc commands too please :slight_smile:

Well I just borrowed from my tc approach. The latter is presumably fine for the vast majority of use cases. Whereas intellectually this DSCP stuff is very interesting, I'm dubious about how much real benefit messing about with DSCPs actually provides. I mean on my LTE cake w/ besteffort provides a collosal difference, but benefit from DSCPs? I'm not so sure.

But something like:

	# ifb interface for handling ingress on WAN (and VPN interface if wg show reports endpoint)
	ip link add name ifb-ul type ifb
	ip link add name ifb-dl type ifb
    ip link set ifb-ul up
    ip link set ifb-dl up

	# apply CAKE on the ifbs
	tc qdisc add dev ifb-ul root cake bandwidth 30Mbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100ms noatm overhead 92
    tc qdisc add dev ifb-dl root cake bandwidth 25Mbit besteffort triple-isolate nonat nowash ingress no-ack-filter split-gso rtt 100ms noatm overhead 92

And so this nftables approach is potentially better than the pure tc approach because it allows DSCP setting prior to forwarding to the IFBs. Albeit I like egress hook in tc.

1 Like

I don't think so, I think ingress is well before conntrack has seen the packet, by design.

However you can still do DSCP changes during routing which is after conntrack, and then you can benefit in other places, for example WMM or in a smart switch or if you want another cake doing something like throttling a guest network or whatever.

Ah crap. So that'd be another benefit of the egress hook since at that point this would be available?

If you have control of the windows 11 machine, I mean you trust the dscp marks, coming from it, create a rule with the ip address (via accept) and skip the nftables rules so the windows dscp marks are preserved

1 Like

FWIW this is also why I never bothered with all the diffserv tinkering. If you have very low bandwidth (say, a couple of megabits), and use applications that tend to continuously use lots of bandwidth that you want to throttle in favour of other traffic (say, bittorrent or cloud backups or something like that), the diffserv-based marking can make sense. But if you don't do any of these, the flow queueing will generally be enough to get you excellent performance on its own.

As for your other question:

I think it might be fine with that, actually? sch_cake itself uses the skb network_header offset and the kernel flow dissector to retrieve the DSCP marking and distinguish flows (respectively), and both of these should deal gracefully with a packet with no MAC header. The simple way to test this is to just look at the cake stats and see if they look sane (i.e., packets are hitting the tins you expect and flow counters are going up). Which it looks like you've done, so yeah, I guess it's working? :slight_smile:

2 Likes

Thanks so much for your input here @tohojo.

And otherwise gents, thanks for your patience. This has been fun.

Time for me to get back to some patent opposition work for a bit.

1 Like

We could use your persistence and tenacity to push the netfilter devs to accept these patches for nftables:

https://lore.kernel.org/netfilter-devel/20220404121410.188509-1-jeremy@azazel.net/

These would make it possible to save DSCP to conntrack marks more easily than we can today.

2 Likes

Yes, but also I've had some success at internal network bottlenecks, for example APs, or switches. I have one network segment on the far side of a power line modem, and the switches on either side are configured to allow 40Mbps across that segment, and they obey DSCP markings, so stuff like the IP cameras don't stutter even if I copy a file across that segment etc.

1 Like

Right, that makes sense, certainly. My point above was mostly related to general-purpose internet traffic, where special-purpose use cases like yours are rarer. Also, if you could just install sch_cake in host fairness mode on those switches, your IP cameras would be fine unless you had so many clients transmitting simultaneously that you would run out of bandwidth entirely (e.g., if you have a 40Mbps link and each video stream is 1 Mbps, you'd be fine with host fairness until you exceed 40 simultaneous hosts transmitting).

Also, I'm not denying that it is possible to optimise traffic some with finely tuned diffserv markings. I'm just saying that it's not worth the effort (to me). I.e., just turning on sch_cake and configuring the bandwidth gets you 90% of the benefit compared to an unmanaged link, and spending hours tweaking things to get those few extra % towards some theoretical optimum is not worth it :slight_smile:

However, using diffserv-based prioritisation to work around devices in the network that lack the smarts to do better (as you're doing) is certainly viable. To me, that is the justification for diffserv in general, although thankfully I don't have any bottlenecks in my own network that needs this kind of tuning :slight_smile:

1 Like

There's no question that the people who have experienced the most value from DSCP are those with pretty tight internet bandwidth. One enormous thread here was about helping a Greek man play realtime games on an asymmetric 16/0.75 Mbps DSL line.

Still nftables makes tagging things with DSCP a lot easier than it used to be and it's possible to get a lot of value from it specifically for making video conferencing completely stutter free even if other traffic on the network gets intense. The copious meetings and teaching of classes my wife did during height of the pandemic never had any issues. My friends wife almost lost her job teaching med students virtually because the students couldn't actually understand anything she was saying or doing. Cake with diffserv4 is responsible for her keeping that job. Many Thanks to everyone who has worked on latency issues in OpenWrt and Linux kernel and online here.

2 Likes