NFtables and QoS in 2021

Ah, sorry for being ambiguous, I wonder under what name and path to write the file with the nftable code.

Ah sorry yes that's just any file with a filename ending in ".nft" placed in /etc/nftables.d/ since that is where you can place additional chains. And then service firewall restart. I will update my GitHub repository with that file and update the readme.

I am trying to work out whether myself and most other users will want to allow guests in guest network to specify certain DSCPs. I think devices specifying DSCPs is pretty rare and normally has to be be explicit. I can either allow it or force all guest traffic to the besteffort tin. Any thoughts on that?

@ldir I just had a random thought. Would it be possible to restore DSCPs from conntracks in respect of encrypted WireGuard packets at wan egress (and then use CAKE wash to wash them after the tinning) before release into the wild? Because we know already that CAKE can use skb->hash preservation across WireGuard to identify individual flows, and although WireGuard strips the DSCPs during encapsulation, I wondered if linux still tracks the flows such that the conntracks could be exploited to restore DSCPs there?

I still need to fully wrap my head around conntracks to determine whether this makes sense. I am confused about whether we edit the conntrack of a packet or what linux stores in respect of flows (the stamp or the stamper). Or perhaps we edit the stamp so the stamper gets altered. If the latter then the points of update of the stamper based on altered stamps seems important.

Hey guys, thought I'd post my own setup that I've been working with against the new fw4 release.

It may or may not be of use, but feel free to pinch ideas :slight_smile:

I'm categorising packets in the input and postrouting 'chains' with a mix of static and dynamic rules - the latter do a pretty good job of downgrading threaded http (steam etc) and P2P traffic.

DSCP marking on the ingress is done using the ctinfo dscp restore function (and therefore the rules are used in conjunction with a tweaked layer cake setup script).

Packets in postrouting (output/forwarded) use nftables to set their DSCP mark as I'm marking LAN bound packets with a more WMM friendly value (think EF to CS6 etc).

Repo: https://github.com/jeverley/nft-dscpclassify
Nft script: https://github.com/jeverley/nft-dscpclassify/blob/main/etc/nftables.d/11-dscpclassify.nft

2 Likes

I know that people get in a fuss because EF by default maps to AC_VI instead of AC_VO, but it is not clear that this matters much, arguably most WiFi traffic should use AC_BE anyway...

@yelreve that is some serious nftables coding! For those that are a little slow on the uptake like myself please could you expand on what your script does? You indicate that you use DSCP restoration on ingress (from wan?) and that you also adjust the DSCP mark on the way to lan? So it is a sort of corrected DSCP restoration?

Why don't you just classify for both directions rather than the DSCP restoration using something like this:

table netdev cake {

        chain capture-ul {
                type filter hook ingress devices = { br-lan, br-guest } priority -149; policy accept;
                ip daddr != { 192.168.1.0/24, 192.168.2.0/24 } jump process-ul
        }

        chain process-ul {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-ul"
        }

        chain capture-dl {
                type filter hook ingress devices = { wan, vpn } priority -149; policy accept;
                ip saddr != { $wg_endpoint } jump process-dl
        }

        chain process-dl {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ip fwd to "ifb-dl"
        }

Is it because you want LAN clients to be able to influence the DSCP?

Sorry if I'm missing the obvious. But it's not so easy to grasp from your description or the code. Well at least for me anyway.

You don't need to remap EF and VA from AC_VI to AC_VO if you use OpenWrt for your WiFi devices. They have been mapping correctly since some time ago. Might be interesting to add define le = 1 to your list and a new dscp_set_le for this new type of traffic. CS1 is better used only for backup, and you can mark your dynamic deprioritised traffic LE.
Colour me impressed; I like your approach. It mimics @ldir iptables scripts on nftables.

Could you possibly give me a brief summary of what this script does and why? I don't get the use of conntrack restore when in nftables you can just set DSCPs for both directions anyway. Or is the intention here to have LAN clients set DSCP and then correct it in the script? Or mixture of LAN clients setting DSCP and DSCP set in scripts?

I ask because absent this facility of having LAN clients set DSCP I don't yet see the point in DSCP restore form conntrack.

I suspect it's so that it happens before queueing if you're using an IFB

But you don't need it..

.. with nftables you can mark in both directions and then fwd to IFBs.. like this:

table netdev cake {

        chain capture-ul {
                type filter hook ingress devices = { br-lan, br-guest } priority -149; policy accept;
                ip daddr != { 192.168.1.0/24, 192.168.2.0/24 } jump process-ul
        }

        chain process-ul {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ether type ip fwd to "ifb-ul"
        }

        chain capture-dl {
                type filter hook ingress devices = { wan, vpn } priority -149; policy accept;
                ip saddr != { $wg_endpoint } jump process-dl
        }

        chain process-dl {
                ip dscp set cs3
                ip protocol icmp ip dscp set cs5 counter
                ip fwd to "ifb-dl"
        }

The only reason I see for it at the moment is to allow LAN clients to set DSCPs for upload and then restore for download.

Yeah, agreed i'm not sure what the point of it is in this particularly instance.

1 Like

Looking forward to better understanding the thinking here.

It's DSCP marking (and storing in conntrack) any packet matching a specific rule. The point of storing the DSCP in conntrack is to automatic restore it once it comes back from WAN into your LAN. And the main goal is to do a proper WMM classification for the WLAN (LAN) network. I'm oversimplifying it. I hope it makes sense.

The advantage of this is that in cases like having a macOS computer not supporting per app DSCP marking, you do this in your router based on addresses, ports or all of them.

Sorry no I don't follow. I use restoration to allow me to set DSCPs on upload using LAN clients and then restore on download.

Else why don't you just set DSCPs in both directions in script?

1 Like

I suspect it's just a copy/translation of what @ldir did with iptables. Am I correct @yelreve?

Not having looked where his shapers live, but the copy DSCP approach works nicely with SQM ingress via an IFB on the wan interface, where internal addresses are not available, and where nftables rules would be limited.

1 Like

However it is far from clear that mapping EF into AC_VO is a good idea in the first place... really the IETF RFC 8325 that tried to propose a new mapping between PHBs/DSCPs was apparently not based on empiric evidence, but on theoretical considerations. The argument goes something like, EF is intended for VoIP signaling and AC_VO contains 'voice' in its name hence EF should be mapped into AC_VO. The problem with this approach is that by default EF has been mapped to AC_VI ever since at least WMM mappings where ratified in 802.11e in 2005. RFC 8325 cites no data indicating that VoIP traffic actually suffered from this treatment, and also offers no evidence that voice traffic actually performs better in AC_VO.
In short rfc 8325 is a rather theoretical affair bringing more order/logic rationale into DSCP to AC mappings, but fails to assess how/if the proposed changes actually improve performance in the field. (To be generous, improvement or no change seem fine, but it also fails to confirm that the proposed changes do no harm). I say this in spite of liking rfc 8325 it explains the issues at hand pretty clearly and IMHO quite readably, but alas, it fails to offer compelling reasons why the proposed changes should be implemented in the field.

The goal behind LE is IMHO that CS1 can be relieved of its duty as background traffic identifier, there is still gear out there that treats CS1 to higher priority than CS0, which is bad, so best get rid of CS1 as background marker at all. Now inside your home network that only matters if your operate devices (like cheap switches) that behave that way.

I guess with X priority tiers one really only needs X different DSCPs, however I like your implied idea to use different DSCPs per tier for labeling different reasons (actually different rules) for steering a packet into a tier, which can help a lot with debugging/diagnosing things.

the readme on the github page reads:

An nftables ruleset for OpenWrt's firewall4 for dynamically setting DSCP packet marks (this only works in OpenWrt 22.03 and above). This should be used in conjunction with layer-cake SQM queue with ctinfo configured to restore DSCP on the device ingress. The dscpclassify rules use the last 8 bits of the conntrack mark (0x000000ff).

implying a traditional sqm set-up with an IFB on the wan interface where nftables will not work as expected (at the very least because internal IP addresses and ports are not yet resolved).

Given that there is very little rationale for using different priority tiers/DSCPs on the two legs of a network connection, I really see no real increased utility of implementing a dedicated set of re-marking rules for ingress. @ldir really hit it out of the park IMHO when coming up with this idea. Not because this allows the most elaborate QoS hierarchies imaginable, but because it reduces the required complexity by roughly a factor of 2 with introducing significant constraints for typical use-cases (and un-usual use-cases are not forced to use that method and still can go wild).

Ah, got it. Pity egress hook not available yet to work instead with br-lan.

I actually consider @ldir's design to be the best way forward as it will allow both setting DSCPs directly from end-points as well as over-riding those/defining new rules on the router by simply using the firewall GUI... (in my mental model prioritization should only be used sparingly so adding a handful of rules via the GUI seems to be user-friendly enough, but this approach will become cumbersome once the number of rules switches from few to many).

Shaper on br-lan will also affect traffic between WiFi and LAN (unless you specifically exempt that) so not the best place IMHO.

Of course this only matters if internal addresses are actually used in setting DSCPs and not just port numbers.

Ah yes and I like mixture of internal end points and router setting DSCPs. I just didn't know if that was the thinking underlying @yelreve's approach.