Nftables custom QoS, round 2

Simon said 'patches welcome' and has been killed in the rush...ah, no, my mistake, no nftables patches at all. As is often the case with open source, you want it, you code it, unless you can persuade $corp to do it and share for you.

This is a rant and I'm quite cross but I need to get it off of my chest.

nf'f**king'tables.

The status from my 'QoS' perspective is:

  1. dnsmasq doesn't support nftables' named sets so any 'dynamically populated IP address'/port combinations easily achieved with iptables/ipsets is currently off the table. AFAIK there's no 'transition path'. nftables literate people need to provide patches to dnsmasq.

  2. AFAIUI The promise of being able to mangle DSCP on ingress in nftables is only partially fulfilled in that the hook point as pre-NAT so any classifications based on internal IP addresses are still not possible. Leading to...

  3. Workarounds for lack of easy ingress classification in the form of act_ctinfo and storing DSCPs into firewall connmarks are currently impossible in nftables, easily achieved with iptables (and a straightforward patch)

What really pisses me off is that 'act_ctinfo' and 'CONNMARK --set-dscpmark' are solving real world problems TODAY in Openwrt and '--set-dscpmark' in 'old'n'busted' iptables wasn't accepted upstream because there isn't an implementation for the new hotness of nftables. The 'new hotness' is actually stopping development of 'old'n'busted' even though 'new hotness' doesn't have the support mechanism for something that 'old'n'busted' can do with ease.

Storing stuff into connmarks from nftables apparently requires a parser 're-write' which quite frankly is beyond my C and failure to attend a computer science degree. I was hopeful at one point: Jeremy Sowden looked at '--set-dscpmark' and thought "seems simple enough" before being sent down the rabbit hole of parser changes and not seen since.

I want to like nftables. iptables is clunky, I suspect 'ctinfo_4/5layercake.qos' could be written in a much nicer way under nftables but there are some key functionality points missing AFAICT in nftables to do so.... and that's leaving aside openwrts 'fw3'.

sigh. and breath.

3 Likes

Let's try to make lemonade here. How about we try to get your nifty script into sqm-scripts proper while OpenWrt is still using iptables based fw3?
I think we already prepared a few things in sqm-scripts (like the automatic iptables unrolling at sqm stop) but we might still need a few more (like allowing custom tear-down routines)...

1 Like

I don't think it's as bad as all that. I am working on a grant to massively improve COVID testing availability so I don't have lots of spare time for networking issues, but I think it's straightforward to look at egress connections that have high prio DSCP and plunk the IP addresses and ports into a set and on ingress match source ips and set dscp. it doesn't require any special act_ctinfo.

or am I being obtuse?

That sounds like essentially duplicating already existing conntrack information into a set, something the iptables CONNMARK solution handled more elegantly by simply reusing the mark field of kernel connection tracking entries.

Besides the obvious downsides like increased memory usage, there's also issues with different timeouts of set entries vs. ct entries, unrelated connections reusing the same ports etc.

It seems to me like connmark is a very limited thing, it can attach a 32 bit integer to a connection, but only ONE integer, and it can only be used by one thing at a time... like if you want to use it for copying the DSCP field, you can't use it to say mark packets going to a known set of servers or coming from low-priority machines or that have been validated by a captive portal or whatever else you might want to do with them. For example, I put all udp streams into a higher priority tin and then when they go over a certain pps I connmark them to permanently downgrade their priority. This automatically captures a LOT of interactive RTP traffic without much work.

I don't really see that much in the way of downsides to the nftables approach. It basically comes down to the difference between having composable primitives that are general purpose, and fast but limited special purpose things. In general I think composable primitives win out in the long run, but for certain people with machines that have limited resources the special purpose thing can be better at the moment.

For the most part, the nftables sets are lightweight, I mean maybe if you've got a million simultaneous connections you'll run into memory problems on small routers. My RPi4 has 4GB of RAM and cost less than most enthusiast all-in-wonder routers so I guess I don't care about that myself. If you want to play around with this level of network nerdery I think it's worthwhile to invest in somewhat higher performance hardware than say the low end gl-inet travel routers or the like.

Also, I think the best way to handle ingress is to convert it to egress on the LAN side interface... First do simple total bandwidth throttling on the ingress side, and then do reprioritization on the egress side. This could be as simple as a TBF on the IFB associated with the WAN interface, and then a DRR on the egress, or even a Cake on egress if you've got the CPU.

I feel like in some way my dnsmasq/ipsets post kicked a hornets nest here. Perhaps I am misreading the correlation, too. It certainly was not my intention to stir things up, if that was related.

Regardless, my interest in Nftables is in no way meant to be a jab at anyone’s blood, sweat, and tears that have gone into making iptables + SQM the awesome combination it is currently. I would wager most would agree with that sentiment. Speaking for myself, I am a tinkerer and find enjoyment in fiddling with new, often "bleeding edge" stuff. Yeah, I want my internet connection to work well, and it does, but seeing if I can make it run even better is a motivation that has turned into somewhat of a hobby of mine.

Back to the matter at hand, I certainly do not want to see betterment of iptables taking a hit at the expense of nftables. Iptables is out there, all over the place, and it should continue to get attention. I am sorry to hear of the stalled development because of 'new hotness'. But, it is the nature of IT for "things" to iterate and [hopefully] improve. We all know that does not happen by sticking to one platform/tool/ecosystem forever, no matter how good it is today. Otherwise, we would all still be using "operating systems" like GEOS and trying to reach outside our own four walls with 300 baud modems. Yes, those things worked at the time, but thankfully tinkerers around the world decided not to stay with "good enough" forever.

Please know I am not trying to lecture anyone here. I am furthest from the smartest person in this virtual room. On that point, I will admit it right here and now... I have a hard time understanding the intricacies of iptables. Heck, I have a hard time even understanding the basics of it at times. I mean no offense to anyone reading this, but iptables syntax seems complicated. I work in IT and can develop in multiple languages. But, for some dang reason, iptables confounds me at times. I can look at nftables syntax now after only about a week of reading up on it and I get it. It just makes sense to me. Maybe not to everyone else, but that leads me to my next point.

Options. I would like to see nftables get to the point of being equivalent to iptables. Whether that's next month or years from now--and I'm sure it will. But I think it's important for there to be options. For those that get iptables--use the heck out of it. For those who just cannot wrap their heads around all of it, having another workable alternative might be a better fit. I am not proposing advancement of nftables be at the cost of iptables, but I would like to see people continue to tinker with nftables and figure out what needs to happen to make it feature-equivalent to iptables.

Wrapping this up... @ldir the work you've done to get the ctinfo_4/5layercake is flat-out awesome. I am using the ctinfo_5layercake configuration now and it is the finest I have ever seen my internet connection operate at. The improvement in responsiveness is great. I have the utmost respect for you, and many others here, and I am 100% behind @moeller0's suggestion to "make lemonade" :slight_smile: At the end of the day, I do hope those others like @dlakelan will continue pursuing ways to help make nftables more well-rounded to bring parity to the amazingness that SQM + iptables is today.

Sincerely hoping for no hard feelings here. :+1:

P.S. You might find it "interesting" to note that NAT modules appear to be missing for nftables in the latest kernel 5.4 builds at the moment: [kernel 5.4.x | nft] NAT not working due to missing kmods :smirk:

1 Like

Obviously different strokes for different folks. I find the syntax of nftables script much much better than shell scripts full of iptables commands. I admit that nftables has a few edge cases where it doesn't currently offer certain things that iptables has, but let's face it iptables took over from ipchains in like 2000 or so. It's had 20 years of being the PRIMARY firewall to build up features.

I've moved to 100% nftables firewall. I find that it does what I want, including things like quotas for my kids to watch YouTube, and DSCP tagging in a smart way. For the most part I do QoS by putting qdiscs on Egress of WAN and on Egress of each VLAN rather than using the ingress / IFB methodology. When you do that some of these ingress limitations go away.

The big issue with that is for people with connections that are slow the sum of the VLAN egress speeds could exceed the WAN ingress speed, and that would create uncontrolled bottlenecks somewhere else. One strategy is to simply put a TBF on the IFB to limit the total bandwidth, and then use the egress on VLAN technique for more sophisticated prioritization.

1 Like

Ok, for those who are still interested in this topic, can I get a show of hands of who wants to try this out, and what the goals are which aren't met already by the existing example? It'd help me focus my efforts here, since I'm doing a lot of other projects at the moment and don't have time to sit down and think through examples.

1 Like

I'm up for it.

What are the requirements for your system? what do you aim to achieve, which kinds of traffic need marking, how many subnets do you have etc?

Extra! Extra! Read all about it!
Looks like someone finally posted a patch to start doing this: http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2020q3/014212.html

It kind of stupid simple at the moment, which makes it a so-so test case. It is an LTE connection that would elevate Lync/Skype for Business/Teams to high priority.

Single subnet.

Easiest way to do this is setting up DSCP marking on the client side. Documentation from Microsoft on how to do this is here: https://docs.microsoft.com/en-us/microsoftteams/qos-in-teams-clients. Note that if you're not running a domain, you can just run gpedit.msc locally and set up the policies that way. Worked for me on Win7 & Win10. Note that this only handles the outbound streams, but that is generally where the bottleneck is...

May be true for Windows, but DSCP is hard-coded in the Teams client for Mac (last I checked). Not sure what @atiensivu is using, but I’m on Mac myself.

Also, what happens if your client DSCP marks get remapped due to WMM. Wouldn’t it be ideal to “correct” them on the egress at the router?

On a separate note, I’m currently stuck at this issue: [kernel 5.4.x | nft] NAT not working due to missing kmods - #9 by _FailSafe

As far as I can tell, all WiFi WMA does is to map DSCPs to its own 8 UPs and finally it's 4 access classes (ACs), it should not change the packet's DSCP marks.

1 Like

Ah, that makes sense. So the only potential issue with client DSCP marking would be if a client (e.g. a BitTorrent client) sets, or gets set manually to, a greedy mark for CS4 or CS5 (or some other high priority class) and causes airtime issues at the wireless stack?

If that’s the case, that could/would happen today anyway. My WMM point is irrelevant to this topic, then. :slightly_smiling_face:

Looking at the progress @ldir has had and the issues arising I too have hit a wall.

https://lore.kernel.org/netfilter-devel/20191209214208.852229-1-jeremy@azazel.net/t/

From what I can gather they are adding modifications to that kernel and it looks like programming. Way beyond me and it is also not documented for nftables. The problem I have is I can do some iptables and I just built an nftables firewall. Connmark, conntrack, and act restores I am unfamiliar with even though I have been using iptables for awhile. I also can't code the "shaper" with TC commands for priority tins and flow ids resulting in not being able to assign marked packets into a priority tin on ingress even if I could mark it. I am sure I can pull some relevant information from iptables connmark and ct infot docs but aside from that translating into nftables is proving to be a problem. nftables is looking to have problems of its own with the connmark method as opposed to the netdev method. So far, after about a year with iptables and a week with nftables I have coded secure functioning rulesets but the QoS remains broken.

At the present moment nftables has solved one thing for sure, it has made packet flow consistent on ingress on some games but unchanged in others. Quake, PUBG, Valorant, Warzone all improved 10 fold. Overwatch has had no change. Most importantly Fortnite has not improved and not only has it not improved it has gotten worse like the problem isolated itself and is now a pronounced, blatant issue. Frame freeze in gaming. I am certain the problem lies in ingress. I don't see a problem with the code or the hardware. Its either dropped packets or I am not configuring the link correctly. In theory it should be netdev, forget. In reality it might involve drawing from ct/tc marking. I do not believe it has anything to do with packet rate and packet size limiting and I do not believe it has anything to do with priority and tin assignment. 2 years of work and still at square 1.....

Same issue, frame freeze, only now 5x worse.... As you can see in the video: the main video is a popular streamer, the picture in picture is me. The minor issue is I will frame freeze constantly throughout the game and worse when loading into the game in the battle arena or whatever. That major issue is when that player shoots, builds, edits the action and result is instant. With my iptables and nftables not only is there obvious freeze framing but there is action:delay:freezeframe:delay:result.

The action is the same as if I were to create screen recording with OBS of microsoft paint. Say I drew out the letters of the word red.

If I go back to that recording and play it - it would be a 1:1 recording of what actually happened.

Now let's say I have all the limitations I have now. I go back and open that same video. Just after I begin drawing out the word a few seconds in I pause, then play. Then I slow the video down by 0.75 briefly then back to normal speed then pause then play then slow down briefly. The video becomes longer and the pause and delay more pronounced. Now when I look at games it has become clear it really is all I see but it feels like I am looking at the matrix. I'm not really sure what I see or what to do about it.
layer_cake.qos on the 2 interfaces. Not sure why there is a third "qdisc cake 8023" with 1 tin(last pic)

1 Like

Simon Kelley just merged nftables set support for dnsmasq.

https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=47aefca5e405b4b6627ef952fdc42e61b1baa770

3 Likes

That's exciting! It looks like compiling the nftset option into dnsmasq is not yet an option for openwrt, though. Anybody tried?