QoS and nftables ... some findings to share

Hi all, every so often there's discussions on the forum about how to do QoS at a level a little higher than just set-it-and-forget-it that SQM offers (which is a great service thanks @moeller0 and others involved in SQM!). I've been doing custom QoS for several years now and have figured out some stuff I thought I'd share, and we can discuss here.

Warning 1

I don't actually use OpenWrt for my router, only for my access points. So some of this will require some help from you to debug how it works on OpenWrt.

Warning 2

This is for advanced users only, and requires REPLACING YOUR FIREWALL ENTIRELY make sure you UNDERSTAND and take responsibility for your own security and operations. Test this on a spare router or something. I offer this with NO WARRANTY OF ANY KIND as educational material only.

First off...

QoS is hard to do right... so we want to take advantage of the best available tools. And the best available tools for managing packets is nftables. nftables does work on OpenWrt but it completely replaces the firewall and you can't manipulate the firewall through LuCI anymore. So, first thing's first, you need to get nftables working and forget about having iptables anymore. Also you need to get a basic nftables firewall set up that you can customize.

To get started putting nftables on your OpenWrt router, I suggest this:
https://openwrt.org/docs/guide-user/firewall/misc/nftables

But remember, I am actually routing on Debian, so it comes with nftables installed, so hopefully someone who is interested in this topic can post some info on their experiences with setting that up.

Second

You will need an nftables firewall.... here's the bare bones, which assumes your WAN is on eth1 and LAN is on eth0. This firewall lets anything out but only allows things in if they are related to outgoing packets. It masquerades ipv4 on the WAN, and it has some limitations on what can be accepted by the router. If you have a br-lan bridge you will need to alter the use of eth0 here to probably be br-lan, but I'm not sure what nftables thinks is the "in interface" when it's a bridge... so you'll have to test.

I believe this is a safe base for a firewall, but you should check it and test it yourself. This is based on editing down a much more complicated firewall that I use, so it might have bugs or typos.

# A simple stateful firewall with some packet tagging,
# based originally on nftables archlinux wiki
# https://wiki.archlinux.org/index.php/nftables

## this assumes eth0 is LAN and eth1 is WAN, modify as needed

flush ruleset

## change these

define wan = eth1
define lan = eth0

table inet filter {
	chain input {
		type filter hook input priority 0; policy drop;

		# established/related connections
		ct state established,related accept

		# loopback interface
		iifname lo accept

		## icmpv6 is a critical part of the protocol, we just
		## accept everything, you can lookin to making this
		## more restrictive but be careful
		ip6 nexthdr icmpv6 accept

		# we are more restrictive for ipv4 icmp
		ip protocol icmp icmp type { destination-unreachable, router-solicitation, router-advertisement, time-exceeded, parameter-problem } accept

		ip protocol igmp accept

		ip protocol icmp meta iifname $lan accept

		## ntp protocol accept from LAN
		udp dport ntp iifname $lan accept

		## DHCP accept
		iifname $lan ip protocol udp udp sport bootpc udp dport bootps log prefix "FIREWALL ACCEPT DHCP: " accept

		## DHCPv6 accept from LAN
		iifname $lan udp sport dhcpv6-client udp dport dhcpv6-server accept

		## allow dhcpv6 from router to ISP
		iifname $wan udp sport dhcpv6-server udp dport dhcpv6-client accept

		# SSH (port 22), limited to 10 connections per minute,
		# you might prefer to not allow this from WAN for
		# OpenWrt, in which case you should also add an
		# iifname eth0 filter in the front so we're only
		# allowing from LAN
		
		ct state new tcp dport ssh meter ssh-meter4 {ip saddr limit rate 10/minute burst 15 packets} accept
		ct state new ip6 nexthdr tcp tcp dport ssh meter ssh-meter6 {ip6 saddr limit rate 10/minute burst 15 packets} accept 

		## allow access to LUCI from LAN
		iifname $lan tcp dport {http,https} accept

		## DNS for main LAN, we limit the rates allowed from each LAN host to reduce chance of denial of service
		iifname $lan udp dport domain meter dommeter4 { ip saddr limit rate 240/minute burst 240 packets} accept
		iifname $lan udp dport domain meter dommeter6 { ip6 saddr limit rate 240/minute burst 240 packets} accept

		iifname $lan tcp dport domain meter dommeter4tcp { ip saddr limit rate 240/minute burst 240 packets} accept
		iifname $lan tcp dport domain meter dommeter6tcp { ip6 saddr limit rate 240/minute burst 240 packets} accept

		## allow remote syslog input? you might want this, or remove this
		
		# iifname $lan udp dport 514 accept

		counter log prefix "FIREWALL INPUT DROP: " drop
	}

	chain forward {
	    type filter hook forward priority 0; policy drop;

	    ct state established,related accept

	    iifname lo accept
	    iifname $lan oifname $wan accept ## allow LAN to forward to WAN

	    counter log prefix "FIREWALL FAIL FORWARDING: " drop
	}
}

## masquerading for ipv4 output on WAN
table ip masq {
	chain masqout {
	    type nat hook postrouting priority 0; policy accept;
	    oifname $wan masquerade

	}

	## this empty table is required to make the kernel do the unmasquerading
	chain masqin {
	    type nat hook prerouting priority 0; policy accept;

	}
	
}

Once you have got this working and debugged... you can also add special tables for traffic control, including DSCP tagging and policing... For example I add a table a lot like this to my firewall. You can simply plop in additional tables, there is no limit like in iptables where you have to hook the FORWARD table or whatever.


table netdev retag {
    chain tagin {
    	  ## mangle priority, retag anything coming from LAN, you
    	  ## might want to do the same for anything coming from WAN
    	  ## (assumed to be eth1)
	  
    	  type filter hook ingress device eth0 priority -149; policy accept;

	  ip dscp set cs3 ## convert all to cs3 first, this is my base DSCP tag rather than cs0
	  ip6 dscp set cs3

	  # tag ntp packets very high priority
	  ip protocol udp udp sport ntp ip dscp set cs6
	  ip6 nexthdr udp udp sport ntp ip6 dscp set cs6


	  ## icmp/icmpv6 gets high priority but you might not want
	  ## this!  it does let you find out what the round trip time
	  ## for high priority packets is by just using ping though
	  ip protocol icmp ip dscp set cs5
	  ip6 nexthdr icmpv6 ip6 dscp set cs5 

	  ## game traffic on ip and ipv6
	  udp dport {7000-9000, 27000-27200} ip dscp set cs5
	  udp sport {7000-9000, 27000-27200} ip dscp set cs5

	  ip6 nexthdr udp udp dport {7000-9000, 27000-27200} ip6 dscp set cs5
	  ip6 nexthdr udp udp sport {7000-9000, 27000-27200} ip6 dscp set cs5

	  # I have a custom shaper with different classes 1:10, 1:20
	  # are realtime, 1:30 is high priority nonrealtime, 1:40 is
	  # normal, 1:50 is nfs fileserver bulk, 1:60 is very low
	  # priority, if you use cake, you can remove this whole thing
	  # and just use layer cake, it will use the DSCP on its own
	  

	  meta priority set 1:40 ## default

	  ip dscp {ef,cs6} meta priority set 1:10
	  ip dscp {cs5} meta priority set 1:20
	  ip dscp {af41, af42, af43} meta priority set 1:30
	  ip dscp {cs2} meta priority set 1:50
	  ip dscp {cs1} meta priority set 1:60

	  ip6 dscp {ef,cs6} meta priority set 1:10
	  ip6 dscp {cs5} meta priority set 1:20
	  ip6 dscp {af41, af42, af43} meta priority set 1:30
	  ip6 dscp {cs2} meta priority set 1:50
	  ip6 dscp {cs1} meta priority set 1:60

    }
}



This is to get you started, you could add your own tagging here, like you could use rules that detect bulk traffic and tag it cs1, and/or might not want to use my classification scheme, like CS3 as base priority level might not be a good idea on your network. But the point here is that this is custom so YOU make the decisions.

Because this tag table is hooked to "ingress" at extremely high priority, it gets run before queueing in the IFB, which means that you could have one table to tag inbound packets from eth1 and one to tag packets from eth0 and then use SQM on the WAN and the IFB packets will be queued according to the DSCP tags (I believe... please test!)

HOPE THIS GETS YOU STARTED... and would be happy to discuss here for other QoS nerds.

14 Likes

If anyone tries this out and reports bugs in getting stuff going, I will update those scripts in the first post so we can be sure to have a working base config.

Once you've got this going, I can offer some more sophisticated rules for DSCP tagging, particularly for automatically tagging sparse UDP streams like game traffic based on the traffic characteristics itself (packet size and rate etc) as well as down-prioritizing bulk traffic...

But I figure I'll leave that until someone shows some interest and has confirmed they have got this working on some actual OpenWrt device! so have at it!

Ok, so I had one taker @LeTran seemed interested in getting this going. To start with we need to get nftables installed. The device is acting as a pure bufferbloat appliance with a bridge between eth0.2 and eth1.3 so there is no firewalling going on anyway.

This is similar to @mindwolf with his bump in the wire application, though there I think he's doing routing not bridging.

my Wrt-ac3200 is a pure bridge so update lists is not working. I can't install nftables without internet. What is your suggestion?

gave it a shot ( sourcing conf fails badly - appears to have parsing/hook issues ), nftables might be a bit buggy i think... ( needs input of someone with developer level access / interest jeff ? to confirm the underlying systems are ok )

as in the link above... entering via the cli yeilds better results but is kinda impractical.

To load the conf you do nft -f /etc/nftables.conf you can't source it, it's not a shell script.

if you have errors doing that copy and paste them here!

yeah, that's what I mean't... load would have been a better word...

ditched the vm... but something like... for starters

iif
^^^
http
^^^

nft -c -f ./samplefile.conf

works for me on my Debian machine, so I'm guessing you maybe need some nftables stuff installed that isn't installed yet. I do know that @jeff mentioned using it, and obviously the wiki author got it working, so I think nftables works on OpenWrt under some kind of circumstances. I just don't know what those circs are, and I don't have a spare to test it on unfortunately :frowning:

A spare device should not be necessary since OpenWrt can be run in an x86 VM (32 or 64 bit).

1 Like

Yeah, that's a good point, and I don't have enough experience with virtual machines so I should probably try that out just to say "hey I know how that stuff works :slight_smile: "

@mpa I Downloaded and installed x86_64 image for very latest 19.07.0 and did opkg update but there is no nftables package? Is this just a lag in the build system?

nevermind... selecting macvtap in virt-manager config didn't result in the virtual machine having any network interfaces at all. What's the "proper" way to give openwrt a network interface?

see I knew there was a reason I didn't futz with VMs much... sigh

OK, getting a little off topic but I managed to set up two ethernets as macvtap but now... openwrt doesn't come with nano installed! yikes I have to actually use vi!

all right. emergency over, I've got nano and nftables installed! whew :sweat_smile: :upside_down_face:

1 Like

Ok I'll start with a new post here...

I installed a virtual machine... configed stuff... added nano so I could adjust things... and then did the following:

service firewall disable

to disable the standard firewall.

then created a directory for modules we don't use:
mkdir /etc/modules.d.unused
and moved the following things from modules.d to the unused directory:

42-ip6tables
ipt-conntrack
ipt-core
ipt-nat
ipt-offload

i then installed packages opkg install nftables kmod-nf-conntrack kmod-nf-conntrack6 kmod-nf-nat kmod-nf-reject kmod-nf-reject6 kmod-nfnetlink kmod-nft-core kmod-nft-nat kmod-nft-netdev libnftnl nftables

reboot, and you have an openwrt with no iptables firewall
at this point
nft -f /etc/nftables.conf works if you put my script into /etc/nftables.conf

so hopefully that gets someone up and running...

4 Likes

Some stupid questions XD

This is a script or not?
Created a directory for modules meant I have to create a file name "modules.d.unused" in /etc/? what is mkdir

Is this the script in the file "modules.d.unused"?

mkdir is the command to make a directory. you can type this command right at the command line. The three commands are:

mkdir /etc/modules.d.unused
mv /etc/modules.d/ipt* /etc/modules.d.unused
mv /etc/modules.d/42-ip6tables /etc/modules.d.unused

Copy and paste those three lines, one at a time to the command line on the router.

Done doing those 3 command line

The asterisk * mean it will automatically replace * by ipt-conntrack, ipt-core, ipt-nat in to this code
"mv /etc/modules.d/ipt* /etc/modules.d.unused" right?

yes that's right.

Now I think it makes sense to go to LUCI and go to system > startup and scroll to the bottom. before the line exit 0 put
/usr/sbin/nft -f /etc/nftables.conf

and copy the script from above to /etc/nftables.conf

I will have to leave it for that today, and come back tomorrow.

you meant this script /usr/sbin/nft -f /etc/nftables.conf ?

it's weird that I can't find nftables.conf in /etc/ of Winscp.
Pretty sure i installed packages with your script
opkg install nftables kmod-nf-conntrack kmod-nf-conntrack6 kmod-nf-nat kmod-nf-reject kmod-nf-reject6 kmod-nfnetlink kmod-nft-core kmod-nft-nat libnftnl nftables

yes, you will have to scp it over there. First copy and paste the script from the first post into a file. then scp that file from your computer to the router.

also I will edit the file to make LAN and WAN into variables, so it's easy to adjust to the particular hardware people use... hold on.

Ok, I changed that script it uses $lan and $wan now to describe the interfaces, so at the top of the script you can plug in the values your router needs, like eth0.1 or ppoe-wan or anything like that

It uses device names.

You meant I have to create a file in /etc/ and name it nftables.conf
after that i have to copy and paste this code

# A simple stateful firewall with some packet tagging,
# based originally on nftables archlinux wiki
# https://wiki.archlinux.org/index.php/nftables

## this assumes eth0 is LAN and eth1 is WAN, modify as needed

flush ruleset

## change these

define wan = eth1
define lan = eth0

table inet filter {
	chain input {
		type filter hook input priority 0; policy drop;

		# established/related connections
		ct state established,related accept

		# loopback interface
		iifname lo accept

		## icmpv6 is a critical part of the protocol, we just
		## accept everything, you can lookin to making this
		## more restrictive but be careful
		ip6 nexthdr icmpv6 accept

		# we are more restrictive for ipv4 icmp
		ip protocol icmp icmp type { destination-unreachable, router-solicitation, router-advertisement, time-exceeded, parameter-problem } accept

		ip protocol igmp accept

		ip protocol icmp meta iifname $lan accept

		## ntp protocol accept from LAN
		udp dport ntp iifname $lan accept

		## DHCP accept
		iifname $lan ip protocol udp udp sport bootpc udp dport bootps log prefix "FIREWALL ACCEPT DHCP: " accept

		## DHCPv6 accept from LAN
		iifname $lan udp sport dhcpv6-client udp dport dhcpv6-server accept

		## allow dhcpv6 from router to ISP
		iifname $wan udp sport dhcpv6-server udp dport dhcpv6-client accept

		# SSH (port 22), limited to 10 connections per minute,
		# you might prefer to not allow this from WAN for
		# OpenWrt, in which case you should also add an
		# iifname eth0 filter in the front so we're only
		# allowing from LAN
		
		ct state new tcp dport ssh meter ssh-meter4 {ip saddr limit rate 10/minute burst 15 packets} accept
		ct state new ip6 nexthdr tcp tcp dport ssh meter ssh-meter6 {ip6 saddr limit rate 10/minute burst 15 packets} accept 

		## allow access to LUCI from LAN
		iifname $lan tcp dport {http,https} accept

		## DNS for main LAN, we limit the rates allowed from each LAN host to reduce chance of denial of service
		iifname $lan udp dport domain meter dommeter4 { ip saddr limit rate 240/minute burst 240 packets} accept
		iifname $lan udp dport domain meter dommeter6 { ip6 saddr limit rate 240/minute burst 240 packets} accept

		iifname $lan tcp dport domain meter dommeter4tcp { ip saddr limit rate 240/minute burst 240 packets} accept
		iifname $lan tcp dport domain meter dommeter6tcp { ip6 saddr limit rate 240/minute burst 240 packets} accept

		## allow remote syslog input? you might want this, or remove this
		
		# iifname $lan udp dport 514 accept

		counter log prefix "FIREWALL INPUT DROP: " drop
	}

	chain forward {
	    type filter hook forward priority 0; policy drop;

	    ct state established,related accept

	    iifname lo accept
	    iifname $lan oifname $wan accept ## allow LAN to forward to WAN

	    counter log prefix "FIREWALL FAIL FORWARDING: " drop
	}
}

## masquerading for ipv4 output on WAN
table ip masq {
	chain masqout {
	    type nat hook postrouting priority 0; policy accept;
	    oifname $wan masquerade

	}

	## this empty table is required to make the kernel do the unmasquerading
	chain masqin {
	    type nat hook prerouting priority 0; policy accept;

	}
	
}

and this

table netdev retag {
    chain tagin {
    	  ## mangle priority, retag anything coming from LAN, you
    	  ## might want to do the same for anything coming from WAN
    	  ## (assumed to be eth1)
	  
    	  type filter hook ingress device eth0 priority -149; policy accept;

	  ip dscp set cs3 ## convert all to cs3 first, this is my base DSCP tag rather than cs0
	  ip6 dscp set cs3

	  # tag ntp packets very high priority
	  ip protocol udp udp sport ntp ip dscp set cs6
	  ip6 nexthdr udp udp sport ntp ip6 dscp set cs6


	  ## icmp/icmpv6 gets high priority but you might not want
	  ## this!  it does let you find out what the round trip time
	  ## for high priority packets is by just using ping though
	  ip protocol icmp ip dscp set cs5
	  ip6 nexthdr icmpv6 ip6 dscp set cs5 

	  ## game traffic on ip and ipv6
	  udp dport {7000-9000, 27000-27200} ip dscp set cs5
	  udp sport {7000-9000, 27000-27200} ip dscp set cs5

	  ip6 nexthdr udp udp dport {7000-9000, 27000-27200} ip6 dscp set cs5
	  ip6 nexthdr udp udp sport {7000-9000, 27000-27200} ip6 dscp set cs5

	  # I have a custom shaper with different classes 1:10, 1:20
	  # are realtime, 1:30 is high priority nonrealtime, 1:40 is
	  # normal, 1:50 is nfs fileserver bulk, 1:60 is very low
	  # priority, if you use cake, you can remove this whole thing
	  # and just use layer cake, it will use the DSCP on its own
	  

	  meta priority set 1:40 ## default

	  ip dscp {ef,cs6} meta priority set 1:10
	  ip dscp {cs5} meta priority set 1:20
	  ip dscp {af41, af42, af43} meta priority set 1:30
	  ip dscp {cs2} meta priority set 1:50
	  ip dscp {cs1} meta priority set 1:60

	  ip6 dscp {ef,cs6} meta priority set 1:10
	  ip6 dscp {cs5} meta priority set 1:20
	  ip6 dscp {af41, af42, af43} meta priority set 1:30
	  ip6 dscp {cs2} meta priority set 1:50
	  ip6 dscp {cs1} meta priority set 1:60

    }
}

into this nftables.conf that i just created?

that could work, you'd have to open the file in an editor on the router, but you could also copy and paste the text it into a file on your computer with notepad or something, and then just use scp or filezilla or something similar to copy the file over to the router.

so

define wan = eth1
define lan = eth0

change into

define wan = eth0.3
define lan = eth1.2