NFtables and QoS in 2021

gabrielo · August 20, 2022, 9:59pm

if you are using cake qdiscs, when make tests be sure which the cake qdiscs are created with the nowash option to preserve the dscp

because the wash option clears the dscp field from all packets after put the packet in the designated tin, so tcpdump is unable to see any dscp marks

dave14305 · August 21, 2022, 1:18am

No effect on what? Be specific. Post output. Add some counters to your rules and see if they are being hit.

Lynx · August 21, 2022, 3:42pm

OK I have created the file 20-try.nft in /etc/nftables.d/ with content:

chain tagin {
          ## mangle priority, retag anything coming from LAN, you
          ## might want to do the same for anything coming from WAN
          ## (assumed to be eth1)

          type filter hook ingress device br-lan priority -149; policy accept;

          ip dscp set cs3 ## convert all to cs3 first, this is my base DSCP tag rather than cs0
          ip6 dscp set cs3

          # tag ntp packets very high priority
          ip protocol udp udp sport ntp ip dscp set cs6
          ip6 nexthdr udp udp sport ntp ip6 dscp set cs6


          ## icmp/icmpv6 gets high priority but you might not want
          ## this!  it does let you find out what the round trip time
          ## for high priority packets is by just using ping though
          ip protocol icmp ip dscp set cs5
          ip6 nexthdr icmpv6 ip6 dscp set cs5

          ## game traffic on ip and ipv6
          udp dport {7000-9000, 27000-27200} ip dscp set cs5
          udp sport {7000-9000, 27000-27200} ip dscp set cs5

          ip6 nexthdr udp udp dport {7000-9000, 27000-27200} ip6 dscp set cs5
          ip6 nexthdr udp udp sport {7000-9000, 27000-27200} ip6 dscp set cs5

          # I have a custom shaper with different classes 1:10, 1:20
          # are realtime, 1:30 is high priority nonrealtime, 1:40 is
          # normal, 1:50 is nfs fileserver bulk, 1:60 is very low
          # priority, if you use cake, you can remove this whole thing
          # and just use layer cake, it will use the DSCP on its own


          meta priority set 1:40 ## default

          ip dscp {ef,cs6} meta priority set 1:10
          ip dscp {cs5} meta priority set 1:20
          ip dscp {af41, af42, af43} meta priority set 1:30
          ip dscp {cs2} meta priority set 1:50
          ip dscp {cs1} meta priority set 1:60

          ip6 dscp {ef,cs6} meta priority set 1:10
          ip6 dscp {cs5} meta priority set 1:20
          ip6 dscp {af41, af42, af43} meta priority set 1:30
          ip6 dscp {cs2} meta priority set 1:50
          ip6 dscp {cs1} meta priority set 1:60

    }

then:

/etc/init.d/firewall restart

then:

tcpdump -i br-lan -vv

and this gives output:

tcpdump: listening on br-lan, link-type EN10MB (Ethernet), capture size 262144 bytes
16:41:04.968162 IP (tos 0x0, ttl 128, id 40940, offset 0, flags [none], proto ICMP (1), length 60)
    XXX.lan > one.one.one.one: ICMP echo request, id 1, seq 83, length 40
16:41:05.011274 IP (tos 0x0, ttl 58, id 1754, offset 0, flags [none], proto ICMP (1), length 60)
    one.one.one.one > XXX.lan: ICMP echo reply, id 1, seq 83, length 40

So as you can see the ingress tos value is not getting set as per:

          ip protocol icmp ip dscp set cs5
          ip6 nexthdr icmpv6 ip6 dscp set cs5

and in the context of the ingress hook:


          type filter hook ingress device br-lan priority -149; policy accept;

What am I missing?

dave14305 · August 21, 2022, 5:41pm

Try checking the tcpdump on your ifb interface, or check the cake stats on that interface. I don’t know if tcpdump will capture br-lan before or after the ingress hook in an inet table.

Did you add any counters to the marking rules? Just add the keyword counter at the end of some of your statements. Then check them with nft list chain inet fw4 tagin

Dopam-IT_1987 · August 21, 2022, 6:10pm

@Lynx

very good work i has try to make a special script on elan script in add VPN interface

like this

## Go to "Network -> Interfaces" and write the name of your "WAN" interface.
WAN="wan"
VPN="tun0" ### add by me 
## Add veth devices and OpenVpn
ip link set veth0 up
ip link set veth1 up
ip link set veth1 promisc on
ip link set veth1 master br-lan
ip rule del priority 100 > /dev/null 2>&1
ip route del table 100 > /dev/null 2>&1
ip route add default dev veth0 table 100
ip rule add iif $WAN priority 100 table 100
ip -6 rule del priority 100 > /dev/null 2>&1
ip -6 route del table 100 > /dev/null 2>&1
ip -6 route add default dev veth0 table 100
ip -6 rule add iif $WAN priority 100 table 100
ip rule add iif $VPN priority 100 table 100 ##add by me
ip -6 rule del priority 100 > /dev/null 2>&1
ip -6 route del table 100 > /dev/null 2>&1
ip -6 route add default dev veth0 table 100
ip -6 rule add iif $VPN priority 100 table 100
## Delete the old qdiscs created by the script
tc qdisc del dev veth0 root > /dev/null 2>&1
tc qdisc del dev $WAN root > /dev/null 2>&1
tc qdisc del dev $VPN root > /dev/null 2>&1 ##ad by me all before part
## Inbound / Ingress
if [ "$BANDWIDTH_DOWN" != "" ]; then
    tc qdisc add dev $VPN root cake $BANDWIDTH_DOWN_CAKE $AUTORATE_INGRESS_CAKE $PRIORITY_QUEUE_INGRESS $HOST_ISOLATION_INGRESS $NAT_INGRESS $WASH_INGRESS $INGRESS_MODE $RTT $COMMON_LINK_PRESETS $ETHER_VLAN_KEYWORD $LINK_COMPENSATION $OVERHEAD $MPU $EXTRA_PARAMETERS_INGRESS
fi
## Delete the old qdiscs created by the script
    tc qdisc del dev veth0 root > /dev/null 2>&1
    tc qdisc del dev $WAN root > /dev/null 2>&1
    tc qdisc del dev $VPN root > /dev/null 2>&1 ## add by me

tell me if you think is good idea

for the moment with cyberghost vpn i has no bufferbloat in upload but always in download i has just experimented the vpn with this script

my bufferbloat witouth vpn https://www.waveform.com/tools/bufferbloat?test-id=03760703-49e5-4e03-9b21-c6fec47c6550

with vpn but not setting add https://www.waveform.com/tools/bufferbloat?test-id=e7956a3d-f2c6-4b25-9998-39bf5be562d1

with vpn and my new settings add https://www.waveform.com/tools/bufferbloat?test-id=6c67837a-54ff-4946-a186-bd444fbeb315

Lynx · August 21, 2022, 7:51pm

Your knowledge of all this obscure networking stuff is rather intimidating! Your suggestions seem to be helping push the needle. Here are my findings.

root@OpenWrt:~# tcpdump -i ifb-ul -vv host 1.1.1.1
tcpdump: listening on ifb-ul, link-type EN10MB (Ethernet), capture size 262144 bytes
20:47:09.022506 IP (tos 0x0, ttl 128, id 40952, offset 0, flags [none], proto ICMP (1), length 60)
    XXXX.lan > one.one.one.one: ICMP echo request, id 1, seq 101, length 40

root@OpenWrt:~# tcpdump -i vpn -vv host 1.1.1.1
tcpdump: listening on vpn, link-type RAW (Raw IP), capture size 262144 bytes
20:47:58.913343 IP (tos 0x0, ttl 127, id 40956, offset 0, flags [none], proto ICMP (1), length 60)
    YY.YY > one.one.one.one: ICMP echo request, id 1, seq 105, length 40

Without counters:

root@OpenWrt:~# nft list chain inet fw4 tagin
table inet fw4 {
        chain tagin {
                type filter hook ingress device "br-lan" priority mangle + 1; policy accept;
                ip dscp set cs3
                ip6 dscp set cs3
                ip protocol udp udp sport 123 ip dscp set cs6
                ip6 nexthdr udp udp sport 123 ip6 dscp set cs6
                ip protocol icmp ip dscp set cs5
                ip6 nexthdr ipv6-icmp ip6 dscp set cs5
                udp dport { 7000-9000, 27000-27200 } ip dscp set cs5
                udp sport { 7000-9000, 27000-27200 } ip dscp set cs5
                ip6 nexthdr udp udp dport { 7000-9000, 27000-27200 } ip6 dscp set cs5
                ip6 nexthdr udp udp sport { 7000-9000, 27000-27200 } ip6 dscp set cs5
                meta priority set 1:40
                ip dscp { ef, cs6 } meta priority set 1:10
                ip dscp cs5 meta priority set 1:20
                ip dscp { af41, af42, af43 } meta priority set 1:30
                ip dscp cs2 meta priority set 1:50
                ip dscp cs1 meta priority set 1:60
                ip6 dscp { ef, cs6 } meta priority set 1:10
                ip6 dscp cs5 meta priority set 1:20
                ip6 dscp { af41, af42, af43 } meta priority set 1:30
                ip6 dscp cs2 meta priority set 1:50
                ip6 dscp cs1 meta priority set 1:60
        }
}

With counter on icmp:

root@OpenWrt:/etc/nftables.d# nft list chain inet fw4 tagin
table inet fw4 {
        chain tagin {
                type filter hook ingress device "br-lan" priority mangle + 1; policy accept;
                ip dscp set cs3
                ip6 dscp set cs3
                ip protocol udp udp sport 123 ip dscp set cs6
                ip6 nexthdr udp udp sport 123 ip6 dscp set cs6
                ip protocol icmp ip dscp set cs5 counter packets 0 bytes 0
                ip6 nexthdr ipv6-icmp ip6 dscp set cs5 counter packets 2 bytes 144
                udp dport { 7000-9000, 27000-27200 } ip dscp set cs5
                udp sport { 7000-9000, 27000-27200 } ip dscp set cs5
                ip6 nexthdr udp udp dport { 7000-9000, 27000-27200 } ip6 dscp set cs5
                ip6 nexthdr udp udp sport { 7000-9000, 27000-27200 } ip6 dscp set cs5
                meta priority set 1:40
                ip dscp { ef, cs6 } meta priority set 1:10
                ip dscp cs5 meta priority set 1:20
                ip dscp { af41, af42, af43 } meta priority set 1:30
                ip dscp cs2 meta priority set 1:50
                ip dscp cs1 meta priority set 1:60
                ip6 dscp { ef, cs6 } meta priority set 1:10
                ip6 dscp cs5 meta priority set 1:20
                ip6 dscp { af41, af42, af43 } meta priority set 1:30
                ip6 dscp cs2 meta priority set 1:50
                ip6 dscp cs1 meta priority set 1:60
        }
}

So packets '0' indicates a problem I think? Any idea what's wrong? Am I missing certain required packages? This is with a very recent 22.03 snapshot on RT3200 but I may well not have certain required kmod? packages.

dlakelan · August 21, 2022, 9:19pm

I'm not sure if a bridge has an ingress. It may be that the individual devices that are slaved to the bridge are the ones with the ingress?

In any case, you probably don't want ingress on br-lan, you can just manipulate the DSCP during normal inet table processing no?

dave14305 · August 21, 2022, 9:39pm

He mirrors (with tc) ingress on br-lan and br-guest to an ifb (where cake is instantiated) to control upload before being sent to a Wireguard tunnel or not (via PBR).

Dopam-IT_1987 · August 21, 2022, 9:39pm

good evening nftables is fantastic my games are going really well,

congratulations again to the OpenWrt team

Capture d’écran 2022-08-21 à 23.34.22

just I can't copy my putty sometimes but do you think @moeller0 @dlakelan that this output is correct?

vpn + script gives me very good result

Capture d’écran 2022-08-21 à 23.37.00

Lynx · August 21, 2022, 10:26pm

Exactly this. Here is my script:

start() {

	# ifb interface for handling ingress on WAN (and VPN interface if wg show reports endpoint)
	ip link add name ifb-ul type ifb
	ip link add name ifb-dl type ifb
    ip link set ifb-ul up
    ip link set ifb-dl up

    tc qdisc add dev br-lan handle ffff: ingress
    tc qdisc add dev br-guest handle ffff: ingress
	tc qdisc add dev br-lan handle 1: root prio
	tc qdisc add dev br-guest handle 1: root prio

	# capture upload (ingress) on br-lan and br-guest
	tc filter add dev br-lan parent ffff: protocol ip prio 1 u32 match ip dst 192.168.1.0/24 action pass
	tc filter add dev br-lan parent ffff: prio 2 matchall action mirred egress redirect dev ifb-ul
	tc filter add dev br-guest parent ffff: protocol ip prio 1 u32 match ip dst 192.168.2.0/24 action pass
	tc filter add dev br-guest parent ffff: prio 2 matchall action mirred egress redirect dev ifb-ul
       	
	# capture download (egress) on br-lan and br-guest
	tc filter add dev br-lan parent 1: protocol ip prio 1 u32 match ip src 192.168.1.0/24 action pass
	tc filter add dev br-lan parent 1: prio 2 matchall action mirred egress redirect dev ifb-dl
	tc filter add dev br-guest parent 1: protocol ip prio 1 u32 match ip src 192.168.2.0/24 action pass
	tc filter add dev br-guest parent 1: prio 2 matchall action mirred egress redirect dev ifb-dl

	# apply CAKE on the ifbs
	tc qdisc add dev ifb-dl root cake bandwidth 30Mbit diffserv4 triple-isolate nonat nowash no-ack-filter split-gso rtt 100ms noatm overhead 92
    tc qdisc add dev ifb-ul root cake bandwidth 25Mbit diffserv4 triple-isolate nonat nowash ingress no-ack-filter split-gso rtt 100ms noatm overhead 92
}

Works a treat!

So I want to apply DSCP marks via nftables such that the cake on ifb-ul (containing tc mirred packets from br-lan and br-guest) will see them.

Why isn't my chain working though? I imagine I am lacking an OpenWrt package or so?

dlakelan · August 21, 2022, 10:30pm

hmmm... yes tricky.

I think it might be necessary to hook into every underlying physical device's ingress. br-lan br-guest are virtual devices where I don't think they have ingress.

I'd suggest to create the tagging chain separately, and then create an ingress hook for each physical device, and just jump to the tag chain.

Lynx · August 21, 2022, 10:31pm

Oh man really? But I have WiFi and stuff go in br-lan and br-guest. I really can't hook ingress on br-lan? That seems a real shame . Are you sure it's not just I'm missing a package for the ingress hook to work? I saw a bunch of kmod packages relating to nft. So I wonder if I'm just missing a package or three.

Also if what you say is true then how come I can mirror from their ingress?


    tc qdisc add dev br-lan handle ffff: ingress
    tc qdisc add dev br-guest handle ffff: ingress

    # capture upload (ingress) on br-lan and br-guest
	tc filter add dev br-lan parent ffff: protocol ip prio 1 u32 match ip dst 192.168.1.0/24 action pass
	tc filter add dev br-lan parent ffff: prio 2 matchall action mirred egress redirect dev ifb-ul
	tc filter add dev br-guest parent ffff: protocol ip prio 1 u32 match ip dst 192.168.2.0/24 action pass
	tc filter add dev br-guest parent ffff: prio 2 matchall action mirred egress redirect dev ifb-ul

That works fine. Tested and all is good. I see all the upload packets and cake works a treat.

If I can tc mirror from br-lan and br-guest ingress then don't they have ingress that I can apply the nftables ingress hook to classify dscp marks on prior to the mirroring?

Another option is just tc filtering on the IFBs to force cake tins. But I'd like to rule out nftables first. Since it's nicer to work with port ranges and stuff.

dave14305 · August 21, 2022, 10:40pm

Maybe your traffic is being mirred away before the inet ingress chain is invoked.

Lynx · August 21, 2022, 10:41pm

Don't think so because then I'd expect to see it on the VPN interface but didn't see it there either.

Also I thought @dlakelan was suggesting in other thread that these fancy ingress hooks action before the mirroring?

dave14305 · August 21, 2022, 10:42pm

Add counters to every line and run more tests.

Lynx · August 21, 2022, 10:43pm

Ok will do. If I'm missing a package like kmod-xxx? can I see in log file some kind of screaming?

dave14305 · August 21, 2022, 10:44pm

The rules would not be accepted if you were missing a kmod. You’d get some file not found errors from netlink.

Lynx · August 21, 2022, 10:45pm

OK and also do you think just chain definition inside /etc/nftables.d/ ought to work? I mean you can see my rule above in output list.

root@OpenWrt:/etc/nftables.d# nft list chain inet fw4 tagin
table inet fw4 {
        chain tagin {
                type filter hook ingress device "br-lan" priority mangle + 1; policy accept;
                ip dscp set cs3
                ip6 dscp set cs3
                ip protocol udp udp sport 123 ip dscp set cs6
                ip6 nexthdr udp udp sport 123 ip6 dscp set cs6
                ip protocol icmp ip dscp set cs5 counter packets 0 bytes 0
                ip6 nexthdr ipv6-icmp ip6 dscp set cs5 counter packets 2 bytes 144
                udp dport { 7000-9000, 27000-27200 } ip dscp set cs5
                udp sport { 7000-9000, 27000-27200 } ip dscp set cs5
                ip6 nexthdr udp udp dport { 7000-9000, 27000-27200 } ip6 dscp set cs5
                ip6 nexthdr udp udp sport { 7000-9000, 27000-27200 } ip6 dscp set cs5
                meta priority set 1:40
                ip dscp { ef, cs6 } meta priority set 1:10
                ip dscp cs5 meta priority set 1:20
                ip dscp { af41, af42, af43 } meta priority set 1:30
                ip dscp cs2 meta priority set 1:50
                ip dscp cs1 meta priority set 1:60
                ip6 dscp { ef, cs6 } meta priority set 1:10
                ip6 dscp cs5 meta priority set 1:20
                ip6 dscp { af41, af42, af43 } meta priority set 1:30
                ip6 dscp cs2 meta priority set 1:50
                ip6 dscp cs1 meta priority set 1:60
        }
}

Looks OK to me? But packet counter stays at zero?

I get what @dlakelan is saying br-lan has no ingress but then seems weird I can tc mirror from its ingress.

Oh wait just saw that ipv6 captured some packets....

ip6 nexthdr ipv6-icmp ip6 dscp set cs5 counter packets 2 bytes 144

So why didn't the tos values show up on br-lan or VPN? I'm confused...

Lynx:

root@OpenWrt:~# tcpdump -i ifb-ul -vv host 1.1.1.1
tcpdump: listening on ifb-ul, link-type EN10MB (Ethernet), capture size 262144 bytes
20:47:09.022506 IP (tos 0x0, ttl 128, id 40952, offset 0, flags [none], proto ICMP (1), length 60)
    XXXX.lan > one.one.one.one: ICMP echo request, id 1, seq 101, length 40

root@OpenWrt:~# tcpdump -i vpn -vv host 1.1.1.1
tcpdump: listening on vpn, link-type RAW (Raw IP), capture size 262144 bytes
20:47:58.913343 IP (tos 0x0, ttl 127, id 40956, offset 0, flags [none], proto ICMP (1), length 60)
    YY.YY > one.one.one.one: ICMP echo request, id 1, seq 105, length 40

But at least I see some counter action now...

gabrielo · August 21, 2022, 10:54pm

Care to make one crazy experiment? maybe works

change these lines

type filter hook ingress device "br-lan" priority mangle + 1; policy accept;

tc qdisc add dev ifb-ul root cake bandwidth 25Mbit diffserv4 triple-isolate nonat nowash ingress no-ack-filter split-gso rtt 100ms noatm overhead 92

by these:

type filter hook postrouting priority 0 ; policy accept;

tc qdisc add dev ifb-ul root cake bandwidth 25Mbit diffserv4 triple-isolate nonat nowash no-ack-filter split-gso rtt 100ms noatm overhead 92

Edit: Removed the device from the postrouting hook because isn't accepted there

dave14305 · August 22, 2022, 12:46am

Lynx:

meta priority set 1:40
ip dscp { ef, cs6 } meta priority set 1:10
ip dscp cs5 meta priority set 1:20
ip dscp { af41, af42, af43 } meta priority set 1:30
ip dscp cs2 meta priority set 1:50
ip dscp cs1 meta priority set 1:60
ip6 dscp { ef, cs6 } meta priority set 1:10
ip6 dscp cs5 meta priority set 1:20
ip6 dscp { af41, af42, af43 } meta priority set 1:30
ip6 dscp cs2 meta priority set 1:50
ip6 dscp cs1 meta priority set 1:60

What do you suppose these statements are doing? Is this related to the prio qdisc? Are all those classes (10-60) valid? What happens when the traffic is destined for CAKE instead?