Ultimate SQM settings: Layer_cake + DSCP marks

BTW, if you try nftables, have a look at the new nft-qos package which promises per IP/subnet throttling via a luci interface, which might be a good starting point for a dscp remarker app, if instead of using the rate limiter one optionally rewrites the dscps. Note, I have not tested yet how this behaves in regards to latency under load/bufferbloat since I have not managed to get it installed yet (would need to compile firmware from source and am lacking diskspace to do so).

@dlakelan good luck with configuring nft.
but what is the real benefit of nft, simplicity,less rules?

but will this force you to use tc, in this case we can't use sqm diffserv classes?!
@moeller0 but how we would tag packets with nft qos?
is there any classes,prioritization?
this mean we will loss sqm bufferbloat, or we should make sqm work with nft instead of iptables?
for dscp will we use something like this?

ip forward ip dscp set 42

also is there any attempts to make openwrt firewall based on nft instead of iptables, or any new package
like luci-app-nftfirewall?

I run a Debian router with fireqos, converting to nftables made my config easier to understand and much simpler, I suspect faster too, fireqos had hundreds of rules in tens of chains, my nftables has around 20 lines of code :joy:

When I tried to use iptables to tag dscp in my prewan namespace my speed dropped from 750 megabits to about a megabit, and erratic. So this motivated me to look into it.

I run openwrt on my access points, who have a lighter duty.

Nftables let's you select classes by using the priority action, so it can be used instead of TC, which is a big benefit because TC is terrible to use. I think it means while tagging dscp also you can select classes.

Too funny, that's much smaller and will take less space from disk.

hhhaah,really strange.use nft to tag packets with dscp.

it will make things easier, i think this will allow sqm to run on nft instead of tc!.
tc is not well documented and hard to understand, still i can't understand what does priomap do or how to config it

Hi guys! Everyone still nicely tweaking QoS and all i see :stuck_out_tongue:

I have a question, I'm running at 15/15 bandwidth right now conservative as my 4G randomly drops once in a while to sub 20's occasionally. But sometimes I just want to go to the max of 40Mbit~ for a quick download. I was thinking of using a shortcut on desktop and use ssh or something to send a command to my router.

I assume i have to use a command to re-set bandwidth on veth0 SQM instance and restart SQM, but not sure how to do it. Would I need to use tc command? Other solution would be to swap config file and then restart through init.d but not sure what the best approach is.

Thanks :slight_smile:

Another thing which is weird is when i set it less conservative, on full load sometimes the WAN connection disconnects for a second or two, but the only thing I found on the web is about LCP echo starvation and PPPoE DSL connection, but I dont have such a connection. I wonder if its because of Link layer adaption perhaps?

Ahh the smell of fresh QoS in the morning... :wink:

so, coming back from some nftables tweaking. I've changed my firewall from fireQOS using iptables to a custom nftables firewall. This seems to have improved my throughput a little, in that with the same bandwidth settings the cpu idle drops only to say 25% instead of 5-10%. In the prewan namespace where i've shoved the inbound ethernet vlan device and the veth0 into a bridge, here is what I am loading into nftables to do tagging. (ip/ip6 addresses and ethernet device names obscured because they're not important to the overall point)

table netdev retag {
    chain tagin {
          ## mangle priority
          type filter hook ingress device <MYEthDevHere> priority -149; policy accept;

          ip dscp set cs2 ## convert all to cs2 first
          ip6 dscp set cs2
          
          ## VOIP servers UDP traffic
          ip saddr {x.x.x.x, y.y.y.y} ip protocol udp ip dscp set cs6
          ip daddr {x.x.x.x, y.y.y.y} ip protocol udp ip dscp set cs6
          
          ip6 saddr {aaaa:aaaa::aaaa, bbbb:bbbb::bbbb} ip6 nexthdr udp ip6 dscp set cs6
          ip6 daddr {aaaa:aaaa::aaaa, bbbb:bbbb::bbbb} ip6 nexthdr udp ip6 dscp set cs6

          ## icmp/icmpv6 matches game traffic priority
          ip protocol icmp ip dscp set cs5
          ip6 nexthdr icmpv6 ip6 dscp set cs5 

          ## game traffic on ip and ipv6
          udp dport {7000-9000, 27000-27200} ip dscp set cs5
          udp sport {7000-9000, 27000-27200} ip dscp set cs5

          ip6 nexthdr udp udp dport {7000-9000, 27000-27200} ip6 dscp set cs5
          ip6 nexthdr udp udp sport {7000-9000, 27000-27200} ip6 dscp set cs5
          
          ## large transfers over 3 seconds long at 100Mbps (37.5
          ## MBytes) get deprioritized this doesn't work yet because
          ## conntrack during ingress isn't a thing, we do this later
          ## in the main namespace
          
#         ip protocol tcp ct bytes ge 37500000 ip dscp set cs1
#         ip6 nexthdr tcp ct bytes ge 37500000 ip6 dscp set cs1

    }
}

Conntracking isn't available at the netdev/ingress hook so I commented out the last two rules. I do use a rule like that in the main namespace where I'm doing regular forwarding/routing. It will further delay the long running transfers. It occurs to me I should do that cs1 tagging before the VOIP specific tagging because some calls might last long enough to be down-prioritized, so I just changed that in the main namespace.

So, the upshot is that the system of shoving the inbound device into a prewan namespace and bridging it to a veth works well. I get very good results, with an A+ rating at 700+ megabits using a j1900 motherboard as router. Before doing this I was getting an A rating, and had occasional latency spikes under high load. In part this is due to the fact that I run a proxy on the router so packets coming into the proxy were un-queued, and in part due to the LAN download being set higher than the WAN because of bonded interfaces and the fact that the box has other functions that I didn't want to throttle.

Now, as for @Emtee and variable speeds. There is no great way to automatically handle variable speeds from outside the qdisc. It's definitely possible for the qdisc itself to have an algorithm that "learns" the bandwidth and constantly probes for higher available bandwidth until it hits latency problems and then backs off... but it requires a level of sophistication that would cause a lot of computational overhead and basically isn't currently available in any qdisc. The biggest issue is how does the qdisc measure round-trip latency increase directly? It's not like packets arrive with a timestamp included so you have to do some kind of indirect inference, like looking at the time between when tcp segments come in your interface vs when acks go back out that interface... or something like that, and it's not obviously easy. It's conceivable someone could create some kind of latency monitoring conntrack module which could then provide current latency estimates to sophisticated qdiscs... but yeah for the moment no.

So, how can you take advantage of your bandwidth more fully? It's a big drop from 40 to 15 Mbps ! I think your best bet for simplicity is to swap between two config files, and then just restart the sqm. Create two config files that you like, one for normal and one for fast download, and then copy whichever one you want into the /etc/config/ directory and restart SQM. It's quick and dirty but it's easy.

Yea I think for my latency sensitive gaming needs It would be a bad idea to try to 'compensate' with a weird algorithm. I will probably use the config swap method for the time being.

One thing Im still unsure of when doing my iptables and set DSCP rules, i'm using Diffserv4 now, and layer_cake should have 4 tins, but from EF to CS0 to CS6 how would I know which values go into what? or does Cake change this adaptively?

I'm currently using EF for highest priority, CS6 for right below that, CS3 for mid-priority, CS1 for bulk and CS0 for undefined/best effort.

Other than that I just hope the prioritization works 'right' at this point.

Here is the relevant section from sch_cake.c:

static int cake_config_diffserv4(struct Qdisc *sch)
{
/*  Further pruned list of traffic classes for four-class system:
 *                                   
 *          Latency Sensitive  (CS7, CS6, EF, VA, CS5, CS4)
 *          Streaming Media    (AF4x, AF3x, CS3, AF2x, TOS4, CS2, TOS1)
 *          Best Effort        (CS0, AF1x, TOS2, and those not specified)
 *          Background Traffic (CS1)
 *
 *              Total 4 traffic classes.
 */

And here is the mapping for the 64 diffserve codepoints:

static const u8 diffserv4[] = {
        0, 2, 0, 0, 2, 0, 0, 0,
        1, 0, 0, 0, 0, 0, 0, 0,
        2, 0, 2, 0, 2, 0, 2, 0,
        2, 0, 2, 0, 2, 0, 2, 0,
        3, 0, 2, 0, 2, 0, 2, 0,
        3, 0, 0, 0, 3, 0, 3, 0,
        3, 0, 0, 0, 0, 0, 0, 0,
        3, 0, 0, 0, 0, 0, 0, 0,
};

So based on @moeller0's post and the fact that it maps well to both Wifi WMM queues, and the queues on cheap TP-Link managed switches, you should probably switch to CS6 for highest priority, CS3 for mid priority, CS0 for normal priority, and CS1 for bulk. By default linux wifi drivers put EF into the mid priority VIDEO queue rather than the high priority VOICE queue, so I don't use it even though it's the "standard" for voice. (This is in my opinion a bug in the default behavior of the linux driver, apparently it can be adjusted in hostapd, but the config is not available to OpenWrt at the moment).

All those AFxx codepoints are fine-details that no one actually is currently using.

@moeller0 I am currently using CS2 as my "best effort" priority, and retag everything on ingress from my router (and output from my desktop machine) because of those Tp-Link switches. Since I'm not using cake that's ok for me, but I wonder if cake shouldn't switch to having CS0 and CS2 in the same best effort tin.

I guess the issue is that there is an old 3bit priority scheme used in some VLAN handling equipment that does weird mapping (see page 40 of http://profesores.elo.utfsm.cl/~agv/elo309/doc/802.1D-1998.pdf and compare with the more recent recommendations in https://en.wikipedia.org/wiki/IEEE_P802.1p)

The recommendations how to interpret the VLAN priority codes changed over time. IMHO we should stick to the most recent version (if at all ;)*). Also I believe that Jonathan's justifications for the placements come from tags seen on real world traffic or seen as recommendation in RFCs.
Now I note that there is a proposal for a new background code point (out of the the CS0 space, so no priority inversion when traffic hits non-compliant hops). Also I note that in standard WMM mapping CS2 becomes background

*) I think I get your point though, even though WMM and VLAN priority schemes might be considered unfortunate, they still exist in real live and it might make sense to take them into account (like the WMMM/EF anomality you mention, also seen in https://www.bintec-elmeg.com/portal/downloadcenter/dateien/workshops/current_en/ws_wlan_html_en_HTML/vowlan_infra_qos_wmm.html) This is a mess :wink:

Oh thanks, very helpful guys! d-^;^-b

I have WMM off on my wifi interface and all but my static IP gets wiped off DSCP marks tho, but i will adhere to these rules anyway and not use EF :stuck_out_tongue:

Probably more laziness than anything. I might change the setup and not do this, im just afraid that something in my iPhone or macbook could potentially supersede my high priority gaming or even slightly affect its UDP datastream I will go nuts! :stuck_out_tongue:

My iphone jumps from different isp/wifi anyway whenever depending on range, and my macbook rarely does anything like ever, other than Local remoting.

One thing that might be consideration is my ISP and how it would handle CS6 vs EF? Its T-mobile 4G in the netherlands, no clue really if they ignore it or not. Especially since 4G is used for voice as well?

No doubt.

Looking at these two sites:

and
https://wireless.wiki.kernel.org/en/developers/documentation/mac80211/queues

I see CS2 = 16, which after multiplying by 4 for the ECN bits, maps to TOS 64, which maps to UP 2 which maps to Background queue!!! jeezus that's right and it's clearly stupid.

But note what happens if you use something like CS3, Cake treats it as "Streaming Media" and WMM treats it as Best Effort!!!

For actual streaming media (YouTube etc) I'm using AF41, which at least gets consistency in cake and WMM

As you say it's a mess. When do we get access to that hostapd priority map config in OpenWrt?

My favorite scheme would be:

CS1 = bulk
CS0, CS2 = Best Effort
CS3,CS4,CS5 = Streaming Media
EF,CS6,CS7 = Voice

and treat other values by just mask off the lower 3 bits. It's very comprehensible.

I would definitely wipe DSCP on ingress from the internet (this is actually kinda how DSCP is supposed to work, it's not an end-to-end specification, there are "diffserve domains" and the traffic is "supposed" to be reclassified at the boundaries, at least that's basically what the RFC was suggesting).

Also on ingress from your wifi, I'd wipe any DSCPs and then maybe just maybe add back ones you can identify as important (like maybe wifi calling or VOIP service packets get CS6). There's no way to avoid the Client -> AP being affected by the DSCP but you can at least affect AP -> client and AP -> internet

And then of course, set your high priority for UDP game packets = CS6... and off you go, this should protect your game stuff properly.

For me it's crappy VOIP calls that I'm really unhappy about, and it's amazing how much better it is with proper QoS, my voice calls are crystal clear these days. Note also that I got bad problems with VOIP even with a supposed gigabit fiber connection. Part of it is that my network can get busy, I run NFS home directories, and if a big computation is running on my desktop it can be slamming data into files at full gigabit speeds for seconds at a time, so I need QoS not just on my WAN but also on my LAN!

I decided to dig up the old thread on qos_map to see if anything has changed in the interim with the version 18 series: Using DSCP for QoS

EDIT:

Hey all, I discovered something interesting....

Setting up tagging and by default tagging everything cs2 seems to have been the cause of my erratic behavior. If I tag cs2 my ATT connection seems to down-prioritize or throttle my traffic (possibly my ACKs?), not sure what, if I tag cs3 as my "normal" it seems to be ok. Basically changing that one line made things go from highly erratic to clean as a whistle. So I've switched to CS3 as my "default" tag.

Also, they symptom was a much slower start on download but reaching a moderately reasonable level, but on upload completely lack of bandwidth eventually resulting in bandwidth dying out almost altogether, which is why I think it has to do with ATT equipment interpreting outgoing DSCPs I send.

More information: tc is dead, long live nftables... Here are rules I use to classify into my multi-tier HFSC shaper:

	  meta priority set 1:40 ## default

	  ip dscp {ef,cs6} meta priority set 1:10
	  ip dscp {cs5} meta priority set 1:20
	  ip dscp {af41, af42, af43} meta priority set 1:30
	  ip dscp {cs1} meta priority set 1:50

	  ip6 dscp {ef,cs6} meta priority set 1:10
	  ip6 dscp {cs5} meta priority set 1:20
	  ip6 dscp {af41, af42, af43} meta priority set 1:30
	  ip6 dscp {cs1} meta priority set 1:50

This replaces about 40 lines of more complicated tc filter commands that look like:

 /sbin/tc filter add dev ${DEV} parent 1:0 protocol ip prio 60 u32 match ip tos 0x88 0xfc flowid 1:30
 /sbin/tc filter add dev ${DEV} parent 1:0 protocol ipv6 prio 61 u32 match ip6 priority 0x88 0xfc flowid 1:30 

some of which included using tc to match tcp / udp ports instead of DSCP values.

All I need now is tc to set up the hfsc shaper itself, which is much more understandable than the u32 filter syntax.

I think nftables provides the ideal comprehensive method for QoS afficionados (gamers, Voipers, guest network providers etc) who want more control than the very good defaults that piece-of-cake offers.

1 Like

@dlakelan nice work man!, i was busy trying to figure out why i can't unlock my phone bootloader!
it's really much easier to setup and understand nft qos.
so basically my idea about bridge is working well for you?!, i think it's possible to use for those who have
a different rate for ipv6, as some people get ipv6 connection with dhcpv6 or 6in4, ...etc.
by setup another veth pair and bridge one of them with wan and use second one to get a connection!
the question is:
1.when nft will be natively used on openwrt instead of iptables.

i think we still need tc to setup shaper, unless nft came with alternative solution.
it would be nice to see a packet inspection or L7 in nft!.
*i figured that my isp is zeroing dscp on inbound, on old isp when i was watching youtube i see AF41 tag, and
AF31 on a filehost site!
@Emtee
nice to see that you happy with your qos, for testing purpose change your download speed in sqm to 40000.
then run a download and play a game, then see how's the lag!

Yes when I put it in it's own namespace, I think that's key. Then veth1 looks like a wan connection to an ISP that has a clue :grinning:

1 Like

did you use an external power supply to power this mother board,and how much dose power supply cost?!
also do you buy a case for the MB, or you just place it inside a cartoon box?
also i can use one Ethernet as wan and second one as lan to connect it to wifi AP

This is a good question. Since nftables has its own special syntax for rules and that syntax is excellent, it seems like trying to shoehorn that into a UCI based thing would be problematic. I mean, it could be done, but would not give the advantages of the nice syntax.

In any case, I'm going to look into building an image for my test device (an old tp-link device) that has no iptables no OpenWrt firewall no luci-app-firewall and does include tc and nftables with a default config included for both. If I can make that work it will be my new AP image. Still not quite sure how to make it start nftables at the appropriate time during boot.

Yes it's in a small case. This one with power supply I've had it for years
https://www.amazon.com/dp/B00L5OT1QY/ref=cm_sw_r_cp_apa_i_rK9dCbAHYFMZH

I use an asrock rack j1900d2y but I'd recommend something newer if you plan to buy a router, something with a quad core 3000 series Celeron with aes-ni. Lots of stuff on AliExpress...

i plan to use it as router!
but is it necessary to use display monitor to install and configure the OS, cause i don't have display monitor.
i saw that tp-link archer c7 AC1750 is cost about 95$ i my country, so it's better to buy mother board or mini pc
with modern Intel CPU!
i think this one is good, but it's with only one lan port!: https://www.scan.co.uk/products/asrock-j4005m-integrated-intel-celeron-j4005-ddr4-sata-iii-intel-uhd-graphics-600-gbe-microatx
but if i want to search on AliExpress what should i type!?