QoSmate: (Yet Another) Quality of Service Tool for OpenWrt

AlanDias17 · November 30, 2024, 8:51pm

Even I got confused with my own statement lmao. Each priority tier gets its own virtual clock. At dequeuing, if shaper isn't enabled it'll do simple round robin on every tin and dequeue the packets. If shaper is enabled then the algo picks the highest-priority tier which both has queued traffic and whose schedule is due, if one exists. Now inside a tin, it does DRR based on flows deficit to ensure fairness and finally comes CoDel AQM logic. So, what I meant to say is that if packets DSCP are marked strategiically then using Diffserv8 should provide optimal result since more tins, more respective flows and less delay for the important packets. The problem is most users will end up marking wrong DSCP and mess up CAKE's dynamic flow Management (like I did lol). So, I think it's better to just put everything on best effort and let DRR calculation to figure out which packet should dequeue first. So, diffserv3 should be enough for unmarked traffic. What I learned from Luci-sqm is that for unmarked traffic the code doesn't put bulk traffic into bulk tin lol. Everything stays in best-effort & that was really inefficient. Good that qosmate is using newer version of cake scheduler which manages flow dynamically.

moeller0 · November 30, 2024, 9:10pm

Why? The only reason for diffserv8 is if you have 8 (or more than 4) different priority tiers you want/need to use. If you follow the advice to up-prioritise sparingly, I would guess you rarely need more than 3 tiers, one below the default, one default tier, and one above...

QoSmate uses the same cake qdisc as sqm is using... so I am stll confused?

AlanDias17 · December 1, 2024, 12:17am

You're right about choosing three tins as a prudent choice. With 8 tins, Cake will prefer low-latency sensitive first if and only if packets are put into respective tins. However, accurately assigning DSCP values to individual packets to take advantage of diffservice8, particularly in scenarios with diverse traffic types, can be complex leading to inadvertently disrupt overall network performance. Assigning few gaming udp packets for higher priority tins make sense but granular DSCP assignments based on ports & IP addresses feels waste of time and forever testing, considering the overhead & potential negative impacts on network dynamics. For what? to get a few milliseconds advantage... In short theoretically diffserv8 should work better than diffserv3/4 but practically it just messes up the flow coordination. Minimum 3 tins make sense for the unmarked traffic.

No, I reset my router to test normal luci-sqm both piece of cake and layer-cake. When chose layer-cake, I observed bulk traffic doesn't go to Bulk tin after a while. So, I don't think it's the same sch-cake code

moeller0 · December 1, 2024, 12:47am

Oh, I can guarantee it is the same code in the cake qdisc, the difference is qosmate comes with (quite clever) means to actually assign different DSCP values, while sqm-scripts does nothing in that regard. This is not intended to diss qosmate, just to clarify its magic is not in having a different cake qdisc.

AlanDias17 · December 1, 2024, 1:07am

Hmm I'll re-test again. Btw, I noticed that someone on network can easily hack cake tin from best-effort to higher tin by simply creating multiple connections through IDM.
Example: If Boost Low-Volume TCP Traffic is enabled then TCP connections with less than 150 packets per second will be AF42. With IDM 16 or more connections the packets/second will be around 50-120 effectively getting higher priority. Also, Bulk triggering code won't activate inside Cake to put the whole traffic into Bulk tin due to lots of connections with the minimal flow.

brada4 · December 1, 2024, 1:14am

Well in this case they live inside your home.

brada4 · December 1, 2024, 5:39am

@AlanDias17 you have a good eye, you can make PR-s yourself
https://github.com/hudra0/qosmate/pull/25

moeller0 · December 1, 2024, 8:22am

Because that does not exist... cake does not move flows between its priority tins.
Inside each tin, selected by the DSCP cake will give new and sparse flows a gentle
boost.

Please have a look at:
https://www.bufferbloat.net/projects/codel/wiki/CakeTechnical/

and

(cake's sparse flow boosting is quite similar to fq_codels)

If you want to demote bulk transfers to the bulk priority tin, you will need to change the DSCP acvordingly, but that requires using a classifier outside of cake to decide when to mark a flow as "bulk", IIUC qosmate would use a nftables rule for that...

moeller0 · December 1, 2024, 8:28am

Yes this a known failure mode for flow queueing, if you can use many flows on aggregate you will get more than your fair share of capacity.
Cake's dual and triple isolation modes however put a limit to this, reducing the ensuing unfairness e.g. to the host runnimg the application that uses many flows, or if that application is a torrent client you run on purpose you can steer it into the bulk priority tin.

brada4 · December 1, 2024, 12:16pm

it is possible to limit up-marking packet rate and bandwidth of what goes in diffserv4 upper shelf. Has to be "soft" like ack limiting to not become 1s sawblade graph

AlanDias17 · December 2, 2024, 2:20pm

You mean the bulk rotation, right? this snippet:

else if (flow->set == CAKE_SET_SPARSE_WAIT) {
		struct cake_host *srchost = &b->hosts[flow->srchost];
		struct cake_host *dsthost = &b->hosts[flow->dsthost];

		/* this flow was empty, accounted as a sparse flow, but actually
		 * in the bulk rotation.
		 */
		flow->set = CAKE_SET_BULK;
		b->sparse_flow_count--;
		b->bulk_flow_count++;

Idk I try to assign DSCP based on packets rate or anything. The result in difference still isn't satisfactrory...
Btw thanks for those attachments with in-depth analysis. I'll need time to go through in detail though.

You're right but practically speaking I think it's just a bandage. If we want dramatic results, then it should be done within the Sch-Cake. Like example I've been thinking of using ML and stumbled upon an old post from 2013. We could dynamically adjust flow queue sizes & scheduling priorities based on flow duration, packet size using machine learning-based classifiers. But it'll take huge memory and space lol. What if, we could first adjust quantum parameters based on MTU size then dynamically fine-tune the quantum for each flow based on burstiness, latency sensitivity or wtv using dynamic DDR?

moeller0 · December 2, 2024, 2:47pm

Yes.

Here is the thing these are heuristics and likely will have both false positive and false negative classifications, so getting this right is challenging especially if one des not fully know the expected behaviour of different flows. IMHO it is best to use both up- and down-prioritization sparingly anyways, less opportunity to get heuristics wrong

Ernieelias · December 3, 2024, 10:47pm

Hello all, Thank you very much @Hudra for the project from the continuation of @dlakelan. Also, thank you to @brada4 for your contributions and @choppyc for your testing and reporting. This is really an amazing project! I do have a question about the RED qdisc as I would like to give that shot as well and report my findings. When I select RED as the qdisc I get this

root@OpenWrt:~# service qosmate status
==== qosmate Status ====
qosmate service is enabled.
Segmentation fault
Traffic shaping is active on the egress interface (eth1).
Segmentation fault
Traffic shaping is active on the ingress interface (ifb-eth1).
==== Overall Status ====
Segmentation fault
qosmate is currently active and managing traffic shaping.
==== Current Settings ====
Upload rate: 900000 kbps
Download rate: 900000 kbps
Game traffic upload: 50000 kbps
Game traffic download: 50000 kbps
Queue discipline: red (for game traffic in HFSC)
==== Package Status ====
All required packages are installed.

==== Detailed Technical Information ====
Traffic Control (tc) Queues:
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc mq 0: dev eth0 root
 Sent 24536066995 bytes 19595106 pkt (dropped 0, overlimits 0 requeues 31693)
 backlog 0b 0p requeues 31693
qdisc fq_codel 0: dev eth0 parent :8 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 3906169828 bytes 2968436 pkt (dropped 0, overlimits 0 requeues 4792)
 backlog 0b 0p requeues 4792
  maxpacket 6056 drop_overlimit 0 new_flow_count 1980 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :7 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 4426173103 bytes 3464655 pkt (dropped 0, overlimits 0 requeues 6163)
 backlog 0b 0p requeues 6163
  maxpacket 4542 drop_overlimit 0 new_flow_count 2787 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :6 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 2944629517 bytes 2274790 pkt (dropped 0, overlimits 0 requeues 3407)
 backlog 0b 0p requeues 3407
  maxpacket 3028 drop_overlimit 0 new_flow_count 1623 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :5 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 3172693257 bytes 2610085 pkt (dropped 0, overlimits 0 requeues 4826)
 backlog 0b 0p requeues 4826
  maxpacket 1514 drop_overlimit 0 new_flow_count 2227 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 3942973 bytes 28270 pkt (dropped 0, overlimits 0 requeues 2)
 backlog 0b 0p requeues 2
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :3 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 2115464511 bytes 1797570 pkt (dropped 0, overlimits 0 requeues 2758)
 backlog 0b 0p requeues 2758
  maxpacket 1514 drop_overlimit 0 new_flow_count 1207 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 2190009507 bytes 1795208 pkt (dropped 0, overlimits 0 requeues 3797)
 backlog 0b 0p requeues 3797
  maxpacket 1514 drop_overlimit 0 new_flow_count 1774 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 5776984299 bytes 4656092 pkt (dropped 0, overlimits 0 requeues 5948)
 backlog 0b 0p requeues 5948
  maxpacket 6056 drop_overlimit 0 new_flow_count 2999 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc hfsc 1: dev eth1 root refcnt 65 default 13
 Sent 2028212253 bytes 3095464 pkt (dropped 921, overlimits 1329105 requeues 2816)
 backlog 0b 0p requeues 2816
qdisc cake 8053: dev eth1 parent 1:14 bandwidth unlimited besteffort triple-isolate nat nowash ack-filter split-gso rtt 100ms raw overhead 0
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 0b of 15140Kb
 capacity estimate: 0bit
 min/max network layer size:        65535 /       0
 min/max overhead-adjusted size:    65535 /       0
 average network hdr offset:            0

                  Tin 0
  thresh           0bit
  target            5ms
  interval        100ms
  pk_delay          0us
  av_delay          0us
  sp_delay          0us
  backlog            0b
  pkts                0
  bytes               0
  way_inds            0
  way_miss            0
  way_cols            0
  drops               0
  marks               0
  ack_drop            0
  sp_flows            0
  bk_flows            0
  un_flows            0
  max_len             0
  quantum          1514

Segmentation fault

How can I tell that RED is indeed running?

brada4 · December 3, 2024, 10:48pm

tc -s qdisc

Ernieelias · December 3, 2024, 10:50pm

root@OpenWrt:~# tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc mq 0: dev eth0 root
 Sent 24943428575 bytes 19953423 pkt (dropped 0, overlimits 0 requeues 32162)
 backlog 0b 0p requeues 32162
qdisc fq_codel 0: dev eth0 parent :8 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 3965115982 bytes 3012812 pkt (dropped 0, overlimits 0 requeues 4926)
 backlog 0b 0p requeues 4926
  maxpacket 6056 drop_overlimit 0 new_flow_count 2052 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :7 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 4430997029 bytes 3475206 pkt (dropped 0, overlimits 0 requeues 6191)
 backlog 0b 0p requeues 6191
  maxpacket 4542 drop_overlimit 0 new_flow_count 2795 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :6 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 3022117253 bytes 2329566 pkt (dropped 0, overlimits 0 requeues 3432)
 backlog 0b 0p requeues 3432
  maxpacket 3028 drop_overlimit 0 new_flow_count 1635 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :5 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 3243544616 bytes 2671232 pkt (dropped 0, overlimits 0 requeues 4909)
 backlog 0b 0p requeues 4909
  maxpacket 1514 drop_overlimit 0 new_flow_count 2274 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 3987154 bytes 28647 pkt (dropped 0, overlimits 0 requeues 2)
 backlog 0b 0p requeues 2
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :3 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 2130422749 bytes 1808780 pkt (dropped 0, overlimits 0 requeues 2820)
 backlog 0b 0p requeues 2820
  maxpacket 1514 drop_overlimit 0 new_flow_count 1237 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 2197695124 bytes 1804288 pkt (dropped 0, overlimits 0 requeues 3838)
 backlog 0b 0p requeues 3838
  maxpacket 1514 drop_overlimit 0 new_flow_count 1791 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 Sent 5949548668 bytes 4822892 pkt (dropped 0, overlimits 0 requeues 6044)
 backlog 0b 0p requeues 6044
  maxpacket 6056 drop_overlimit 0 new_flow_count 3103 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc hfsc 1: dev eth1 root refcnt 65 default 13
 Sent 2363688851 bytes 3615061 pkt (dropped 978, overlimits 1570074 requeues 3056)
 backlog 0b 0p requeues 3056
qdisc cake 8053: dev eth1 parent 1:14 bandwidth unlimited besteffort triple-isolate nat nowash ack-filter split-gso rtt 100ms raw overhead 0
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 0b of 15140Kb
 capacity estimate: 0bit
 min/max network layer size:        65535 /       0
 min/max overhead-adjusted size:    65535 /       0
 average network hdr offset:            0

                  Tin 0
  thresh           0bit
  target            5ms
  interval        100ms
  pk_delay          0us
  av_delay          0us
  sp_delay          0us
  backlog            0b
  pkts                0
  bytes               0
  way_inds            0
  way_miss            0
  way_cols            0
  drops               0
  marks               0
  ack_drop            0
  sp_flows            0
  bk_flows            0
  un_flows            0
  max_len             0
  quantum          1514

Segmentation fault

That's what I get

brada4 · December 3, 2024, 10:54pm

Thats red qdisc stats crashing on OpenWRT 23
Will not crah in 24.10 rc

Ernieelias · December 3, 2024, 10:56pm

Ahhh ok, so I would need to run a snapshot build to see the stats I take it then?
But, does that mean that RED is still running in the background?

brada4 · December 3, 2024, 11:00pm

Yes, it means exactly that. And likelyy got little to no testing bc of this.
Also firewall rules are listed back with errors.

If you jave more than one router will not hurt to test out next release:
https://mirror-03.infra.openwrt.org/releases/24.10-SNAPSHOT/

Ernieelias · December 3, 2024, 11:11pm

Alright, thank you for explaining it to me. and I will try the update on when I have a chance.

also, do you know what would be a good Max Delay number for RED qdisc?

Hudra · December 4, 2024, 1:50pm

I would say 16 ms is a good starting point for red. Just experiment and settle for whatever feels best to you.