SQM - help interpreting RRUL graphs and difference between diffserv4 and diffserv8

dl12345 · March 12, 2021, 4:23pm

I've been experimenting with a diffserv4 vs 8 profile for SQM.

I'm not sure I'm interpreting these graphs correctly, but it looks like I am getting a worse result with a diffserv8 profile than I do with a diffserv4 profile in some respects, and a better one in others.

I repeated these tests multiple times and they give the same results. I'd be inclined to go with the diffserv8 profile, except the variability gives me pause.

I'm experimenting with a private server on the WAN side of my router as a couple of runs of flent to my VM got me a bandwidth usage of 60GB in a pretty short space of time...I'm sure if I do that much more my ISP will disconnect me!

The things I'm not quite grasping here are the following:

Download bandwidth looks quite a bit higher on the diffserv8 profile, but much more variable
Upload bandwidth is higher on the diffserv8 profile for all classes of service
Average latency looks better on diffserv8
No idea why there is such a large difference on the latency chart on BK between the two profiles or why it just does not even appear on the upload or download graph of the diffserv8 profile
No idea why the CS5 profile is so spiky on the diffserv8 profile

Frankly, I'm not entirely sure how the diffserv8 test is reporting 300Mbps per diffserv class on a 1Gbe link that I've capped with SQM to 900Mbps either (I get 940Mbps throughput on this link without SQM). Given that there are 4 diffserv classes in the test, that would be a bandwidth of 1.2Gbps which is just not physically possible...

Any pointers in how to correctly interpret these results would be appreciated

diffserv4

diffserv8

My SQM is not squashing DSCP.

icmp and dns have class EF set, TCP ACK/SYN have class CS3 set. The machine I'm running flent on is my dns server and it explicitly marks outgoing dns and icmp with dscp class EF. Apart from my Zoom client, there is no other dscp marking going on in my internal network before it hits the Openwrt router.

The switch is a managed switch that is setup to do QoS on dscp, mapping to 8 hardware queues using Cisco default queue mappings. It doesn't make any difference to the rrul charts if I disable the QoS on the switch or not.

moeller0 · March 12, 2021, 8:23pm

his smells and looks a bit like on diffserv8 BK was dropped completely so the averaging only works on three flows, 3 * 300 ~ 900, while on diffserv4 4225 = 900. The same seems true for the Upload as well, 143 = 42, while 10.5*4 = 42. Interestingly the BK flow in diffserv8 shows minimal latency, as if there was no competing TCP traffic in that tier...

Mostly autoscaling diffserv 4 range: 0-4, diffserv8 range 0.5-1.75, but note that the oranke BK flows behaves oddly.

Maybe repeat the experiments a few times and post links to the flent data files?

Here is how I start flent nowadays on a linux host:

date ; ping -c 10 netperf-eu.bufferbloat.net ; ./run-flent --ipv4 -l 300 -H netperf-eu.bufferbloat.net rrul_var --remote-metadata=root@192.168.42.1 --te=cpu_stats_hosts=root@192.168.42.1 --socket-stats --test-parameter bidir_streams=8 --test-parameter markings=CS0,CS1,CS2,CS3,CS4,CS5,CS6,CS7 -D . -t IPv4_2_netperf-eu.bufferbloat.net

192.168.42.1 is my router running SQM/cake with flent-tools istalled (for CPU usage data via --te=cpu_stats_hosts, and for tc stats via --remote-metadata; both need passwordless logins configured from your Linux client to the router)
--socket-stats will on a Linux client collect socket statistics which can be quite revealing
rrul_var this is the test I use which allows easy configurations of numbers and dscps of the individual flows like:
--test-parameter bidir_streams=8 : 8 simultaneous flows in each direction
--test-parameter markings=CS0,CS1,CS2,CS3,CS4,CS5,CS6,CS7 : marked with these specific dscps (this will only work like this for the named DSCPs)

dl12345 · March 12, 2021, 9:41pm

Thanks @moeller0. I'll get to that over the weekend and post some links to the data files

dl12345 · March 14, 2021, 9:14pm

I think it's a bit pointless right now to upload any flent data files as my diffserv8 is clearly not functional.

Firstly, it locks up and causes other sessions to remote hosts outside the firewall to be terminated during a flent run, an obvious sign that it's not working as intended.

And on ingress and egress, it never uses Tin 0 and Tin 4 - zero traffic - despite doing an 8-stream test with CS0 --> CS7, so theoretically it should be using all tins for the test, unless I'm mistaken about that?

I'm getting flent test results all over the place, with little repeatability, like the following

Nothwithstanding this, a simple switch to a diffserv4 profile and it works perfectly.

So I think I'll just keep with diffserv4. Having a little more granularity for custom dscp tagging would be nice, but diffserv4 works well enough that I don't need to spend the cycles working out what's wrong with diffserv8.

A little extra granularity probably won't buy me much incremental improvement anyway, even if I were to get diffserv8 working properly...

moeller0 · March 14, 2021, 9:32pm

In all honesty, I believe only diffserv3 and diffserv4 actually being fieldtested enough to make a simple bug in cake unlikely... Few people use diffserv8 so there might be simply something wrong in cake...

moeller0 · March 14, 2021, 9:34pm

With this I agree, it is quite easy to go over-board in designing QoS schemes, but not always the best idea. Diffserv4 offers one default, one lower than default and two faster than default tiers, IMHO that should suffice for quite a number of use cases without ending with tiers with too little capacity each... So most simple QoS schemes should be mappable to diffserv4.