SQM Reporting?

I've completed integrating collectd-mod-sqm with luci-app-statistics and created a PR in the luci github for merging into the OpenWrt-19.07 branch:


One note, as mentioned in the PR comments, you will need to run touch /usr/lib/collectd/sqm.so to make it all work. @hnyman - any chance you can add that to the collectd-mod-sqm package?

I've been experimenting with the data collection portion, feeding into influxdb.

I'm noticing some periods where there are SQM backlog (green spikes), when the wan link is no where near saturated. Might be an anomoly in my influxdb configs, but thought Id ask if anyone else has noticed similar. (no impacts of this i can tell , just noticed since I started monitoring this)

These can occur, when the CPU is loaded, traffic shaping is relatively CPU intensive and latency sensitive, if the shaper gets on average enough CPU but not in a timely enough fashion queueing spikes can/will happen. In the past it seemed that HTB+fq_codel did not show latency spikes but rather throughput drops while cake keeps throughput decent at the cost of some latency spikes.
Another observation is that traffic shaping does not play nice with CPU frequency scaling and power saving features (my hypothesis is that this increases the latency enough to cause transient queue spikes while not causing clear signs of CPU overload and hence does not trigger CPU frequency upscaling sufficiently well). All subjective hypothesis, so I could be out to lunch here....

2 Likes

Has anyone considered creating a chart to show SQM data via Netdata? Here's a reference for creating a collector: https://learn.netdata.cloud/docs/agent/collectors/quickstart

UPDATE
I had an initial go at this and have something working so far. Still a WIP, but coming along:

2 Likes

Making more headway on the Netdata chart for SQM. I may need to get some expert advice on the units for some of the values that are returned from tc. I will search back through this thread to see if those questions are answered already.

Anyway, here are some screenshots of where I am at with diffserv4, which I use. I ran two speed tests represented in these:

Testing compatibility with diffserv3 and diffserv8:


For any not familiar with Netdata, the "broken lines" in my screen captures are due to me restarting the netdata service while developing. Those won't be visible under normal usage. :slight_smile:

For anyone wanting to test the SQM chart for Netdata, please refer to here: https://github.com/Fail-Safe/netdata-chart-sqm. The code behind this is adapted from sqm_collectd.sh (credit to @ldir).

Consider this "beta" for now. I am sure some of the units need tweaking, but any assistance in validating them is very welcome. Please open issues within the project for bugs/improvements so as not to clutter up this thread.

If this drums up enough interest, I will look at possibly making this into a package. Open to feedback around that, too.

UPDATE 6/29/20 - 9:00PM EDT
Just pushed a commit to fix some bugs with the units. They should be correct now and match what is seen in collectd. Please report any issues to the project. :slight_smile:

6 Likes

This looks very interesting, thanks for sharing!

1 Like

Did you get a chance to try it out? If so, is it working okay for you?

Pushed another update to this Netdata chart to now support multiple SQM interfaces. Each SQM queue is queried and handled individually, so the SQM queues can be of different settings and handled accordingly. e.g. One queue could be layer_cake @ diffserv4 and another queue be piece_of_cake @ besteffort.

Repo: https://github.com/Fail-Safe/netdata-chart-sqm

1 Like

No, I haven't tried it yet. At the moment my priority with SQM is DSCP tagging for ingress but I haven't got it done...

1 Like

Ah nice--I just tackled the same over this past weekend. Still tweaking it a bit though. Interestingly enough, my Netdata chart helped me reason through some struggles I was having in the process. :slight_smile:

1 Like

I'm trying to get it done via ctfino and I guess that the netdata sqm chart could really help when verifying my setup. I can't wait to test it. :slight_smile:

What is ctfino?

From what I understand so far kmod-sched-ctinfo is needed to make DSCP work for the ingress side (at least for my pppoe-wan setup).
I'm trying to get @ldir's ctinfo4layer_cake script to work (actualy a more simple version of it) as nftables is way to complicated for my level of understanding.

@ldir This is very interesting! Is "diffserv5" a typo, though?

No. I have a patched CAKE that has a 5 tin diffserv implementation. Least Effort, BulK, Best Effort, VIdeo, VOice. Basically diffserv4 with an extra realy low priority tin.

1 Like

Ah, that makes sense! Is that patch readily available or under-wraps for now? :slight_smile:

@ldir
Can we still use your script with diffserv4?

Yes, you can. I use it myself with diffserv4.

1 Like

I am not seeing anywhere the same amount of traffic passing through the QOS_MARK_* and QOS_CAKE_* chains as I saw with my prior SQM setup. Something doesn't feel right there. @amteza are you seeing the same?

FWIW, I managed to patch and build the diffserv5 kmod-sched-cake and diffserv5 capable tc, but with diffserv5 I only end up with three tins for some reason. I realize this is very much a WIP by @ldir, so I'm not looking for support or a fix right now. Just sharing my observations. I'm sure it will all get ironed out into a smooth running machine when the time is right. :slight_smile: