SQM Reporting?

And for anyone who does attempt the influxdb & grafana route, an important item of note is to make sure you copy the collectd types.db FROM OpenWrt to your influxdb host. There's a setting in influxdb.conf to configure the path to the file (check influxdb docs for details).

1 Like

Finally :slight_smile: Thank you @hnyman

I have today backported the SQM data collection part for collectd in 19.07 , but the graphs data presentation in LuCI has not been implemented in 19.07 .

Thanks for doing this! I've taken a stab at adding the configuration to Luci for OpenWrt 19.07:

/usr/lib/lua/luci/statistics/plugins/sqm.lua
-- /usr/lib/lua/luci/statistics/plugins/sqm.lua
return {
        legend = {
                { },
                { },
                { "Interfaces" }
        },
        label = _("SQM"),
        category = "network"
}
/usr/lib/lua/luci/model/cbi/luci_statistics/sqm.lua
-- /usr/lib/lua/luci/model/cbi/luci_statistics/sqm.lua
-- Copyright 2011 Jo-Philipp Wich <jow@openwrt.org>
-- Copyright 2020 Joseph Nahmias <joe@nahmias.net>
-- based on /usr/lib/lua/luci/model/cbi/luci_statistics/iwinfo.lua
-- Licensed to the public under the Apache License 2.0.

-- To enable this to show in luci, in addition to placing this file in the
-- appropriate place, you will need to run:
--    touch /usr/lib/collectd/sqm.so

-- In order for the configuration from luci to correctly make it to
-- collectd.conf, you will need to patch stat-genconfig as follows:
--    patch -Nui support_collectd_sqm.patch

-- Debugging:
-- uci set luci_statistics.collectd_sqm=collectd_sqm
-- uci set luci_statistics.collectd_sqm.enable=1
-- uci add_list luci_statistics.collectd_sqm.Interfaces='eth0'
-- uci add_list luci_statistics.collectd_sqm.Interfaces='eth1'
-- uci show luci_statistics.collectd_sqm

local m, s, o

m = Map("luci_statistics",
	translate("SQM Plugin Configuration"),
	translate("The sqm plugin collects statistics about smart queue management QoS."))

s = m:section(NamedSection, "collectd_sqm", "luci_statistics")

o = s:option(Flag, "enable", translate("Enable this plugin"))
o.default = 1

o = s:option(DynamicList, "Interfaces", translate("Monitor interfaces"))
o.template = "cbi/network_ifacelist"
o.widget   = "checkbox"
o.nocreate = true
o:depends("enable", 1)

return m
support_collectd_sqm.patch
--- /usr/bin/stat-genconfig     2020-06-16 01:35:21.000000000 +0000
+++ /usr/bin/stat-genconfig     2020-06-16 01:59:56.000000000 +0000
@@ -42,6 +42,8 @@ function section( plugin )

                if type( plugins[plugin] ) == "function" then
                        params = plugins[plugin]( config )
+               elseif type( plugins[plugin] ) == nil then
+                       return
                else
                        params = config_generic( config, plugins[plugin][1], plugins[plugin][2], plugins[plugin][3], plugin == "collectd" )
                end
@@ -98,10 +100,20 @@ function config_exec( c )
        local str = ""

        for s in pairs(sections) do
-               for key, type in pairs({ Exec="collectd_exec_input", NotificationExec="collectd_exec_notify" }) do
-                       if sections[s][".type"] == type then
+               for plugin, key in pairs({ collectd_exec_input="Exec", collectd_exec_notify="NotificationExec", collectd_sqm="Exec" }) do
+                       if sections[s][".type"] == plugin then

-                               cmd = sections[s].cmdline
+                               if plugin == "collectd_sqm" then
+                                       if sections[s]["enable"] == "1" then
+                                               --local sqm = uci:get_all( "sqm" )
+                                               local ifcs = table.concat(sections[s]["Interfaces"], " ")
+                                               if ifcs then
+                                                       cmd = "/usr/libexec/collectd/sqm_collectd.sh " .. ifcs
+                                               end
+                                       end
+                               else    -- exec_input or exec_notify
+                                       cmd = sections[s].cmdline
+                               end

                                if cmd then
                                        cmd   = cmd:gsub("^%s+", ""):gsub("%s+$", "")
@@ -292,6 +304,9 @@ for filename in nixio.fs.dir(plugin_dir)
        local name = filename:gsub("%.lua", "")
        if (name == "exec") then
                plugins[name] = config_exec
+       elseif (name == 'sqm') then
+               -- sqm uses the Exec plugin
+               plugins[name] = nil
        elseif (name == "iptables") then
                plugins[name] = config_iptables
        elseif (name == "curl") then
@@ -303,6 +318,16 @@ for filename in nixio.fs.dir(plugin_dir)
        end
 end

+if sections.collectd_sqm.enable == "1" and sections.collectd_exec.enable == "0" then
+       -- blank out all disabled exec_input & exec_notify
+       for k,v in pairs(sections) do
+               if v[".type"] == "collectd_exec_input" or v[".type"] == "collectd_exec_notify" then
+                       v.cmdline = nil
+               end
+       end
+       -- enable Exec plugin for SQM
+       sections.collectd_exec.enable = "1"
+end

 preprocess = {
        RRATimespans = function(val)

Now to work on generating / displaying the graphs from the data collected in the RRDs...

1 Like

I've completed integrating collectd-mod-sqm with luci-app-statistics and created a PR in the luci github for merging into the OpenWrt-19.07 branch:


One note, as mentioned in the PR comments, you will need to run touch /usr/lib/collectd/sqm.so to make it all work. @hnyman - any chance you can add that to the collectd-mod-sqm package?

I've been experimenting with the data collection portion, feeding into influxdb.

I'm noticing some periods where there are SQM backlog (green spikes), when the wan link is no where near saturated. Might be an anomoly in my influxdb configs, but thought Id ask if anyone else has noticed similar. (no impacts of this i can tell , just noticed since I started monitoring this)

These can occur, when the CPU is loaded, traffic shaping is relatively CPU intensive and latency sensitive, if the shaper gets on average enough CPU but not in a timely enough fashion queueing spikes can/will happen. In the past it seemed that HTB+fq_codel did not show latency spikes but rather throughput drops while cake keeps throughput decent at the cost of some latency spikes.
Another observation is that traffic shaping does not play nice with CPU frequency scaling and power saving features (my hypothesis is that this increases the latency enough to cause transient queue spikes while not causing clear signs of CPU overload and hence does not trigger CPU frequency upscaling sufficiently well). All subjective hypothesis, so I could be out to lunch here....

2 Likes

Has anyone considered creating a chart to show SQM data via Netdata? Here's a reference for creating a collector: https://learn.netdata.cloud/docs/agent/collectors/quickstart

UPDATE
I had an initial go at this and have something working so far. Still a WIP, but coming along:

2 Likes

Making more headway on the Netdata chart for SQM. I may need to get some expert advice on the units for some of the values that are returned from tc. I will search back through this thread to see if those questions are answered already.

Anyway, here are some screenshots of where I am at with diffserv4, which I use. I ran two speed tests represented in these:

Testing compatibility with diffserv3 and diffserv8:


For any not familiar with Netdata, the "broken lines" in my screen captures are due to me restarting the netdata service while developing. Those won't be visible under normal usage. :slight_smile:

For anyone wanting to test the SQM chart for Netdata, please refer to here: https://github.com/Fail-Safe/netdata-chart-sqm. The code behind this is adapted from sqm_collectd.sh (credit to @ldir).

Consider this "beta" for now. I am sure some of the units need tweaking, but any assistance in validating them is very welcome. Please open issues within the project for bugs/improvements so as not to clutter up this thread.

If this drums up enough interest, I will look at possibly making this into a package. Open to feedback around that, too.

UPDATE 6/29/20 - 9:00PM EDT
Just pushed a commit to fix some bugs with the units. They should be correct now and match what is seen in collectd. Please report any issues to the project. :slight_smile:

6 Likes

This looks very interesting, thanks for sharing!

1 Like

Did you get a chance to try it out? If so, is it working okay for you?

Pushed another update to this Netdata chart to now support multiple SQM interfaces. Each SQM queue is queried and handled individually, so the SQM queues can be of different settings and handled accordingly. e.g. One queue could be layer_cake @ diffserv4 and another queue be piece_of_cake @ besteffort.

Repo: https://github.com/Fail-Safe/netdata-chart-sqm

1 Like

No, I haven't tried it yet. At the moment my priority with SQM is DSCP tagging for ingress but I haven't got it done...

1 Like

Ah nice--I just tackled the same over this past weekend. Still tweaking it a bit though. Interestingly enough, my Netdata chart helped me reason through some struggles I was having in the process. :slight_smile:

1 Like

I'm trying to get it done via ctfino and I guess that the netdata sqm chart could really help when verifying my setup. I can't wait to test it. :slight_smile:

What is ctfino?

From what I understand so far kmod-sched-ctinfo is needed to make DSCP work for the ingress side (at least for my pppoe-wan setup).
I'm trying to get @ldir's ctinfo4layer_cake script to work (actualy a more simple version of it) as nftables is way to complicated for my level of understanding.

@ldir This is very interesting! Is "diffserv5" a typo, though?

No. I have a patched CAKE that has a 5 tin diffserv implementation. Least Effort, BulK, Best Effort, VIdeo, VOice. Basically diffserv4 with an extra realy low priority tin.

1 Like

Ah, that makes sense! Is that patch readily available or under-wraps for now? :slight_smile: