SQM Reporting?

Have you noticed that the rrd based chart tool allows arithmetic?
(with Reverse Polish Notation)

Here is a multiplication by 100 in order to get values ranging 0.00-1.00 shown as percentage of 0-100%:

transform_rpn: "100,*"

Dividing by 1000 would make microseconds to milliseconds, right?
transform_rpn: "1000,/"

2 Likes

Brilliant! Yes! - Abandons build and un-does horrificness in ash :joy:

Thank you for that hint. Really, really helps.

lovely, thx. in terms of scaling factors, trying to plot ack-drops on the same graph as drops and marks is kind of hard:

                   Bulk  Best Effort        Voice
  thresh      562496bit        9Mbit     2250Kbit
  target         32.3ms        5.0ms        8.1ms
  interval      127.3ms      100.0ms      103.1ms
  pk_delay        6.3ms        361us        889us
  av_delay        2.3ms         81us         56us
  sp_delay         66us          5us          4us
  backlog            0b           0b           0b
  pkts          1204032    658967544      3582442
  bytes      1099157792 134340865468    794561466
  way_inds        11698     42652027       196178
  way_miss        73600     11241940       214416
  way_cols            0            0            0
  drops              90       125152            6
  marks             132          180            7
  ack_drop            0     65000503          439
  sp_flows            1            4            1
  bk_flows            0            1            0
  un_flows            0            0            0
  max_len         15605        29234         4542
  quantum           300          300          300

Even if scaled those down by, say 20, to account for roughly the relevant size vs the other drop size, it's still a pretty amazing amount of ack-drops in my case.

And I guess the ecn advocates can always hope the ecn mark goes up.

Do you think that the stat it right? ack_drops are 1/10 of the total number of packets.

but only ~1/150th the total number of bytes.

Yes, I'm certain the ack-drop stat is right. It could, actually, end up higher on this 10x1 connection, if the algorithm for dropping acks was improved a bit. (as best as I recall it didn't do a virtual drop from drr) You wanna see it go really high, try something with a 50x1 down/up ratio....

I pushed an update to master for rrdtool1, collectd & luci-app-statistics to include a script based collector for cake & mq qdiscs if anyone interested.

Pretty graphs for all (well especially for cake)

1 Like

How do I install that? Sorry for the noob question...

The installation is possible in master branch (but not in 19.07.x), and the steps are:

  • install collectd-mod-sqm which also installs collectd-mod-exec.
  • install updated version of collectd itself (5.11.0-3), so that the init script has the correct logic
  • additionally, you should install the updated rrdtool1 (1.0.50-3), so that the log scale is shown more beautifully.
  • and naturally you also need up-to-date LuCI statistics, so that the graph definitions are there. luci-app-statistics_git-20.126.39721-9c4f345 or newer

The SQM monitor plugin (just a script) is actually executed by the collectd exec plugin (real collectd plugin).

You enable exec plugin in luci_statistics config, and also add a "exec" section for running the script /usr/libexec/collectd/sqm_collectd.sh with interface list as nobody. You can do that in exec plugin config.

NOTE: the collectd part has been backported to the 19.07 branch, so the data collection will work also there, but the LuCI graphs have not been backported, so the visualisation of that data needs to be done elsewhere.

See below:

root@router1:~# tail /etc/config/luci_statistics

config statistics 'collectd_exec'
        option enable '1'

config collectd_exec_input
        option cmduser 'nobody'
        option cmdgroup 'nogroup'
        option cmdline '/usr/libexec/collectd/sqm_collectd.sh eth0.2'

And you should get two tabs:

11 Likes

I added note about "install updated version of collectd itself (5.11.0-3)"
I bumped the collectd version to make the SQM supporting version clear.

(There was a collectd version bump during the PR time, so Kevin's original bump did not actually happen.)

EDIT:
And I added a note about also needing up-to-date LuCI statistics.

1 Like
root@OpenWrt:~# opkg install collectd-mod-sqm
Unknown package 'collectd-mod-sqm'.
Collected errors:
 * opkg_install_cmd: Cannot install package collectd-mod-sqm.

Buildbot has not yet built it.
Wait 1-3 days, depending on the random order of the package architecture builds.

Or build new collectd by yourself.

2 Likes

Hmm, I guess I fouled up something:

Wed May  6 22:22:48 2020 daemon.err collectd[2403]: exec plugin: exec_read_one: error = sh: -want0: out of range
Wed May  6 22:22:48 2020 daemon.err collectd[2403]: exec plugin: exec_read_one: error = /usr/libexec/collectd/sqm_collectd.sh: eval: line 1: osppppoe-want0=1: not found
Wed May  6 22:22:48 2020 daemon.err collectd[2403]: exec plugin: exec_read_one: error = sh: -want1: out of range
Wed May  6 22:22:48 2020 daemon.err collectd[2403]: exec plugin: exec_read_one: error = /usr/libexec/collectd/sqm_collectd.sh: eval: line 1: osppppoe-want1=2506: not found
Wed May  6 22:22:48 2020 daemon.err collectd[2403]: exec plugin: exec_read_one: error = sh: -want2: out of range
Wed May  6 22:22:48 2020 daemon.err collectd[2403]: exec plugin: exec_read_one: error = /usr/libexec/collectd/sqm_collectd.sh: eval: line 1: osppppoe-want2=2: not found

From cat /etc/config/luci_statistics

[...]
config statistics 'collectd_exec'
        option enable '1'

config collectd_exec_input
        option cmduser 'nobody'
        option cmdgroup 'nogroup'
        option cmdline '/usr/libexec/collectd/sqm_collectd.sh pppoe-wan'
[...]

Might be related to the hyphen in pppoe-wan...

Following @hnyman's lead from above I added the last line:

	ifc="$1"
	ifr="${ifc//./_}"
	ifr="${ifr//-/_}"

an additional substitution for - to _ to the script and now it runs.

@ldir, I have no patch, but maybe you can add this by hand?

Would this work for you?

ifr="${ifc//[.-]/_}"

(not quite sure about the syntax)

1 Like

According to the bash manuals the syntax looks fine.

I tried it and it does indeed work with this single line.

I note that [-.] might be safer for further additions, as - denotes a character range unless at the end or beginning of a pattern, and I assume new members to the set of to be substituted taboo characters would naturally be added to the end...

1 Like

Now I notice:

Wed May  6 23:23:12 2020 daemon.err collectd[2403]: Sleeping only 2s because the next interval is 1737.746 seconds in the past!
Wed May  6 23:23:14 2020 daemon.err collectd[2403]: rrdtool plugin: rrd_update_r failed: /tmp/rrd/nacktmulle/sqmcake-ifb4pppoe-wan/qdisct_latencyus-BK.rrd: opening '/tmp/rrd/nacktmulle/sqmcake-ifb4pppoe-wan/qdisct_latencyus-BK.rrd': No such file or directory

after a reboot, yet the plots display nicely (albeit starting from scratch as I do not use persistent backing for the data bases). I assume this is just cosmetic....

Yep, it is cosmetic. Normal startup messages, partially related to the nonexistent real-time clock.

Hopefully works also with non-bash shells like busybox ash.

As I said, I tested it under OpenWrt with the default BusyBox/ash shell. The ash manpage is pretty mum about this though...
I realized that I can add multiple lines to the collected exec configuration to monitor multiple interfaces. Tip of the hat to @ldir, thanks a lot!