SQM Reporting?

prwood · April 8, 2020, 8:15pm

I've successfully installed and configured luci-app-sqm, along with its related packages, using the default recommendations and instructions at this page:

https://openwrt.org/docs/guide-user/network/traffic-shaping/sqm

My testing using dslreports.com/speedtest and the speedtest-netperf package in OpenWRT show that SQM is having the desired result, namely reducing latency during periods of peak network utilization.

That being said, I am curious about how exactly SQM is "shaping" my traffic - for example, what operations are being throttled and which are being given priority. Does anyone have any tips on where to get this data or how to generate reports on this sort of thing, either in realtime or historical?

hnyman · April 8, 2020, 8:32pm

Using "tc".
For example: tc -s -d qdisc show

Depending on the qdisc in use, you can get different statistics.
Like this:

root@router1:~# tc -s -d qdisc show
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc mq 0: dev eth0 root
 Sent 169397185 bytes 674650 pkt (dropped 0, overlimits 0 requeues 30)
 backlog 0b 0p requeues 30
qdisc fq_codel 0: dev eth0 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
 Sent 169397185 bytes 674650 pkt (dropped 0, overlimits 0 requeues 30)
 backlog 0b 0p requeues 30
  maxpacket 1514 drop_overlimit 0 new_flow_count 18 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc mq 0: dev eth1 root
 Sent 1353781125 bytes 1455605 pkt (dropped 0, overlimits 0 requeues 449)
 backlog 0b 0p requeues 449
qdisc fq_codel 0: dev eth1 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
 Sent 1353781125 bytes 1455605 pkt (dropped 0, overlimits 0 requeues 449)
 backlog 0b 0p requeues 449
  maxpacket 1514 drop_overlimit 0 new_flow_count 456 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth1.1 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 800b: dev eth0.2 root refcnt 2 bandwidth 17Mbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0
 Sent 100248881 bytes 473261 pkt (dropped 6998, overlimits 268591 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 184512b of 4Mb
 capacity estimate: 17Mbit
 min/max network layer size:           42 /    1514
 min/max overhead-adjusted size:       42 /    1514
 average network hdr offset:           14

                  Tin 0
  thresh         17Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay        339us
  av_delay         20us
  sp_delay          4us
  backlog            0b
  pkts           480259
  bytes       105882719
  way_inds            0
  way_miss          141
  way_cols            0
  drops            6998
  marks               0
  ack_drop            0
  sp_flows            4
  bk_flows            1
  un_flows            0
  max_len         17032
  quantum           518

That was from "piece of cake", a single tin version of cake.

Here cake has dropped about 7000 packets from my upload traffic in order to keep the response time reasonable. (total number of packets was 480k, so a little over 1% of packets were dropped)

Similar stats can be seen for most qdiscs, but the presentation varies.

fuller · April 9, 2020, 8:36am

it tries to be fair to all the flows (pair of source and destination) and hence does not rely on complex rulesets, which is its main strength (smartness) and so it de-priorizes flows/transfers which would saturate the line (like long-running uploads/downloads) and priorizes things that dont (interactive services).

it does not care about types of traffic per se (you can configure it to) because classification is hard and error prone (think vpn/encryption/liars) and so it cannot really give statistics in the sense of which website/service was throttled (because it does not know/care).

jow · April 9, 2020, 9:23am

I'd love to visualize some of that info in the ui, to give a quick overview about the shaping stats. Can someone explain how to extract meaningful info out of the tc output?

hnyman · April 9, 2020, 9:31am

It depends hugely on the qdisc on use.

For most qdiscs we need to parse from text based input, for cake there is a table. (for layer_cake a multi column table of tins in use)

I think that the dropped/total number of packets is maybe the most useful metric telling us how much of the traffic is effectively shaped. That collected every 30 seconds?

qdisc specialists like @tohojo @moeller0 @dtaht and @ldir might have valuable input on getting tc stats of SQM actions.

ldir · April 9, 2020, 12:33pm

oh boy, can open, worms all over the place. I can advise on some aspects of cake but not much else. The good news is that tc can output in json format which probably makes things easier in terms of not scraping screen output.

'tc -d qdisc show' will show basic config for all qdiscs on all interfaces and isn't particularly interesting/helpful
'tc -d qdisc show dev eth0' will do the same but for just the specified interface.
Adding a '-j' flag will return same output/s in json format.

The more interesting stuff is in the '-s' statistics output as @hnyman posted above. In terms of cake output, cake can be viewed as an overall shaper with traffic classified into groups called Tins. A cake instance can have 1, 3, 4 or 8 tins. A 4 tin configured cake looks like:

qdisc cake 8081: root refcnt 9 bandwidth 19900Kbit diffserv4 dual-srchost nat nowash ack-filter split-gso rtt 100.0ms ptm overhead 26 mpu 72 
 Sent 100593447 bytes 224203 pkt (dropped 7430, overlimits 326315 requeues 64) 
 backlog 0b 0p requeues 64
 memory used: 237416b of 4Mb
 capacity estimate: 19900Kbit
 min/max network layer size:           28 /    1500
 min/max overhead-adjusted size:       74 /    1550
 average network hdr offset:           14

                   Bulk  Best Effort        Video        Voice
  thresh       1243Kbit    19900Kbit     9950Kbit     4975Kbit
  target         14.6ms        5.0ms        5.0ms        5.0ms
  interval      109.6ms      100.0ms      100.0ms      100.0ms
  pk_delay        764us        2.5ms         43us         72us
  av_delay        117us        279us         29us         16us
  sp_delay          6us         23us         21us         16us
  backlog            0b           0b           0b           0b
  pkts            45927       184129         1260          317
  bytes        15150132     87582439        79573        16302
  way_inds          360          754            0            0
  way_miss          257         1312          216            2
  way_cols            0            0            0            0
  drops              33         1145            0            0
  marks               0            0            0            0
  ack_drop         4936         1316            0            0
  sp_flows            1            0            1            1
  bk_flows            0            1            0            0
  un_flows            0            0            0            0
  max_len         11245         7570          603           54
  quantum           300          607          303          300

qdisc ingress ffff: parent ffff:fff1 ---------------- 
 Sent 234180120 bytes 245510 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0

For the 3 & 4 tin category instances of cake, the tins are given a descriptive name, the other instances they're just called 'Tin 'n''. Tins on the left have lower priority than tins on the right. Tin selection is (usually) determined by the packet's DSCP value. In terms of the stats values reported here's what they mean AFAIUI.

Thres: Defines how much bandwidth is consume in this tin before it switches to a lower priority. In the above example, voice is guaranteed 5Mbit of bandwidth. If video needed up to 10Mbit and voice was consuming more than 5Mbit, video would start stealing bandwidth from voice until the bandwidth minimums were reached. They're soft limits in that anyone can have all of the bandwidth as long as no one with higher priority needs it.

target: the 'ideal' target delay that we'll tolerate for an individual flow. ie. how old stuff can be in the queue before we'll consider taking action, like shooting packets to tell people to slow down.

interval: I need to check this!

pk_delay: peak packet delay, the oldest packet we had in the queue, ie. how long the oldest packet hung around before we got to dequeue it.
av_delay: the average delay of the packets in the queue that we dequeued.
sp_delay: the delay in the queue for sparse packets. Oh boy: packet flows that send on a continuous basis are regarded as 'bulk' flows (ftp transfer). Packets that flow occasionally are regarded as 'sparse' (think interactive ssh). So this tells you how delayed the 'interactive' packets have been.
NB: All the above delay stats are EWMA averages so they can lag slightly or if there's no packet in a tin it can appear to stall.

backlog: number of bytes in the queue, waiting to be sent when the shaper says there's space/time to be able to send.
pkts(c): number of packets that have flowed through this tin
bytes(c): number of bytes that have flowed through this tin
way_inds(c), way_miss(c), way_cols(c): Each packet flow is ideally put into an individual queue, these are almost like cache stats and show how succesful we were in achieving that. Mostly uninteresing.
drops(c): number of packets we dropped as part of our queue control mechanism
marks(c): number of packets we ECN marked (on ECN capable flows) in preference to dropping them.
ack_drop(c): If ack-filtering enabled, number of unnecessary ack only packets that were dropped.
sp_flows: the number of sparse packet flows in this tin
bk_flows: the number of bulk packet flows in this tin
un_flows: the number of unresponsive packet flows in this tin. If a flow doesn't respond to codel style 'slow down' signalling in a normal manner then it is considered unresponsive and is handled by the 'blue' aqm instead.
maxlen: the largest packet we've seen in the queue.
quantum: granularity in bytes of how much we can de-queue in our queues and release to the shaper.

Most of the figures are an instantaneous snapshot or 'gauge' of the current state, I've indicated with a (c) where the values accumulate and thus would need 2 samples over time to produce a 'rate'

In terms of key performance display I think:

Traffic rate (bytes - prev_bytes)
backlog
pk, av, sp delays (vs target)
drops rate
marks rate
sparse, bulk, unresponsive flows

hnyman · April 9, 2020, 3:27pm

One problem with all these stats is that they are pretty cryptic to the casual user.

Most people have no idea what these qdiscs used by SQM actually do, and how they do the magic.

E.g. is having drops ok? How much?
If I have too many drops, what can I do on my LAN client's side (e.g. decrease upload speed limits on streaming/torrenting clients, etc.)

The interpretation may also depend on the use case: heavy upload / streaming / torrenting or just casual emails and office remote work, etc.

So, although visualising QoS stats sounds interesting, I am not sure if we could design any reasonable logic to provide the user with an useful interpretation.

(And cake is pretty much the only qdisc that provides more details. fq_codel and others give a rather short key stats overview on 2-3 lines.)

moeller0 · April 9, 2020, 5:19pm

Even worse, the individual fq-"buckets" come and go IIRC, that is their number and "identity" is not stable over time (cake would have the same issue, except it does not even attempt to report the individual flow instances, as far as I remember).

That, said, for cake we could just try to add the output of tc -s qdisc dev $DEVto a new tab in the SQM luci GUI?

ACwifidude · April 9, 2020, 5:22pm

That would be a great feature to add to be able to get a quick visual on the GUI.

ldir · April 10, 2020, 12:35pm

Yes you make very good points. A qdisc drops packets to make things go faster/smoother/fairer which in a world that still thinks 'dropping a packet = bad' that's quite some concept to get around

Cake has a lot of stats output mainly as a result of testing it and wanting to get a dynamic view of what was going on underneath.

For me, personally, a graph view showing how much traffic is going across each tin would be interesting as it gives hints on DSCP classification effectiveness. Unresponsive flows are worrying from the point of view of 'why don't they respond to signalling'.

A page where I could graph 2 cake instances side-by-side (ingress/egress) would be useful/interesting though I agree they have about as much import/use as the flashenblinkenlights on the majority of kit.

A dynamically updated 'tc -s qdisc dev foo' output page would be a really useful start.

JonP · April 10, 2020, 8:57pm

Hmm... might it be possible to do something with collectd a la the luci-app-statistics package?

I recently started using this, and it does a nice job of gathering sys info and graphing it over time.

dtaht · April 11, 2020, 4:43am

I would like very much a tool to display this information. Either something to show all of cakes stats, or merely bandwidth, marks and drops (separately or stacked). Or both.

At one point I had convinced a shell script and a custom snmp mrtg thing to let me graph marks/drops/throughput, but that was many years ago. Is collectd the new hotness?

JonP · April 11, 2020, 9:18pm

Dont know much about it, just enjoying using the statistics app. There of course might be something better out there. It's easy enough to test drive it and take a look at how it works, tho.

dtaht · April 11, 2020, 9:31pm

I liked this first try, but don't know anything about munin:

prwood · April 12, 2020, 11:31pm

That would quite fit the bill...

ldir · April 13, 2020, 11:27am

You can add custom commands to the luci gui by editing /etc/config/luci. eg:

config command            
        option command 'tc -s qdisc show dev eth0'
        option name 'qdisc stats egress'
                      
config command                       
        option command 'tc -s qdisc show dev ifb4eth0'
        option name 'qdisc stats ingress'

tohojo · April 14, 2020, 6:35am

If someone is going to take a stab at parsing, please use the json format instead of parsing the human-readable output!

tc -js qdisc show dev <iface>

moeller0 · April 14, 2020, 8:08am

Mmmh, I need to use tc -j -s qdisc show dev <iface> to get the detailed statistics (OpenWrt master-r12856-ae06a650d6 from a week ago)...
-js gives identical output to -j so without the detailed information:

root@router:~# tc -js qdisc show dev pppoe-wan
[{"kind":"cake","handle":"807d:","root":true,"refcnt":2,"options":{"bandwidth":3875000,"diffserv":"diffserv3","flowmode":"dual-srchost","nat":true,"wash":false,"ingress":false,"ack-filter":"disabled","split_gso":true,"rtt":100000,"raw":false,"atm":"noatm","overhead":34,"mpu":68,"fwmark":"0"}},{"kind":"ingress","handle":"ffff:","parent":"ffff:fff1","options":{}}]
root@ router:~# tc -j qdisc show dev pppoe-wan
[{"kind":"cake","handle":"807d:","root":true,"refcnt":2,"options":{"bandwidth":3875000,"diffserv":"diffserv3","flowmode":"dual-srchost","nat":true,"wash":false,"ingress":false,"ack-filter":"disabled","split_gso":true,"rtt":100000,"raw":false,"atm":"noatm","overhead":34,"mpu":68,"fwmark":"0"}},{"kind":"ingress","handle":"ffff:","parent":"ffff:fff1","options":{}}]
root@ router:~# tc -j -s qdisc show dev pppoe-wan
[{"kind":"cake","handle":"807d:","root":true,"refcnt":2,"options":{"bandwidth":3875000,"diffserv":"diffserv3","flowmode":"dual-srchost","nat":true,"wash":false,"ingress":false,"ack-filter":"disabled","split_gso":true,"rtt":100000,"raw":false,"atm":"noatm","overhead":34,"mpu":68,"fwmark":"0"},"bytes":250453892,"packets":1244562,"drops":212,"overlimits":190349,"requeues":0,"backlog":0,"qlen":0,"memory_used":220224,"memory_limit":4194304,"capacity_estimate":3875000,"min_network_size":28,"max_network_size":1492,"min_adj_size":68,"max_adj_size":1526,"avg_hdr_offset":0,"tins":[{"threshold_rate":242187,"sent_bytes":25052,"backlog_bytes":0,"target_us":9401,"interval_us":104401,"peak_delay_us":26,"avg_delay_us":18,"base_delay_us":15,"sent_packets":625,"way_indirect_hits":0,"way_misses":295,"way_collisions":0,"drops":0,"ecn_mark":0,"ack_drops":0,"sparse_flows":0,"bulk_flows":0,"unresponsive_flows":0,"max_pkt_len":60,"flow_quantum":300},{"threshold_rate":3875000,"sent_bytes":250572324,"backlog_bytes":0,"target_us":5000,"interval_us":100000,"peak_delay_us":73,"avg_delay_us":21,"base_delay_us":16,"sent_packets":1242943,"way_indirect_hits":34511,"way_misses":33401,"way_collisions":0,"drops":212,"ecn_mark":8,"ack_drops":0,"sparse_flows":1,"bulk_flows":1,"unresponsive_flows":0,"max_pkt_len":1492,"flow_quantum":946},{"threshold_rate":968750,"sent_bytes":168410,"backlog_bytes":0,"target_us":5000,"interval_us":100000,"peak_delay_us":150,"avg_delay_us":24,"base_delay_us":17,"sent_packets":1206,"way_indirect_hits":0,"way_misses":90,"way_collisions":0,"drops":0,"ecn_mark":0,"ack_drops":0,"sparse_flows":0,"bulk_flows":0,"unresponsive_flows":0,"max_pkt_len":576,"flow_quantum":300}]},{"kind":"ingress","handle":"ffff:","parent":"ffff:fff1","options":{},"bytes":5413180647,"packets":4006766,"drops":0,"overlimits":0,"requeues":0,"backlog":0,"qlen":0}]
root@ router:~#

Best Regards
Sebastian

tohojo · April 14, 2020, 8:47am

Ugh, right, -j -s is what I meant; iproute2 has a fairly arcane command line parser, so it doesn't recognise combining switches in a single argument, unfortunately

eduperez · April 14, 2020, 8:51am

A plug-in for "collectd" would be... the icing on the cake!
(lame pun, I know)