Hey @rb1 , sure, I can share the script. I'm currently at work and just came to realize that my wireguard tunnel to my server is down so I can only share it lately.
Meanwhile, shouldn't we split this discussion into a new thread, or even proceed from the SQM Reporting thread, where it suits better?
EDIT: @rb1 In case you want to start something without waiting for my script, I can advance some details:
The script reads the json output from tc -s -j qdisc dev ...., then it uses jq to save each metric into a small file named the same as the metric. So, at each run, it updates the files bytes, bandwidth, drops and so on. I set it to run each 10 seconds as a systemd process, started by a timer. Warning: for each metric a jq execution, so the script isn't exactly resource friendly;
I created some extended snmp oid's, one for each metric. Each oid is obtained simply by cat the metrics files;
In collectd, I use the snmp plugin, and set it up with all metrics oid's.
By the way, there are two servers (x86/64) involved here: the router (where the script and snmp runs) and the grafana/prometheus/collectd server.
After starting all this, I noticed an increase in the router CPU usage from around 1% to 8%, mainly due to the script running each 10 seconds, as well as snmp being called (by collectd) each 10 seconds.
This is a messy setup, I recognize, but like I said before, I took the script + snmp processes that were already running as collectors for my now dead Cacti monitoring. It can certainly be improved and simplified. But since I have spare resources in the router, I kinda relaxed (well... maybe I should look closely to the energy bill...)
Hi @gadolf, thanks for taking the time explaining in detail how you get the data into prometheus. Very interesting. When I get some time, see if I can set up something similar in Grafana.
Thanks for pointing me towards the SQM Reporting thread. Yeah I guess this could be split into a different thread, it is a bit off topic, I haven't explored this forum much beyond this topic.
One more question, just curious with your cake setup, looks like you have 4 tins: bulk, best effort, video and voice. I'm happy with the built in piece of cake with the one tin but if I use layer_cake.qos or simple.qos, tins I get are bulk, best effort and voice. What do you use to get the video tin?
gustavo@srv:/etc/sqm$ cat enp2s0.iface.conf
# Default SQM config; the variables defined here will be applied to all
# interfaces. To override values for a particular interface, copy this file to
# <dev>.iface.conf (e.g., "eth0.iface.conf" for eth0).
# When using ifupdown, the interface config file needs to exist for sqm-scripts
# to be activated on that interface. However, these defaults are still applied,
# so the interface config can be empty.
# Uplink and Downlink values are in kbps
# SQM recipe to use. For more information, see /usr/lib/sqm/*.help
# Optional/advanced config
# ECN ingress resp. egress. Values are ECN or NOECN.
# Extra qdisc options ingress resp. egress
IQDISC_OPTS="diffserv4 nat dual-dsthost"
#EQDISC_OPTS="diffserv4 nat dual-srchost ack-filter"
EQDISC_OPTS="diffserv4 nat dual-srchost"
# CoDel target
As for the whole grafana/prometheus/collectd setting, I was thinking that if you use a single machine, there's no need to bring snmp into the equation, and just use collectd_exec plugin, as proposed in that thread I pointed before.
I would probably be able to also eliminate snmp from my setup. All I had to do is installing collectd in the router machine and have prometheus on the monitor machine pointing to collectd url. Maybe I'll do that soon.
Please, remember: All this setup is under Debian machines, not Openwrt.
Ok thanks. SQM and cake is something I want to learn more about.
I'm thinking about making another python prometheus exporter starting off with the json values in your first script. I've played around with collectd a long time ago, (I know there is the collectd-mod-sqm package) but had some troubles with collectd exporter getting data from it into Prometheus. I believe there is another way using collectd to influx then grafana but haven't tested that yet. Looks like in that SQM reporting thread, there are some nice grafana plots using collectd-influx.
Can't tell, I typically do a separate git clone/pull on a separate machine, look at the scripts and manually copy them to my router, so I never tried the installation script. But maybe open a new issue on the github page, so you get the developr's attention to the issue?
That is because Linux will only instantiate a qdisc (like cake) on an egress interface (which for WAN would only cover the upload traffic). So to be able to attach qdisc's to ingress traffic we need to somehow convert what is ingress traffic into some sort of egress traffic (as far as the kernel is concerned, ingress and egress only matter in respect to a given interface, so internet download traffic is ingress for WAN, but egress for LAN).
There are three more or less common methods to achieve that:
A) use the kernels intermediate functional block device (IFB) to create an interface-copy from the egress interface that can deal with ingress traffic (that is what sqm-scripts does)
B) create a veth pair and a matching routing table to redirect ingress traffic over that pair and then attach the qdisc to the egress half of that pair
C) simple instantiate the ingress shaper on a LAN interface (but that only works for wired only puters where the router itself generates next to no traffic)
What all three have in common is, that from the kernel's view there will be two different interfaces one for egress, one for ingress. So "overly complex" might be right, but "rare" seems incorrect, at least for users of ingress traffic shaping/AQM.
I would recommend you install and configure sqm-scripts/luci-app-sqm first (opkg update; opkg install luci-app-sqm), then look at:
for the configuration and once you have a working sqm installation, then deal with cake-autorate.
Thanks for the information guys. Apologies, it might be worth adding some additional notes about this, I had CAKE installed and configured, but didn't have it enabled for that interface, so "tc qdisc ls" didn't return much. I guess it's not really an "interface" in the traditional Openwrt form (it's more like a qdisc interface), hence my confusion. I'll try it again soon and see how we go.
cake-autorate will use whatever config file is placed in the running directory /root/cake-autorate at the time the script is launched. If you've either run the setup script or manually downloaded the files from the master branch then you've got the most up to date version. I should look into having the setup script write out the latest commit identified into a version file.
I've still not actually released 2.0.0. Maybe it's time now as everything seems super stable again! Versioning is not my favourite aspect of working on cake-autorate .
Once you have it running I'd encourage you to obtain a log file showing a couple of speed tests and upload it for us to take a look and make a plot and verify all looks in order.
it struck me as worthwhile to facilitate working with unusual cake instantiations in respect of the tc change calls in which the cake qdisc is not necessarily placed at root, but instead placed at a specific parent band.
Here is the output from 'tc qdisc ls' on my router when using the new capability of 'cake-qos-simple' to overwrite the ECN bits with '0' on upload and download:
root@OpenWrt-1:~# tc qdisc ls
qdisc noqueue 0: dev lo root refcnt 2
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1518 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
qdisc noqueue 0: dev lan1 root refcnt 2
qdisc noqueue 0: dev lan2 root refcnt 2
qdisc noqueue 0: dev lan3 root refcnt 2
qdisc noqueue 0: dev lan4 root refcnt 2
qdisc prio 1: dev wan root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc cake 808f: dev wan parent 1:1 bandwidth 20Mbit diffserv4 triple-isolate nat wash ack-filter split-gso rtt 100ms noatm overhead 0
qdisc ingress ffff: dev wan parent ffff:fff1 ----------------
qdisc noqueue 0: dev br-guest root refcnt 2
qdisc noqueue 0: dev br-lan root refcnt 2
qdisc noqueue 0: dev wlan1 root refcnt 2
qdisc noqueue 0: dev wlan0 root refcnt 2
qdisc noqueue 0: dev wlan0-1 root refcnt 2
qdisc noqueue 0: dev wlan0.sta8 root refcnt 2
qdisc noqueue 0: dev wlan0.sta9 root refcnt 2
qdisc noqueue 0: dev wlan1.sta1 root refcnt 2
qdisc noqueue 0: dev wlan1.sta2 root refcnt 2
qdisc cake 8090: dev ifb-wan root refcnt 2 bandwidth 20Mbit diffserv4 triple-isolate nat nowash ingress no-ack-filter split-gso rtt 100ms noatm overhead 0
Sure, odd things can happen... e.g. a small ISP might have a direct path to an internet exchange that typically is used with a delay of Xms, but in primetime this link might be constantly overloaded and so some connections/customers get e.g. routed via a secondary path of different length resulting in a statically different delay to the same targets. If the path difference happen inside say an MPLS network this would be really hard to diagnose for end customers...
This is however a hypothetical answer, no idea what your ISP is "cooking" there.
This version restructures the bash code for improved robustness, stability and performance (@lynxthecat and @rany2).
Employ FIFOs for passing not only data, but also instructions, between the major processes, obviating costly reliance on temporary files. A side effect of this is that now /var/run/cake-autorate is mostly empty during runs (@lynxthecat).
Significantly reduced CPU consumption - cake-autorate can now run successfully on older routers (@lynxthecat and @rany2).
Introduce support for one way delays (OWDs) using the 'tsping' binary developed by Lochnair. This works with ICMP type 13 (timestamp) requests to ascertain the delay in each direction (i.e. OWDs) (@lynxthecat).
Many changes to help catch and handle or expose unusual error conditions (@rany2).
Introduce more user-friendly config format by introducing defaults.sh and config.X.sh with the basics (interface names, whether to adjust the shaper rates and the min, base and max shaper rates) and any overrides from the defaults defined in defaults.sh (@rany2).
More intelligent check for another running instance (@rany2).
Introduce more user-friendly log file exports by automatically generating an export script and a log reset script for each running cake-autorate instance inside /var/run/cake-autorate/*/ (@lynxthecat).
Added config file validation that checks all config file entries against those provided in defaults.sh (@rany2).