Building an ubus interface for ethtool statistics

Hello everybody,

I really want to use one of the "new" Realtek RTL838x switches in an production environment (specifically ZyXEL GS1900-8HP).

One of the missing pieces is a way to collect statistics about the switching ports.

Fortunately per port metrics from the switching chip are accessible via ethtool -S (because the switching chip uses DSA).

Because I am using prometheus as my monitoring stack I've build a simple collector for prometheus-node-exporter-lua that calls ethtool -S for every port, parses the output and exposes the statistics.

This collector does not only expose the driver specific metrics for the RTL838x switching chips, but also every other possible metric exposed via ethtool -S for every possible interface/device.
Therefore it is a generic solution that could be upstreamed and would hopefully be useful for more people.

But the way this collector is implemented gives me the heebie-jeebies.

So I am searching for an elegant solution to implement this feature (gathering ethtool -S statistics and exposing them via prometheus-node-exporter-lua).

This is my current idea:

  • extend netifds ubus interface to also expose per interface ethtool statistics (netifd already exposes some device statistics that (to my knowledge) are used by luci, other prometheus-node-exporter-lua collectors and so on)
  • to stop the horrific practice of calling ethtool -S for every link, instead issue the ioctls directly from netifd
  • write a simple lua collector that calls the ubus interface of netifd and exposes the metrics via prometheus

This would solve my current problem and also build a way for other programs to easily gather ethtool metrics.

What do you think ?

Thanks in advance ! :slight_smile:

1 Like

Extending the netifd ubus implementation might be a viable way, but you need to be aware that it only covers Linux netdevs which are managed by netifd. So it wouldn't be a truly generic solution but only expose statistics for ports which are somehow covered by the network uci which might be a surprising behavior for some users.

You could consider writing an rpcd plugin to expose stats for all netdevs, unrelated to netifd's own state.

In my opinion, the ethtool kernel API is horrible and weird (you do have to e.g. fetch string lists, use them to locate the offsets of "features" which you then look up in blocks of variable bit arrays etc.) - so scraping ethtool output, while not nice, might be not that bad after all.

In any case here's some minimal ethtool API code I added to netifd in order to expose the TC HW offload capability, maybe it helps to get you started:
https://git.openwrt.org/?p=project/netifd.git;a=commitdiff;h=fd4c9e17c8f22b866c1bf386c580074e3e678910;hp=3d76f2e2a1aa5c791aefce5c3458134e6e16400b

Thank you for your answer @jow :slight_smile:

So what you are saying is, that there are netdevs which are not configured/managed by netifd ?

I didn't know that there is a plugin interface for rpcd. And it is also possible to execute random binaries/scripts. Love it ! This seems like the "right thing to do".

My only "real" concern is, that it is super inefficient to spawn a process for every interface I want to get statistics for. But writing shitty c code that interacts directly with the kernel maybe a bit worse :smiley:

Nevertheless which path I choose for actually gathering the statistics, there is no way to gather them all at once, right ? So no matter what I do, there always is a delay between querying the first and last interface, correct ?

Which means that in theory my monitoring data is not 100% accurate. This shouldn't be a concern for my usecase, I just wanted to point that out.

Is listing every file in /sys/class/net an appropriate way to get a list of all netdevs ?

I'll start writing an rpcd plugin now :slight_smile:

Since I'm more used to snmp for switch monitoring, my approach was to extend mini_snmpd with ethtool statistics support. This was merged a while ago: https://github.com/troglobit/mini-snmpd/pull/23

and i've been using it on my GS1900-10HP swithches since then.

The plan was to extend the offiical OpenWrt packaging once this ended up in a mini_snmpd release. But that seems to take a while.. Should proably just do it.

One problem with mini_snmpd though, is that it will only support 8 interfaces by default. And this is a build-time limit, which has a significant memory impact whether you need more ports or not.