Collectd network plugin getaddrinfo failed System error

Hello,

I want to get metrics from my router and ap and send them to an influxdb instance.
I installed luci-app-statistics which installed collectd and some plugins. All plugins work as expected on the router and ap.
When I configure the network output plugin and insert the influxdb host (myserver.lan) and port (25826), data isn't received by the influxdb instance. After some troubleshooting, I can see the following in the logs :

daemon.err collectd[12369]: configfile: stat (/etc/collectd/conf.d) failed: No such file or directory
daemon.err collectd[12369]: plugin_load: plugin "iwinfo" successfully loaded.
daemon.err collectd[12369]: plugin_load: plugin "memory" successfully loaded.
daemon.err collectd[12369]: plugin_load: plugin "cpu" successfully loaded.
daemon.err collectd[12369]: plugin_load: plugin "rrdtool" successfully loaded.
daemon.err collectd[12369]: rrdtool plugin: RRASingle = true: creating only AVERAGE RRAs
daemon.err collectd[12369]: plugin_load: plugin "interface" successfully loaded.
daemon.err collectd[12369]: plugin_load: plugin "network" successfully loaded.
daemon.err collectd[12369]: plugin_load: plugin "load" successfully loaded.
daemon.err collectd[12369]: Initialization complete, entering read-loop.
daemon.err collectd[12369]: network plugin: getaddrinfo (myserver.lan, 25826) failed: System error

It seems that collectd has a problem in resolving the DNS name for myserver.lan
I then decided to input the server's ip address to get pass the DNS resolution but I have the same error :

daemon.err collectd[12831]: configfile: stat (/etc/collectd/conf.d) failed: No such file or directory
daemon.err collectd[12831]: plugin_load: plugin "iwinfo" successfully loaded.
daemon.err collectd[12831]: plugin_load: plugin "memory" successfully loaded.
daemon.err collectd[12831]: plugin_load: plugin "cpu" successfully loaded.
daemon.err collectd[12831]: plugin_load: plugin "rrdtool" successfully loaded.
daemon.err collectd[12831]: rrdtool plugin: RRASingle = true: creating only AVERAGE RRAs
daemon.err collectd[12831]: plugin_load: plugin "interface" successfully loaded.
daemon.err collectd[12831]: plugin_load: plugin "network" successfully loaded.
daemon.err collectd[12831]: plugin_load: plugin "load" successfully loaded.
daemon.err collectd[12831]: Initialization complete, entering read-loop.
daemon.err collectd[12831]: network plugin: getaddrinfo (192.168.1.10, 25826) failed: System error

On the router and on the ap, I can resolve the name myserver.lan and everything works as expected.

This is happening on a Linksys WRT1900ACS with OpenWrt 19.07.2 and on a TP-Link Archer C7 v5 with OpenWrt 19.07.7

Has anyone had the same issue? Is anyone using the network output plugin with success?

Thanks.

How often does this happen, occasionally or always?
Does the issue persist if you reboot the device and/or restart collectd?

Collect the output and post it to pastebin.com redacting the private parts:

ubus call system board; \
opkg list-installed | grep -e ^collectd -e ^luci-app-statistics; \
uci show collectd; uci show luci_statistics; uci show dhcp; \
head -v -n -0 /etc/collectd.conf; \
head -v -n -0 /etc/resolv.* /tmp/resolv.* /tmp/resolv.*/*; \
nslookup example.org; nslookup myserver.lan

Hello @vgaetera

First of all thanks for your answer.
Sorry for not mentioning it in my first post, but sending the collected data to the influxdb instance never worked, I always got that getaddrinfo error (when using the hostname or the ip address). Before posting, I also rebooted the router and restarted collectd and luci_statistics several times when testing and it never worked.

You can find the output of the requested commands here : https://pastebin.com/mPdAffRb

1 Like

This might be related to some differences in the getaddrinfo implementation:

Try to create a static DHCPv6 lease for this host as well.
Make sure the respective hostname is resolved to both IPv4 and IPv6 ULA.

1 Like

IPv6 is currently deactivated on my entire network, both on the router and on the hosts (and I wouldn't want to activate it, at least not for the moment), so re-configuring my entire network just to make the network plugin of collectd work is not at hand.
On the other side I defined an IPv6 domain for my influxdb host (just to see if it would solve the problem) and after checking that the resolution for that host is done for IPv4 and IPv6, I restarted collectd but sadly I still get the same error no matter if I use the domain name or the IPv4 address.
I am no expert (obviously) and my understanding is quite limited in this area, but it seems a little bit weird to have collectd query getaddrinfo when I input an IPv4 address for the remote influxdb host, and why does it fail in doing so?
Just to make sure it is not my network that is causing the issue, I installed and configured a VM with Debian and collectd and the database in influxdb is correctly populated (I understand what you've linked in your previous post about the getaddrinfo differences between glibc and musl, but in this case I think the implementation in OpenWRT is faulty compared to the one in Debian).

Mar  8 16:31:03 debian collectd[11302]: plugin_load: plugin "syslog" successfully loaded.
Mar  8 16:31:03 debian collectd[11302]: plugin_load: plugin "battery" successfully loaded.
Mar  8 16:31:03 debian collectd[11302]: plugin_load: plugin "cpu" successfully loaded.
Mar  8 16:31:03 debian collectd[11302]: plugin_load: plugin "df" successfully loaded.
Mar  8 16:31:03 debian collectd[11302]: plugin_load: plugin "disk" successfully loaded.
Mar  8 16:31:03 debian collectd[11302]: plugin_load: plugin "entropy" successfully loaded.
Mar  8 16:31:03 debian collectd[11302]: plugin_load: plugin "interface" successfully loaded.
Mar  8 16:31:03 debian collectd[11302]: plugin_load: plugin "irq" successfully loaded.
Mar  8 16:31:03 debian collectd[11302]: plugin_load: plugin "load" successfully loaded.
Mar  8 16:31:03 debian collectd[11302]: plugin_load: plugin "memory" successfully loaded.
Mar  8 16:31:03 debian collectd[11302]: plugin_load: plugin "processes" successfully loaded.
Mar  8 16:31:03 debian collectd[11302]: plugin_load: plugin "rrdtool" successfully loaded.
Mar  8 16:31:03 debian collectd[11302]: plugin_load: plugin "swap" successfully loaded.
Mar  8 16:31:03 debian collectd[11302]: plugin_load: plugin "users" successfully loaded.
Mar  8 16:31:03 debian collectd[11302]: plugin_load: plugin "network" successfully loaded.
Mar  8 16:31:03 debian collectd[11302]: Systemd detected, trying to signal readyness.
Mar  8 16:31:03 debian collectd[11302]: Initialization complete, entering read-loop.

Do you have any other idea in troubleshooting this besides activating IPv6 on my network (which if really needed and solving the problem, should not be an accepted solution, as in some cases IPv6 could be a no go)?

Thanks again for the time spent helping me.

one option... try something like tcprelay from a local socket... otherwise maybe foreground/follow strace collectd and file a bug report...

I guess this is an easy way to reproduce the issue:
https://openwrt.org/docs/guide-user/perf_and_log/statistic.collectd#ping_check

Then customize the hosts:

uci set luci_statistics.collectd_ping.Hosts="openwrt.org"
uci commit luci_statistics
/etc/init.d/luci_statistics restart

And check the logs:

# logread -l 3 -e collectd
Mon Mar  8 21:05:42 2021 daemon.err collectd[30196]: Initialization complete, entering read-loop.
Mon Mar  8 21:05:43 2021 daemon.err collectd[30196]: ping_sendto: Permission denied
Mon Mar  8 21:05:43 2021 daemon.err collectd[30196]: ping plugin: ping_send failed: Permission denied

# ping -q -c 1 openwrt.org
PING openwrt.org (139.59.209.225): 56 data bytes

--- openwrt.org ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 90.241/90.241/90.241 ms

# ping6 -q -c 1 openwrt.org
PING openwrt.org (2a03:b0c0:3:d0::1af1:1): 56 data bytes
ping6: sendto: Permission denied

Looks like getaddrinfo prefers IPv6, so it fails when we have no IPv6 connectivity.

@hnyman, any idea?

Well, regarding your reproduce example: actually the ping plugin of collectd does support specifying the address family in config, so "ping" itself could be configured to avoid the error...

But I think that the network plugin has no such support.

All upstream examples are actually based on specifying the IP address directly, instead of a hostname.

And the upstream source evaluates the address with ai_family = AF_UNSPEC:

3 Likes

Yep, specifying the address family to IPv4 explicitly works.
But it still fails with implicit settings and no IPv6 connectivity.
So, musl/getaddrinfo appears to be the cause of the issue:

# ldd $(type -p collectd)
	/lib/ld-musl-x86_64.so.1 (0x7f98f292d000)
	libz.so.1 => /usr/lib/libz.so.1 (0x7f98f2919000)
	liblua.so.5.1.5 => /usr/lib/liblua.so.5.1.5 (0x7f98f28ef000)
	libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x7f98f28db000)
	libc.so => /lib/ld-musl-x86_64.so.1 (0x7f98f292d000)

@Malakai, you should file a ticket if you want it to be fixed:

1 Like

Not sure if anything might change in musl upstream would change, as this problematics has been discussed earlier in 2018 and the current behaviour set. See the discussion thread https://www.openwall.com/lists/musl/2018/07/11/1 )

(Not sure if reporting musl bug/feature in OpenWrt bug tracker would have any impact)

There's always a chance to find a better solution if someone reviews the old code.
On the other hand, the issue may never be fixed if it isn't reported.

There're quite a few musl-specific changes and patches:

Anyway, we have traced the issue and suggested possible workarounds.
So, it's up to the OP to decide how to proceed.

2 Likes

Thanks everyone for troubleshooting this.
I will most probably try to open a bug report (never done so before, but there is a first for everything ; not sure I will be able to correctly explain the problem but I will try).
Centrally collecting metrics will have to wait until the bug is solved (which if it happens seems to be in a future not so near) or I finally start using IPv6.

1 Like

Note that the issue is still relevant even when you do not disable IPv6 specifically.
I tested OpenWrt 19.07.7 x86_64 as a KVM/QEMU guest with mostly default configuration.
So, IPv6 is enabled but IPv6 connectivity is missing and the issue persists.

2 Likes

I opened the bug report : https://bugs.openwrt.org/index.php?do=details&task_id=3675
If someone thinks that I should have added other information or maybe done something differently, I am all ears.

Thanks.

2 Likes

I think that it is the likely same issue that has plagued opkg downloads when there is ipv6 addressing/DNS, but no ipv6 connectivity.

3 Likes

I'm also having issues with the output. Where can I find the collectd logs ? I added log-mod but the log is never created in /var/log/collects.log

@hnyman any idea?