DocMAX
January 3, 2025, 11:41pm
1
Is the collectd dns plugin working for you guys? never worked for me. always stuck here.
Barebones configuration works here.
Post return of uci show luci_statistics | grep luci_statistics.collectd_dns.
DocMAX
January 4, 2025, 6:01am
3
root@router:~# uci show luci_statistics | grep luci_statistics.collectd_dns
luci_statistics.collectd_dns=statistics
luci_statistics.collectd_dns.enable='1'
luci_statistics.collectd_dns.Interfaces='br-lan'
luci_statistics.collectd_dns.IgnoreSources='127.0.0.1'
LGTM. Provide some details please: ubus call system board
Do you see any syslog anomalies? logread -e collectd
DocMAX
January 5, 2025, 2:42am
5
RuralRoots:
ubus call system board
root@router:~# ubus call system board
{
"kernel": "6.6.68",
"hostname": "router",
"system": "ARMv7 Processor rev 5 (v7l)",
"model": "AVM FRITZ!Box 7530",
"board_name": "avm,fritzbox-7530",
"rootfs_type": "squashfs",
"release": {
"distribution": "OpenWrt",
"version": "SNAPSHOT",
"revision": "r28478-f0df6e3a4a",
"target": "ipq40xx/generic",
"description": "OpenWrt SNAPSHOT r28478-f0df6e3a4a",
"builddate": "1735823339"
}
}
collectd.err.log has nothing about dns. oh, and there is data available..
root@router:/mnt/router/usb/collectd/rrd/router/dns# ls
dns_octets.rrd dns_opcode-Query.rrd dns_qtype-#65.rrd dns_qtype-DNSKEY.rrd dns_qtype-NAPTR.rrd dns_qtype-SOA.rrd dns_qtype-WKS.rrd dns_rcode-RCode0.rrd
dns_opcode-Iquery.rrd dns_opcode-Status.rrd dns_qtype-A.rrd dns_qtype-DS.rrd dns_qtype-NS.rrd dns_qtype-SRV.rrd dns_rcode-FORMERR.rrd dns_rcode-REFUSED.rrd
dns_opcode-Opcode6.rrd dns_qtype-#0.rrd dns_qtype-AAAA.rrd dns_qtype-HINFO.rrd dns_qtype-NULL.rrd dns_qtype-TXT.rrd dns_rcode-NOTIMPL.rrd dns_rcode-SERVFAIL.rrd
dns_opcode-Opcode7.rrd dns_qtype-#64.rrd dns_qtype-CNAME.rrd dns_qtype-MX.rrd dns_qtype-PTR.rrd dns_qtype-URI.rrd dns_rcode-NXDOMAIN.rrd dns_rcode-YXDOMAIN.rrd
even deleted the data if there is any corruption. collectd created new data, but the GUI still can't read.
hotfur
January 5, 2025, 4:46am
6
Do you really need to run on snapshot? Try on 23.05.5 if possible.
The fact that collectd isn’t crashing and rrd data for the DNS Plug-in is producing data leads me to suspect one of the DNS Resource Records data, query types, response codes, or opcode types being produced from one of your bridged interfaces could be buggy.
It would be interesting to see if just collecting collectd DNS data from each single interface in turn that encompasses your bridged interfaces instead of the totality of the br-lan bridged interfaces on your devices would produce visible graph data.
hnyman
January 9, 2025, 6:51am
8
It has never worked ok in the openwrt context.
Even if you manage to configure it right, it will likely crash.
There is likely a threading race inside collected core - DNS plugin - pcap - something, which causes a crash.
(and as there is currently no real upstream development of collectd, the problem will not disappear)
See
I've enabled collectd on the router for data collection but once I've enabled and started the service, it seems to crash some time afterwards (sometimes it's hours after starting, other times it's minutes after starting the service).
This is the message logged when collectd exits:
Sat Sep 1 13:25:32 2018 daemon.info collectd[16359]: dns plugin: pcap_loop exited with status -1.
It's always the DNS plugin that causes this. Once the dns plugin exits, the rest of my collectd config stops working …
I debugged it extensively in
opened 12:54PM - 16 Aug 15 UTC
closed 05:12PM - 05 Apr 18 UTC
Bug
Pending contributor action
I have built identical builds for Openwrt router (ar71xx / mips32r2 / WNDR3700) … with Linux kernel 3.8.x and 4.1.x, and collectd 5.5.0 segfaults quickly with Linux 4.1.x if the dns plugin is enabled. If DNS plugin is disabled, collectd works otherwise normally also with 4.1.x. The plugin works ok with Linux 3.8.x.
Segfault:
```
[ 104.435977]
[ 104.435977] do_page_fault(): sending SIGSEGV to collectd for invalid write access to 76db3ae8
[ 104.444585] epc = 76f6ef30 in dns.so[76f6e000+14000]
[ 104.449597] ra = 0040f084 in collectd[400000+30000]
[ 104.454602]
```
I have opened an issue at the Openwrt bug tracker ( https://github.com/openwrt/packages/issues/1660 ), but I am opening also this issue here as this sounds more like an collectd problem due to some change at Linux.
Also with the current collected 5.12
opened 05:53PM - 26 Dec 20 UTC
* Version of collectd: `5.12`
* Operating system / distribution: `OpenWrt 1… 9.07.4 r11208-ce6496d796`
* Kernel version (if applicable): `4.14.195 #0 SMP x86/64 Linux`
## Actual behavior
Given enough time running, the DNS plugin will segfault and crash, shutting down collectd.
```
Sat Dec 26 14:14:48 2020 daemon.err collectd[17396]: dns plugin: pcap_loop exited with status -1.
Sat Dec 26 14:15:13 2020 kern.info kernel: [103791.357292] reader#0[17408]: segfault at 7f4142f1a758 ip 00007f414313519a sp 00007f4142f1a760 error 6 in dns.so[7f4143135000+2000]
Sat Dec 26 14:15:13 2020 daemon.info procd: Instance collectd::instance1 s in a crash loop 6 crashes, 271 seconds since last crash
```
I have opened https://github.com/openwrt/packages/issues/14339, but the maintainer suggested this should be reported upstream.
This behavior has been going on for a long time even in previous versions on OpenWRT.
The OpenWRT package maintainer tried to help debug it in the past but that did not appear to go anywhere and the issue was closed. See https://github.com/collectd/collectd/issues/1227 the issue seems to be related to the DNS plugin using libpcap.
Issue is not platform specific (I have know of this issue for years on x64) and the maintainer was troubleshooting this on ar71xx / mips32r2 architectures.
The current workaround suggested for OpenWRT is to [disable the plugin](https://forum.openwrt.org/t/collectd-dns-plugin-crashes/20386/2) as its crashing causes collectd to exit.
## Steps to reproduce
* Install and enable collectd on OpenWRT.
* Wait 10 - 90 mins
* View syslog to confirm collectd has crashed
While I could provide debug logs, I am not half as competent as @hnyman and his attempts in https://github.com/collectd/collectd/issues/1227
Perhaps OpenWRT could be added to Collectd's test suite as this issue is easily reproducible?
And here the same recent in our packages repo issues:
opened 09:32AM - 26 Dec 20 UTC
closed 05:07AM - 14 Jan 21 UTC
@hnyman / `19.07.4 r11208-ce6496d796 on x86/64
`
The `collectd-mod-dns` plugin… runs fine for a while and then segfaults.
```
Sat Dec 26 14:14:48 2020 daemon.err collectd[17396]: dns plugin: pcap_loop exited with status -1.
Sat Dec 26 14:15:13 2020 kern.info kernel: [103791.357292] reader#0[17408]: segfault at 7f4142f1a758 ip 00007f414313519a sp 00007f4142f1a760 error 6 in dns.so[7f4143135000+2000]
Sat Dec 26 14:15:13 2020 daemon.info procd: Instance collectd::instance1 s in a crash loop 6 crashes, 271 seconds since last crash
```
I recall this happening even on `18.x`, I suppose the next step is to take debug logs, but I'm unsure if this is the correct place or if this should be reported upstream?
1 Like
hnyman:
It has never worked ok in the openwrt context.
Even if you manage to configure it right, it will likely crash.
There is likely a threading race inside collected core - DNS plugin - pcap - something, which causes a crash.
Yes! That was my inference as well. If I added the DNS plugin, most times the collectd instance would start up fine, and other times the whole collectd instance would crash after going into the read loop with no changes to any plugins. I also noted that almost without fail the DNS data would stop recording if I modded any other plugin after save/apply but all others continued to run without crashing the collectd instance. I got into the habit of checking syslog for collectd instance crash on boot and, invariably stopping/starting collectd/luci-statistics would right the collection of statistics.
Unfortunately I never saw your debug analysis. It was well before I started using OpenWrt.
Given that DNS stats are really non-essential in the scheme of things, maybe the DNS plugin should be removed from the repo?