Collectd-mod-ping incorrectly reporting packet loss on IPv6

Hi all,

I'm experiencing an issue where collectd-mod-ping reports packet loss every 4 minutes or so when pinging an IPv6 address (in this test 2606:4700:4700::11111, one.one.one.one). See the attached screenshots as an example.

When running collectd-mod-ping on a dumb access point on the same network, also running OpenWrt, it reports 0% packet loss. So the issue is not the ISP nor the WAN connection.

Pinging directly from the router shell did show a minor packet loss of around 2 packets per 1000, which is negligible (but maybe relevant?). Since pinging from the dumb AP (or any other device in the network) shows 0% packet loss, the issue seems to be collectd-mod-ping running on the router or the router itself.

Has anyone run into something similar, or have any ideas on what might be causing this?

My router is a NanoPi R6S and the access point I used for this comparison is a Xiaomi AX6S, both running OpenWrt 25.12.2. Ping interval is 1s in every test. My router's /etc/config/network is here.

Thanks!

collectd-mod-pingon the router:

collectd-mod-ping on the access point:

ping from router:

root@router:~# ping -c 1000 -O 2606:4700:4700::1111
PING 2606:4700:4700::1111 (2606:4700:4700::1111) 56 data bytes
64 bytes from 2606:4700:4700::1111: icmp_seq=1 ttl=60 time=0.893 ms
64 bytes from 2606:4700:4700::1111: icmp_seq=2 ttl=60 time=0.708 ms
64 bytes from 2606:4700:4700::1111: icmp_seq=3 ttl=60 time=0.522 ms
64 bytes from 2606:4700:4700::1111: icmp_seq=4 ttl=60 time=0.737 ms
64 bytes from 2606:4700:4700::1111: icmp_seq=5 ttl=60 time=0.727 ms
(...)
64 bytes from 2606:4700:4700::1111: icmp_seq=995 ttl=60 time=0.835 ms
64 bytes from 2606:4700:4700::1111: icmp_seq=996 ttl=60 time=0.918 ms
64 bytes from 2606:4700:4700::1111: icmp_seq=997 ttl=60 time=0.985 ms
64 bytes from 2606:4700:4700::1111: icmp_seq=998 ttl=60 time=0.741 ms
64 bytes from 2606:4700:4700::1111: icmp_seq=999 ttl=60 time=0.796 ms
64 bytes from 2606:4700:4700::1111: icmp_seq=1000 ttl=60 time=0.899 ms

--- 2606:4700:4700::1111 ping statistics ---
1000 packets transmitted, 998 received, 0.2% packet loss, time 1030160ms
rtt min/avg/max/mdev = 0.474/0.851/40.204/1.307 ms
root@router:~#

traceroute one.one.one.one

root@opi5:~# traceroute one.one.one.one -I
traceroute to one.one.one.one (2606:4700:4700::1111), 30 hops max, 80 byte packets
1 2804:14d:xxx:xxx::1 (2804:14d:xxx:xxx::1) 0.448 ms 0.383 ms *
2 2804:14d:xxx::1 (2804:14d:xxx::1) 2.803 ms * *
3 2804:14d:xxx::411 (2804:14d:xxx::411) 1.316 ms 1.297 ms 1.743 ms
4 2804:14d:4c00:1600:d::2 (2804:14d:4c00:1600:d::2) 1.413 ms 1.703 ms 1.612 ms
5 2804:14d:4cf9:9::16 (2804:14d:4cf9:9::16) 2.548 ms 2.530 ms 2.512 ms
6 one.one.one.one (2606:4700:4700::1111) 1.061 ms * *
root@opi5:~#

The really strange thing is that if I start an IPv6 ping in parallel via SSH on the router, collectd_mod_ping stops reporting packet loss, even if both are pinging different IPv6 targets. I am really clueless about why this is happening.

These are completely unrelated things, except that both are running on the router using the IPv6 stack.

Just 2 cents. Icmp tests on high frequency does not have to be accurate. A host in the path or even the endpoint can simply ignore the icmp echo request. You can observe this on high end devices which are able to route at line speed but the small CPU which generates the response has other things to do and drops this low priority task of generate and send the Echo reply.
Even if you observe like 3 or 5 packets not coming back in a row does not need to be an actual issue.
Best strategy is to test multiple endpoints, once per minute and with like 5 to 25 secs separated from each other.
And function tests like a remote DNS serser should be asked directly (retrieve an DNS answer) to decide if a server is reachable and functional.
Sorry for being not helpful on your specific issue regarding collectd.

Yep, I tested with different endpoints and from different source hosts.

Since there is no packet loss from devices on the LAN, this is not a serious issue.

The impact is only on my collectd ping graph, which I use to monitor my WAN connection, and at the moment it is producing some incorrect information.

Well, it seems that it is not the internet routers.

I have a site-to-site tunnel via WireGuard connecting two OpenWrt routers’ LANs, one at home (192.168.1.1/24) and another at a cottage (192.168.2.1/24).

This WireGuard tunnel is established via IPv6, and the IPv4 routing between the two LANs is done via this tunnel.

The router at home (router.home) is 192.168.1.1, and the router at the cottage (routercc.home) is 192.168.2.1.

I just configured the same access point (apesc.home) at home to keep pinging routercc (192.168.2.1) at the other end of the tunnel. To my surprise, it is showing the same behavior of lost packets as collectd-mod-ping running on the router (since WireGuard is via IPv6). Since I understand that Wireguard tunnel operates at UDP level, the theory that some internet routers in the path are discarding ICMPv6 packets is not valid (since these packets are encapsulated and encrypted withing Wireguard tunnel):

root@apesc:~# traceroute routercc.home
traceroute to routercc.home (192.168.2.1), 30 hops max, 46 byte packets
 1  router.home (192.168.1.1)  0.253 ms  0.301 ms  0.262 ms
 2  routercc.home (192.168.2.1)  7.341 ms  3.850 ms  3.678 ms
root@apesc:~#

And this is reinforced by the fact that pinging other IPv6 addressess from home network results in zero packet loss. The issue seems to be related on the IPv6 connections originating from the router.home itself. Anything traversing the firewall/nftables seems to work just fine (LAN <-> WAN).

The next test will be downgrading my router to 24.10.x to see if the issue remains.

I noticed that my ISP is dropping one or two IPv6 packets every 5m15s (+- 1s).

The hypothesis from AI (Claude) is this issue is caused by "NDP cache expiry":

(...) NDP cache expiry on the ISP router is confirmed as the cause. But the root cause driving those expiries is likely ISP router misconfiguration — either too-short NDP timeouts or neighbor table exhaustion from a large shared /64. Either way, it's entirely on ISP's side. (...)

See packet capture below. Since I do not trust AI diagnosis, maybe some IPv6 network expert around here could confirm (or not) this root cause.

Thanks!

Update: I disabled WAN6 "Request IPv6-address" so now it only has a SLAAC IPv6:

uci set network.wan6.reqaddress='none'

In my understanding this would invalidade the hypothesis of NDP cache expiry, but I am not sure since the problem still remains.

Executing this command with support from Gemini:

root@router:~# ip -6 monitor neigh | while read -r line; do case "$line" in *eth1*) echo "$(date '+[%H:%M:%S]') $line";; esac; done

[16:43:53] 2804:xxxx:xxxx::1 dev eth1 lladdr xx:xx:xx:1a:bb:7f router STALE
[16:48:46] 2804:xxxx:xxxx::1 dev eth1 lladdr xx:xx:xx:1a:bb:7f router PROBE
[16:48:46] 2804:xxxx:xxxx::1 dev eth1 lladdr xx:xx:xx:1a:bb:7f router REACHABLE
[16:49:15] 2804:xxxx:xxxx::1 dev eth1 lladdr xx:xx:xx:1a:bb:7f router STALE

Where 2804:xxxx:xxxx::1 is my ISP IPv6 gateway. A packet was lost at 16:48:42 (seq 290). This happens every 5m15s:

2026-05-23 16:48:39 64 bytes from 2001:4860:4860::8888: icmp_seq=288 ttl=114 time=20.1 ms
2026-05-23 16:48:40 64 bytes from 2001:4860:4860::8888: icmp_seq=289 ttl=114 time=20.3 ms
2026-05-23 16:48:42 no answer yet for icmp_seq=290
2026-05-23 16:48:42 64 bytes from 2001:4860:4860::8888: icmp_seq=291 ttl=114 time=20.3 ms
2026-05-23 16:48:43 64 bytes from 2001:4860:4860::8888: icmp_seq=292 ttl=114 time=20.3 ms

According to Gemini, this observation "confirms the packet drop is directly caused by the IPv6 Neighbor Unreachability Detection (NUD) state machine transitioning out of the STALE state.".

Below is Gemini analysis:

  1. : The gateway 2804:xxxx:xxxx::1 transitions to STALE. It remains in this state because no traffic is destined directly to the gateway's link-local or global address (transit traffic does not refresh the REACHABLE state).
  2. : Packet icmp_seq=290 is routed to the gateway. This triggers the kernel to shift the neighbor state from STALE to DELAY. The kernel forwards the packet, but because the ISP edge router's neighbor cache for your router has also expired, the ISP router drops the packet while it waits for NUD resolution.
  3. : The Linux delay_first_probe_time (default 5 seconds) expires. The state shifts to PROBE. Your router sends a Neighbor Solicitation (NS).
  4. : The gateway replies with a Neighbor Advertisement (NA). The state shifts to REACHABLE.
  5. : The base_reachable_time_ms (default 30 seconds) expires. The state returns to STALE. The cycle repeats.

BTW, I have a Wireguard LAN-LAN tunnel (with two OpenWrt routers) via IPv6, and the packet loss is real, and it is causing packet loss in the tunnel. Let me check Gemini's solution...

Short feedback. Thanks for the detailed analysis. From a brief overview it looks correct or at least valid.

First hip shoot is maybe to send unsolicited ndp to the upstream router?

Simply pinging the IPv6 upstream router once per minute seems to have solved the issue.

I added a single IPv6 ping, running every minute, to the crontab:

* * * * * ping -c 1 2804:xxxx:xxxx::1 >/dev/null 2>&1

This has been running for just over 20 minutes, and no packet loss has been detected so far:

EDITED: this solution works only if network.wan6.reqaddress='none' which means that WAN6 is getting only a SLAAC IPv6 (plus PD). If WAN6 gets an additional IPv6 via DHCP from ISP, the problem remains.

In the end that's the same but using different words :sweat_smile:

What's happong in the end, you ensure that your neighbor entry is present in their neighbor table.

Thanks again for your detailed updates on your issue!