IPv6 connectivity issues since 22.03.4

Hardware: Turris Omnia 2020 (CZ11NIC23)
Software: OpenWRT 22.03.4 and up

Hey there!

I'm trying to find the cause of an issue I'm having since v22.03.4 on my Turris Omnia, and I'd greatly appreciate if you could share your thoughts on the matter.
My setup is quite simple:

  • one router between WAN and LAN
  • prefix delegation (DHCPv6 + SLAAC)
  • all LAN devices connected to the router

What happens is the IPv6 connectivity between LAN and WAN will stop working after some time, while LAN to LAN works perfectly (so basically, doesn't work when routing is involved).

I found that flushing the cache of IPv6 neighbors with ip -6 neigh flush all on the router triggers the issue, so I'm guessing the "after some time" part is just these entries expiring. In that state, pinging either LLA, ULA and GUA addresses of the router from an host does not work, or pinging an host from the router for that matter.
The strange part however is that just after rebooting the router, or even sometimes seemingly for no reason, it will start working properly again. I'm unable to reproduce that consistently.

Looking at tcpdump traces on the router, it seems like it is sending ND neighbor solicitations without getting any response. But running tcpdump on the associated host shows no ND solicitations whatsoever. I do not perform any filtering on ICMP, so I don't think my firewall rules are to blame here.

tcpdump trace on router
20:42:31.352100 IP6 2a02:842a:8b:4601::f8f8 > 2620:fe::fe: ICMP6, echo request, seq 1, length 64
20:42:31.363429 IP6 fe80::da58:d7ff:fe01:af50 > ff02::1:ff00:f8f8: ICMP6, neighbor solicitation, who has 2a02:842a:8b:4601::f8f8, length 32
20:42:32.245464 IP6 fe80::da58:d7ff:fe01:af50 > ff02::1:ff00:a: ICMP6, neighbor solicitation, who has 2a02:842a:8b:4601::a, length 32
20:42:32.357347 IP6 2a02:842a:8b:4601::f8f8 > 2620:fe::fe: ICMP6, echo request, seq 2, length 64
20:42:32.405462 IP6 fe80::da58:d7ff:fe01:af50 > ff02::1:ff00:f8f8: ICMP6, neighbor solicitation, who has 2a02:842a:8b:4601::f8f8, length 32
20:42:33.370616 IP6 2a02:842a:8b:4601::f8f8 > 2620:fe::fe: ICMP6, echo request, seq 3, length 64
20:42:33.404242 IP6 fe80::da58:d7ff:fe01:af50 > ff02::1:ff00:a: ICMP6, neighbor solicitation, who has 2a02:842a:8b:4601::a, length 32
20:42:33.445463 IP6 fe80::da58:d7ff:fe01:af50 > ff02::1:ff00:f8f8: ICMP6, neighbor solicitation, who has 2a02:842a:8b:4601::f8f8, length 32
20:42:34.383866 IP6 2a02:842a:8b:4601::f8f8 > 2620:fe::fe: ICMP6, echo request, seq 4, length 64
20:42:34.405464 IP6 fe80::da58:d7ff:fe01:af50 > ff02::1:ff00:a: ICMP6, neighbor solicitation, who has 2a02:842a:8b:4601::a, length 32
20:42:35.252661 IP6 fe80::da58:d7ff:fe01:af50 > ff02::1:ff00:f8f8: ICMP6, neighbor solicitation, who has 2a02:842a:8b:4601::f8f8, length 32
20:42:35.397121 IP6 2a02:842a:8b:4601::f8f8 > 2620:fe::fe: ICMP6, echo request, seq 5, length 64
20:42:35.445464 IP6 fe80::da58:d7ff:fe01:af50 > ff02::1:ff00:a: ICMP6, neighbor solicitation, who has 2a02:842a:8b:4601::a, length 32
20:42:35.445482 IP6 fe80::da58:d7ff:fe01:af50 > 2a02:842a:8b:4601:ad73:4794:caa4:8b4c: ICMP6, neighbor solicitation, who has 2a02:842a:8b:4601:ad73:4794:caa4:8b4c, length 32
20:42:35.505944 IP6 2a02:842a:8b:4601:ad73:4794:caa4:8b4c > fe80::da58:d7ff:fe01:af50: ICMP6, neighbor advertisement, tgt is 2a02:842a:8b:4601:ad73:4794:caa4:8b4c, length 24
tcpdump trace on host 'f8f8'
20:44:18.758707 IP6 2a02:842a:8b:4601::f8f8 > 2620:fe::fe: ICMP6, echo request, id 46, seq 107, length 64
20:44:19.772036 IP6 2a02:842a:8b:4601::f8f8 > 2620:fe::fe: ICMP6, echo request, id 46, seq 108, length 64
20:44:20.785374 IP6 2a02:842a:8b:4601::f8f8 > 2620:fe::fe: ICMP6, echo request, id 46, seq 109, length 64
20:44:21.795402 IP6 2a02:842a:8b:4601::f8f8 > 2620:fe::fe: ICMP6, echo request, id 46, seq 110, length 64
20:44:22.808711 IP6 2a02:842a:8b:4601::f8f8 > 2620:fe::fe: ICMP6, echo request, id 46, seq 111, length 64
20:44:23.101114 IP6 fe80::7ff:dae3:88d8:a866 > fe80::da58:d7ff:fe01:af50: ICMP6, neighbor solicitation, who has fe80::da58:d7ff:fe01:af50, length 32
20:44:23.825369 IP6 2a02:842a:8b:4601::f8f8 > 2620:fe::fe: ICMP6, echo request, id 46, seq 112, length 64
20:44:24.838702 IP6 2a02:842a:8b:4601::f8f8 > 2620:fe::fe: ICMP6, echo request, id 46, seq 113, length 64
20:44:25.852040 IP6 2a02:842a:8b:4601::f8f8 > 2620:fe::fe: ICMP6, echo request, id 46, seq 114, length 64
20:44:26.862033 IP6 2a02:842a:8b:4601::f8f8 > 2620:fe::fe: ICMP6, echo request, id 46, seq 115, length 64

Looking at the diff of sysctl dumps between 22.03.3 and 22.03.4, I don't see anything notable.

sysctl diff
fs.dentry-state = 8052	5523	45	0	979	0     |	fs.dentry-state = 6227	3780	45	0	941	0
fs.inode-nr = 7069	0				      |	fs.inode-nr = 5283	0
fs.inode-state = 7069	0	0	0	0	0     |	fs.inode-state = 5283	0	0	0	0	0
kernel.osrelease = 5.10.161				      |	kernel.oops_limit = 10000
							      >	kernel.osrelease = 5.10.176
kernel.random.boot_id = 4e13c211-8bfd-4de6-994b-43a8f024a04d  |	kernel.random.boot_id = 01b13c08-c5f9-48fe-a649-92020c7c5830
kernel.random.uuid = 643b80ae-4774-4483-adeb-8e51f8b397bc     |	kernel.random.uuid = d67d0f7e-ddf7-453f-b13b-c03682bb38e9
kernel.version = #0 SMP Tue Jan 3 00:24:21 2023		      |	kernel.version = #0 SMP Sun Apr 9 12:27:46 2023
							      >	kernel.warn_limit = 0
net.ipv4.tcp_fastopen_key = ad5ff674-82b3e993-74de6adb-6f1788 |	net.ipv4.tcp_fastopen_key = 9f4de282-77e4dd8c-b608bf28-b5bab5
net.ipv6.conf.eth0.mtu = 1500				      |	net.ipv6.conf.eth0.mtu = 1508
net.netfilter.nf_conntrack_count = 90			      |	net.netfilter.nf_conntrack_count = 242
net.netfilter.nf_conntrack_tcp_no_window_check = 1	      <
vm.user_reserve_kbytes = 64342				      |	vm.user_reserve_kbytes = 64341

I also tried:

  • flashing the previous archive for 22.03.3 (that fixes the issue)
  • flashing a freshly built archive for 22.03.3 (that fixes the issue)
  • upgrading to 22.03.5 (does not solve the issue)

I'm not running the official OpenWrt release for mvebu, but a custom build made with the image builder.

image builder configuration
PROFILE="cznic_turris-omnia"
PACKAGES="luci luci-theme-material nft-qos luci-app-nft-qos adblock luci-app-adblock tinyproxy luci-app-tinyproxy wireguard-tools luci-app-wireguard -ip-tiny ip-full -dnsmasq dnsmasq-full -odhcpd -odhcpd-ipv6only tcpdump ethtool iperf3 lsof htop vim-full"
FILES=""
BIN_DIR=""
EXTRA_IMAGE_NAME=""
DISABLED_SERVICES="tinyproxy"

I'm suspecting either a kernel bug, or a new filtering behavior I missed in the changelog, but both seem unlikely. I'll try to do a clean install with the official build when I get the chance.

Thanks!

1 Like

Hello!!

From your description of the issue, it looks exactly the same as the one I'm experiencing on my Linksys 1900ACSv2. My router's platform is mvebu/cortexa9. It's the same platform family as your device.

My issue looks like this: after some hours of router's uptime, I see a communication disruption on the wireless network, that only happens with IPv6 traffic. A tcpdump trace shows that the ICMP-NDP packets the router sends to the wireless devices don't get any reply. For wired devices, IPv6 traffic continues without any issues.

There are some differences between your case and mine though. First one, on my router I use version 21.02.5 (the latest 22.03 is not available for my device because of some Linux kernel issues with the switch, see mvebu: cortexa9: disable devices using broken mv88e6176 switch). And the second one, in my case running a ip -6 neigh flush all command doesn't resolve the issue. The best solution I've found so far is restarting the wireless interface. Once I do it, IPv6 communication with wireless devices works again.

I haven't made a thorough research of this IPv6 issue yet. But from the looks of it, I would guess is due to some Linux kernel bug, possibly on the wireless driver.

One further comment. Before running version 21.02.5, I run v 19.07.10. And the router could be up for weeks, with no issues. So clearly the issue is due to some change in the Linux kernel between both versions.

Let's see is someone wishes to add more to this discussion.

Best wishes to all.

This is very intriguing feedback, thanks for sharing. I am in fact familiar with the mv88e6176 switch issue you mentioned since I'm experiencing this exact behavior with v22.03.3.

What's puzzling me is that:

  • In your case, wired devices seems to not be affected. This would indeed suggest an issue with the wireless stack, but in my case I did not notice anything unusual regarding wireless devices (apart from frequent and undesired AP-STA-DISCONNECTED events, but I suppose it's another bug altogether).
  • I thought I could blame v22.03.4 but you're experiencing the same issue on an older major version, so finding the cause is gonna be tough. And that's only if we suppose the common symptoms we experience have the same cause.

Trying to pinpoint more differences in our setups, are you using the default 21.02.5 OpenWrt build, or perhaps do you use a custom image like I do? Also, do you do anything special with how you setup your LAN bridge? In my case, I just have one br-lan with all switch ports and wireless adapters on it, but I wonder if splitting the two on multiple interfaces would yield a different result.

It seems we'll have to wait for a more up-to-date kernel for fixes anyway, but this bug is so random it got me curious.

I'm running the default 21.02.5 build on the OpenWRT archive. And I use the default switch configuration as it comes in that build. IIRC, the LAN bridge neither has any special or custom tweaking. Network-wise, the only real tweaking I've made is the disabling of the 2.4 Ghz wireless (I only use the 5 GHz band), but I don't think this could trigger the bug.

I'm not experiencing any AP-STA-DISCONNECTED event. On all the OpenWRT images I've used in my Linksys, the wireless connection has always been flawless.

Do you have any experience debugging kernel issues?. The usual method is to run git-bisect on the kernel source, in order to find the commit that triggers the issue. That would point to the possible cause of the bug. I have some background debugging Linux kernel issues, but as I need my device for Internet access, maybe it could take me some time to do it.

Perhaps our most practical solution would be to disable IPv6 traffic temporarily and wait for the next stable OpenWRT release. That's supposed to resolve the issue with the switch and could also solve this little IPv6 bug.

Regards.

I have a similar problem but I don't know if it's the same than yours. IPv6 stops working after some time, at least for some apps/devices, but for example a ping -6 www.google.com works.
In your case, the ping works or your connectivity is totally lost?

Ping does not work in my case no, once I have flush the table of neighbors all IPv6 connectivity is lost. Sometimes it will fix itself, but I haven't been able to reproduce that consistently...

I've never done kernel debugging but I might give it a try, could be fun. I also need the device for internet access so that'll have to wait a bit.
I'd hate to have to disable IPv6 tho, since it's somehow faster than IPv4 with my ISP (CG-NAT related I guess?). Staying on 22.03.3 will do for me, the switch bug is manageable but this one is more troublesome.

Hi!

In my case, ping (IPv6) from the wireless devices to the router or to the Internet doesn't work. This is to be expected, since the router sends NDP packets to the wireless devices but it gets no replies. So it's not possible for ping to work, as there are no return packets.

Yeah, CGNAT can be an issue with some providers. It could also be due to traffic shaping.

Maybe a VPN could be of help to you. Some of them have IPv6 servers, so you could tunnel IPv4 traffic over IPv6, circumventing the low IPv4 speeds on your ISP.