Clients lose IPv6 connection after some time

Bld · July 7, 2020, 5:04pm

Hi,

I am now running to a problem that seems to be similar to Clients lose IPv6 connection after a few hours at the first glance, but then I realized it's different. I did some research but still have no clue...

I am running the stable release OpenWrt 19.07.3 r11063-85e04e9f46.

Symptoms: my clients would magically lose IPv6 connection after working fine for some time. When it happens to a client, the other clients continue to work. When this happens:

ping6 ipv6.google.com on broken client => 100% package loss.
ping6 ipv6.google.com on router works fine.
ping6 <another_working_client_ip> on broken client works fine.
ping6 <broken_client_ip> on the working client works fine. (But see the behavior below first - it might be just because I ping'd the working client from broken client first.)
ping6 ipv6.google.com on broken client => still 100% package loss.
ping6 <broken_client_ip> on router => 100% package loss.
Now, if I ping6 <router_ip> from the broken client, it works. And after this, the broken client's IPv6 connectivity would magically come back.

My configuration is quite standard - DHCPv6 disabled and just RA. My ISP's native IPv6 has some funny behavior so I am using 6in4 with HE TunnelBroker.

# cat /etc/config/dhcp

config dhcp 'lan'
	option interface 'lan'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option ra 'server'

# cat /etc/config/network

config interface 'lan'
	option type 'bridge'
	option ifname 'eth0.1'
	option proto 'static'
	option netmask '255.255.255.0'
	option ipaddr '192.168.0.1'
	option ip6assign '64'

config interface 'wan6'
	option ifname 'eth0.2'
	option proto '6in4'
	option username '<redacted>'
	option peeraddr '<redacted>'
	option ip6addr '2001:aaaa:bbbb:cccc::2/64'  // HE local endpoint address
	option tunnelid '<redacted>'
	option password '<redacted>'
	list ip6prefix '2001:dddd:eeee::/48'        // HE routed prefix

And the routes on router: (I believe they don't change in working vs. non working states.)

# ip -6 route

default from 2001:aaaa:bbbb:cccc::/64 dev 6in4-wan6  metric 1024
default from 2001:dddd:eeee::/48 dev 6in4-wan6  metric 1024
2001:aaaa:bbbb:cccc::/64 dev 6in4-wan6  metric 256
2001:dddd:eeee::/64 dev br-lan  metric 1024
unreachable 2001:dddd:eeee::/48 dev lo  metric 2147483647  error -148
fd3c:ce92:51fb::/64 dev br-lan  metric 1024
unreachable fd3c:ce92:51fb::/48 dev lo  metric 2147483647  error -148
fe80::/64 dev eth0  metric 256
fe80::/64 dev eth0.2  metric 256
fe80::/64 dev br-lan  metric 256
fe80::/64 dev wlan1  metric 256
fe80::/64 dev wlan0  metric 256
fe80::/64 dev 6in4-wan6  metric 256
anycast 2001:aaaa:bbbb:cccc:: dev 6in4-wan6  metric 0
anycast 2001:dddd:eeee:: dev br-lan  metric 0
anycast fd3c:ce92:51fb:: dev br-lan  metric 0
anycast fe80:: dev eth0  metric 0
anycast fe80:: dev eth0.2  metric 0
anycast fe80:: dev br-lan  metric 0
anycast fe80:: dev wlan1  metric 0
anycast fe80:: dev wlan0  metric 0
anycast fe80:: dev 6in4-wan6  metric 0
ff00::/8 dev eth0  metric 256
ff00::/8 dev br-lan  metric 256
ff00::/8 dev eth0.2  metric 256
ff00::/8 dev wlan1  metric 256
ff00::/8 dev wlan0  metric 256
ff00::/8 dev 6in4-wan6  metric 256

All these look perfectly normal to me. Am I missing something here? Any hints on what I should do to debug this?

Thanks a lot!

vuhuy · July 7, 2020, 9:18pm

I think it's related the link layer address. Do you see a difference in the output of ip -6 neigh in a working and non-working state on your client?

Bld · July 7, 2020, 9:55pm

Thanks for helping!

I too thought about this (because pinging the router magically solves it). But packets between a good and a bad clients travel correctly (yet after this router still cannot ping the bad client) so I just assumed that the neighbors should be correct.

(But I have to admit that I never seriously read how IPv6 NDP works and am basically reusing my understanding of ARP... Apologize if I have some terrible misunderstanding.)

On router:

Bad:

<client_ip> dev br-lan  used 8/130/7 probes 6 FAILED

Good:

<client_ip> dev br-lan lladdr <client_mac> ref 1 used 1/24/3 probes 1 DELAY

On client:

Bad: it doesn't have an entry for router.

Good:

<router_ip> dev en0 lladdr <router_mac> REACHABLE

vuhuy · July 7, 2020, 10:34pm

Yeah that's the problem right there - no entry for link layer address on the client. Unfortunately, I do not have a solution for your problem why that entry is missing. Do you manually configure the clients for IPv6 or does IPv6 auto configuration does the job for you? I assume it's the latter, so let's also assume your router is correctly advertising the link layer address since other clients are working (you can confirm this by monitoring the ICMP headers send by RA with sudo tcpdump -i yourinterfacename ip6 -vv).

If my assumptions are correct we should focus on the client. Maybe check the IPv6 kernel attributes (sysctl net.ipv6.conf.youradaptername and focus on the accept_ra* attributes). It wouldn't make sense in this context though, but checking won't hurt.

Last, I had a similar problem with Ubuntu 18.04 past, where it would lose IPv6 connection after a while. Other clients, like Ubuntu 20.04 and Windows worked flawless. The configuration was identical (netplan & kernel attributes). Couldn't figure it out, so I just updated the affected boxes to 20.04.

Posting your client configuration would definitely help here: how you configured them, the kernel parameters, which OS and such (Ubuntu basically ignores the network kernel parameters set by sysctl). Do all clients lose connection after a while, or is it a single box / specific OS?

I should note that I don't have much experience with 6to4 tunnels, should the problem be related with that.

lleachii · July 8, 2020, 6:21pm

Also add:

option ip6class 'wan6'

Bld · July 8, 2020, 6:39pm

Yes I'm relying on auto configuration.

I have observed this on 2 OSX devices (one 10.14 and another 10.15), as well as an Android 10 phone. I have devices running Windows and Ubuntu too, and haven't observed this on them yet.

(As this happens randomly, I haven't had a chance to get to the non-working state yet. Guess I really need to run Wireshark on the OSX devices to figure out what's going on...)

Edit: just realized that all devices having issues are connected through Wi-Fi while the wired ones seem to be more stable. Just switched one Ubuntu box to use Wi-Fi. (I don't know if I am hoping it to break or not...)

Bld · July 8, 2020, 6:41pm

This would disable the ULA prefix on lan I think? But I don't use them anyway. Let me just give a try. Thanks for the suggestion!

lleachii · July 8, 2020, 6:54pm

Then:

option ip6class 'wan6 local'

vuhuy · July 8, 2020, 7:54pm

Mmm okay this is getting somewhere... Android and MacOSX both have very weird IPv6 behavior not following the RFCs. Speaking of Android, I've seen cases where it wouldn't accept ULA addresses for DNS, maybe it's the same for neighbor solicitation packets. Anyway, behavior differs from device to device and even across Android versions. For MacOSX I've recently encountered one device that didn't accept neighbor solicitation packets with ULA addresses and only accepts link-local addresses. I don't know the OSX version but I guess it was an up-to-date box in a non-OpenWRT environment and I encountered it somewhere in the end 2019.

If i recall it correctly, disabling ULA prefixes in OpenWRT makes odhcpd sent link-local addresses in response to neighbor solicitation. Maybe you could give that a try and can you confirm that the advertised is not a link local address (you've masked your ip -6 neigh output)? You'll lose the ULA addresses functionality but at least you've tackled the problem.

vuhuy · July 8, 2020, 8:03pm

Well if this disables ULA prefixes, this should also work, as long the a link-local address are send for neighbor discovery .

// edit: so either option ip6class 'wan6' or comment the option ula_prefix in /etc/config/network on your OpenWRT device. Whatever works to make odhcpd sent link-local addresses.

Bld · July 13, 2020, 5:22am

Yeah Android is... (sigh) It doesn't even care about DHCPv6, at least it was the case when I last checked.

Yes the neighbor addresses I pasted above are all with prefix of wan6.

After applying the changes, I think I have not observed this on the OSX devices anymore.

The Android device probably has another issue now... On 5GHz, it loses IPv6 connection randomly but it doesn't happen on 2.4GHz. (And pinging the router from there doesn't solve this.) This doesn't make a lot of sense to me so I am temporarily forcing the Android device to use 2.4GHz in an attempt to verify if it's actually related. Probably it's just mt76xx behaving weirdly, since OpenWrt's support for it is not pretty stable yet.

vuhuy · July 13, 2020, 8:29am

Yes that is correct, Android doesn't do DHCPv6, but you probably don't need DHCPv6 in most home situations.

I run OpenWRT with DHCPv6 PD to get my unique prefix, and clients on hte LAN side only use stateless autoconfiguration (SLAAC). That is RA service on server, DHCPv6 server on disabled and NDP proxy disabled. The announced DNS server is the GUA address of my router, with a script to updates that prefix if my IPv6 prefix would change (my ISP gives me a dynamic IPv6 prefix).

SLAAC addresses don't change so you don't need DHCPv6 for static addresses. The only downside is that you can't set a easy to remember suffix for IPv6 devices, but IPv6 is meant to used with DNS anyways, instead of relying on IPv6 addresses.

I'm not sure anymore but I know I also had problems with Android dropping IPv6 connections, it was very unstable. The announced DNS server with GUA address seemed the key here for a working stable IPv6 connection on Android.

With this configuration (along with disabling ULA prefixes) I could get a working stable on all network devices (Windows, Linux, Mac OS, IOS and Android). I wouldn't know about the stability of MT devices, that would be unfortunate if that's what causing IPv6 instability on Android for you.

cvmiller · July 13, 2020, 3:14pm

Sounds like the config has gone sideways some how. The default config of OpenWrt supports IPv6 quite well.

I would suggest writing down your HE tunnel information, and default the router. Then only add the HE tunnel config (and required packages for 6in4), and see if that gives you IPv6 stability.

knacky · July 13, 2020, 3:22pm

It's 2020, and Android still doesn't support DHCPv6. You can thank one single person at Google for this.

I'm not implying Apple is any better. iOS is it's own shitshow. But at least it supports DHCPv6.

vuhuy · July 13, 2020, 4:51pm

Nope, his config didn't go sideways. As @knacky said, Android and Apple devices still have some weird quirks with their IPv6 implementations. This has nothing to do with OpenWRT and still has to be accounted for regardless of the router manufacturer and use of a tunnel broker.

cvmiller · July 13, 2020, 8:02pm

Could be, I haven't had any problems like this with my Apple and Android devices, and I was running a HE tunnel for 9 years (I now have a native IPv6 connection)