Ipv6 default gateway not being given out to clients?

aleks-mariusz · July 29, 2024, 8:59am

So i don't know at what point, but this weekend i noticed my ipv6 connectivity on my laptop seems to have broken. I'm not being given a default ipv6 gateway (even after releasing/renewing the dhcp and rebooting the macbook).

My ISP is def IPv6 compatible (as it used to work) and on my macbook, i used to get 19/20 score on ipv6-test but now it totally flunks that bit and only shows ipv4 connectivity.

I verified i have a default gateway on the router

root@router-main:/etc/config# ip -6 r sh | grep ^def
default from 2a02:8011:aaaa:bbbb::/64 via fe80::827f:f8ff:fe74:b8e7 dev pppoe-wan proto static metric 512 pref medium
default from 2a02:8012:aaaa::/48 via fe80::827f:f8ff:fe74:b8e7 dev pppoe-wan proto static metric 512 pref medium

and there i can ping6 google.com, but from my macbook, i seem to be missing the gateway.

I also see these messages on the router from logread:

Mon Jul 29 08:52:00 2024 daemon.warn odhcpd[2888]: No default route present, overriding ra_lifetime to 0!
Mon Jul 29 08:52:01 2024 daemon.warn odhcpd[2888]: No default route present, overriding ra_lifetime to 0!
Mon Jul 29 08:52:04 2024 daemon.warn odhcpd[2888]: No default route present, overriding ra_lifetime to 0!
Mon Jul 29 08:52:08 2024 daemon.warn odhcpd[2888]: No default route present, overriding ra_lifetime to 0!

Why does odhcpd think i have no default route, when the route table clearly has a default route ?

trendy · July 29, 2024, 9:43am

Please run the following commands (copy-paste the whole block) and paste the output here, using the "Preformatted text </> " button:

Remember to redact passwords, MAC addresses and any public IP addresses you may have

ubus call system board; \
uci export network; \
uci export dhcp; uci export firewall; \
ip -6 addr ; ip -6 ro li tab all ; ip -6 ru; \
ifstatus wan6; ifstatus lan

Please adapt the interfaces for the last 2 commands if they are not named like this.
For the masking part of the IPv6 address, please make it consistent all over the console output.

aleks-mariusz · July 29, 2024, 7:30pm

I've not done anything but waited and my macbook now has an ipv6 default gateway.. i'll wait to see if/when it happens again, then i'll post the output of those diagnostic commands.

curiously, i still these messages regularly:

Mon Jul 29 19:20:38 2024 daemon.warn odhcpd[2888]: No default route present, overriding ra_lifetime to 0!
Mon Jul 29 19:20:39 2024 daemon.warn odhcpd[2888]: No default route present, overriding ra_lifetime to 0!
Mon Jul 29 19:20:42 2024 daemon.warn odhcpd[2888]: No default route present, overriding ra_lifetime to 0!
Mon Jul 29 19:20:46 2024 daemon.warn odhcpd[2888]: No default route present, overriding ra_lifetime to 0!

trendy · July 29, 2024, 7:56pm

You can always announce the default prefix regardless of the presence of default route.
https://openwrt.org/docs/guide-user/network/ipv6/ipv6_extras#announcing_ipv6_default_route

aleks-mariusz · September 6, 2024, 9:55am

Thanks for the suggestions, although I was hoping to understand the underlying reason this breaks sometimes, rather than just work-around the issue.

So I did a bit more digging on this, and think i have uncovered a potential cause (and a possible idea on a proper solution).

What I noticed that accompanied this issue is on my router a kernel-messages such as:

Thu Sep  5 11:31:49 2024 kern.info kernel: [57500.710792] IPv6: br-lan.2220: IPv6 duplicate address 2xxx:yyyy:zzzz:2220:: used by 52:54:xx:yy:zz:c6 detected!

At this point, since the kernel message happens fairly regularly (every ~10 minutes), i thought i'll just wait and do a tcpdump to see what's happening at the network level:

11:31:49.668555 60:38:xx:yy:zz:10 > 33:33:ff:00:00:00, ethertype IPv6 (0x86dd), length 86: :: > ff02::1:ff00:0: ICMP6, neighbor solicitation, who has 2xxx:yyyy:zzzz:2220::, length 32
11:31:49.769601 52:54:xx:yy:zz:c6 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 86: fe80::a8b4:7a51:21a4:a9ba > ff02::1: ICMP6, neighbor advertisement, tgt is 2xxx:yyyy:zzzz:2220::, length 32
Thu Sep  5 11:31:49 2024 kern.info kernel: [57500.710792] IPv6: br-lan.2220: IPv6 duplicate address 2xxx:yyyy:zzzz:2220:: used by 52:54:xx:yy:zz:c6 detected!

The IPv6 address referenced is my vlan-interface on br-lan for vlan 2220.. It looks like my router is checking if it's in use by asking who has it, expecting no response, but instead for some reason, another system on the network is responding advertising itself strangely. The culprit mac address is from a kvm virtual-machine i have on my NAS on the same network. It happens to be this is the same VLAN that i am not getting an IPv6 assignment on, so I thought this is probably not just a coincidence and could be the cause?

This VM is for my homeassistant VM i recently set up (something i've been trying to play with but so far haven't gotten much done with it), so when I shut this VM off, IPv6 assignments start to work.

An IP conflict, I hear you say.. So of course before I shut it down, I logged into the VM and checked all address assignments. No such address appearing there, so i am not sure why this VM is attempting to use this address? Back on the router, I checked my /etc/config/dhcp and the config host section for this mac address, this host's entry has option hostid '00000011' (and indeed its IPv6 address its getting is 2xxx:yyyy:zzzz:2220::11 (which is probably the last IPv6 addr assignement before the rest of the lan stops getting IPv6 because this issue manifests).

I did some researching, and came across this post on a somewhat obscure cause that i don't fully understand why things would be configued this way.

Virtualization proxmox on which the virtual machine with RouterOS has been enabled has IGMP Snnoping turned on by default, which cut these packages.

The solution to disable IGMP Snnoping on a specific bridge is:
echo 1 > /sys/devices/virtual/net/bridge/bridge/multicast_querier
echo 0 > /sys/devices/virtual/net/bridge/bridge/multicast_snooping

Where bridge is name of bridge is the name of the bridge that we use to communicate with the virtual machine.

I have no idea why my bridge would be configured with snooping, but as per this guidance, I tweaked my (CentOS, not Proxmox) hypervisor host (what runs the VM)'s bridge settings via sysctl, and rebooted the VM and it seems to be resolved now, no more conflict.

I'll post if this breaks again, but this may have been the cause of IPv6 assignments stopping.

mk24 · September 6, 2024, 1:46pm

IGMP is strictly a v4 protocol so I don't think it would have anything to do with this. IPv6 implements multicasting completely differently.
Make sure that all your LANs have unique prefixes (out of the /48 that the ISP routes to you) and that there are no other RA/DHCP servers active on the network.

aleks-mariusz · September 6, 2024, 2:06pm

I think the OP of the referenced post (who i was quoting) used the term IGMP incorrectly, which I agree is for IPv4 but this sysctl seems to toggle both IGMP and MLD (the IPv6 equivalent). It seems to be related to multicast snooping though, as there seem to be several different posts online describing the same phenomenon. That last thing makes me suspect, maybe i'm missing something on the OpenWRT side to permit the default snooping to work.

aleks-mariusz · September 16, 2024, 11:49am

So the issue has returned, despite the settings being set on my KVM host's bridge br0 - but i have at least narrowed it down to the exact series of steps that cause the issue to reproduce:

if the router comes up before the suspect VM boots, IPv6 continues to work fine.
If i reboot the router (as i often do, at least 2-3 times a month, to keep up-to-date with the master-branch) but VM continues to stay online, when the router boots up, it no longer will be able to try to use its configured VLAN IPv6 address for some reason, as the VM starts to respond to the router's initial neighbor-solicitation queries (standard for an IPv6 host comig online, first it asks if anyoen else has that address before giving it to itself). And since it can't use the gua IPv6 address i've assigned it, it doesn't advertise itself but continues to send RA's while throwing the No default route present, overriding ra_lifetime to 0! errors.

So it's that series of events that seems to send my home-assistant VM into some kind of condition where it tries to respond on behalf of my router's VLAN ipv6 address. Rebooting the VM means the router can then assign itself (without obstruction) the address correctly and IPv6 starts to work.

I need to do some more investigation as to what it is about this particular VM. It usually gets a different IPv6 address, and knows that the default IPv6 route is via that router-address... but for some reason after the router goes down, it starts responding on its behalf when the router boots back up. I've never seen anything like this.

aleks-mariusz · September 16, 2024, 5:05pm

I ended up opening up on a github issue under the home-assistant OS issue-tracker. Hopefully someone there can explain this behaviour.. the more eyes the better, right?

aleks-mariusz · September 16, 2024, 5:16pm

ahh crap this is happening with a regular Ubuntu 22.04 VM on the same KVM host while the HA vm is shutoff entirely - so i know it's NOT home-assistant related..

I still think it has to do with the bridge and mld snooping or something, but i am at a loss what/how it needs to be configured or if this is something that can be fixed on the OpenWRT side of things.

I have a very detailed and analyzed tcpdump attached to that issue in case anyone has any issues, but i'm at a loss if this is even an OpenWRT issue as well at this point :-/

aleks-mariusz · September 18, 2024, 12:29pm

Ok so I have come to realize the reason for this. It's because i chose the '::' address for the router's static IPv6 assignment in the first place

Solution: I changed my 'ip6ifaceid' from the neat-looking '::' to anything else (i chose '::ffff') and the issue stopped. I had only chose '::' originally because it looked neat, without realizing the "ticking-timebomb" that i had introduced once i had VMs up on the network (and probably it could happen w/ any linux host really, VM or not).

Seems this [2xxx:aaaa:bbbb::] is a possibly "reserved" anycast address, and for whatever reason, both my VMs have regardless (even after the fix) in their routing table this address:

user@vmname:~$ ip -6 route show table local| grep anycast | grep 2a02
anycast 2a02:aaaa:bbbb:2220:: dev ens2 proto kernel metric 0 pref medium

...at least until my router goes down, when it stays in only one VM's who started impostering the router's address.

I have no idea what mechanism triggers this automated behaviour only once the VM goes down, so if anyone has any idea, i'd be game to understand the underlying behavioural-cause, but at least I have a fix for now.

_bernd · September 18, 2024, 12:38pm

Sure? I thought that <prefix>:: is a valid address, too.
Maybe it's an implementation bug?

aleks-mariusz · September 18, 2024, 12:47pm

valid address yes, but not for unicast it seems.. it seems it's a bad idea to use it for a layer 3 interface of a router that's doing SLAAC/DHCPv6 for a LAN segment. It technically worked for me fine, until i reboot the router, which was the trigger, and at least in my experience, using :: is a ticking time-bomb.

I wish i could chalk this up to an implementation bug, but in what/where? the linux-kernel on two different VMs running different distros? it seems it is also defined in this RFC.. what triggers this odd behaviour though, is beyond me.

I've updated the OpenWRT docs with this information to make sure others don't make the same mistake i made.

mk24 · September 18, 2024, 12:57pm

Unlike IPv4, the first and last IPs within an IPv6 subnet are not special or reserved. An ifaceid of all zeros (::) or all ones (::ffff:ffff:ffff:ffff) should treated the same as any other 64 bit number.
By convention ::1 is used for the router. It is the default if ip6ifaceid is not specified. Though ::0 (equal to ::) also works. I don't see any difference in the RA packets between those two.

aleks-mariusz · September 18, 2024, 1:12pm

ehhhh.. "works".. until it doesn't then you spend weeks trying to debug this weirdness where other linux hosts on your network "take over" the address when your router goes down. I am honestly surprised i seem to be the only who has experienced this? no one else thought an address ending in only :: looked tidier?

The default LUCI behaviour/value even suggested for this field is '::1' .. I thought i was being clever by just doing :: - then seeing it "worked" (at least at first), and unaware of the eventual impact if the router went down.

I still suggest people don't use :: on an interface that does SLAAC/DHCPv6, as I was advised, plus there's this RFC that may not use the word "reserved", but uses "predefined" - which to me suggest it should be avoided unless you're actually intending (and configured correctly) anycast on your network. Which i think few of us are doing (at least intentionally).

~~I'm clueless still on why the default behaviour of linux VMs on my network "take over" this address when the router goes down though. But I did very few modifications to a default install~~ on two different distros (Ubuntu and HAOS) and witnessed this, so it's how it comes "out of the box" for handling of these predefined anycast addresses on the same network.

edit: ok, after having read this article, i now realize it might be because both of the VMs are configured with IPv6 forwarding on (net.ipv6.conf.all.forwarding=1). One intentionally, and one seems to be something out-of-the-box i hadn't realized was set this way.. In summary - seems the combination of IPv6-forwarding enabled on other hosts on the network, along with the use of the anycast address on the primary router i think is what leads to this behaviour

_bernd · September 18, 2024, 5:10pm

Do you have radv or dnsmasq running on the kvm node?

aleks-mariusz · September 19, 2024, 1:13pm

I believe, every hypervisor host out of the box that runs libvirtd/KVM would usually have dnsmasq running (unless you customize your libvirt networks), it's used for the default virbr0 bridge so hosts can come on their own dedicated network for VMs.

Unless you were asking about the VM guests themselves? Neither of my VM-guests had either of those daemons to my knowledge (i could be wrong about the 2nd one, as i didn't create the HAOS distribution). One of these VMs I hadset up to do network-access to a network that VM can VPN to (but doesn't need me to run neither dnsmasq or radv, i did have to enable IPv6 forwarding though), the other VM is home-assistant-os (HAOS), which they inform me IPv6 is enabled because the Thread protocol runs over IPv6 and the home-assistant system should allow network-wide access to those IOT devices, since it acts as the "border router" into the thread network (but i don't have any thread-devices, it just comes out of the box with ipv6 forwarding enabled). I don't know for sure but i don't believe dnsmasq nor radv is running there either.

system · September 29, 2024, 1:14pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.