[Solved] MAC address causing DHCP weirdness with Xfinity/Comcast WAN DCHP [Solved]

Hi folks, just a PSA here in the hopes that it saves someone the weeks of effort I went through trying to figure this out:

TL;DR: My ISP did "upgrades" which included refusing DHCP to any device with the U/L bit set in its MAC address. By default, my OpenWrt OS changed my WAN MAC from the device's original hardware 14:xx:xx:xx:xx:xx address to a "locally administered" 16:xx:xx:xx:xx:xx address, and caused very weird DHCP behavior until I figured it out. Is there a reason this bit is set on the WAN MAC address by default on a fresh OpenWrt installation? Is it normal/expected for my ISP to refuse service to such an address? What's the best fix here in folks' opinion?

Long version:

I have a Linksys WRT1200AC that I originally loaded with 22.03.5, then successfully upgraded to 23.05.0 with no issues. About a month ago, my ISP (Xfinity/Comcast in northern New England) sent out an email that we should expect outages as they "perform upgrades."

The outage happened, I turned off my router and modem (Netgear CM600) for a while to wait it out, and when I turned them back on (the modem first, waiting for all the LEDs, then the router, as usual, that will become important) I could not connect to the internet through the router. Now the weird part is that if I connected my laptop directly to the modem via ethernet cable, it connected to the internet with no issues.

So, I hooked the router back up to the modem, power cycle them as before, and log in to LuCI, only to find that my router WAN has no IP address assigned... So I download tcpdump to take a closer look at things only to find that my router is sending DHCP Discover packets, but none are being returned by the modem/ISP, so then I hook the laptop up to the modem again and use wireshark to find that, as expected, the whole DHCP exchange is working perfectly, so I figure there must be something wrong with the router DCHP Discover packets.

<tangent> Now, where this gets really weird is that, by some miracle, if I turn the router on before the modem, everything just works, and I get my router WAN IP assigned via DHCP just fine. :exploding_head: </tangent>

Well anyway, this reduced the urgency of finding a fix, but I couldn't let it go, and after weeks of tweaking the router DHCP settings to match the laptop DHCP settings and getting nowhere, I replayed a bunch of raw DHCP packets from my laptop and started tweaking bits to see what would get my modem to send a DHCP Offer request, and as the title suggests, it wasn't the DHCP Discover request at all, it was the fact that the seventh most significant bit (the U/L bit) of my router's MAC address was set, and so my ISP was just refusing to continue the DHCP negotiation until I manually unset it.

<tangent-explanantion> But what about the case when I turned the router on before the modem? Why was it working then? Well, I honestly have no idea, but after packet-sniffing that scenario, it seems that it has something to do with the modem... So this modem, like many others, allows you to go to 192.168.100.1 in your browser in order to manage it via HTTP. In order to do this, if you are connected to it as it boots, it will answer a DHCP negotiation by offering an address of 192.168.100.10 to the host connected to it, so that the modem can be reached by that host's browser. Anyhow, for some reason, when this modem DHCP negotiation succeeds, the ISP DHCP negotiation also succeeds, and everything just works. Maybe the modem passes on its MAC address to the ISP in this case to get a DHCP IP assigned? Who knows. </tangent-explanation>

The more important question to me at this point is, what's the best fix here? Should OpenWrt not be setting the U/L bit on the WAN MAC address by default? Does anyone know where or why this happens? Or should I call my ISP and see if I can get an explanation from them, and convince them to respond to DHCP negotiations from "locally administered" MAC addresses?

Thanks in advance for any insight, and sorry for the long read!

EDIT: It seems that the issue wasn't the U/L bit, but instead the fact that the WAN MAC was at all different than the LAN MACs, see @psherman's post below.

I suspect that the issue is with the built in switch in your router. Chances are that it starts up as an unmanaged switch on all ports, including the wan port. This will cause dhcp requests to “leak” from devices on the lan to the cable modem. The cable modem, in turn, learns the mac address of one of the lan devices and then refuses to provide an address to the router itself when the dhcp client starts running on the router’s wan port.

This is an issue with the boot loader on a number of router devices. Once openwrt (or really any router firmware) has completed booting and has reconfigured the switch as a managed switch with the wan separated from the other ports, the dhcp requests no longer leak from the lan.

By allowing your router to fully boot before booting or connecting the cable modem, it ensures that the switch is properly configured before the cable modem is up/connected and therefore the router’s MAC address is learned as it requests a dhcp address.

The rest of your analysis with the U/L but may well be a red herring when taken in context with the above.

1 Like

So, this very well could be possible, but unlikely, since there were no devices connected to the LAN while booting the router, (there were two LAN cables connected to computers that were off) and the WLAN takes quite a while to actually come up.

Also, it doesn't explain why unsetting the U/L bit magically makes this issue go away, and why this issue started occurring only after my ISP upgrade. I will test this by removing all LAN cables, and disabling all wireless, and setting U/L bit back on the WAN MAC address, and giving the router ample time to negotiate a DHCP connection, then connecting my laptop via ethernet on the LAN to examine the situation and report back.

It could also happen if the router has multiple MACs on the ethernet subsystem -- one for the wan, one for the lan. If these get bridged temporarily, it could cause the cable modem to learn the wrong MAC.

I've switched away from cable, but IIRC, you can see the learned MAC in the status page when you visit 192.168.100.1.

Now that I think is way more likely... Still doesn't explain why it only started happening after the ISP upgrades, (maybe the DHCP negotiation happens faster or something) but that could definitely be possible. I will also test that by trying a different MAC address, with the U/L bit still unset, and see if that's what's happening. And I'll check the modem too.

"upgrades"... It's Comcast... need we even speculate :stuck_out_tongue_closed_eyes:

The factory MAC on the sticker is supposed to be the basis for default MACs, with the LSB changed for various interfaces. The factory MAC is always somewhere in the flash, but manufacturers store it in various different ways, and every model port of OpenWrt needs code to find it. There may be some models where it doesn't work and you need to manually install it with an option macaddr in the config.

It could also happen if the router has multiple MACs on the ethernet subsystem -- one for the wan, one for the lan. If these get bridged temporarily, it could cause the cable modem to learn the wrong MAC.

Ok, so after some further investigation, this seems to be what actually happened... I changed my WAN MAC address to a 14:xx:xx:xx:xx:yy address that was different from the factory MAC that was on my other LAN ports, and the issue still occurred even though the U/L bit wasn't set. I wasn't able to confirm on the modem itself because it apparently doesn't show the learned MAC address on it's status page, but it definitely wasn't the U/L bit itself; any bit would have triggered this issue. I still do have to ask why the U/L bit gets set on the WAN MAC though? @mk24, the LSB only seems to get changed for wireless interfaces, the 7th MSB is what's getting set on the WAN MAC, not sure where or why though.

If a DHCP request from another MAC "leaks" through, the cable system will remember that and refuse to serve a different MAC for a period of time, possibly until the modem has been powered off.

The MAC that you use on the WAN port must be globally unique, at least within the extent of the cable system. If another customer has set the same MAC as yours, there will be a race condition and only one will be served.

There is a lot that goes on inside a cable modem that is not visible to the customer and really not documented either. The 192.168.100 IPs that are temporarily assigned should have a very short lease time, so once connectivity over cable is established, the public IP can be assigned with a renewal.

I use a UPS because any way you slice it, rebooting a router tends to be disruptive.

Yeah.... I'm not surprised, but glad we could identify the root cause.
Some routers exhibit this behavior, others do not -- it comes down to the way the bootloader initializes the swich chip in the early boot process.

Although this doesn't necessarily 'solve' the issue, it does explain it (solving it requires a different bootloader). As such...

If your problem is solved, please consider marking this topic as [Solved]. See How to mark a topic as [Solved] for a short how-to.
Thanks! :slight_smile:

So, I don't think that a bootloader fix is quite necessary to "fix" the problem... Like I was mentioning earlier, is it really necessary for the default OpenWrt setup to set the U/L bit on the WAN MAC? Could we just have it use the hardware MAC, which is also assigned to all the LAN ports? If so, does anyone know where that occurs?

It is the bootloader that would need to be modified because the problem occurs very early in the boot process.

OpenWrt can't solve the issue because it hasn't even started at the time that the problem is happening. The bootloader is solely in control for the early boot phase.

You could always override the MAC address in the OpenWrt configuration such that it is the same as the MAC address that was learned by the modem when the bootloader bridged the ports together, but the problem is that this may not be deterministic (especially if you have multiple ethernet devices connected to the router).

The only fully reliable way to fix the problem is to replace the bootloader with one that doesn't bridge the ports. There are other kludge fixes that could be attempted, but only modifying the bootloader will actually solve the root cause.

I agree that a bootloader fix is necessary to fully address the root cause, but if OpenWrt uses the hardware MAC by default, and it's the same as the MAC that's being used during the early boot phase, then this problem is less likely to noticeably manifest in the wild, isn't it?

Yes and no. Sure, you can set the openwrt MAC address to be the same as the hardware mac (it actually is by default). The problem occurs when one of two things happens:

  • there are two MAC addresses in the router, and the early phase boot process exposes both to the upstream due to bridging. This can, in some situations, become a race condition or coin toss wrt the one that the cable modem learns.
  • if there are other devices also connected to the switch, their MAC addresses may also be leaked to the modem, and if that happens, the learned mac may be something not related to the MAC addresses held by the router.

Therefore, this situation is rife with non-deterministic outcomes due to the implementation of the boot loader.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.