Switch active before it's configured - avoidable?

CuteRoute · April 28, 2019, 8:49am

Running OpenWRT 18.06.2; I guess this is hardware independent.

A very aggressive DHCP client brought to my attention the fact that the LAN switch is working in "dumb" mode before it's actually configured.

In my case, the LAN ports of the "managed AP" OpenWRT device are supposed to be mapped to VLANs. Interestingly, my central router supports tagged and untagged operation on the same port. Actually, it even forces me to have a non-VLAN IP address for each port. (Although a DHCP server is of course not required, but I currently have one there for a certain reason.) So for ease of configuration and to make a long story short, that particular DHCP client manages to DHCP before the OpenWRT "managed AP" configures its switch for trunk operation. The DHCP client then gets a non-VLAN IP address assigned from the central router.

From observing LEDs I guess switch hardware just behaves like that. And admittedly, this is a very special situation which I would not even justify. However, there might be other cases where this behaviour is undesirable. So is there any way to disable the switch at boot / reset time until it's configured for operation?

jeff · April 28, 2019, 9:12am

These home-router chips usually come up in a configuration appropriate for OEM WAN/LAN use and are “leaky” as a result until OpenWrt is running and re-configures them. They’re also generally reset on config change, but that is typically much less than a second so most never see the ill effects.

I don’t think there’s a good solution as I don’t think one can keep the phys powered down at boot. You might be able to do something with a custom boot loader, as the problem starts at power on. I typically use the WAN port as my VLAN trunk to help mitigate this somewhat.

CuteRoute · April 28, 2019, 9:47am

Thank you! Same here, btw. However, without me knowing the HW details, it looks as if it's "not separate enough" (HW or SW wise) in my case.

But yes, it means there's HW where it's not relevant (unless one configures more than one trunk or so)...

bobafetthotmail · April 29, 2019, 12:01pm

The bootloader initializes the switch and can set it up to do whatever by writing numbers raw to the switch chip registers (which is what drivers also do later when OpenWrt takes over the control of the switch).

That's how some devices are not leaky on the "WAN" port, as the bootloader is initializing the switch to keep the WAN port separate from the LAN ports (as the WAN port is just a port of a switch).

This is possible if you are a OEM making your own devices, but for everyone else this may require very esoteric information (switch configuration registers) and would require crafting a bootloader (at least a secondary bootloader) to do this change before booting OpenWrt.

CuteRoute · April 29, 2019, 1:08pm

Before that, the switch hardware will probably have a reset state (or at least a hopefully well-defined initial power-on state, or the state which was configured before a soft reset). No nitpicking intended, but either I want to keep the switch from passing untagged packets upstream or not.

But yes, from the OpenWRT perspective, it's as it is. And I'm very grateful that I have two consumer devices which run a mature OpenWRT and solve a problem for me which I did not choose to have.

And yes, I can live with that behaviour. It's just something to keep in mind, and I did not expect this.

My central router has a maximum bandwidth which applies to the four ports of its configurable LAN switch, while its WAN port has an individual bandwidth limitation. Which makes me think that not all WAN ports are "just a port of a switch"?

jeff · April 29, 2019, 1:12pm

You can look at a typical switch's data sheet, such as https://cdn.datasheetspdf.com/pdf-down/A/R/8/AR8327_Atheros.pdf

There are "interesting" connections between the switch and the CPU of the SoC on some devices, such as the IPQ40xx and some mvebu devices that additionally tag the packets with the port on which they were received, but, in the main, they're just switches.

bobafetthotmail · April 29, 2019, 4:37pm

It depends. External switch chips usually initialize as "dumb" switch (they are also used alone in the "dumb" switch products) although there is no guarantee they will.

In many modern devices it's all integrated in the SoC, and they may actually need some magic wooo wooo from the bootloader to actually wake up and turn on at all. This is not commonly investigated.

I do know about this because I've read threads of people trying to develop a new bootloader for a Kirkwood-based Linksys router (the end goal was running Debian ARM on it, like is possible with other Kirkwood-based devices). And yes, the switch was not powering up properly without uboot intervention. I didn't stay around long enough to see if they figured it out in their new uboot.

the board init phase is pretty quick, a 1-2 seconds at most. If the bootloader is setting the switch to isolate one of the ports, it should not leak.

The actual boot of OpenWrt is the most likely time it can leak as it can take 5-10 seconds to get to OpenWrt system and actually apply your settings (on some devices more because the bootloader sits there and does stuff for a while before booting).

Yeah, that's for older devices with no integrated switch (or even with no VLAN-capable switch), and also high-end managed switches/routers not supported by OpenWrt (Cisco stuff for example). Does not change the end result of WAN port being best choice if you want to avoid early leaks.

In this case the main CPU SoC has 2 internal "ethernet ports", one is connected to an internal 5-port switch (on a separate chip) to create the LAN ports, and the other is used as WAN port and exposed as physical port.

In this case it's highly likely that the LAN ports are controlled by a switch that is "dumb" by default and not touched by the bootloader, so it will leak until OpenWrt takes over.

In this case, the WAN port is not on the switch so it won't leak anything.

slh · April 29, 2019, 9:22pm

For many consumer routers, the default power-on state is bridging all ports together as a dumb switch (even if you brick your bootloader/ firmware, most devices will revert to dumb switch operations). The bootloader is the first stage to untangle that mess (and usually fast enough to avoid noticeable leaks) and configure port isolation until the final switch configuration gets applied by the OS.

CuteRoute · May 2, 2019, 11:32am

Ah yes I should have assumed that.

I'm more worried about the DHCP client getting an address which is later not routed. I don't want to have to remember ugly details like this in my private network. My workaround will be to disable the untagged DHCP I guess.

CuteRoute · May 2, 2019, 11:37am

Looks like it. I guess the rationale is that this is the expected operation and the consumer is supposed to be happy if the switch is functional as early as possible in the power cycle. Back in the good old days :), I doubt we would have done it like that. We'd have looked at the switch and said, it's configurable, we don't know the final configuration yet, so let's initialise it to a safe state to which everyone can agree.

Lucky1 · May 2, 2019, 12:23pm

I can't remember how atm but on a DIR-645 I remember being able to bring all the ports down & reset all the local computer to re request new ip's
I use this when it was a slave access point & switch if it detected the dhcp server not available it turned off wifi & when one was found it turned on wifi & reset the local port so the local computers would ask for new ip's
it meant i could reset the main router & it would reset everything down stream as well
it was an older version of openwrt but maybe you can do something similar

bobafetthotmail · May 2, 2019, 1:24pm

Is this dhcp client in another device right?

If that's the case, the DHCP client is getting an address because it's asking for an address while the switch is operating in "dumb" mode because the bootloader has set it like that, and OpenWrt has not booted up yet.

The rationale is that these are consumer devices, not designed to be used with advanced features like VLANs in a complex network.
So they never took precautions to make their device act sane in these environments.

slh · May 2, 2019, 11:38pm

Sadly this is very likely to coincide for devices connected directly to the router's LAN ports, as a switch reset is usually part of the reboot, which will invoke a link powerdown/ link-up event on the LAN ports (with the client reacting accordingly, by sending a new DHCP request).

CuteRoute · May 3, 2019, 12:42pm

Absolutely. That's the reason why my workaround will be to disable the DHCP für the untagged interface "under" the trunk. (My central router forces me to have that active.) Next I hope the switch setup will be a more or less atomic operation.

In my view, implementing a subcomponent with hardwired assumptions about its later use, when avoiding these assumptions would have led to no additional costs, is simply not such a great idea. So I was trying to find a rationale from the implementation team's perspective.

But yes, the rationale from an observer's perspective which you provide sounds almost liek a safe bet.

CuteRoute · May 3, 2019, 12:46pm

Yes, no matter how short the time frame, this one device is really eager to send its DHCP request. But once I disable the DHCP server in the central router for the untagged interface, it should keep trying and only succeed once the switch is configured.

Lucky1 · May 3, 2019, 11:39pm

I think you may find the boot loader may setup the switch for possible generic recover mode
but could with extra work be tailored to except recover on a single port

CuteRoute · May 19, 2019, 2:27pm

Interesting. Was that a script you made?

CuteRoute · May 19, 2019, 2:46pm

...which is not possible with AVM FB4040 and LINKSYS EA6350, and maybe also any other IPQ4018 based device?

Also, the boot sequence of the EA6350 is really very special in this regard. While it indicates control over its LAN ports by nicely switching on one port LED after the other, during boot with OpenWRT, it resets the LAN switch three times - one at power on, then once again before OpenWRT takes control. No idea whyt this is supposed to be good for, will just result in more pointless DHCP requests.

jeff · May 19, 2019, 3:31pm

I think you'll find the first is the boot loader (which prepares it for network checks and the possibility of TFTP or the like) and the second is driver load by the kernel (which may need network access for root mount).

CuteRoute · May 19, 2019, 3:37pm

Which in turn means, root mount over network is impossible over VLANs?