RaspberryPi 4b, OpenWRT 24.10.1, and eth0 hang

jmccabe06 · June 24, 2025, 7:40pm

Hello!

I've got a RaspberryPi 4Br1.2. I've had OpenWRT 23.x running on it without problem. Several months ago I upgraded to OpenWRT 24.10.1.

The network configuration of this device is:
eth0 -> br-lan
eth1 (usb) -> wan
eth2 (usb) -> wanb

mwan3 manages traffic against wan and wanb. Other than the problems with ipsets, no complaints here.

Since moving to 24.10.1 I've sporadically had all LAN connectivity fail. I've had to power-cycle the RPi to get it back online.

To troubleshoot the problem I've added eth3 and added this device to br-lan. When connectivity fails I can unplug the patch cable from eth0 (on the RPi or the switch) and br-lan picks up the connection to eth3 for the LAN interface and allows me to log into the RPi. If I then reconnect the patch cable, br-lan switches back to using eth0 for LAN and I lose connectivity. Unplug the patch cable - I am able to connect again via eth3.

After trawling through the logs I find that every time this has happened I am seeing a flurry of these errors:

Tue Jun 24 11:18:48 2025 kern.crit kernel: [312676.902251] bcmgenet fd580000.ethernet eth0: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 2010 ms
Tue Jun 24 11:18:50 2025 kern.crit kernel: [312678.912278] bcmgenet fd580000.ethernet eth0: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 2010 ms
Tue Jun 24 11:18:52 2025 kern.crit kernel: [312680.922288] bcmgenet fd580000.ethernet eth0: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 2010 ms
Tue Jun 24 11:18:54 2025 kern.crit kernel: [312682.932318] bcmgenet fd580000.ethernet eth0: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 2010 ms
Tue Jun 24 11:18:56 2025 kern.crit kernel: [312684.942322] bcmgenet fd580000.ethernet eth0: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 2010 ms
Tue Jun 24 11:18:59 2025 kern.crit kernel: [312686.952342] bcmgenet fd580000.ethernet eth0: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 2010 ms
Tue Jun 24 11:19:01 2025 kern.crit kernel: [312688.962358] bcmgenet fd580000.ethernet eth0: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 2010 ms
Tue Jun 24 11:19:03 2025 kern.crit kernel: [312690.972377] bcmgenet fd580000.ethernet eth0: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 2010 ms

It seems as though I can recover the port without restarting the RPi by issuing these commands:

ip link set eth0 down; ip link set eth0 up

If I do this, then plug the patch cable back in, the link comes to life and can again act as the LAN port.

In trying to diagnose this I've found various OpenWRT threads, and GitHub bug reports, indicating that this onboard bcmgenet eth driver can be flakey. The various threads make recommendations about how to work around the problem, but they are evidently not working. From my rc.local:

# https://forum.openwrt.org/t/bizarre-dns-problems-with-debian-linux-after-upgrade-to-24-10/225860/34
# https://github.com/openwrt/openwrt/issues/18122
ethtool -K eth0 rx-gro-list off
# https://github.com/openwrt/openwrt/issues/9305#issuecomment-1047344495
# https://github.com/raspberrypi/linux/issues/5561
ethtool -K eth0 rx off

yet the failure just happened. Note that the time-to-failure can be as little as a few minutes, and as long as several weeks.

The /etc/config/network config for br-lan and the device

config device
        option name 'br-lan'
        option type 'bridge'
        option ipv6 '0'
        option stp '1'
        list ports 'eth0'
        list ports 'eth3'
...
config device
        option name 'eth0'
        option ipv6 '0'
        option gro '0'

The end of my too long question - is there anything to be done? Is this a hardware failure with this RPi4? Would there be any different behavior if I upgraded to an RPi5? Should I just not use the onboard eth and run everything off the USB eth interfaces plugged into the USB3 ports (these ports have far greater bandwidth than the network links connected to them, so I'm not super concerned about the performance impact).

Any guidance?

Thank you!

psherman · June 24, 2025, 8:18pm

Remove STP.

If that doesn't fix it, please show your complete config:

Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </> " button:

Remember to redact passwords, MAC addresses and any public IP addresses you may have:

ubus call system board
cat /etc/config/network
cat /etc/config/wireless
cat /etc/config/dhcp
cat /etc/config/firewall

jmccabe06 · June 24, 2025, 8:26pm

STP was added when eth3 was added to br-lan. While probably not strictly required, it seemed a good idea to prevent a (possible) route loop.

The behavior started before STP (and eth3) was added.

The (current) version of the network config file:

config interface 'loopback'
        option device 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fdb5:19d8:6528::/48'
        option packet_steering '1'
        option steering_flows '128'

config device
        option name 'br-lan'
        option type 'bridge'
        option ipv6 '0'
        option stp '1'
        list ports 'eth0'
        list ports 'eth3'
        option hello_time '4'

config interface 'lan'
        option device 'br-lan'
        option proto 'static'
        option ipaddr '172.16.100.1'
        option netmask '255.255.255.0'
        option ip6assign '60'
        option force_link '0'
        list dns '172.16.100.97'
        list dns_search 'local'

config route
        option interface 'lan'
        option target '192.168.100.0/24'
        option gateway '136.22.110.161'

config interface 'wan'
        option proto 'dhcp'
        option device 'eth1'
        option metric '10'

config interface 'wanb'
        option proto 'dhcp'
        option device 'eth2'
        option metric '20'

config device
        option name 'eth0'
        option ipv6 '0'
        option gro '0'

config device
        option name 'eth3'
        option ipv6 '0'
        option macaddr 'b8:8d:12:53:a0:f3'

config device
        option name 'eth1'
        option ipv6 '0'
        option macaddr '00:0a:cd:3f:3d:92'

config device
        option name 'eth2'
        option ipv6 '0'
        option macaddr '00:0a:cd:3f:3d:93'

Update: As expected, disabling STP caused eth0 and eth3 to be blocked as a route loop at the switch. To recover each patch cable had to be unplugged, then plugged back in one at a time. Now one is blocked and one is permitted at the switch.

I'll keep it with eth0 active, eth3 disabled, and STP disabled (on OpenWrt) to continue troubleshooting.

jmccabe06 · June 25, 2025, 1:06am

I have installed the newly released 24.10.2. The release notes do not mention any changes to bcmgenet, but it does mention upgrading the kernel from 6.6. 83 to 6.6.92. Maybe there's something buried in there.

jmccabe06 · June 27, 2025, 10:20pm

Three days so far, no problems.

As I noted above the time-to-fail can sometimes be measured in weeks, so I'm not out of the woods yet. But ≥3 days is a positive sign.

jmccabe06 · June 29, 2025, 4:25pm

Aaaand ... it failed catastrophically last night. Completely hung eth0.

Fo rnow I have re-enabled eth3, added it back to br-lan, and removed eth0 from br-lan. I've unplugged the eth0 patch cable, effectively disabling the interface.

My network config file is above. I don't see anything that is very far from a vanilla config.

If this were a problem endemic to the RPi4 or the bcmgennet device and driver then I'd expect to see other people reporting it. This leads me to think that this is local to my config (though, as noted, unlikely IMHO) or my device. This is not a new RPi4, having had 4+ years of service before being re-purposed into an OpenWRT router. So ... maybe a NIC can just start becoming flakey and failing? Is that a thing?

Pico · June 29, 2025, 6:45pm

guessing:
Running 2 additional USB LAN adapters in longer use case could be demanding on a typical default RPI power supply.
There are countless adapters on the market, some requiring less power and having buffering capacitors and others just being cheap circuits.
I guess some might have current peaks that when combined with current peaks of the CPU might exceed what a default PSU is willing to spend. I have an 1gb adapter from AMZN that tends to crash, while a noname 100mbit LAN adapter runs rock solid.

RPIs with more than 1 USB port have some kind of power mgmt chip and tend to drop power on USB ports first when voltage drops.

jmccabe06 · June 29, 2025, 7:58pm

Solid guesses, thank you.

The power supply should be able to provide more than enough power - it's a 180W USBc power brick from a retired MacBook Pro. I have no idea how to ask the RPi if it's feeling power-starved, however.

The USB adapters are:

1 dual-port USB3 ethernet interface (link). This device provides the interfaces for WAN and WANB (eth1 and eth2 respectively). It uses Realtek chipset and has been, as near as I can tell, rock solid.
1 single-port USBC (with USB3 adapter) ethernet (link). This device was added to act as a LAN fallback (eth3) when trying to troubleshoot eth0 hanging, so I doubt that it is complicit in the problem. It uses Realtek chipset and appears, so far, pretty solid.
Additionally there is a small (30G? Less?) USB storage device hanging off of USB2 that provides archival storage of scripts, tools, collectd RRD databases, etc.

Nothing I've seen so far implicated USB failure, though now that you mention it I believe that the RPi hangs the ethernet interface off the USB bridge somewhere. So ... maybe?

I have seen mentioned somewhere that people noticed better stability from the RPi4 ethernet when forcing it to 100M instead of 1G. Perhaps I'll try forcing the switch to negotiate the link at 100M and see if that has a better stability profile.

jmccabe06 · June 29, 2025, 8:02pm

I skipped over this. Sorry about that.

I suppose I could pick up a powered USB3 hub and see if moving the USB power-load off of the RPi might be worth exploring. Though, again - the power supply really should be sending more than enough power to cover the RPi and these low impact USB devices. It's not like they're mining BitCoin or powering a coffee-cup warmer.

Pico · June 29, 2025, 8:10pm

Thats a wrong assumption, for 180W, USB C PSUs require more than 5V negotiated between device and PSU.
The RPi4 will not negotiate higher voltages than 5V.

That will likely leave your 180W PSU only providing around 3A x 5V = 15Watt to the RPi4 (you need to check the PSU specs what its max amperes are for 5V).

psherman · June 29, 2025, 11:04pm

The PSU for the Pi itself is only partially relevant here.

While I haven't been able to find official documentation to back this up, this thead states that the total power available on the USB-A ports is 1.2A across all 4 ports.

The official specs require a minimum of a 3A PSU, but say 2.5A can be used if the total USB peripherals draw less than 500mA total.

I am quite certain, even if the maximum current is actually higher than 1.2A total, its not much greater than that. Running multiple high-ish power USB devices is not likely to work well. Check the rated max power consumption (or current draw) for each of your USB peripherals.

You'll be better served by using a single multi-port USB adapter and/or using a powered USB hub.

jmccabe06 · June 29, 2025, 11:59pm

That's ... a really good point. Quite a rabbit hole of reading about USB-C, power negotiation, and potential output.

I just assumed that if a USB-C device needed more watts, it would negotiate a higher voltage as well and keep amperage within the capabilities of a narrow strip of copper. Turns out this is true, and the RPi4 doesn't care. It want's it's 5V and, obviously, 50W at 5V would be getting into pretty sketchy amperage for those small wires (10A for those unfamiliar).

My next step, as you alude to and @psherman states outright, is to try a powered USB hub. I find it curious, though, that if throtting is happening due to power shortages that the on-board eth0 (again, somehow connected the the USB bus by the RPi) gets more impacted than does the actual USB3 devices plugged into the ports. I'll also swap out the USB brick with one more specifically tuned to what the RPi wants - something that will do 5V 3A specifically.

Thanks again for the advice. I'll report back if these make a difference.

wilsonyan · June 30, 2025, 12:35am

lan over usb has just always been a bit of a gamble with linux
If you want just hardcore 'have things work well advice' i'd say just never do it

Perhaps grab a pi 5 and a pcie adapter cable and get an extra nic on it that way, or there are many options out there of rockchip based rk35xx devices with multiple ports or mini pc stuff like n100 based stuff

this post is basically a 'my 2 cents thing' of where I lean lately (away from lan over usb)

jmccabe06 · June 30, 2025, 8:06pm

tl;dr: I'm trying running in a VLAN configuration rather than USB Ethernet dongles. Fingers crossed.

It seems to have disappeared, but I believe someone recommended possibly using VLANs with the managed switch to avoid all these USB ethernet adapters. I decided to give this a shot while waiting for the powered hub.

Current status (as of five minutes ago):

RPi, using 802.1q VLAN:

eth0
  |-->eth0.10 (LAN)
  |-->eth0.20 (WAN)
  |-->eth0.30 (WANB)

Managed switch:

Port 1
  |-->"Trunk" to RPi, "tagged" member of VLAN 10 (LAN), 20 (WAN), 30 (WANB)

Port 2
  |-->"WAN" port (connects to ISP modem), "untagged" member of VLAN 20

Port 3
  |-->"WANB" port (connects to ISP modem), "untagged" member of VLAN 30

I won't bore you with the various port memberships (outside of these) for the rest of the switch. There are other VLANs, but they a minimal.

On the OpenWRT I:

Network->Interfaces->Devices I added eth0.10 to br-lan and removed eth0 and eth3
Network->Interfaces replaced eth1 and eth2 from WAN and WANB with eth0.20 and eth0.30

On Save&Apply (and the moving of all relevant patch cables) blinky lights began blinking appropriately and, weirdly, everything began routing correctly ("weirdly" only in the sense that I fully expected to be locked out and to have to pull the memory card, mount it on a different device, undo the configuration changes, etc).

So now two USB devices are removed from the configuration. I'll keep monitoring to see if this appears more stable.

Whoever made this recommendation, thanks! If nothing else, I've learned a great deal more about VLAN's than I knew this morning, and I might have a more stable configuration now. Thanks!