STP doesn't work

Traffic between two LAN ports is hardware switched, bypassing the Linux kernel. Most consumer hardware does not include loop detection, so a loop will cause the switch to jam up with looped packets and DoS the network.

STP is a software feature that is only active on packets that pass through the kernel.

4 Likes

Yeah it's Spanning Tree Protocol. I decided to create loop as simplest way to checkout if STP actually works.

I don't think you created a STP loop. The Kernel is smarter than that. If the connection worked, it woulda saw itself as the MAC connecting anyways.

That is a network fault on the broadcast domain (i.e. bridge) on mutiple layers of abstraction before STP is considered.

1 Like

I got next logs after the loop creation with patch cord

Fri Nov 11 01:21:45 2022 kern.info kernel: [35074.183948] br-lan: port 1(lan1) entered blocking state
Fri Nov 11 01:21:45 2022 kern.info kernel: [35074.189211] br-lan: port 1(lan1) entered listening state
Fri Nov 11 01:21:45 2022 daemon.notice netifd: Network device 'lan1' link is up
Fri Nov 11 01:21:45 2022 kern.info kernel: [35074.195368] br-lan: port 2(lan2) entered blocking state
Fri Nov 11 01:21:45 2022 kern.info kernel: [35074.200613] br-lan: port 2(lan2) entered listening state
Fri Nov 11 01:21:45 2022 daemon.notice netifd: Network device 'lan2' link is up
Fri Nov 11 01:21:46 2022 kern.info kernel: [35075.015195] br-lan: port 2(lan2) entered blocking state
Fri Nov 11 01:21:48 2022 kern.info kernel: [35076.419264] mt7530 mdio-bus:1f lan2: Link is Down
Fri Nov 11 01:21:48 2022 kern.info kernel: [35076.424208] mt7530 mdio-bus:1f lan1: Link is Down
Fri Nov 11 01:21:48 2022 kern.info kernel: [35076.429861] br-lan: port 2(lan2) entered disabled state
Fri Nov 11 01:21:48 2022 daemon.notice netifd: Network device 'lan2' link is down
Fri Nov 11 01:21:48 2022 daemon.notice netifd: Network device 'lan1' link is down
Fri Nov 11 01:21:48 2022 kern.info kernel: [35076.436310] br-lan: port 1(lan1) e

I guess STP actually works and it's hardware fault? I'm really confused, because with kernel 5.4 loop doesn't cause interface getting stucked

It's not a hardware fault, it's a hardware and/or DSA driver limitation that the switch can't handle loops. DSA doesn't yet necessarily activate all functions that are available on every hardware. Don't create Ethernet loops.

4 Likes

You would have to test with two (2)ntermediate switches. This only shows the connection going up/down. Which I expect in signaling when the device realizes it's its own MAC requesting connection to itself.

I'm referring to the signaling at hardware level when establishing the Layer 2 connection. This this prevents Layer 2 establishment, you're hence leftover with just hardware connect not working (i.e. at Layer 1 - hardware).

This created the Layer 1 fault in question.

We can put "fault" in quotes and use your wording too.

1 Like

Thus, for true STP tests I need second device... And ethernet loop fault is not Layer 2, but Layer 1 problem and it's related to DSA driver problem overall?

I beleive 3 or 4 additional...

You have to create paths with switches so that there's mutiple layer 1/2 paths from A, thru the OpenWrt and anothet switch, separated said client/switch X.

See images here:

:spiral_notepad: Managed switches have a MAC so @mk24 warning may apply to this setup with a DSA Kernel - but i think this applies to any "client" machine on Ethernet (i.e. it sees its MAC on wire and hence fails).

I made an experiment with topology like below


The results are: I can't ssh any device, one host (PC-1 or PC-2) barely can ping another. If I unplug one of the parallel connected wire, everything works fine. Surprisingly, if i change WE1602 to WG3526 everything works fine...

  • Are you saying that you have a third device that works?
  • If so, what version of OpenWrt?

Openwrt version is 22.03. WE1602 and WG3506 devices have it onboard

Disregard my inquiry. I understand now that you're attempting with only two devices despite my information (and images form Wikipedia in the provided link).

(There's still only a path thru the same 2 devices - you have to create at least another path via a 3rd device. That obviously requires more equipment, as I noted already.)

1 Like

Allright. I will try topologies just like in wiki, with 3 and 5 same devices (Zbtlink WE1602)

1 Like

What is the chipset in the 1602?

The main chipset is MT7621A

That is the same chip as the WG3526, so I don't see why there would be a difference.

1 Like

Well, I made STP test with topology like below, and results are slightly better

I can barely type anything while stp works, sometimes some of devices stop responding.

Is there any way I can make a live debug of stp feature to find out what's the problem?

If (and this is just a supposition) you are bridging all the LAN ports on the routers, then all the traffic handling work relies on the internal switch, and I am not sure those will support STP. I think (and this is another wild guest) that you need to define different interfaces for each LAN port, so the bridging happens on the CPU, which does support STP.

1 Like

According to the OpenWrt developer's post, when bridging these DSA port, the DSA driver is smart enough to do it on the switch only, without going through the CPU. It offloads the work to the hardware switch, relieving the CPU from unnecessary processing, but obviously the cheap integrated switch does not support STP. I never try to do that and really have no idea what will happen either.

To answer the original topic question, yes STP does work on my non-DSA devices. You will have to use VLAN to force traffic into the CPU and enable STP. It takes 20-40 seconds to realize a broken link and failover to another, in my simple home setup. I do not think plugging a cable to 2 ports is a good way to test STP though (I hope someone can explain why it does not work).

3 Likes