RSTP with ustp: constantly flapping between learning and blocking

I have two HPE 1920 rtl838x switches running 24.10.2 with ustp installed for RSTP and network.switch.stp='1', otherwise default config apart from changed IPs and system name:

  • one hpe,1920-8g-poe-65w, network.switch.priority='36864' (as root bridge)
  • one hpe,1920-24g, network.switch.priority='40960'

Both devices are connected via one ethernet cable, no other ports connected but the link almost never goes into forwarding on both sides.

The port on one switch almost always goes into a constantly flapping state between blocking and learning, while the other is in forwarding:

00:51:42 kern.info kernel: [  126.196468] rtl83xx-switch switch@1b000000 lan24: Link is Up - 1Gbps/Full - flow control rx/tx
00:51:42 kern.info kernel: [  126.225270] switch: port 24(lan24) entered blocking state
00:51:42 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:43 kern.info kernel: [  126.401680] rtl83xx-switch switch@1b000000 lan24: Link is Down
00:51:43 kern.info kernel: [  126.421296] switch: port 24(lan24) entered disabled state
00:51:43 daemon.info ustpd[1836]: set_if_up: Port lan24 : down
00:51:47 kern.info kernel: [  130.915608] RTL8380 Link change: status: 1, ports 800000
00:51:48 kern.info kernel: [  131.833002] rtl83xx-switch switch@1b000000 lan24: Link is Up - 1Gbps/Full - flow control rx/tx
00:51:48 kern.info kernel: [  131.861869] switch: port 24(lan24) entered blocking state
00:51:48 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:50 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering learning state
00:51:50 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering forwarding state
00:51:50 kern.info kernel: [  134.206228] switch: port 24(lan24) entered learning state
00:51:50 kern.info kernel: [  134.227378] switch: port 24(lan24) entered forwarding state
00:51:51 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:51 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:51 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering blocking state
00:51:51 kern.info kernel: [  135.143311] switch: port 24(lan24) entered blocking state
00:51:51 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:51 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering learning state
00:51:51 kern.info kernel: [  135.206017] switch: port 24(lan24) entered learning state
00:51:51 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:52 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering blocking state
00:51:52 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:52 kern.info kernel: [  136.135588] switch: port 24(lan24) entered blocking state
00:51:52 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering learning state
00:51:52 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:52 kern.info kernel: [  136.205921] switch: port 24(lan24) entered learning state
00:51:53 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering blocking state
00:51:53 kern.info kernel: [  137.141008] switch: port 24(lan24) entered blocking state
00:51:53 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:53 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering learning state
00:51:53 kern.info kernel: [  137.205389] switch: port 24(lan24) entered learning state
00:51:53 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:54 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering blocking state
00:51:54 kern.info kernel: [  138.139204] switch: port 24(lan24) entered blocking state
00:51:54 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:54 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering learning state
00:51:54 kern.info kernel: [  138.210883] switch: port 24(lan24) entered learning state
00:51:54 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:55 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering blocking state
00:51:55 kern.info kernel: [  139.139155] switch: port 24(lan24) entered blocking state
00:51:55 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:55 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering learning state
00:51:55 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:55 kern.info kernel: [  139.206967] switch: port 24(lan24) entered learning state
00:51:56 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering blocking state
00:51:56 kern.info kernel: [  140.140014] switch: port 24(lan24) entered blocking state
00:51:56 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:56 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering learning state
00:51:56 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:56 kern.info kernel: [  140.207609] switch: port 24(lan24) entered learning state
00:51:57 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering blocking state
00:51:57 kern.info kernel: [  141.141071] switch: port 24(lan24) entered blocking state
00:51:57 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:57 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering learning state
00:51:57 kern.info kernel: [  141.209503] switch: port 24(lan24) entered learning state
00:51:57 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:58 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering blocking state
00:51:58 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:58 kern.info kernel: [  142.142237] switch: port 24(lan24) entered blocking state
00:51:58 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering learning state
00:51:58 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:58 kern.info kernel: [  142.207413] switch: port 24(lan24) entered learning state
00:51:59 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering blocking state
00:51:59 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:59 kern.info kernel: [  143.143393] switch: port 24(lan24) entered blocking state
00:51:59 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering learning state
00:51:59 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:51:59 kern.info kernel: [  143.206737] switch: port 24(lan24) entered learning state
00:52:00 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering blocking state
00:52:00 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:52:00 kern.info kernel: [  144.144553] switch: port 24(lan24) entered blocking state
00:52:00 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering learning state
00:52:00 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:52:00 kern.info kernel: [  144.206766] switch: port 24(lan24) entered learning state
00:52:01 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering blocking state
00:52:01 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:52:01 kern.info kernel: [  145.152377] switch: port 24(lan24) entered blocking state
00:52:01 daemon.info ustpd[1836]: MSTP_OUT_set_state: switch:lan24:0 entering learning state
00:52:01 daemon.info ustpd[1836]: set_if_up: Port lan24 : up
00:52:01 kern.info kernel: [  145.206717] switch: port 24(lan24) entered learning state

Now the weird thing is, if I connect them to a third switch (stock hp1810, priority 61440), the port goes into forwarding as expected and also disconnecting one of the legs to the 1920 switches works usually, but sometimes one of the switches gets is again stuck in the learning/blocking flapping, leaving the connected switch unreachable permanently.

Taking the link down and up again from the other (forwarding) side sometimes resolves it, but with the next link down/up it's flapping again without connectivity.

brctl showstp is inconspicuous apart from showing themselves as root bridge:

root@hpe-1920-8g:~# brctl showstp switch
switch
 bridge id              b000.d894036aa2eb
 designated root        b000.d894036aa2eb
 root port                 0                    path cost                  0
 max age                  10.00                 bridge max age            10.00
 hello time                1.00                 bridge hello time          1.00
 forward delay             8.00                 bridge forward delay       8.00
 ageing time             300.00
 hello timer               0.00                 tcn timer                  0.00
 topology change timer     0.00                 gc timer                 186.09
 flags

...

lan7 (7)
 port id                8007                    state                forwarding
 designated root        b000.d894036aa2eb       path cost                  5
 designated bridge      b000.d894036aa2eb       message age timer          0.00
 designated port        8007                    forward delay timer        0.00
 designated cost           0                    hold timer                 0.00
 flags

----

root@hpe-1920-24g:~# brctl showstp switch
switch
 bridge id              a000.2c233a677932
 designated root        a000.2c233a677932
 root port                 0                    path cost                  0
 max age                  10.00                 bridge max age            10.00
 hello time                1.00                 bridge hello time          1.00
 forward delay             8.00                 bridge forward delay       8.00
 ageing time             300.00
 hello timer               0.00                 tcn timer                  0.00
 topology change timer     0.00                 gc timer                   1.75
 flags

lan24 (24)
 port id                8018                    state                  learning
 designated root        a000.2c233a677932       path cost                  5
 designated bridge      a000.2c233a677932       message age timer          0.00
 designated port        8018                    forward delay timer        0.00
 designated cost           0                    hold timer                 0.00
 flags

With ustp disabled, the bridge forwards normally (in kernel STP mode), am I missing something?

I've been debugging this, the issue is always triggered if the affected switch receives a BPDU from the other one first.
The port goes into disputed in ustpd, BPDUs are sent out on the affected switch/port (visible in tcpdump), but never received on the other side in tcpdump interestingly.
If the affected switch sends out a RSTP packet before receiving one, both sides go into forwarding.
I have a feeling that this could be due to DSA bridge offloading maybe blocking STP too.

So, I’m still not sure why the BPDUs on the affected port are not sent out, since for this switch family there is actually no difference between blocking and listening, there should be no reason for the switch to drop a BPDU coming from the CPU.

But I found that my theory seems to be correct, the rtl838x debug STP egress drop counter STP_EGR_DROP is high and constantly incrementing, corresponding to the number of RSTP packets sent out by ustpd:

root@switch-lounge:~# cat /sys/kernel/debug/rtl838x/drop_counters
STP_IGR_DROP: 102
STP_EGR_DROP: 381

I’d actually like to move this topic to the development category since it’s rtl838x specific, but it seems that after some time posts can’t be edited anymore.

Did you ever find out what’s going on here? For me ustpddoes not seem to work - kernel STP without ustpd works fine in conjunction with my TP-Link switches, but when ustpd is enabled, the RSTP packets never seem to reach the kernel. I can see them via tcpdump on the connected LAN port, but not on eth0 and neither my Zyxel 1900-10HP (on main) switch nor my GL-MT6000 router (on 24.10-SNAPSHOT) correctly set the root bridge when ustpd is running (i.e. brctl showstp <bridge> shows the local bridge ID as root).