Debug Environment:
- Devices: Two OpenWrt routers connected via a VLAN-bridged network (bridge name:
br
). - VLAN Tagging: LAN traffic uses VLAN tag 10.
- Second Bridge: Created a second bridge (
brl
) abovebr.10
(Just for clearer view of the bridged ports and wireless interfaces). - LAN Configuration: LAN network is attached to the
brl
bridge. - Note: Both OpenWrt devices have identical configurations.
Symptoms & Observations:
- Unexpected DHCP Discovery: When an STA moves from proximity of one device to another, it sends a DHCP Discovery message, which is unexpected behavior when Fast Transition is enabled.
- DHCP Offer Limited to First Device: The DHCP Offer reply packet is only visible on the first device, which acts as the DHCP server, and does not propagate to the second device where the STA has roamed.
Bridge FDB Observations:
I performed bridge fdb show | grep 'macaddr of STA'
on the second OpenWrt device. It reveals multiple MAC address entries across various interfaces and VLANs, indicating that something is amiss. It seems the second device either discards the packets or perhaps sends them back to the source, due to these stale or erroneous entries.
First device:
bridge fdb show | grep '??:??:??:??:??:??'
??:??:??:??:??:?? dev lan1 vlan 10 master br
??:??:??:??:??:?? dev br.10 master brl
Second device:
bridge fdb show | grep '??:??:??:??:??:??'
??:??:??:??:??:?? dev sw-eth1 vlan 10 master br
??:??:??:??:??:?? dev sw-eth1 vlan 10 self
??:??:??:??:??:?? dev phy1-ap0 master brl
phy0-ap0: wireless interface 2.4Ghz
phy1-ap0: wireless interface 5GhZ
Identical scenario is happening also in the case of just switching from 'phy0-ap0' to 'phy1-ap0' (which is 5GHz, and involves both cases with/without FT enabled).
Setting the MAC address aging time of bridge containing the VLAN to a lower value
brctl setageing br 3
seems to fix both issues, at least as a temporary solution.
Conclusion:
I believe there is a flaw or shortcoming in the DSA implementation when dealing with VLANs and MAC address aging. It seems like the bridge doesn't efficiently flush stale MAC entries, which affects both Fast Transition and 5GHz connectivity.
Problem is present on:
- 23.05.0-rc3 r23389-5deed175a5
- SNAPSHOT r23995-ce7209bd21
- also some version of SNAPSHOT r24xxx (but i forgot to write down the exact number )
Also something to note:
My first attempt to solve the issue was to dynamically (using scripting) delete old entries, but when i tried just manually remove them using:
bridge fdb del '??:??:??:??:??:??' dev sw-eth1 vlan 10 master
which executes with no issues, but
bridge fdb del '??:??:??:??:??:??' dev sw-eth1 vlan 10 self
always fails with
RTNETLINK answers: No such file or directory
so next logical step was to set the aging time to very low value.