br-vlX: received packet on bat0.X with own address as source address (addr:...., vlan:0)

Hi,

can you please help me debug this? It's occuring since upgrading to 21.02.0-rc.3 (from 19.07.7). It makes the unit B, connected and manage-able only through batman-adv tunnel over wireless, sporadically loose its ping and WebIF access for minutes. But in-between, some packets go through so our monitoring system does not detect the "short" outages. I noticed that when the above kernel message came in, the connection problems started to occur.

unit A: LAN uplink, TP-Link Archer C7v2 running OpenWrt 21.02.0-rc.3
unit B: WiFi uplink through BATMAN-ADV, TP-Link Archer C7v5 running OpenWrt 21.02.0-rc.3

I've examined network config again but found no loops or other suspicious things in it.

Thanks for your help in advance.

Unit B - network switch config:

  • vl10 is the batman-adv "uplink over wifi 802.11s".

image

My setup is as follows:

  • (LAN, wired) == unit A (batman-adv, bridge between eth0.10 and bat0.10) == (WiFi 802.11s mesh) == unit B (batman-adv, bridge between eth0.10 and bat0.10) == (Second AP on another radio for clients)

is there ANY device that routes packets on the LAN side of your network, that is not running openwrt?

also, go ahead and enable STP, and disable multicast unless you need it for some reason

forgot to mention, you need STP on for all devices that are bridged together for it to take effect

Thanks for your reply. I've got this config for bat0. I need multicast to calm down packet storms caused by Chromecast devices doing mdns discovery. I wonder why it worked that smooth for a year on 19.07.x?!

Where exactly should I set STP = on in the config text files? Isn't this the same as bridge_loop_avoidance?

config interface 'bat0'
	option proto 'batadv'
	option routing_algo 'BATMAN_IV'
	option aggregation '1'
	option ap_isolation '0'
	option bonding '0'
	option fragmentation '1'
	option gw_mode 'off'
	option log_level '0'
	option orig_interval '10000'
	option bridge_loop_avoidance '1'
	option distributed_arp_table '1'
	option multicast_mode '1'
	option network_coding '0'
	option hop_penalty '30'
	option isolation_mark '0x00000000/0x00000000'

I've now additionally enabled STP in the bridge. Is it correct to have bridge loop avoidance and bridge STP enabled at the same time?

image

If you enable STP for all devices in the network you can disable the BATMAN specific "bridge loop avoidance", but having them both should be harmless

1 Like

No, it's OpenWrt only.

I did enable STP and it went fine for about 7 days, now the problem has come up again.

[335039.834769] net_ratelimit: 17 callbacks suppressed
[335039.834780] br-vl10: received packet on bat0.10 with own address as source address (addr:xx:xx:xx:xx:xx:xx, vlan:0)

Could this be a bug in the bridge or batman-adv? I only have this on one batman-adv device, all other OpenWrt devices (wired, wireless) don't show this problem after some days of uptime. The MAC address in question is not found in our network documentation, so I suspect it to be something "internal" of the Wifi AP or bridge.

@mpratt14 Enabling STP did not help. I still got "received packet on bat0.10 with own address" randomly on one of the five APs, seemed like "one day AP 1, the other day AP4, ...".

config device
	option type 'bridge'
(...)
	option stp '1'
	option hello_time '10'

I've now removed STP from the bridge device (which holds eth0.10 and bat0.10) and added

option ipv6 '0'

instead. Will see if that helps. (batman-adv bridge loop avoidance is still on - unchanged)

I've noticed there are ~ 5 minutes between those blocks in the logread output. Any pro who has a guess what might be causing this. I've triple checked all bridges, switches and temporarily disabled the bat0 interface. But the problem does not go away. Physical (other) switches do not blink like crazy and I'm sure there isn't a loop on the network. What can it be?

Aug  9 12:12:58 WifiAP-01 kernel: [ 2118.755113] br-vl10: received packet on eth0.10 with own address as source address (addr:a4:2b:b0:xx:xx:xx, vlan:0)
Aug  9 12:17:20 WifiAP-01 kernel: [ 2380.792039] br-vl10: received packet on eth0.10 with own address as source address (addr:a4:2b:b0:xx:xx:xx, vlan:0)

STP on all devices that route packets, including APs?

Sorry, no, we don't have a possibility "of round trip" because there are no redundant extra connections available. This problem came with 21.02.0-rcX and wasn't there on 19.07.x.

I've now solved the problem. The culprit was a wrongly configured broadcast address in /etc/config/network. The bridge br-v10 is part of the IP range 10.20.10.0/23 . The broadcast address was WRONGLY configured to 10.20.10.255. I've set it to 10.20.11.255 by removing the option from the config file and OpenWrt calculated this correct address automatically plus showing it in LUCI Web UI.

After applying the new broadcast address which fits "the existing physical network range", the errors "received packet with own address as source" ceased.

I did not change my STP/bridge_loop_avoidance settings. They are still the same, so STP = off (as I don't have redundant LAN/WiFi inter-connections) and bridge_loop_avoidance = 1 (batman-adv setting).

Everything running error-free and fine now. Thanks for your help again! :slight_smile:

1 Like

Just for reference, if anyone has the same problem with the wrong broadcast address in use, those packets will show up simultaneously in Wireshark when the "br-vlX: received packet on bat0.X with own address as source address" error appears in "logread -f".

OpenWrt:

Wireshark:

The problem came back, and it's similar to: Kernel Warning - Received packet with own address as source address - #53 by Ellah1

Note: WifiAP-xx changes between 01...05 every 5-8 minutes randomly.

Aug 24 09:36:00 WifiAP-02 kernel: [19829.723799] net_ratelimit: 28 callbacks suppressed
Aug 24 09:36:00 WifiAP-02 kernel: [19829.723811] br-vl10: received packet on eth0.10 with own address as source address (addr:98:da:c4:bf:c6:cc, vlan:0)
Aug 24 09:36:00 WifiAP-02 kernel: [19829.740174] br-vl10: received packet on eth0.10 with own address as source address (addr:98:da:c4:bf:c6:cc, vlan:0)
Aug 24 09:36:00 WifiAP-02 kernel: [19829.751075] br-vl10: received packet on eth0.10 with own address as source address (addr:98:da:c4:bf:c6:cc, vlan:0)
Aug 24 09:36:00 WifiAP-02 kernel: [19829.761939] br-vl10: received packet on eth0.10 with own address as source address (addr:98:da:c4:bf:c6:cc, vlan:0)
Aug 24 09:36:00 WifiAP-02 kernel: [19829.772890] br-vl10: received packet on eth0.10 with own address as source address (addr:98:da:c4:bf:c6:cc, vlan:0)
Aug 24 09:36:00 WifiAP-02 kernel: [19829.784751] br-vl10: received packet on eth0.10 with own address as source address (addr:98:da:c4:bf:c6:cc, vlan:0)
Aug 24 09:36:00 WifiAP-02 kernel: [19829.795667] br-vl10: received packet on eth0.10 with own address as source address (addr:98:da:c4:bf:c6:cc, vlan:0)
Aug 24 09:36:00 WifiAP-02 kernel: [19829.806527] br-vl10: received packet on eth0.10 with own address as source address (addr:98:da:c4:bf:c6:cc, vlan:0)
Aug 24 09:36:00 WifiAP-02 kernel: [19829.817393] br-vl10: received packet on eth0.10 with own address as source address (addr:98:da:c4:bf:c6:cc, vlan:0)
Aug 24 09:36:00 WifiAP-02 kernel: [19829.828253] br-vl10: received packet on eth0.10 with own address as source address (addr:98:da:c4:bf:c6:cc, vlan:0)

I ran

tcpdump -i br-vl10 -evn not host 10.20.10.61 and ether host a4:2b:b0:de:86:21

on WifiAP-01 and got:

(...a lot more of the same...)
09:31:44.368052 a4:2b:b0:de:86:21 > 01:00:5e:00:00:01, ethertype IPv4 (0x0800), length 1496: (tos 0xc0, ttl 1, id 0, offset 0, flags [DF], proto IGMP (2), length 32, options (RA))
    0.0.0.0 > 224.0.0.1: igmp query v2

What can I do against this problem? Seems like a bug?!

The MAC 01:00:5e:00:00:01 is unknown to me, Google says: https://superuser.com/questions/1234592/why-does-router-send-packets-to-the-multicast-address

I also have br-lan: received packet on bat0.100 with own address as source address and I don't know how to solve it ...

Try to set "option multicast_querier '0'" on the bridge device. That solved it for me.

I think it was because the mesh interface and the br-lan bridge shared the same MAC, so I had to change the MAC of the mesh interface and that no longer appears. Where is that option? In bat0 or br-lan?

In my case bat0 got some random auto generated mac by itself. I did not have to do a specific setting of the mac.

Yes, I also get a random MAC in bat0, but the one I had to change is the one for the 802.11s mesh interface in the /etc/config/wireless file, because apparently one of the wireless interfaces takes the MAC from eth0. In my case, I changed 802.11s and the AP interface took eth0, then the other guest AP generated a MAC from eth0.
I also see that VLANs take the MAC of the interface, for example eth0.1 from eth0 and bat0.100 from bat0.
So I guess it is a MAC address allocation and creation problem.

I thought that too, but only was able to solve the problem by turning multicast querier off. what's remarkable here: i have two identical mesh setups in two different places (two independent networks). The first and the second openwrt devices have exactly same settings. environment 1 didn't show the received own packet problem , 2 showed it. network two has another switch (brocade) than Network 1 (tplink). the brocade handles the "multicast passive" according to its config. I suspect a conflict between what the brocade does and what openwrt does, thus the problem coming up after some hours at first occurrence when setting openwrt multicast querier to 1 on network 2.

Thanks, that solved it for me. :slight_smile: