Br-<device>: received packet on bat0.<VID> with own address as source address (addr:<MAC>, vlan:0)

After installing & configuring BATMAN interface using this guide I'm receiving lots of the following errors on both of my nodes:

br-guest: received packet on bat0.3 with own address as source address (addr:<MAC>, vlan:0)

My setup is as follows:
AP X with internet access which functions as my router, firewall
AP Y connected wireless with mesh & batman.

/etc/config/wireless on AP Y:

config wifi-iface 'mesh_living'
        option device 'radio1'
        option network 'bat_mesh' 
        option mode 'mesh'
        option mesh_id 'BATMESH'
        option encryption 'sae'
        option key '<PASSWORD>'
        option mesh_fwding '0'
        option mesh_ttl '1'
        option mcast_rate '24000'
        option mesh_rssi_threshold '0'

and /etc/config/network on the same AP:

config interface 'bat0'
        option proto 'batadv'
        option routing_algo 'BATMAN_IV'
        option aggregated_ogms '1'
        option ap_isolation '0'
        option bonding '0'
        option bridge_loop_avoidance '1'
        option distributed_arp_table '1'
        option fragmentation '1'
        option gw_mode 'off'
        option hop_penalty '30'
        option isolation_mark '0x00000000/0x00000000'
        option log_level '0'
        option multicast_mode '1'
        option multicast_fanout '16'
        option network_coding '0'
        option orig_interval '1000'

config interface 'bat_mesh'
        option proto 'batadv_hardif'
        option master 'bat0'
        option mtu '1536'

And in the same file I've bridged m VLAN's:

config device
        option name 'br-lan'
        option type 'bridge'
        list ports 'eth1.1'
        list ports 'bat0.1'

config device
        option type 'bridge'
        option name 'br-guest'
        list ports 'bat0.3'

The config on my router (AP X) is very similar, the only difference is that it also has config for routing, firewalling....

So how could I solve this? I've fond a similar thread ,but I do not have this option multicast_querier enabled so this couldn't be the culprint.

kr
wouter

My workaround is to set a individual macaddr for the batman interface and for each bridge.

[...]
config device
    option  name            'bat0'
    option  macaddr         '02:00:01:00:00:01'
[...]
config device
    option  name            'br-lan'
    option  type            'bridge'
    list    ports           'eth0.1'
    list    ports           'bat0.1'
    option  macaddr         '02:00:01:01:00:01'
[...]
config device
    option  name            'br-vlan17'
    option  type            'bridge'
    list    ports           'eth0.17'
    list    ports           'bat0.17'
    option  macaddr         '02:00:01:01:00:11'

Are you running a wired AND a wireless backhaul simultaneously? I’ve dealt with this error myself and it was because I misconfigured my backhaul. This is caused from a loop in the bridge you’re using.

If you are running both wired and wireless backhaul, I would keep the wire and shut down the wireless. Best practice for a wired backhaul is to dedicate an Ethernet vlan to your switch for just the backhaul and refrain from sharing it to a bridge used for a network interface.

ATM is only 2.4 and 5 GHz, but we could count them as two links, too. I'm not quiet certain I understand your point because that's the whole point of STP in general or loop avoidance in particular with batman-adv to have multiple links.

No, but I'm planning to do this on a later stage.
At this moment the AP's are connected via a wifi backhaul running on 2.4Ghz (AP Y does not have an ethernet port in the neighbourhood, that's why).

Would I benefit to create a dedicated management VLAN now and use this already for my Wifi-backhaul (and later on re-use it for my Ethernet backhaul)

Thanks for the suggestion. So this would not solve the root cause but take the error log entries away? The mac address is just a random one (i.e. not conflicting with something in your local network)?

I have two hands full of VLAN-Bridges (old Acher C7 and WDR3600 so no DSA or VLAN aware bridge support) and each VLAN-Bridge has the tagged eth0 and the tagged bat0 subinterface enslaved...

All pure AP / Dump AP have a "management" VLAN which is configured with proto dhcp, all others are proto none.

The macaddr is "statically" assigned in my case 02: is one of the the "private use" space, and I encode a device id, and a scope/function and then an incremental value. But in the end, the CPE and all AP have a individual macaddr for each interface device, to avoid this "error". I have never encountered an actual issue but was "annoyed" of these messages in the log. Do you see / experience any no-function?
But yes you could just pick and ensure random addresses, but I think this results in more work then just have a "simple" address plan and allocation schema :person_shrugging:

Actually no, I have this setup for one week now and so far no strange hick-ups. The only thing I've noticed is that wifi mesh backhaul is not up after a reboot, in LEDE I see the following:

However if I click edit I do see an network interface associated with this wifi:

If I restart my wifi radio, the mesh is up & running again. In the error log I see the following:

Tue Jan  3 13:10:33 2023 daemon.notice mesh11sd[2081]: mesh1 is not up - giving up for now.
Tue Jan  3 13:10:54 2023 daemon.notice mesh11sd[2081]: mesh1 is not up - giving up for now.

I'm not sure whether this has something to do with it (fyi: I've installed mesh11sd previously to create an 802.11s mesh network as described here)

PS: what does CPE stand for?

Two links is what’s causing the loop in the log. STP on the device and bridge loop avoidance are doing their job, which is why you haven’t noticed any hiccups other than the log report, but having those two links is what the system sees as a “loop” per se. Just use your 5ghz radio for your backhaul. If you’re using any radio for both mesh and AP, you’re effectively cutting your throughput in half for that radio anyways. Dedicate one of your radios for your mesh point (again, preferably your 5ghz band) and call it a day. The loop in the log should disappear and you’ll get full throughput of your bandwidth on the 2ghz radio.

Regarding the wired backhaul, if you do as I recommended in my first post when you’re ready to setup a wired link, you’ll be good to go.

I know it as "Costumer Provider Edge", in my case doing the PPPoE dial in and doing routing. Sometimes its also called "Customer-premises equipment". (See: https://en.wikipedia.org/wiki/Customer-premises_equipment, too.)

I doubt that it is the issue. Because its the whole point. A "redundant" network should have / has multiple links available. On bridged Layer-2 you need either STP, Rapid STP, or Multiple VLAN Rapid STP, or however a vendor calls their proprietary implementation... or in our case, batman-adv.
Like I said before I have never encountered any blocked links for hickups or issues, only this warning, and it went away when I assigned an individual macaddr for each device.

Point is: STP, without badman-adv, takes 20 to 30 seconds to unblock a link. And sadly (AFAIK) there is no Rapid STP or RSTP per VLAN available for Linux, or at least not for OpenWRT :confused:

For small home networks badman-adv is therefor a nice and suitable option to bridge Layer-2.
I have no deeper knowledge how badman-adv does it actually, and I just trust its implementation, however https://www.open-mesh.org/projects/batman-adv/wiki/Bridge-loop-avoidance is a good starting point. (I can happily life with this lag of knowledge.)

I will just share the relevant portion of my setup:

config wifi-iface 'mesh0'
    option  disabled    '0'
    option  device      'radio0'
    option  ifname      'mesh0'
    option  macaddr     '02:00:01:02:00:01'
    option  network     'bat0_hardif_mesh0'
    option  mode        'mesh'
    option  mesh_fwding '0'
    option  mesh_id     'fde6:a09a:b373::/48'
    option  encryption  'psk2+ccmp'

config wifi-iface 'mesh1'
    option  disabled    '0'
    option  device      'radio1'
    option  ifname      'mesh1'
    option  macaddr     '02:00:01:03:00:01'
    option  network     'bat0_hardif_mesh1'
    option  mode        'mesh'
    option  mesh_fwding '0'
    option  mesh_id     'fde6:a09a:b373::/48'
    option  encryption  'psk2+ccmp'
config device
    option  name            'bat0'
    option  macaddr         '02:00:01:00:00:01'

config interface            'bat0'
    option  proto           'batadv'
    option  routing_algo    'BATMAN_IV'

config interface            'bat0_hardif_mesh0'
    option  proto           'batadv_hardif'
    option  master          'bat0'
    option  mtu             '2304'

config interface            'bat0_hardif_mesh1'
    option  proto           'batadv_hardif'
    option  master          'bat0'
    option  mtu             '2304'

I never payed close attention which link (2.4 or 5 Ghz) is actually used and just trust that the better is picked. Maybe some day I will run cables between the AP but at least this works quiet well since a few years :person_shrugging:

PS: @wouterVE If you like: Compare your batman settings with the defaults and I assume you will find out that you do not actually need to set a bunch of these settings. Again, I'm running with defaults for years on a network with 3 to 5 APs and do not observe or experience any noticeable issues. But the APs do not move and it rather happes not quiet often that I need to reboot an AP or that one is loosing its power. So the network is quiet stable. And I use only batman-adv and not 80211sd.

I suppose this is a case of YMMV. If it works, great! On a personal level, I don’t like multiple links for reasons as mentioned.

  1. Sharing a mesh point and an AP on a radio cuts the throughput of the AP in half or worse. (This wouldn’t be so much an issue for those with phenomenal speeds from their ISP, but I only have a 50/10 link right now)

  2. I get those annoying kernel warnings, but I may retry the wired backhaul approach and take your advice to change the macs.

I've switched off all my other Wifi AP's that are using the same radio as the backhaul (i.e. 2.4Ghz) on both mesh nodes and still receiving the same erros. I've done some quick speedtest between both nodes - before & after turning off the AP's) - and I'm getting the same results (around 80Mbps with iperf3, which I'm totally fine with).
So I don't see any effect in turning off the AP's, or do I miss some point with my test?

Thank you very much with providing your config. It's about the same as pointed out in this guide (which is very good btw). I'll start by adding an 02-mac address and see if my kernel errors are gone.
In the next step I'll use both wifi radio's for my wireless backhaul and in the last step I'll add my 2 other AP's with a wired backhaul.

PS: the official docu is also helpful for this matter.

I’m not surprised because I encountered the same when using an iperf test. There’s a plethora of posts on this forum claiming the matter and I only realized it was true when I ran a more simple Speedtest using one of my firesticks. When I shared my 2.4g radio between mesh and AP, I was only getting 15mbs. When I dedicated my backhaul to 5g, I was getting my full speed from my isp, 50mbs. Very strange…

Fwiw, my batman-adv stopped the kernel errors (at least for me) when I dedicated one radio to mesh and assigned my main router as server node and my other nodes as client nodes and initiated the bonding feature. It’s been a roller coaster ride learning batman and getting it to work in a manner to my liking. I like the MAC address advice tho and I’ve seen it mentioned before on other posts. Bonding can lower your throughput, especially if your client nodes don’t have a straightforward connection to the server node (I have 4 nodes).

Yes, I was also thinking about measuring with speedtest on my laptop as the next step.

What do you mean with server node & client node? Is this defined by the option gw_mode (+ gw_bandwidth) for the bat0 device (official docu)

Setting up a network with OpenWrt has been a roller coaster for me, but a very educational one. Thanks to this I now have much more knowledge about networks in general. Getting batman-adv up & running for whole my network will also teach me a lot :slight_smile:

If you go to your bat0 device interface using luci, you’ll see a tab for mesh. In that tab you can set the gateway mode for your node. The main node connected to the internet should be set to ‘server’ so that any other nodes know to send traffic there. You can set any other node to either ‘off’ or ‘client’. From what I’ve gathered in batman’s documentation and forums, setting it to client forces client nodes to find the best route to hop traffic back to the server node.

Ok thx you probably mean this setting:

I thought this was only necessary if you have multiple internet gateways in your network, but I haven't dug into this matter yet. It probably be a matter of tweaking & measuring to see what's best for my needs.

AFAIK this only matters if you have multiple gateways (uplinks) and/or multiple DHCP server.

See https://openwrt.org/docs/guide-user/network/wifi/mesh/batman and search for gw_mode, and https://www.open-mesh.org/projects/batman-adv/wiki/Gateways

I have not configure gw_mode, so every mesh point has it off, and I experience no issues, neither with getting DHCP leases nor finding the gateway. :person_shrugging:

PS: batctl meshif bat0 gw

Indeed, hence you also need to configure gw_bandwidth for each gateway.
In case of 1 internet gateway, routing is handled by the ipv4 settings (i.e. IP of gateway) I think.

Hello,
I've implemented the fixed macaddr solution as probposed by @_bernd and it's working without problems for a few weeks now.
To summarize, you need to assign a private mac-address* to the following:

Device/interfac Location
bat0 device /etc/config/network
Bridge devices (eg br-lan) /etc/config/network
Wifi interfaces /etc/config/wireless

By adding option macaddr xx:xx:xx:xx:xx:xx to each block in the config.
As a good practice, you can create a logical numbering system for your mac-addresses. I've used the following stanza:

02:00:00:AA:00:00
       |  |  |  |
       |  |  |  ╰---for bat0: 00
       |  |  |      for bridge: vlan id (you can also use the previous 2 positions)
       |  |  |      for Wifi interface: 2 for 2.4ghz | 5 for 5ghz radio
       |  |  |
       |  |  ╰------for bridge: vlan id (you can also use the next 2 positions)
       |  |   
       |  ╰---------Type: A = bat0 device
       |            B = Bridge device
       |            C = Wifi interface
       |
       ╰------------Last number of the router's IP-address eg 01
 
 examples:        
    02:00:01:AA:00:00
 Router at IP 192.168.x.01 - bat0 device
  	02:00:04:BB:06:53
 Router at IP 192.168.x.04 - Bridge device with vlan 653
    02:00:02:CC:00:05
 Router at IP 192.168.x.02 - wifi interface on radio at 5Ghz

Of course, this is optionally you could also assign random mac-addresses using this kind of logic makes the life of troubleshooting much easier (e.g. if you see an entry of a mac address in your log, you know immediately where it originates from).

*private mac-addresses belong to one of the following blocks:

  • x2:xx:xx:xx:xx:xx
  • x6:xx:xx:xx:xx:xx
  • xA:xx:xx:xx:xx:xx
  • xE:xx:xx:xx:xx:xx

Edit: corrected the information of mac-addresses - thanks @_bernd for this addition

Just an addition:

  • "Private" aka local administered addresses are
    x2‑xx‑xx‑xx‑xx‑xx
    x6‑xx‑xx‑xx‑xx‑xx
    xA‑xx‑xx‑xx‑xx‑xx
    xE‑xx‑xx‑xx‑xx‑xx
  • To ease your life have a look at /etc/bat-hosts