Wifi mesh using batman-adv advice needed (seems that I have loop)

Hi everyone and hope you have a happy new year.

I recently bought 3 "Zyxel t-56" routers to add them as wireless mesh to my network and retire my ASUS mesh, so the migration to fully OpenWRT based network is done

My goals are as follows:

  • Obviously I want a mesh network
  • I want to be able to plug uplink cable to any port of any of them and magically have internet on the other 2
  • Mesh should run over both 5g and 2.5g
  • In future I want to be able to add isolated wireless network to my setup

To achieve this I read different guides(1, 2, 3, 4) and tried to implement the network like following diagram. At this stage I just have one node working as bridge between wired and wireless but it is easy to replicate on others:

  • On batman side I can L2 ping every other device using the mac address.
  • For now 802.11s is only done over 5g
  • On the 2 wireless only devices I have internet
  • On the bridge device I can't L3 ping 2 other devices obviously because the bat0 is not connected to the br-lan
  • When I add bat0 to br-lan it seems that it creates loop between cable connection and batman connection and this causes some outages, and it seems no one can see each other. But when it is not there it seems working
  • Loop prevention is activated on all batman devices

So I have some fundamental questions:

  1. Do I need batman tunnel between my gateway and bridge devices? It seems/feels redundant to me as we already have lan cable
  2. Why loop prevention is not working as it should?

Any suggestion and advice would be helpful.
Thanks.

Please post all relevant config files as pre-formatted text... otherwise noone will be able to assist you.

Thanks for your answer
These are my configs
on x86-64 router

/etc/confi/network

config interface 'loopback'
        option device 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fdd3:2fb3:2c59::/48'

config device
        option name 'br-lan'
        option type 'bridge'
        list ports 'bat0'
        list ports 'eth0'

config interface 'lan'
        option device 'br-lan'
        option proto 'static'
        option ipaddr '192.168.1.254'
        option netmask '255.255.255.0'
        option ip6assign '60'

config interface 'wan'
        option device 'eth1.300'
        option proto 'dhcp'
        option peerdns '0'

config interface '4g_dongle'
        option proto 'dhcp'
        option device 'eth2'
        option defaultroute '0'
        option peerdns '0'

config interface 'tailscale'
        option proto 'none'
        option device 'tailscale0'
        option defaultroute '0'

config device
        option name 'eth1.300'
        option type '8021q'
        option ifname 'eth1'
        option vid '300'
        list ingress_qos_mapping '0:0'
        list egress_qos_mapping '0:0'

config interface 'bat0'
        option proto 'batadv'
        option routing_algo 'BATMAN_IV'
        option bridge_loop_avoidance '1'
        option gw_mode 'server'
        option hop_penalty '30'
        option defaultroute '0'

config interface 'batwire'
        option proto 'batadv_hardif'
        option master 'bat0'
        option device 'br-m'

config device
        option type 'bridge'
        option name 'br-m'
        list ports 'eth0.40'

on zyxel T-56 The configs are the same as I backed up from first one and restored on others

/etc/confi/network

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'fdfe:1b37:1c21::/48'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'bat0'
	list ports 'eth1'
	list ports 'lan1'
	list ports 'lan2'
	list ports 'lan3'
	list ports 'lan4'

config interface 'lan'
	option device 'br-lan'
	option proto 'dhcp'

config interface 'bat0'
	option proto 'batadv'
	option routing_algo 'BATMAN_IV'
	option bridge_loop_avoidance '1'
	option gw_mode 'server'
	option hop_penalty '30'
	option defaultroute '0'

config interface 'batmesh'
	option proto 'batadv_hardif'
	option master 'bat0'

config interface 'batwire'
	option proto 'batadv_hardif'
	option device 'br-m'
	option master 'bat0'
	option defaultroute '0'

config device
	option type 'bridge'
	option name 'br-m'
	list ports 'lan1.40'
	list ports 'lan2.40'
	list ports 'lan3.40'
	list ports 'lan4.40'
	list ports 'eth0.40'

/etc/config/wireless


config wifi-device 'radio0'
	option type 'mac80211'
	option path 'platform/soc/18000000.wifi'
	option channel 'auto'
	option band '2g'
	option htmode 'HE20'
	option cell_density '0'

config wifi-device 'radio1'
	option type 'mac80211'
	option path 'platform/soc/18000000.wifi+1'
	option channel '12'
	option band '5g'
	option htmode 'HE80'
	option cell_density '0'

config wifi-iface 'wifinet0'
	option device 'radio1'
	option mode 'mesh'
	option encryption 'sae'
	option mesh_id 'abyz-mesh'
	option mesh_fwding '0'
	option mesh_rssi_threshold '0'
	option key 'kei6geiLaikaiH4oWaen7UaM'
	option network 'batmesh'

config wifi-iface 'wifinet2'
	option device 'radio0'
	option mode 'ap'
	option ssid 'OpenWrt'
	option encryption 'sae-mixed'
	option key 'SecurePass'
	option network 'lan'
	option ieee80211r '1'
	option mobility_domain 'c1c2'
	option ft_over_ds '0'

config wifi-iface 'wifinet3'
	option device 'radio1'
	option mode 'ap'
	option ssid 'LEDE_5G'
	option encryption 'sae-mixed'
	option network 'lan'
	option key 'SecurePass'

At a first glimse I spot:

  1. ULA
    Remove the ULA config from all devices which are not a router. Only the router needs to know.

  2. your bat0 config.
    If I understood you correctly: you have 1 router, and serveral ap or switches behind that? But you have copied the same config on every device?
    Everything which is not connected to the internet directly, gets option gw_mode 'server' removed. I have no clue what does option defaultroute '0' is doing...

  3. cont.

I'm not sure if it is correct, to use device here...

On my config I just have the interface:

config interface            'bat0_hardif_mesh0'
    option  proto           'batadv_hardif'
    option  master          'bat0'
    option  mtu             '2304'

config interface            'bat0_hardif_mesh1'
    option  proto           'batadv_hardif'
    option  master          'bat0'
    option  mtu             '2304'

And later I list the bat0.N instance on each VLAN. Which brings me to the next point:

I would expect here either eth or lan bot not both. On x86 I only got the know ethN "ports"/phy-interfaces... However, besides that, you need to add your bat0 interface. With the tag, too: bat0.40

Maybe someone else will give you another pair of eyes. But yeah you can probally start with my point outs...

Thanks for you suggestions: I will try them and let you know

  1. I will remove it
  2. About the second part

yes one x86 box and 3 T-56 devices

All t-56 devices have the same config

It was a mistake on my side will remove from the clients and test. for option defaultroute '0' I think I was testing should be removed

As far as I learned, we need to connect the batman interface to an actual device, so it can do its magic on layer2. In case of mesh part we connect it to mesh in wireless config file and for wired we need to do it explicitly. Can you share you wireless config file. I think you just have two wireless mesh and these two interfaces are connected to them.

For the VLAN part, if I didn't put batman interface to the "management" VLAN it didn't connect and I think it was because bat0 is also bridged with br-lan.

This part is related to the APs. The AP has eth0 which is wan port and eth1 which is a switch and is consist of lan1 to lan4. So adding just the eth1 was not enough.

config wifi-iface 'mesh0'
    option  disabled    '0'
    option  device      'radio0'
    option  ifname      'mesh0'
    option  macaddr     '02:00:01:02:00:01'
    option  network     'bat0_hardif_mesh0'
    option  mode        'mesh'
    option  mesh_fwding '0'
    option  mesh_id     'fde6:a09a:b373::/48'
    option  encryption  'psk2+ccmp'
    option  key         'XXX'

config wifi-iface 'mesh1'
    option  disabled    '0'
    option  device      'radio1'
    option  ifname      'mesh1'
    option  macaddr     '02:00:01:03:00:01'
    option  network     'bat0_hardif_mesh1'
    option  mode        'mesh'
    option  mesh_fwding '0'
    option  mesh_id     'fde6:a09a:b373::/48'
    option  encryption  'psk2+ccmp'
    option  key         'XXX'