BATMAN-adv not working as expected - MX4200v2

To date I have always used an ethernet backbone, but of late I have started to play with using batman-adv and a couple of tri-band routers.

I have an existing working network that uses ethernet and VLANS 1, 4, 7 for the backbone, although in reality almost everything is done on VLAN1. I have created a mesh network to get me started, but I have configured the bat0 devices to join br-lan and be untagged as on VLAN 1.

On both my mesh client routers I haven’t bothered to set up a VLAN because everything will be done on VLAN 1 and bat0 on the server router is set to see bat0 as an untagged VLAN 1 device.

So far so good and this works perfectly with the WHW03 v2 which is able to use the mesh perfectly. The client router picks up an IP address through DHCP, as do any clients of this client router and everything works perfectly.

Enter the problem with the MX4200v2 which is configured almost identically to the WHW03v2 except that it obviously has a slightly larger switch. Again I have added a bat0 to br-lan, but now neither the MX4200v2 client nor any clients of the MX4200v2 can get DHCP. I might think it isn’t working properly more generally, but if you allocate a static IP to the clients you can ssh through the mesh network to other devices and ping things etc.

Interestingly if you plug an ethernet cable into any of the LAN ports things work exactly as expected, it is just the mesh network that doesn’t work for DHCP.

It is a puzzle I have been trying to crack for the past 24 hours and I am completely stumped. Any thoughts or suggestions?

FTR: I have no experience (yet) with mesh network and/or batman-adv
But in 6.18-rc1 I found this commit:
87b95082db32 ("batman-adv: remove network coding support")

Which removes support for batman-adv from the upstream kernel.
From its commit message:

    The Network Coding feature, introduced in 2013, is based on the master
    thesis "Inter-Flow Network Coding for Wireless Mesh Networks". It relies on
    the assumption that neighboring mesh nodes can reliably overhear each
    other's transmissions in promiscuous mode, allowing packets to be combined
    to reduce forwarding overhead.
    
    This assumption no longer holds for modern wireless mesh networks, which
    are heterogeneous and make overhearing increasingly unreliable. Factors
    such as multiple spatial streams, varying data rates, beamforming, and
    OFDMA all prevent nodes from consistently overhearing each other. The current
    implementation in batman-adv is not able to detect these conditions and would
    require a more complex layer beyond its neighbor discovery process to do so.
    
    In addition, the feature has been unmaintained for years and is discouraged
    for use.

So trying to use/convert to using batman-adv does not sound very future-proof.

That patch removes one single deprecated, unmaintained feature from batman-adv, not the whole thing.

3 Likes

Oh, you're right. Thanks for the correction!

1 Like

Please share all your related config files, then we can have a look at it.