I am posting this for anyone who wants to set up a mesh using either the kmod-batman-adv or the luci-proto-batman-adv software package.
OneMarcFifty (https://www.youtube.com/watch?v=t4A0kfg2olo)
and https://cgomesu.com/blog/Mesh-networking-openwrt-batman/
explain how to configure a B.A.T.M.A.N. advanced mesh for OpenWrt version 22 but I initially found it difficult to use these guides for the current version 23.05. Access points would disappear for random periods and they would eventually become impossible to access by IP address and could not be managed using Luci or ssh.
After a lot of trial and error, I now have a solid working mesh on 23.05.3 but there are a few important differences from the setup under 22.03:
do not disable dnsmasq on any router but, on all routers except the DNS server, tick the DHCP Server -> Ignore interface option for each interface in Network->Interfaces (e.g. lan, Guest, IOT),
set "Use default gateway" on the lan interface on 'dumb access points' (but not on other network interfaces such as Guest and IOT),
(points 1 and 2 make sure that dumb access points will always be available for Luci and ssh access and that they will also be able to access the internet so that NTP runs when they are rebooted)
use BATMAN_V for the bat0 interface as it works fine and seems to be faster and easier to inspect with batctl than BATMAN_IV (this is set on the Mesh Routing tab, Routing Algorithm),
don't try changing the MTU (any attempt to change it has always ended in a system reset for me and fragmentation doesn't seem to be an issue when I run "batctl s" to view statistics),
the bat0 Gateway Mode (under the Mesh Routing tab in Luci) should be left unset for all routers including the internet gateway unless your network contains more than one internet gateway.
I use three Asus RT-AX53U and one Asus RT-AX54 in my mesh and with these changes to the installation, the mesh and its VLANs are 100% reliable for months without any reboots. And, for these router models, OpenWrt can easily be installed using mtd write (please see the OpenWrt documentation for these models).
I would like to clarify some of the points you have stated:
do not disable dnsmasq on any router but,
this goes against general convention for configuring dumb APs. It is not a required service in dumb AP mode and it complicates troubleshooting which is why most disable the service for all dumb APs in the network. The same goes for DHCP and firewall.
the bat0 Gateway Mode
batman needs to calculate the shortest path to get to the gateway regardless of the number of actual gateways you have so in practice it needs to be set. Your setup may not utilize this parameter so doesn't require it to be set.
don't try changing the MTU
the recommended MTU setting is 2403 but I think this is just a general guideline and you can look at modifying this setting. But in your case if you are experiencing resets I doubt it has anything to do with the MTU and it's somewhere else in your setup that is causing you to experience resets.
network for the router that connects to the internet (I chose to use Google's DNS servers 8.8.8.8 and 8.8.4.4, but I don't know if that's a sensible choice!)
config interface 'loopback'
option device 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'
config globals 'globals'
option ula_prefix 'fdd2:3e14:24b8::/48'
option packet_steering '1'
config device
option name 'br-lan'
option type 'bridge'
list ports 'lan1'
list ports 'lan2'
list ports 'lan3'
list ports 'lan4'
list ports 'bat0.99'
config interface 'lan'
option device 'br-lan'
option proto 'static'
option ipaddr '192.168.1.4'
option netmask '255.255.255.0'
option ip6assign '60'
list dns '8.8.8.8'
list dns '8.8.4.4'
option defaultroute '0'
config interface 'wan'
option device 'wan'
option proto 'dhcp'
config interface 'wan6'
option device 'wan'
option proto 'dhcpv6'
config interface 'bat0'
option proto 'batadv'
option routing_algo 'BATMAN_V'
option bridge_loop_avoidance '1'
option gw_mode 'off'
option hop_penalty '30'
config interface 'batmesh'
option proto 'batadv_hardif'
option master 'bat0'
config interface 'iot'
option proto 'static'
option ipaddr '10.21.1.1'
option netmask '255.255.255.0'
option device 'br-iot'
list dns '8.8.8.8'
list dns '8.8.4.4'
option defaultroute '0'
config interface 'guest'
option proto 'static'
option ipaddr '10.20.30.40'
option netmask '255.255.255.0'
option device 'br-guest'
list dns '8.8.8.8'
list dns '8.8.4.4'
option defaultroute '0'
config device
option type 'bridge'
option name 'br-guest'
list ports 'bat0.4'
config device
option type 'bridge'
option name 'br-iot'
list ports 'bat0.3'
network for other access points
config interface 'loopback'
option device 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'
config globals 'globals'
option ula_prefix 'fda3:7ea0:9cf0::/48'
option packet_steering '1'
config device
option name 'br-lan'
option type 'bridge'
list ports 'lan1'
list ports 'lan2'
list ports 'lan3'
list ports 'wan'
list ports 'bat0.99'
config interface 'lan'
option device 'br-lan'
option proto 'dhcp'
config interface 'bat0'
option proto 'batadv'
option routing_algo 'BATMAN_V'
option bridge_loop_avoidance '1'
option gw_mode 'off'
option hop_penalty '30'
config interface 'batMesh'
option proto 'batadv_hardif'
option master 'bat0'
config device
option type 'bridge'
option name 'br-guest'
list ports 'bat0.4'
config interface 'guest'
option proto 'dhcp'
option device 'br-guest'
option defaultroute '0'
config device
option type 'bridge'
option name 'br-iot'
list ports 'bat0.3'
config interface 'iot'
option proto 'dhcp'
option device 'br-iot'
option defaultroute '0'
wireless (I use 2.4GHz for the mesh and 5GHz for access points - make sure every device uses the same passwords and mobility domains but the 5GHz access points should be the same channel for each SSID on a router but different channels on different routers)
I agree that you should be able to disable dnsmasq and you could use a mesh with version 22.03 with it disabled but, in practice, you cannot for 23.05. If you have a working mesh using 23.05 with dnsmasq disabled, please let us know how!
For my case, I only have one gateway so this setting is irrelevant.
Regarding the MTU and fragmentation, perhaps it's because I use 'Hardware flow offloading' or the hardware I use cannot support larger MTU values but, when I check the mesh network statistics with batctl s, the amount of fragmentation isn't very large so I haven't investigated this fully.
I have been running b.a.t ... with 23.05 as long as it has been avalatble. In my dump access points i run this in local startutp.
for i in firewall dnsmasq odhcpd; do
if /etc/init.d/"$i" enabled; then
/etc/init.d/"$i" disable
/etc/init.d/"$i" stop
fi
done
I also have always modified the MTU (1536) in interfaces where the batman is active. No problems at all. Our configurations are almost identical (same source).
I have a WDS network setup now with batman-adv running on dumb APs without any dnsmasq installed on the remote APs. I am only using the tunneling feature of batman-adv and not any of its meshing features and all config settings are at default.