Batman (in production with post 19.07 snapshot) not working under 21.02

But I still don't know how to specify the bridges. Let me be a little clearer about what I need.
In each mesh network, one of the nodes is the gateway. On that node, bat0 is bridged to the LAN ethernet, and so is the 2.4 GHz radio (radio1) via "option network 'lan'". On the remaining (non-gateway) nodes, bat0 is bridged to the WAN. As always, the 2.4 GHz radio is bridged to the LAN.

Looking at it from the perspective of DHCP service:
Each node's LAN is a unique network to which it provides DHCP service.
Now, the gateway node of each mesh in effect provides DHCP service not only to its respective wireless LAN, but also to its respective mesh network. However, within a given mesh, each node's IP is static, so DHCP service has very little impact on the meshes themselves. (I tried making the meshes use DHCP. It worked, but it wasn't robust enough to be practical. Static IPs worked much better.)

Patience! I'm old, obsessively methodical, and slow. You'll be the first to know, though.

my apologies! (lol) I'm now tracking this thread to follow your progress.

The DSA mini-tutorial is very interesting, but according to this support matrix, DSA is not yet relevant to TP-Link routers. So it still isn't at all clear to me what the syntax should be for the bridges I need.

The DSA mini-tutorial contains this example. It looks promising because it's a syntax I would never have guessed and therefore didn't try.

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'lan1'
	list ports 'lan2'
	list ports 'lan3'
	list ports 'lan4'

config bridge-vlan
	option device 'br-lan'
	option vlan '1'
	list ports 'lan1'
	list ports 'lan2'

config bridge-vlan
	option device 'br-lan'
	option vlan '2'
	list ports 'lan3'
	list ports 'lan4'

However, I still don't know how I might bridge bridge-vlan.1 and bridge-vlan.2 (which I admit doesn't make any sense in terms of the example shown in the mini-tutorial). That piece of the puzzle is still missing.

Hooray, the router no longer "goes completely silent" now that I have removed the otiose "default_radio1" from the "device" option.

However, the bridging of bat0 and eth0.1 does not work under the old syntax (regardless of whether I use "ifname" or "device" as the parameter name), because bridging is different now and I have yet to see a working example (or indeed any example) of the syntax of such a bridging specification. It is not obvious how to do it; none of my guesses have worked.

OK it's not DSA, so as far as the Ethernet ports go, eth0.1 is the four LAN ports and eth0.2 is the WAN port.

DSA is a way to abstract a hardware switch so that in the configuration, it looks like independent CPU ports in a software bridge. Internally though, the DSA executive will utilize switching hardware where it is available.

But again DSA has not come to the C7 yet, so don't worry about it.

Switching a batman mesh output to a regular AP is a software bridge. These are configured by creating an overlaying virtual "device" of type bridge, and a name that starts with br-, and attaching the enslaved ports to it with list ports, one line for each port.

config device
    option name 'br-lan'
    option type 'bridge'
    list ports 'eth0.1'
    list ports 'bat0'

The results of this can be examined with brctl show, which will list the bridges and their members, including wifi interfaces that were brought in in /etc/config/wireless.

1 Like

OK. Many thanks; brctl now shows appropriate stuff. I have reported the working /etc/config/network, /etc/config/wireless, and brctl output in this discussion.

But the problems are not quite solved yet, because Batman still doesn't work. The only relevant complaint I have found in the log suggests that perhaps wpa_supplicant is incompatible with what used to work in /etc/config/wireless. Here's what I'm seeing in the log:

Thu Jun 24 20:58:15 2021 daemon.err wpa_supplicant[1480]: Line 8: too large mode (value=5 max_value=4)
Thu Jun 24 20:58:15 2021 daemon.err wpa_supplicant[1480]: Line 8: failed to parse mode '5'.
Thu Jun 24 20:58:15 2021 daemon.err wpa_supplicant[1480]: Line 16: failed to parse network block.
Thu Jun 24 20:58:15 2021 daemon.err wpa_supplicant[1480]: Failed to read or parse configuration '/var/run/wpa_supplicant-wlan0.conf'.
Thu Jun 24 20:58:15 2021 daemon.notice netifd: radio0 (5073): Interface 0 setup failed: WPA_SUPPLICANT_FAILED
Thu Jun 24 20:58:15 2021 daemon.notice netifd: radio0 (5073): WARNING (wireless_add_process): executable path /usr/sbin/wpad does not match process  path (/proc/exe)

So:

root@rpc149:~# cat /var/run/wpa_supplicant-wlan0.conf
network={
        ssid="meshD"
        key_mgmt=SAE
        mode=5
        fixed_freq=1
        frequency=5180
        ht40=1
        vht=1
        max_oper_chwidth=1
        sae_password="XXXX"
        beacon_int=100
}

On another node where my lucky snapshot is running in production, here's what the log says about wpa:

root@rpc150:~# logread | fgrep -i wpa
Thu Jun 24 09:04:05 2021 daemon.notice wpa_supplicant[1367]: Successfully initialized wpa_supplicant
Fri Jun 25 04:35:03 2021 daemon.notice wpa_supplicant[1367]: wlan0: leaving mesh
Fri Jun 25 04:35:04 2021 daemon.err wpa_supplicant[1367]: wlan0: mesh leave error=-134
Fri Jun 25 04:35:06 2021 daemon.notice wpa_supplicant[1367]: wlan0: interface state UNINITIALIZED->ENABLED
Fri Jun 25 04:35:06 2021 daemon.notice wpa_supplicant[1367]: wlan0: AP-ENABLED
Fri Jun 25 04:35:06 2021 daemon.notice wpa_supplicant[1367]: wlan0: joining mesh meshD
Fri Jun 25 04:35:06 2021 daemon.notice wpa_supplicant[1367]: wlan0: CTRL-EVENT-CONNECTED - Connection to 00:00:00:00:00:00 completed [id=0 id_str=]
Fri Jun 25 04:35:06 2021 daemon.notice wpa_supplicant[1367]: wlan0: MESH-GROUP-STARTED ssid="meshD" id=0

On that same (working) node:

root@rpc150:~# cat /var/run/wpa_supplicant-wlan0.conf
country=US
network={
        ssid="meshD"
        key_mgmt=SAE
        mode=5
        mesh_fwding=0
        fixed_freq=1
        frequency=5745
        ht40=1
        vht=1
        max_oper_chwidth=1
        sae_password="XXXX"
        beacon_int=100
}

At this moment, I'm guessing that wpa_supplicant isn't supporting mesh mode, so I'm wondering whether I need to make a change to wpad. I'm looking into it now.

Replace package wpad-wolfssl with wpad-mesh-wolfssl.

So. I removed wpad-basic-wolfssl and installed wpa-supplicant-mesh-openssl. It helped:

root@rpc149:~# logread | grep -i wpa
Fri Jun 25 10:12:59 2021 daemon.notice wpa_supplicant[1477]: Successfully initialized wpa_supplicant
Fri Jun 25 10:13:53 2021 daemon.notice wpa_supplicant[1477]: wlan0: leaving mesh
Fri Jun 25 10:13:54 2021 daemon.err wpa_supplicant[1477]: wlan0: mesh leave error=-134
Fri Jun 25 10:13:56 2021 daemon.notice wpa_supplicant[1477]: wlan0: interface state UNINITIALIZED->ENABLED
Fri Jun 25 10:13:56 2021 daemon.notice wpa_supplicant[1477]: wlan0: AP-ENABLED
Fri Jun 25 10:13:56 2021 daemon.notice wpa_supplicant[1477]: wlan0: joining mesh meshD
Fri Jun 25 10:13:56 2021 daemon.notice wpa_supplicant[1477]: wlan0: CTRL-EVENT-CONNECTED - Connection to 00:00:00:00:00:00 completed [id=0 id_str=]
Fri Jun 25 10:13:56 2021 daemon.notice wpa_supplicant[1477]: wlan0: MESH-GROUP-STARTED ssid="meshD" id=0

...but Batman still wasn't working. "batctl o" still produced no output at all. And what's this about "wlan0", an identifier which appears nowhere in my configuration??

Experimentally, I changed "config wifi-iface 'mesh0'" to "config wifi-iface 'wlan0'". No help. So I put it back the way it was. Then I changed "option network 'nwi_mesh0' in that same stanza to "option network 'wlan0'". Accordingly, in /etc/config/network, I also changed "config interface 'nwi_mesh0'" to "config interface 'wlan0'". Now, at last:

root@rpc149:~# batctl o
[B.A.T.M.A.N. adv 2021.1-openwrt-2, MainIF/MAC: wlan0/36:9b:9b:5e:27:95 (bat0/8a:0e:dd:c4:2c:9f BATMAN_IV)]
   Originator        last-seen (#/255) Nexthop           [outgoingIF]

So batctl now thinks there's something there, but I'm still not joining the mesh. Why? I see nothing untoward in the log.

Following your advice, mk24, I replaced wpa-supplicant-mesh-openssl with wpad-mesh-wolfssl. No change; still not joining the mesh. But at least "batctl o" has some output.

Is it not weird that I have a wlan0 whether I want it or not?

There's no need to force names onto radio interfaces, just let it propagate the other way with option network.

The wpad series of packages include both hostapd and wpa-supplicant capability, so you don't need to install wpa-supplicant separately.

Run iwinfo wlan0 assoclist to see if there is a radio link to at least one other node. If there is, you can conclude that the radio and wpad are configured properly, and it should be sending packets to the BATMAN layer.

But BATMAN used to be real critical about only linking with other nodes running the same version of BATMAN-- any node running a different version was invisible to it. Since BATMAN is part of the kernel, getting all your nodes onto the same version means running similar kernels. I think there are some ways to override that now.

Aha! I will take appropriate measures and see what happens then.

It outputs nothing. However, radio0's LED is lit and it blinks every now and then. On the working batman nodes, radio0's LED blinks pretty constantly. And, provided there is at least one other node that is online and a member of the same mesh, iwinfo wlan0 assoclist has output that looks reasonable.

Even if only to eliminate the possibility of malfunctioning hardware, I need to set up at least one more 21.02 node. You can expect a report!

In your files you're using channel 100, a DFS channel. Mesh operation on DFS is an uncertain thing, since the mesh expects to be on a defined channel and if radar is detected there's no defined way to move the whole mesh to another channel. So I'd try a non DFS channel.

Good call, but yeah, I know about that. In fact I have written a cron job that detects when that has happened and reboots in order to revert to the correct channel. (I haven't successfully drunk the Freifunk kool-aid, but they offer a similar watchdog.) Radar interruptions happen to at least some nodes a couple of times per month, usually in the wee small hours. We're in a near-urban environment. Interestingly, local police radars interrupt channels 100 and 116, but not 132.

But for these tests I'm using channel 149 -- it's not a DFS channel.

I have another A7 coming in four days. Evidently my test bed needs to have four nodes lest one of them is ill-behaved, as I currently suspect, and interferes with the others. With only 3 test nodes, I can't tell which one is the culprit.

Mystery solved. It works.

My error was that I failed to say
option country 'US'
in
config wifi-device 'radio0'

Channel 149 is a regdom issue, a fact that I knew once but had forgotten.
Many thanks for all your help, mk24!!!

If your problem is solved, please consider marking this topic as [Solved]. See How to mark a topic as [Solved] for a short how-to.

For the record, and so I can mark this topic "solved" (but see here for a remaining issue having to do with the specification of mac addresses within a mesh network), here are the /etc/config/network and /etc/config/wireless files that actually worked (modulo the mac address problem).

/etc/config/network
(This one is for the router that plays the "server" role in its mesh, which means that the mesh interface is in the LAN.)

config interface 'loopback'
	option device 'lo'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'
	option proto 'static'

config globals 'globals'
	option ula_prefix 'fdf9:f652:f605::/48'

config interface 'lan'
	option delegate '0'
	option device 'br-lan'
	option ipaddr '192.168.150.1'
	option mtu '1312'
	option netmask '255.255.255.0'
	option proto 'static'
	option stp '0'

config device
	option macaddr '26:9b:9b:5e:27:96'
	option name 'br-lan'
	list ports 'bat0.1'
	list ports 'eth0.1'
	option type 'bridge'

config device
	option macaddr '66:9b:9b:5e:27:96'
	option name 'br-wan'
	list ports 'eth0.2'
	option type 'bridge'

config interface 'wan'
	option delegate '0'
	option device 'br-wan'
	option dns '192.168.4.1'
	option gateway '192.168.4.1'
	option ipaddr '192.168.4.150'
	option mtu '1312'
	option netmask '255.255.255.0'
	option proto 'static'
	option stp '0'

config switch
	option enable_vlan '1'
	option name 'switch0'
	option reset '1'

config switch_vlan
	option device 'switch0'
	option ports '2 3 4 5 0t'
	option vlan '1'

config switch_vlan
	option device 'switch0'
	option ports '1 0t'
	option vlan '2'

config interface 'bat0'
	option aggregated_ogms '1'
	option ap_isolation '0'
	option bonding '0'
	option bridge_loop_avoidance '1'
	option distributed_arp_table '1'
	option fragmentation '1'
	option gw_mode 'server'
	option hop_penalty '30'
	option isolation_mark '0x00000000/0x00000000'
	option log_level '0'
	option multicast_fanout '16'
	option multicast_mode '1'
	option network_coding '0'
	option orig_interval '1000'
	option proto 'batadv'
	option routing_algo 'BATMAN_IV'

config interface 'wlan0'
	option master 'bat0'
	option mtu '1500'
	option proto 'batadv_hardif'

/etc/config/wireless

config wifi-device 'radio0'
	option channel '36'
	option country 'US'
	option disabled '0'
	option htmode 'VHT80'
	option hwmode '11a'
	option path 'pci0000:00/0000:00:00.0'
	#option txpower '23'
	option type 'mac80211'

config wifi-device 'radio1'
	option channel '11'
	option country 'US'
	option disabled '1'
	option htmode 'HT20'
	option hwmode '11g'
	option path 'platform/ahb/18100000.wmac'
	option txpower '24'
	option type 'mac80211'

config wifi-iface 'default_radio1'
	option device 'radio1'
	option encryption 'psk2'
	option key 'XXXX'
	option macaddr '56:9b:9b:5e:27:96'
	option mode 'ap'
	option network 'lan'
	option ssid 'rpc150.rosepark.us'

config wifi-iface 'mesh0'
	option device 'radio0'
	option encryption 'psk2+ccmp'
	option key 'XXXX'
	#option macaddr '36:9b:9b:5e:27:96'
	option mesh_fwding '0'
	option mesh_id 'meshD'
	option mode 'mesh'
	option network 'wlan0'
1 Like

I can no longer edit the previous solution, alas, so I'll just mention here that removing the "option macaddr" from the "config interface wlan0" stanza made a world of difference in both reliability and performance. Moreover, it is now no longer necessary to specify "option mtu '1312'" in the "config interface lan" and "config interface wan" stanzas, and "option mtu '1560'" can be specified in the "config interface 'wlan0'" stanzas (as recommended in a log warning before I did that). None of those things used to work. Now they work. I don't think I ever got reasonable performance from my mesh networks before, but they are performing well now.
So it is not a good idea to attempt to control the MAC address of a mesh interface. I didn't know that. I've updated that discussion here.

5 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.