Batman (in production with post 19.07 snapshot) not working under 21.02

Again, thanks for the advice, but I'm still confused.

You say: Is the swconfig (config switch_vlan) stuff still in there?
Where is "there"?
Why do you ask?

You say: Attaching a wifi AP to br-lan is the same as it always was, just put option network lan` in the /etc/config/wireless.
I never did that before, and specifying that eth0.1 was bridged to default_radio1.1 was apparently necessary before the thing would work. It was definitely not defaulted, perhaps because of something else I was doing, but I have no idea what that might be. Where in the /etc/config/wireless file should "option network lan" appear? In the "radio1" stanza or in the "default_radio1" stanza or in some other stanza that I don't yet know about?

You say: ... you'd need to make a config device (type bridge) called br-wan containing eth0,.2 and bat0.
"Containing"? Exactly how "containing"? What exactly should this stanza actually look like? I did try a number of things but nothing I tried worked, so I really haven't a clue. I ask again, perhaps more explicitly now: How exactly does one specify a multiple-device bridge under the new drill? Verbatim, character by character, please?

In /etc/config/network, like it is in 19.07.

This may be what you're asking about. From rc3, /etc/config/network, in the default configuration of an Archer A7v5:

config switch
	option name 'switch0'
	option reset '1'
	option enable_vlan '1'

config switch_vlan
	option device 'switch0'
	option vlan '1'
	option ports '2 3 4 5 0t'

config switch_vlan
	option device 'switch0'
	option vlan '2'
	option ports '1 0t'


blush
I withdraw my previous remark about 'default_radio1.1' having to be in the bridge. I took it out and it still worked in my lucky snapshot. Moreover, I was also wrong about "option network 'lan'", which it turns out was explicit in /etc/config/wireless. I guess I haven't understood its significance until now; thanks for the education!

hey, @SteveNewcomb ! is it working in 21 now?

@frollic and @mk24 , is there a documentation of the syntax changes in the /etc/config/ files between the 19 and 21 releases?

So far, I've only seen this Mini tutorial for DSA network config

1 Like

But I still don't know how to specify the bridges. Let me be a little clearer about what I need.
In each mesh network, one of the nodes is the gateway. On that node, bat0 is bridged to the LAN ethernet, and so is the 2.4 GHz radio (radio1) via "option network 'lan'". On the remaining (non-gateway) nodes, bat0 is bridged to the WAN. As always, the 2.4 GHz radio is bridged to the LAN.

Looking at it from the perspective of DHCP service:
Each node's LAN is a unique network to which it provides DHCP service.
Now, the gateway node of each mesh in effect provides DHCP service not only to its respective wireless LAN, but also to its respective mesh network. However, within a given mesh, each node's IP is static, so DHCP service has very little impact on the meshes themselves. (I tried making the meshes use DHCP. It worked, but it wasn't robust enough to be practical. Static IPs worked much better.)

Patience! I'm old, obsessively methodical, and slow. You'll be the first to know, though.

my apologies! (lol) I'm now tracking this thread to follow your progress.

The DSA mini-tutorial is very interesting, but according to this support matrix, DSA is not yet relevant to TP-Link routers. So it still isn't at all clear to me what the syntax should be for the bridges I need.

The DSA mini-tutorial contains this example. It looks promising because it's a syntax I would never have guessed and therefore didn't try.

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'lan1'
	list ports 'lan2'
	list ports 'lan3'
	list ports 'lan4'

config bridge-vlan
	option device 'br-lan'
	option vlan '1'
	list ports 'lan1'
	list ports 'lan2'

config bridge-vlan
	option device 'br-lan'
	option vlan '2'
	list ports 'lan3'
	list ports 'lan4'

However, I still don't know how I might bridge bridge-vlan.1 and bridge-vlan.2 (which I admit doesn't make any sense in terms of the example shown in the mini-tutorial). That piece of the puzzle is still missing.

Hooray, the router no longer "goes completely silent" now that I have removed the otiose "default_radio1" from the "device" option.

However, the bridging of bat0 and eth0.1 does not work under the old syntax (regardless of whether I use "ifname" or "device" as the parameter name), because bridging is different now and I have yet to see a working example (or indeed any example) of the syntax of such a bridging specification. It is not obvious how to do it; none of my guesses have worked.

OK it's not DSA, so as far as the Ethernet ports go, eth0.1 is the four LAN ports and eth0.2 is the WAN port.

DSA is a way to abstract a hardware switch so that in the configuration, it looks like independent CPU ports in a software bridge. Internally though, the DSA executive will utilize switching hardware where it is available.

But again DSA has not come to the C7 yet, so don't worry about it.

Switching a batman mesh output to a regular AP is a software bridge. These are configured by creating an overlaying virtual "device" of type bridge, and a name that starts with br-, and attaching the enslaved ports to it with list ports, one line for each port.

config device
    option name 'br-lan'
    option type 'bridge'
    list ports 'eth0.1'
    list ports 'bat0'

The results of this can be examined with brctl show, which will list the bridges and their members, including wifi interfaces that were brought in in /etc/config/wireless.

1 Like

OK. Many thanks; brctl now shows appropriate stuff. I have reported the working /etc/config/network, /etc/config/wireless, and brctl output in this discussion.

But the problems are not quite solved yet, because Batman still doesn't work. The only relevant complaint I have found in the log suggests that perhaps wpa_supplicant is incompatible with what used to work in /etc/config/wireless. Here's what I'm seeing in the log:

Thu Jun 24 20:58:15 2021 daemon.err wpa_supplicant[1480]: Line 8: too large mode (value=5 max_value=4)
Thu Jun 24 20:58:15 2021 daemon.err wpa_supplicant[1480]: Line 8: failed to parse mode '5'.
Thu Jun 24 20:58:15 2021 daemon.err wpa_supplicant[1480]: Line 16: failed to parse network block.
Thu Jun 24 20:58:15 2021 daemon.err wpa_supplicant[1480]: Failed to read or parse configuration '/var/run/wpa_supplicant-wlan0.conf'.
Thu Jun 24 20:58:15 2021 daemon.notice netifd: radio0 (5073): Interface 0 setup failed: WPA_SUPPLICANT_FAILED
Thu Jun 24 20:58:15 2021 daemon.notice netifd: radio0 (5073): WARNING (wireless_add_process): executable path /usr/sbin/wpad does not match process  path (/proc/exe)

So:

root@rpc149:~# cat /var/run/wpa_supplicant-wlan0.conf
network={
        ssid="meshD"
        key_mgmt=SAE
        mode=5
        fixed_freq=1
        frequency=5180
        ht40=1
        vht=1
        max_oper_chwidth=1
        sae_password="XXXX"
        beacon_int=100
}

On another node where my lucky snapshot is running in production, here's what the log says about wpa:

root@rpc150:~# logread | fgrep -i wpa
Thu Jun 24 09:04:05 2021 daemon.notice wpa_supplicant[1367]: Successfully initialized wpa_supplicant
Fri Jun 25 04:35:03 2021 daemon.notice wpa_supplicant[1367]: wlan0: leaving mesh
Fri Jun 25 04:35:04 2021 daemon.err wpa_supplicant[1367]: wlan0: mesh leave error=-134
Fri Jun 25 04:35:06 2021 daemon.notice wpa_supplicant[1367]: wlan0: interface state UNINITIALIZED->ENABLED
Fri Jun 25 04:35:06 2021 daemon.notice wpa_supplicant[1367]: wlan0: AP-ENABLED
Fri Jun 25 04:35:06 2021 daemon.notice wpa_supplicant[1367]: wlan0: joining mesh meshD
Fri Jun 25 04:35:06 2021 daemon.notice wpa_supplicant[1367]: wlan0: CTRL-EVENT-CONNECTED - Connection to 00:00:00:00:00:00 completed [id=0 id_str=]
Fri Jun 25 04:35:06 2021 daemon.notice wpa_supplicant[1367]: wlan0: MESH-GROUP-STARTED ssid="meshD" id=0

On that same (working) node:

root@rpc150:~# cat /var/run/wpa_supplicant-wlan0.conf
country=US
network={
        ssid="meshD"
        key_mgmt=SAE
        mode=5
        mesh_fwding=0
        fixed_freq=1
        frequency=5745
        ht40=1
        vht=1
        max_oper_chwidth=1
        sae_password="XXXX"
        beacon_int=100
}

At this moment, I'm guessing that wpa_supplicant isn't supporting mesh mode, so I'm wondering whether I need to make a change to wpad. I'm looking into it now.

Replace package wpad-wolfssl with wpad-mesh-wolfssl.

So. I removed wpad-basic-wolfssl and installed wpa-supplicant-mesh-openssl. It helped:

root@rpc149:~# logread | grep -i wpa
Fri Jun 25 10:12:59 2021 daemon.notice wpa_supplicant[1477]: Successfully initialized wpa_supplicant
Fri Jun 25 10:13:53 2021 daemon.notice wpa_supplicant[1477]: wlan0: leaving mesh
Fri Jun 25 10:13:54 2021 daemon.err wpa_supplicant[1477]: wlan0: mesh leave error=-134
Fri Jun 25 10:13:56 2021 daemon.notice wpa_supplicant[1477]: wlan0: interface state UNINITIALIZED->ENABLED
Fri Jun 25 10:13:56 2021 daemon.notice wpa_supplicant[1477]: wlan0: AP-ENABLED
Fri Jun 25 10:13:56 2021 daemon.notice wpa_supplicant[1477]: wlan0: joining mesh meshD
Fri Jun 25 10:13:56 2021 daemon.notice wpa_supplicant[1477]: wlan0: CTRL-EVENT-CONNECTED - Connection to 00:00:00:00:00:00 completed [id=0 id_str=]
Fri Jun 25 10:13:56 2021 daemon.notice wpa_supplicant[1477]: wlan0: MESH-GROUP-STARTED ssid="meshD" id=0

...but Batman still wasn't working. "batctl o" still produced no output at all. And what's this about "wlan0", an identifier which appears nowhere in my configuration??

Experimentally, I changed "config wifi-iface 'mesh0'" to "config wifi-iface 'wlan0'". No help. So I put it back the way it was. Then I changed "option network 'nwi_mesh0' in that same stanza to "option network 'wlan0'". Accordingly, in /etc/config/network, I also changed "config interface 'nwi_mesh0'" to "config interface 'wlan0'". Now, at last:

root@rpc149:~# batctl o
[B.A.T.M.A.N. adv 2021.1-openwrt-2, MainIF/MAC: wlan0/36:9b:9b:5e:27:95 (bat0/8a:0e:dd:c4:2c:9f BATMAN_IV)]
   Originator        last-seen (#/255) Nexthop           [outgoingIF]

So batctl now thinks there's something there, but I'm still not joining the mesh. Why? I see nothing untoward in the log.

Following your advice, mk24, I replaced wpa-supplicant-mesh-openssl with wpad-mesh-wolfssl. No change; still not joining the mesh. But at least "batctl o" has some output.

Is it not weird that I have a wlan0 whether I want it or not?

There's no need to force names onto radio interfaces, just let it propagate the other way with option network.

The wpad series of packages include both hostapd and wpa-supplicant capability, so you don't need to install wpa-supplicant separately.

Run iwinfo wlan0 assoclist to see if there is a radio link to at least one other node. If there is, you can conclude that the radio and wpad are configured properly, and it should be sending packets to the BATMAN layer.

But BATMAN used to be real critical about only linking with other nodes running the same version of BATMAN-- any node running a different version was invisible to it. Since BATMAN is part of the kernel, getting all your nodes onto the same version means running similar kernels. I think there are some ways to override that now.

Aha! I will take appropriate measures and see what happens then.

It outputs nothing. However, radio0's LED is lit and it blinks every now and then. On the working batman nodes, radio0's LED blinks pretty constantly. And, provided there is at least one other node that is online and a member of the same mesh, iwinfo wlan0 assoclist has output that looks reasonable.

Even if only to eliminate the possibility of malfunctioning hardware, I need to set up at least one more 21.02 node. You can expect a report!

In your files you're using channel 100, a DFS channel. Mesh operation on DFS is an uncertain thing, since the mesh expects to be on a defined channel and if radar is detected there's no defined way to move the whole mesh to another channel. So I'd try a non DFS channel.