Batman (in production with post 19.07 snapshot) not working under 21.02

We have set of neighborhood mesh networks in reliable production for well over a year, involving 11 Archer C7 v[245] and A7 v5 routers (topology described here). When the existing network, wireless, and firewall configurations (here) are deployed on OpenWRT 21.02, the router goes completely silent and I can only recover using tftp. I do not know why, but /etc/config/network is definitely implicated in whatever the problem is. The existing production platform is OpenWRT snapshot r14277-ff5dd32164 (2020-08-27). The ready-to-build source for this "lucky snapshot" is here, and the builds are here.

This isn't because of the move from swconfig in 19 to DSA in 21?

DSA usually requires a device reset during upgrade.

There is also the new network config syntax where a bridge is defined in two steps: a config device of type bridge, then attaching that bridge to an interface. Before, the bridge property was added directly to the interface.

Also the hardware path of the radio may have changed, if it appears there are no wifi radios at all, confirm that option path is correct.

Flash the pre-built rc3 release onto one and have a look around the default config files.

Thanks for this advice, but I guess I need some hand-holding. Here are some relevant stanzas from rc3's /etc/config/network:

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'eth0.1'

config interface 'lan'
	option device 'br-lan'
	option proto 'static'
	option ipaddr '192.168.1.1'
	option netmask '255.255.255.0'
	option ip6assign '60'

config device
	option name 'eth0.2'
	option macaddr 'b0:be:76:50:2f:94'

config interface 'wan'
	option device 'eth0.2'
	option proto 'dhcp'

You say "in 2 steps". I see a line "option type bridge" but I don't see 2 devices being bridged. I need the wan interface to bridge eth0.2 and bat0.
I also need the lan interface to bridge eth0.1 and radio1. I suppose there are 4 steps involved in accomplishing those two agenda items. What, exactly, are the 4 steps?

Is the swconfig (config switch_vlan) stuff still in there? Because I haven't loaded 21.02 onto my A7 to see if it is DSA or not.

Attaching a wifi AP to br-lan is the same as it always was, just put option network lan in the /etc/config/wireless. That would also be in the default configuration you only need to enable the radio.

Now as far as wan you'd need to make an config device (type bridge) called br-wan containing eth0.2 and bat0. Or if you don't want to bridge the wan out to an Ethernet port you could just replace eth0.2 with bat0 in the non-bridge wan.

Again, thanks for the advice, but I'm still confused.

You say: Is the swconfig (config switch_vlan) stuff still in there?
Where is "there"?
Why do you ask?

You say: Attaching a wifi AP to br-lan is the same as it always was, just put option network lan` in the /etc/config/wireless.
I never did that before, and specifying that eth0.1 was bridged to default_radio1.1 was apparently necessary before the thing would work. It was definitely not defaulted, perhaps because of something else I was doing, but I have no idea what that might be. Where in the /etc/config/wireless file should "option network lan" appear? In the "radio1" stanza or in the "default_radio1" stanza or in some other stanza that I don't yet know about?

You say: ... you'd need to make a config device (type bridge) called br-wan containing eth0,.2 and bat0.
"Containing"? Exactly how "containing"? What exactly should this stanza actually look like? I did try a number of things but nothing I tried worked, so I really haven't a clue. I ask again, perhaps more explicitly now: How exactly does one specify a multiple-device bridge under the new drill? Verbatim, character by character, please?

In /etc/config/network, like it is in 19.07.

This may be what you're asking about. From rc3, /etc/config/network, in the default configuration of an Archer A7v5:

config switch
	option name 'switch0'
	option reset '1'
	option enable_vlan '1'

config switch_vlan
	option device 'switch0'
	option vlan '1'
	option ports '2 3 4 5 0t'

config switch_vlan
	option device 'switch0'
	option vlan '2'
	option ports '1 0t'


blush
I withdraw my previous remark about 'default_radio1.1' having to be in the bridge. I took it out and it still worked in my lucky snapshot. Moreover, I was also wrong about "option network 'lan'", which it turns out was explicit in /etc/config/wireless. I guess I haven't understood its significance until now; thanks for the education!

hey, @SteveNewcomb ! is it working in 21 now?

@frollic and @mk24 , is there a documentation of the syntax changes in the /etc/config/ files between the 19 and 21 releases?

So far, I've only seen this Mini tutorial for DSA network config

1 Like

But I still don't know how to specify the bridges. Let me be a little clearer about what I need.
In each mesh network, one of the nodes is the gateway. On that node, bat0 is bridged to the LAN ethernet, and so is the 2.4 GHz radio (radio1) via "option network 'lan'". On the remaining (non-gateway) nodes, bat0 is bridged to the WAN. As always, the 2.4 GHz radio is bridged to the LAN.

Looking at it from the perspective of DHCP service:
Each node's LAN is a unique network to which it provides DHCP service.
Now, the gateway node of each mesh in effect provides DHCP service not only to its respective wireless LAN, but also to its respective mesh network. However, within a given mesh, each node's IP is static, so DHCP service has very little impact on the meshes themselves. (I tried making the meshes use DHCP. It worked, but it wasn't robust enough to be practical. Static IPs worked much better.)

Patience! I'm old, obsessively methodical, and slow. You'll be the first to know, though.

my apologies! (lol) I'm now tracking this thread to follow your progress.

The DSA mini-tutorial is very interesting, but according to this support matrix, DSA is not yet relevant to TP-Link routers. So it still isn't at all clear to me what the syntax should be for the bridges I need.

The DSA mini-tutorial contains this example. It looks promising because it's a syntax I would never have guessed and therefore didn't try.

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'lan1'
	list ports 'lan2'
	list ports 'lan3'
	list ports 'lan4'

config bridge-vlan
	option device 'br-lan'
	option vlan '1'
	list ports 'lan1'
	list ports 'lan2'

config bridge-vlan
	option device 'br-lan'
	option vlan '2'
	list ports 'lan3'
	list ports 'lan4'

However, I still don't know how I might bridge bridge-vlan.1 and bridge-vlan.2 (which I admit doesn't make any sense in terms of the example shown in the mini-tutorial). That piece of the puzzle is still missing.

Hooray, the router no longer "goes completely silent" now that I have removed the otiose "default_radio1" from the "device" option.

However, the bridging of bat0 and eth0.1 does not work under the old syntax (regardless of whether I use "ifname" or "device" as the parameter name), because bridging is different now and I have yet to see a working example (or indeed any example) of the syntax of such a bridging specification. It is not obvious how to do it; none of my guesses have worked.

OK it's not DSA, so as far as the Ethernet ports go, eth0.1 is the four LAN ports and eth0.2 is the WAN port.

DSA is a way to abstract a hardware switch so that in the configuration, it looks like independent CPU ports in a software bridge. Internally though, the DSA executive will utilize switching hardware where it is available.

But again DSA has not come to the C7 yet, so don't worry about it.

Switching a batman mesh output to a regular AP is a software bridge. These are configured by creating an overlaying virtual "device" of type bridge, and a name that starts with br-, and attaching the enslaved ports to it with list ports, one line for each port.

config device
    option name 'br-lan'
    option type 'bridge'
    list ports 'eth0.1'
    list ports 'bat0'

The results of this can be examined with brctl show, which will list the bridges and their members, including wifi interfaces that were brought in in /etc/config/wireless.

1 Like

OK. Many thanks; brctl now shows appropriate stuff. I have reported the working /etc/config/network, /etc/config/wireless, and brctl output in this discussion.

But the problems are not quite solved yet, because Batman still doesn't work. The only relevant complaint I have found in the log suggests that perhaps wpa_supplicant is incompatible with what used to work in /etc/config/wireless. Here's what I'm seeing in the log:

Thu Jun 24 20:58:15 2021 daemon.err wpa_supplicant[1480]: Line 8: too large mode (value=5 max_value=4)
Thu Jun 24 20:58:15 2021 daemon.err wpa_supplicant[1480]: Line 8: failed to parse mode '5'.
Thu Jun 24 20:58:15 2021 daemon.err wpa_supplicant[1480]: Line 16: failed to parse network block.
Thu Jun 24 20:58:15 2021 daemon.err wpa_supplicant[1480]: Failed to read or parse configuration '/var/run/wpa_supplicant-wlan0.conf'.
Thu Jun 24 20:58:15 2021 daemon.notice netifd: radio0 (5073): Interface 0 setup failed: WPA_SUPPLICANT_FAILED
Thu Jun 24 20:58:15 2021 daemon.notice netifd: radio0 (5073): WARNING (wireless_add_process): executable path /usr/sbin/wpad does not match process  path (/proc/exe)

So:

root@rpc149:~# cat /var/run/wpa_supplicant-wlan0.conf
network={
        ssid="meshD"
        key_mgmt=SAE
        mode=5
        fixed_freq=1
        frequency=5180
        ht40=1
        vht=1
        max_oper_chwidth=1
        sae_password="XXXX"
        beacon_int=100
}

On another node where my lucky snapshot is running in production, here's what the log says about wpa:

root@rpc150:~# logread | fgrep -i wpa
Thu Jun 24 09:04:05 2021 daemon.notice wpa_supplicant[1367]: Successfully initialized wpa_supplicant
Fri Jun 25 04:35:03 2021 daemon.notice wpa_supplicant[1367]: wlan0: leaving mesh
Fri Jun 25 04:35:04 2021 daemon.err wpa_supplicant[1367]: wlan0: mesh leave error=-134
Fri Jun 25 04:35:06 2021 daemon.notice wpa_supplicant[1367]: wlan0: interface state UNINITIALIZED->ENABLED
Fri Jun 25 04:35:06 2021 daemon.notice wpa_supplicant[1367]: wlan0: AP-ENABLED
Fri Jun 25 04:35:06 2021 daemon.notice wpa_supplicant[1367]: wlan0: joining mesh meshD
Fri Jun 25 04:35:06 2021 daemon.notice wpa_supplicant[1367]: wlan0: CTRL-EVENT-CONNECTED - Connection to 00:00:00:00:00:00 completed [id=0 id_str=]
Fri Jun 25 04:35:06 2021 daemon.notice wpa_supplicant[1367]: wlan0: MESH-GROUP-STARTED ssid="meshD" id=0

On that same (working) node:

root@rpc150:~# cat /var/run/wpa_supplicant-wlan0.conf
country=US
network={
        ssid="meshD"
        key_mgmt=SAE
        mode=5
        mesh_fwding=0
        fixed_freq=1
        frequency=5745
        ht40=1
        vht=1
        max_oper_chwidth=1
        sae_password="XXXX"
        beacon_int=100
}

At this moment, I'm guessing that wpa_supplicant isn't supporting mesh mode, so I'm wondering whether I need to make a change to wpad. I'm looking into it now.

Replace package wpad-wolfssl with wpad-mesh-wolfssl.