Having trouble with secure mesh links

This is my simple mesh link config

/etc/config/wireless

config wifi-iface 'wmesh'
	option device 'radio1'
	option ifname 'if-mesh'
	option network 'nwi_mesh0'
	option mode 'mesh'
	option mesh_id 'MeshCloud'
	option key 'ItsASecureMeshLink'
	option mesh_fwding '0'
	option mesh_ttl '1'
	option mesh_rssi_threshold '0'
	option encryption 'sae'

/etc/config/network

config interface 'lan'
	option type 'bridge'
	option ifname 'eth0.1 bat0' 
	option proto 'static'
	option ipaddr '192.168.2.1'
	option netmask '255.255.255.0'
	option ip6assign '60'
..
..
..
config interface 'bat0'
	option proto 'batadv'
	option routing_algo 'BATMAN_IV'
	option aggregated_ogms 1
	option ap_isolation 0
	option bonding 0
	option fragmentation 1
	#option gw_bandwidth '10000/2000'
	option gw_mode 'off'
	#option gw_sel_class 20
	option log_level 0
	option orig_interval 1000
	option bridge_loop_avoidance 1
	option distributed_arp_table 1
	option multicast_mode 1
	option network_coding 0
	option hop_penalty 30
	option isolation_mark '0x00000000/0x00000000'

config interface 'nwi_mesh0'
	option mtu '2304'
	option proto 'batadv_hardif'
	option master 'bat0'

I have the necessary package installed for setting up the secure mesh link .

But for some reason they do not connect in one go. This is what syslog of gateway node shows.

Sun Feb 28 15:55:14 2021 daemon.notice wpa_supplicant[2090]: if-mesh: MESH-SAE-AUTH-FAILURE addr=84:d8:1b:4a:77:10
Sun Feb 28 15:55:31 2021 daemon.notice wpa_supplicant[2090]: if-mesh: MESH-SAE-AUTH-FAILURE addr=84:d8:1b:4a:77:10
Sun Feb 28 15:55:46 2021 daemon.notice wpa_supplicant[2090]: if-mesh: MESH-SAE-AUTH-FAILURE addr=84:d8:1b:4a:77:10
Sun Feb 28 15:56:01 2021 daemon.notice wpa_supplicant[2090]: if-mesh: MESH-SAE-AUTH-FAILURE addr=84:d8:1b:4a:77:10
Sun Feb 28 15:56:01 2021 daemon.notice wpa_supplicant[2090]: if-mesh: MESH-SAE-AUTH-BLOCKED addr=84:d8:1b:4a:77:10 duration=300

[Edit] : Most of the times it connects after a few minutes, sometimes even after 10 minutes it never doesn't connect.

I assume you mean "Secure Mesh", rather than "Sure Mesh".
You do not give any details about your mesh eg number of nodes, geographic spread, hardware in use, backhaul channel etc etc.

A large community mesh with segments of backhaul using cable benefits from using Batman with its layer 3 routing functionality, but an "all radio" mesh for even a large outdoor venue, an 802.11s mesh is all that is needed. As you say you want a "simple mesh link", I wonder if you need Batman at all.

The errors you are getting are typical of inadequate signal strength and possibly high noise. How far apart are the nodes? Are you using 5GHz?

Try moving the nodes closer together.

A mesh network requires a fixed channel number, you have not shown that part of the wireless config....
You have not set the rssi threshold (a value of zero indicates try to connect regardless of signal strength). Try setting to -70, but as far as I am aware, this does not get set from the config file, instead you have to set this using the iw utility once the mesh interface has come up. I'm not sure but you can test using iw.

Yes :sweat_smile:

Am using 4 nodes in total, am using tp-link archer c20 v4 and my backhaul channel is 36.

I was using a simple 802.11s mesh, but since I wanted to try BATMAN out I started using BATMAN.

Signal strength is not the issue, I get a good -56 dBm signal between the nodes and yes am using it on 5Ghz.
Maybe it could be because of noise but am not so sure.(My neighbor uses the same channel, but when I tried that by changing the channel it didn't make much of a difference so maybe not ...as I said not so sure.

Kept them side by side still am having the same issue.

Will try that out.

I'm currently experiencing exactly the same MESH_AUTH_FAILURE and then MESH_AUTH_BLOCKED using wpad-mesh-wolfssl on OpenWrt 21.02.0-rc.1. But it only seems to be an issue if one mesh partner reboots (we have a maintenance reboot during the night once a week) and then the MESH_AUTH_FAILURE come up while it worked perfectly before that reboot event.

Something is very strange. Had two weeks in a row where the problem came during the nightly reboot. If I // now // issue the reboot command to one of the nodes or reboot via Web UI I cannot reproduce the problem.

My devices: TP-Link Archer C7v2 (mesh AP 1), TP-Link Archer C7v5 (mesh AP 2) using ath10k-non-ct drivers for 5 GHz mesh.

Same problem here:
Aug 24 14:02:35 WifiAP-01 wpa_supplicant[14237]: wlan0: MESH-SAE-AUTH-BLOCKED addr=68:ff:7b:0e:32:98 duration=300

When this happens, I'll have to wait 5 minutes "for nothing" and then the mesh starts to work again. I only have two AP devices doing mesh with each other.

Related: https://www.reddit.com/r/openwrt/comments/lujh0t/having_trouble_with_sure_mesh_links/

I have a regular mesh using C7 v2's running on 2.4Ghz and I occasionally get that message:

I suspect its due to interference in my case / busy neighborhood. The nodes it happened to are only 20ft apart with -30dBm signal.

I also observed it when I tried to connect a AR750S to the C7 mesh on 2.4Ghz and then it became unresponsive for a period of time (ie. 5 minutes / same log message).

The C7 (and AR750S) is using 19.07.7, ath10k non-CT drivers (5Ghz I know) and wpad-mesh-openssl.

@16F84
I'm currently trying "long preamble" as my mesh partners are about 15 feet and a wall distanced from each other.

The option can be set for the wifi-iface in "/etc/config/wireless" :

config wifi-iface 'wifinet0'
    option short_preamble '0'
    (...)

YMMV

I'm leaning more towards watchcat and scheduled reboots as I lose 2.4Ghz nodes that transmit lots of data like my TV.

1 Like

@16F84 I've replaced Archer C7v2 by C7v5 units. No more 2,4 GHz problems after that.

1 Like

After a planned reboot of one AP at 04.00, my mesh consisting of two APs failed again.

Log:

Aug 31 04:01:35 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-FAILURE addr=68:ff:7b:0e:xx:xx
Aug 31 04:01:51 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-FAILURE addr=68:ff:7b:0e:xx:xx
Aug 31 04:02:07 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-FAILURE addr=68:ff:7b:0e:xx:xx
Aug 31 04:02:22 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-FAILURE addr=68:ff:7b:0e:xx:xx
Aug 31 04:02:22 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-BLOCKED addr=68:ff:7b:0e:xx:xx duration=300
Aug 31 04:06:43 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-FAILURE addr=68:ff:7b:0e:xx:xx
Aug 31 04:06:53 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-FAILURE addr=68:ff:7b:0e:xx:xx
Aug 31 04:07:07 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-FAILURE addr=68:ff:7b:0e:xx:xx
Aug 31 04:07:17 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-FAILURE addr=68:ff:7b:0e:xx:xx
Aug 31 04:07:17 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-BLOCKED addr=68:ff:7b:0e:xx:xx duration=300
Aug 31 04:11:43 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-FAILURE addr=68:ff:7b:0e:xx:xx
Aug 31 04:11:55 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-FAILURE addr=68:ff:7b:0e:xx:xx
Aug 31 04:12:14 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-FAILURE addr=68:ff:7b:0e:xx:xx
Aug 31 04:12:33 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-FAILURE addr=68:ff:7b:0e:xx:xx
Aug 31 04:12:33 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-BLOCKED addr=68:ff:7b:0e:xx:xx duration=300
Aug 31 04:16:44 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-FAILURE addr=68:ff:7b:0e:xx:xx
Aug 31 04:16:55 WifiAP-01 wpa_supplicant[1424]: wlan0: MESH-SAE-AUTH-FAILURE addr=68:ff:7b:0e:xx:xx

After all troubeshooting and no other concurrent issues left, I'm sure this is a BUG in OpenWrt 21.02.0-rcX or packages.

Also tried "/etc/init.d/wpad restart" and the same error appears.

Also tried "/etc/init.d/network restart" and the same error appears.

Aug 31 09:11:29 HST-WifiAP-01 netifd: Interface 'nwi_mesh0' is now up
Aug 31 09:11:30 WifiAP-01 wpa_supplicant[26069]: wlan0: MESH-SAE-AUTH-FAILURE addr=68:ff:7b:0e:xx:xx
Aug 31 09:11:41 WifiAP-01 wpa_supplicant[26069]: wlan0: MESH-SAE-AUTH-FAILURE addr=68:ff:7b:0e:xx:xx

Related: https://github.com/libremesh/lime-packages/issues/837

MANUAL WORKAROUND FOUND: "DISABLE, WAIT FOR 5+ MINUTES, RE-ENABLE MESH"

Aug 31 09:26:03 Disabled the mesh SSID interface via LUCI manually.

(Waited for 6 minutes, "more than 5 minutes", then re-enabled the mesh SSID interface via LUCI)

Aug 31 09:34:11 HST-WifiAP-01 hostapd: wlan0-1: AP-ENABLED
Aug 31 09:34:18 WifiAP-01 netifd: Interface 'nwi_mesh0' is now up
Aug 31 09:34:19 WifiAP-01 wpa_supplicant[7331]: wlan0: new peer notification for 68:ff:7b:0e:xx:xx
Aug 31 09:34:19 WifiAP-01 wpa_supplicant[7331]: wlan0: new peer notification for 68:ff:7b:0e:xx:xx
Aug 31 09:34:19 WifiAP-01 wpa_supplicant[7331]: wlan0: new peer notification for 68:ff:7b:0e:xx:xx
Aug 31 09:34:19 WifiAP-01 wpa_supplicant[7331]: wlan0: new peer notification for 68:ff:7b:0e:xx:xx
Aug 31 09:34:19 WifiAP-01 wpa_supplicant[7331]: wlan0: new peer notification for 68:ff:7b:0e:xx:xx
Aug 31 09:34:19 WifiAP-01 wpa_supplicant[7331]: wlan0: mesh plink with 68:ff:7b:0e:xx:xx established
Aug 31 09:34:19 WifiAP-01 wpa_supplicant[7331]: wlan0: MESH-PEER-CONNECTED 68:ff:7b:0e:xx:xx

The mesh link came up immediately and was use-able.

@devs @hnyman: does this help you identify the issue?

Weekly reboot here, mesh failed to come up since 3:30 h. I'm on openwrt 21.02.0 stable

The issue occured again after the weekly reboot of the "backhaul node A".

They are connected like this: LAN ==> node A ==> MESH ==> node B.

Node B had the BLOCKED SAE failure and wasn't able to re-establish the mesh with node A after node A did its planned reboot.

I let the situation sit like it was and didn't interfere with any node. After 30 hours, our monitoring system said "node B reachable again" so it succeeded to re-establish the mesh by itself. It SHOULD have done this more quicker.

node A uptime:

10:46:37 up 1 day, 6:46, load average: 0.42, 0.34, 0.24

node B uptime:

10:46:56 up 1 day, 21:50, load average: 0.15, 0.10, 0.09

Note: the planned reboot was 1 day 6:46 hours ago.

I opened a bug report in the OpenWrt bug tracker, if you have any useful information (eg device/hardware info where this is happening) please add it there:
https://bugs.openwrt.org/index.php?do=details&task_id=4098

1 Like