Mesh nodes losing connection sporadically ("plink closed with reason 55", MESH-SAE-AUTH-FAILURE, MESH-SAE-AUTH-BLOCKED)

I have a simple wpad-mesh-wolfssl-based mesh of 3 nodes (all of them are Wavlink WL-WN530HG4, MT7620A/MT76x2E SoC).

2 of them make a pretty stable connection, but the 3rd one is getting sporadically disconnected:

Tue Feb  1 10:36:58 2022 daemon.notice wpa_supplicant[1291]: wlan0: MESH-SAE-AUTH-FAILURE addr=80:3f:5d:f6:75:0a
Tue Feb  1 10:36:59 2022 daemon.notice wpa_supplicant[1291]: wlan0: new peer notification for 80:3f:5d:f6:75:0a
Tue Feb  1 10:36:59 2022 daemon.notice wpa_supplicant[1291]: wlan0: mesh plink with 80:3f:5d:f6:75:0a established
Tue Feb  1 10:36:59 2022 daemon.notice wpa_supplicant[1291]: wlan0: MESH-PEER-CONNECTED 80:3f:5d:f6:75:0a
Tue Feb  1 10:37:00 2022 daemon.warn dnsmasq-dhcp[1]: no address range available for DHCP request via br-lan
Tue Feb  1 10:37:00 2022 daemon.warn dnsmasq-dhcp[1]: no address range available for DHCP request via br-lan
Tue Feb  1 10:37:06 2022 daemon.warn dnsmasq-dhcp[1]: no address range available for DHCP request via br-lan
Tue Feb  1 10:38:09 2022 daemon.notice wpa_supplicant[1291]: wlan0: mesh plink with 80:3f:5d:f6:4b:06 closed with reason 55
Tue Feb  1 10:38:09 2022 daemon.notice wpa_supplicant[1291]: wlan0: MESH-PEER-DISCONNECTED 80:3f:5d:f6:4b:06

I've found that reason 55 means "The mesh STA has received a Mesh Peering Close message requesting to close the mesh peering".

All 3 devices have identical settings (restored from the same config backup), running OpenWrt v21.02.1. I've also tried wpad-mesh-openssl, as well as the latest snapshot build, with the same results.

Is there anything to help with this disconnect issue? Am I possibly dealing with this know MESH-SAE-AUTH-FAILURE problem?

I'm new to whole OpenWrt Mesh things. Thanks for any insights!

1 Like

You can get this error for various reasons but the most common is the "bad" mesh node has a low signal to noise ratio.
Test by moving the bad one closer to a good one. Start very close then try progressively moving apart.
Sometimes just relocating within a room is enough.
If this works for you then there are additional non-config settings that help prevent the "now you see me, now you don't" scenarios from occurring.

1 Like

Hi @bluewavenet, thanks for replying and for your suggestion! I've now tried keeping them all in the same room, and the outcome is the same, eventually only 2 out of 3 hold the connection.

Then assuming there is no hardware problem, it will be a setup issue. ie basic configuration plus required extras.
Using the </> button on the top line of the text box, please show the output of:
uci show wireless

Thank you, here's the output from uci show wireless. It's identical on all units, and I only use the 5Ghz band.

wireless.radio0=wifi-device
wireless.radio0.type='mac80211'
wireless.radio0.path='pci0000:00/0000:00:00.0/0000:01:00.0'
wireless.radio0.band='5g'
wireless.radio0.htmode='VHT80'
wireless.radio0.cell_density='0'
wireless.radio0.country='AU'
wireless.radio0.channel='149'
wireless.radio0.txpower='23'
wireless.default_radio0=wifi-iface
wireless.default_radio0.device='radio0'
wireless.default_radio0.network='lan'
wireless.default_radio0.mode='mesh'
wireless.default_radio0.mesh_id='mymeshid'
wireless.default_radio0.mesh_fwding='1'
wireless.default_radio0.encryption='sae'
wireless.default_radio0.key='mymeshpassword'
wireless.default_radio0.disassoc_low_ack='0'
wireless.default_radio0.mesh_rssi_threshold='0'
wireless.radio1=wifi-device
wireless.radio1.type='mac80211'
wireless.radio1.path='platform/10180000.wmac'
wireless.radio1.channel='1'
wireless.radio1.band='2g'
wireless.radio1.htmode='HT20'
wireless.radio1.disabled='1'
wireless.default_radio1=wifi-iface
wireless.default_radio1.device='radio1'
wireless.default_radio1.network='lan'
wireless.default_radio1.mode='ap'
wireless.default_radio1.ssid='OpenWrt'
wireless.default_radio1.encryption='none'

Try the following (doing it on all 3 mesh nodes):

uci delete wireless.default_radio0.disassoc_low_ack='0'
uci set wireless.default_radio0.mesh_rssi_threshold='-80'
uci commit wireless

Now power cycle all 3 nodes.

Next you need to set some parameters that cannot be set via the uci config, as the interface needs to be up first.
On each node do:

iw dev wlan0 set mesh_param mesh_fwding 1
iw dev wlan0 set mesh_param mesh_gate_announcements 1
iw dev wlan0 set mesh_param mesh_rssi_threshold -80
iw dev wlan0 set mesh_param mesh_hwmp_rootmode 3

Now wait a minute or so then test again without restarting the nodes.

Obviously these "iw" commands will not survive a reboot or a wireless/network restart, so you will have to run a script to do this for you. See if it works first :wink:

2 Likes

Thanks much @bluewavenet, this seems to be working! So far, the flight has been normal, one hour in :slight_smile: Greatly appreciated!

Is there a decent tutorial about those mesh_* params, that would go beyond the OpenWrt docs I had followed initially?

What I've found so far:

1 Like

Not that I am aware of. Choices are to accept it as Faerie Magic or read source code.... :wink:

1 Like

Hi @bluewavenet, sadly it was a temporary success. The mesh didn't hold overnight, and it the morning only a reboot helped on all nodes (after I tried "wifi down; wifi up" and [edited] "service network restart" with no luck).

All nodes had a bunch of ESH-SAE-AUTH-FAILURE errors in syslog, while the signal level was approx -60 dBm on each node.

Are there any other troubleshooting steps I could try? At this point I tend to believe it's a problem with the current MediaTek MT7620A/MT7612E driver in OpenWrt v21.02. Maybe I should try the the older v19.07 release and see if there's any difference.

I also have a pair of TP-Link Archer A6 v3 (MT7620A/MT7612E based) and I'm going to give them a try as well.


Edited, so there is no v19.07 for either WAVLINK WL-WN530HG4 or TP-Link Archer A6 v3. These devices are pretty young in terms of OpenWrt support.

Maybe I should just shelve them until better times and get something more mainstream, like Google Mesh :thinking:

This will clear all the iw command configurations done earlier so it will no longer work anyway.

I'm not sure what that does. It should be service network restart and that wouls also clear the iw settings.

The most common cause of this on an otherwise working mesh, is a very low signal to noise level occurring for a period. The nodes will blacklist the others that fail. The blacklist blocking drops off again at some long interval, I forget the default value, hours I think.

However maybe try changing to wpad-mesh-openssl..

Run dmesg and see if any errors are reported there.

Thanks again @bluewavenet for following up with me on this one :slight_smile:

Right, I already have a script in-place to re-issue those commands. I was running it manually after restarting wifi.

Sorry, I made a type in the post but I was indeed doing service network restart.

I'm observing a pretty good signal all the time I'm awake :slight_smile:

image

I did, there was no errors or warnings in the kernel log, besides this one, which seems to be a long known issue:

[   25.415606] WARNING: CPU: 3 PID: 197 at net/core/flow_dissector.c:960 0x8042bd24`

I tried that already with the default settings (mentioned in the thread start post), but it might be a good idea to try it with the custom settings you suggested.

Maybe I should try running this mesh without encryption at all, as a troubleshooting step, and see if it holds up.

Yes, you read my mind!

How did you do this on all the nodes? The mesh was not working.....
On the "gateway" node ie the one connected to your router, keep monitoring the output of:
iw dev wlan0 mpath dump

Maybe, but you are not getting any errors/crashes, just normal reports of sometimes not being able to communicate.

1 Like

And it works. An unencrypted 802.11s mesh on all three nodes with all the default setting has remained stable overnight. I didn't have to tweak any settings.

Not sure if it's MediaTek (mt76) driver related, it might be this one that also reportedly happens with Atheros drivers:

I had the same issue and was searching for a solution. I ended up doing 2 things.

  1. I messed with the RSSI threshold for the Mesh AP's only (Wireless, edit mesh wifi, interface configuration, advanced settings) This keeps the two mesh ap's from connecting and disconnecting as the Hub sit right between the to mesh AP's. I looked at the current connection strength and then added a few DB's just incase.

  2. I used the following code, dropped it in the root folder and set a cron job to run this every few minutes. I originally wrote this to automate rebooting/resetting router but it works great on the mesh point. With this running it's been almost a week since I've seen the trouble mesh ap disconnect (well it has disconnected but came back up due to the task).
    I also messed with the

Just need to:

  1. define the your gateway for your network in the code
  2. make the file executable (chmod a+x)
  3. setup the cronjob to run the task (*/5 * * * * /root/updateDynhost)
#! /bin/sh                                                                                   
                                                                                
GW=192.168.1.1                                                                  
                                                                                
if ping -c 2 $GW > /dev/null 2>&1                                               
then                                                                            
  echo nothing > /dev/null                                                      
else                                                                            
  /etc/init.d/network restart                                                   
  sleep 20                                                                      
  date >> /root/restart.log                                                     
  if ping -c 2 $GW > /dev/null 2>&1                                             
   then                                                                         
     echo nothing > /dev/null                                                   
   else                                                                         
     date >> /root/reboot.log                                                   
     reboot                                                                     
  fi                                                                            
fi
1 Like

Thanks for sharing! I personally gave up on the mesh and switch to WDS, that works for my purposes and has been rock solid (at least on this MT7xx platform).