I have a simple wpad-mesh-wolfssl-based mesh of 3 nodes (all of them are Wavlink WL-WN530HG4, MT7620A/MT76x2E SoC).
2 of them make a pretty stable connection, but the 3rd one is getting sporadically disconnected:
Tue Feb 1 10:36:58 2022 daemon.notice wpa_supplicant[1291]: wlan0: MESH-SAE-AUTH-FAILURE addr=80:3f:5d:f6:75:0a
Tue Feb 1 10:36:59 2022 daemon.notice wpa_supplicant[1291]: wlan0: new peer notification for 80:3f:5d:f6:75:0a
Tue Feb 1 10:36:59 2022 daemon.notice wpa_supplicant[1291]: wlan0: mesh plink with 80:3f:5d:f6:75:0a established
Tue Feb 1 10:36:59 2022 daemon.notice wpa_supplicant[1291]: wlan0: MESH-PEER-CONNECTED 80:3f:5d:f6:75:0a
Tue Feb 1 10:37:00 2022 daemon.warn dnsmasq-dhcp[1]: no address range available for DHCP request via br-lan
Tue Feb 1 10:37:00 2022 daemon.warn dnsmasq-dhcp[1]: no address range available for DHCP request via br-lan
Tue Feb 1 10:37:06 2022 daemon.warn dnsmasq-dhcp[1]: no address range available for DHCP request via br-lan
Tue Feb 1 10:38:09 2022 daemon.notice wpa_supplicant[1291]: wlan0: mesh plink with 80:3f:5d:f6:4b:06 closed with reason 55
Tue Feb 1 10:38:09 2022 daemon.notice wpa_supplicant[1291]: wlan0: MESH-PEER-DISCONNECTED 80:3f:5d:f6:4b:06
I've found that reason 55 means "The mesh STA has received a Mesh Peering Close message requesting to close the mesh peering".
All 3 devices have identical settings (restored from the same config backup), running OpenWrt v21.02.1. I've also tried wpad-mesh-openssl, as well as the latest snapshot build, with the same results.
You can get this error for various reasons but the most common is the "bad" mesh node has a low signal to noise ratio.
Test by moving the bad one closer to a good one. Start very close then try progressively moving apart.
Sometimes just relocating within a room is enough.
If this works for you then there are additional non-config settings that help prevent the "now you see me, now you don't" scenarios from occurring.
Hi @bluewavenet, thanks for replying and for your suggestion! I've now tried keeping them all in the same room, and the outcome is the same, eventually only 2 out of 3 hold the connection.
Then assuming there is no hardware problem, it will be a setup issue. ie basic configuration plus required extras.
Using the </> button on the top line of the text box, please show the output of: uci show wireless
uci delete wireless.default_radio0.disassoc_low_ack='0'
uci set wireless.default_radio0.mesh_rssi_threshold='-80'
uci commit wireless
Now power cycle all 3 nodes.
Next you need to set some parameters that cannot be set via the uci config, as the interface needs to be up first.
On each node do:
iw dev wlan0 set mesh_param mesh_fwding 1
iw dev wlan0 set mesh_param mesh_gate_announcements 1
iw dev wlan0 set mesh_param mesh_rssi_threshold -80
iw dev wlan0 set mesh_param mesh_hwmp_rootmode 3
Now wait a minute or so then test again without restarting the nodes.
Obviously these "iw" commands will not survive a reboot or a wireless/network restart, so you will have to run a script to do this for you. See if it works first
Hi @bluewavenet, sadly it was a temporary success. The mesh didn't hold overnight, and it the morning only a reboot helped on all nodes (after I tried "wifi down; wifi up" and [edited] "service network restart" with no luck).
All nodes had a bunch of ESH-SAE-AUTH-FAILURE errors in syslog, while the signal level was approx -60 dBm on each node.
Are there any other troubleshooting steps I could try? At this point I tend to believe it's a problem with the current MediaTek MT7620A/MT7612E driver in OpenWrt v21.02. Maybe I should try the the older v19.07 release and see if there's any difference.
I also have a pair of TP-Link Archer A6 v3 (MT7620A/MT7612E based) and I'm going to give them a try as well.
This will clear all the iw command configurations done earlier so it will no longer work anyway.
I'm not sure what that does. It should be service network restart and that wouls also clear the iw settings.
The most common cause of this on an otherwise working mesh, is a very low signal to noise level occurring for a period. The nodes will blacklist the others that fail. The blacklist blocking drops off again at some long interval, I forget the default value, hours I think.
However maybe try changing to wpad-mesh-openssl..
Run dmesg and see if any errors are reported there.
Thanks again @bluewavenet for following up with me on this one
Right, I already have a script in-place to re-issue those commands. I was running it manually after restarting wifi.
Sorry, I made a type in the post but I was indeed doing service network restart.
I'm observing a pretty good signal all the time I'm awake
I did, there was no errors or warnings in the kernel log, besides this one, which seems to be a long known issue:
[ 25.415606] WARNING: CPU: 3 PID: 197 at net/core/flow_dissector.c:960 0x8042bd24`
I tried that already with the default settings (mentioned in the thread start post), but it might be a good idea to try it with the custom settings you suggested.
Maybe I should try running this mesh without encryption at all, as a troubleshooting step, and see if it holds up.
How did you do this on all the nodes? The mesh was not working.....
On the "gateway" node ie the one connected to your router, keep monitoring the output of: iw dev wlan0 mpath dump
Maybe, but you are not getting any errors/crashes, just normal reports of sometimes not being able to communicate.
And it works. An unencrypted 802.11s mesh on all three nodes with all the default setting has remained stable overnight. I didn't have to tweak any settings.
Not sure if it's MediaTek (mt76) driver related, it might be this one that also reportedly happens with Atheros drivers:
I had the same issue and was searching for a solution. I ended up doing 2 things.
I messed with the RSSI threshold for the Mesh AP's only (Wireless, edit mesh wifi, interface configuration, advanced settings) This keeps the two mesh ap's from connecting and disconnecting as the Hub sit right between the to mesh AP's. I looked at the current connection strength and then added a few DB's just incase.
I used the following code, dropped it in the root folder and set a cron job to run this every few minutes. I originally wrote this to automate rebooting/resetting router but it works great on the mesh point. With this running it's been almost a week since I've seen the trouble mesh ap disconnect (well it has disconnected but came back up due to the task).
I also messed with the
Just need to:
define the your gateway for your network in the code
make the file executable (chmod a+x)
setup the cronjob to run the task (*/5 * * * * /root/updateDynhost)
#! /bin/sh
GW=192.168.1.1
if ping -c 2 $GW > /dev/null 2>&1
then
echo nothing > /dev/null
else
/etc/init.d/network restart
sleep 20
date >> /root/restart.log
if ping -c 2 $GW > /dev/null 2>&1
then
echo nothing > /dev/null
else
date >> /root/reboot.log
reboot
fi
fi
Thanks for sharing! I personally gave up on the mesh and switched to WDS, that works for my purposes and has been rock solid (at least on this MT7xx platform).
I have a similar problem that came with a recent upgrade to 22.03.6. It was working perfectly until yesterday. I have an Engenius EAP1300 that had a mesh connection to two ASUS AC58Us. And I have another pair of ASUS AC58Us on a separate mesh on a different channel. The mesh nodes would just bounce continuously and wouldn't stay up more than a few seconds. I would see multiple mesh11sd processes running at the same time, sometimes even 3.
It seems that mesh11sd and some luci package around it were upgraded at this time.
I upgraded to 23.05.3, no love. Exactly the same problem. I did an installation from scratch and that seemed to help some, but I would still see this happen.
Going through this conversation and playing with things, I was able to get it working, I think. I did the following on every node:
uci set mesh11sd.mesh_params.mesh_gate_announcements='1'
uci set mesh11sd.mesh_params.mesh_hwmp_rootmode='3'
uci set mesh11sd.mesh_params.mesh_rssi_threshold='-90'
uci commit mesh11sd
uci set wireless.wifinet2.mesh_gate_announcements='1'
uci set wireless.wifinet2.mesh_hwmp_rootmode='3'
uci set wireless.wifinet2.mesh_rssi_threshold='-90'
uci commit wireless
reboot
Using "iw dev mesh_param dump" I could see that the parameters were definitely being set wrong. I'm not sure which of the above fixed the problem, but it looks like the settings in the mesh11sd config file no longer work, you have to set them in the wireless config.
This was fairly painful for me, not sure if this is the right solution or if the problem will come back. But it's stable right now.