OpenWRT BATMAN MESH-SAE-AUTH-BLOCKED

I have created a BATMAN mesh with OpenWRT 24.10.2 using 2 x Linksys EA8300 + 6 x ASUS Lyra MAP AC2200

One of the Linksys EA8300 is my router & DHCP. All the other devices are Access Points.

The wireless mesh 'backhaul' uses WPA3-SAE encryption.

So far, performance/stability/reliability has been deplorable with Access Points dropping off the mesh fairly randomly & various pings taking 10+ seconds to reply. I think I have sorted some of the problems, but probably not all. At least I'm learning a few things as I go along!

Currently, in the OpenWRT System Log for one of the Lyra Access Points, I keep seeing this:

Wed Jul 23 14:27:55 2025 daemon.notice hostapd: phy1-ap0: AP-STA-DISCONNECTED 00:c0:00:05:35:d7
Wed Jul 23 14:27:56 2025 daemon.info hostapd: phy1-ap0: STA 00:c0:00:05:35:d7 IEEE 802.11: disassociated
Wed Jul 23 14:27:56 2025 daemon.info hostapd: phy1-ap0: STA 00:c0:00:05:35:d7 IEEE 802.11: authenticated
Wed Jul 23 14:27:56 2025 daemon.info hostapd: phy1-ap0: STA 00:c0:00:05:35:d7 IEEE 802.11: associated (aid 1)
Wed Jul 23 14:27:56 2025 daemon.notice hostapd: phy1-ap0: AP-STA-CONNECTED 00:c0:00:05:35:d7 auth_alg=open
Wed Jul 23 14:27:56 2025 daemon.info hostapd: phy1-ap0: STA 00:c0:00:05:35:d7 RADIUS: starting accounting session 344621C070507A17
Wed Jul 23 14:27:56 2025 daemon.info hostapd: phy1-ap0: STA 00:c0:00:05:35:d7 WPA: pairwise key handshake completed (RSN)
Wed Jul 23 14:27:56 2025 daemon.notice hostapd: phy1-ap0: EAPOL-4WAY-HS-COMPLETED 00:c0:00:05:35:d7
Wed Jul 23 14:31:55 2025 daemon.notice wpa_supplicant[829]: phy0-mesh0: new peer notification for 10:7b:44:ce:05:d4
Wed Jul 23 14:32:11 2025 daemon.notice wpa_supplicant[829]: phy0-mesh0: MESH-SAE-AUTH-FAILURE addr=10:7b:44:ce:05:d4
Wed Jul 23 14:32:21 2025 daemon.notice wpa_supplicant[829]: phy0-mesh0: MESH-SAE-AUTH-FAILURE addr=10:7b:44:ce:05:d4
Wed Jul 23 14:32:36 2025 daemon.notice wpa_supplicant[829]: phy0-mesh0: MESH-SAE-AUTH-FAILURE addr=10:7b:44:ce:05:d4
Wed Jul 23 14:32:54 2025 daemon.notice wpa_supplicant[829]: phy0-mesh0: MESH-SAE-AUTH-FAILURE addr=10:7b:44:ce:05:d4
Wed Jul 23 14:32:54 2025 daemon.notice wpa_supplicant[829]: phy0-mesh0: MESH-SAE-AUTH-BLOCKED addr=10:7b:44:ce:05:d4 duration=300
Wed Jul 23 14:32:57 2025 daemon.notice hostapd: phy1-ap0: AP-STA-DISCONNECTED 00:c0:00:05:35:d7
Wed Jul 23 14:32:57 2025 daemon.info hostapd: phy1-ap0: STA 00:c0:00:05:35:d7 IEEE 802.11: disassociated
Wed Jul 23 14:32:57 2025 daemon.info hostapd: phy1-ap0: STA 00:c0:00:05:35:d7 IEEE 802.11: authenticated
Wed Jul 23 14:32:57 2025 daemon.info hostapd: phy1-ap0: STA 00:c0:00:05:35:d7 IEEE 802.11: associated (aid 1)
Wed Jul 23 14:32:58 2025 daemon.notice hostapd: phy1-ap0: AP-STA-CONNECTED 00:c0:00:05:35:d7 auth_alg=open
Wed Jul 23 14:32:58 2025 daemon.info hostapd: phy1-ap0: STA 00:c0:00:05:35:d7 RADIUS: starting accounting session 5503A052852D1AA0
Wed Jul 23 14:32:58 2025 daemon.info hostapd: phy1-ap0: STA 00:c0:00:05:35:d7 WPA: pairwise key handshake completed (RSN)
Wed Jul 23 14:32:58 2025 daemon.notice hostapd: phy1-ap0: EAPOL-4WAY-HS-COMPLETED 00:c0:00:05:35:d7
Wed Jul 23 14:36:57 2025 daemon.notice wpa_supplicant[829]: phy0-mesh0: new peer notification for 10:7b:44:ce:05:d4
Wed Jul 23 14:37:15 2025 daemon.notice wpa_supplicant[829]: phy0-mesh0: MESH-SAE-AUTH-FAILURE addr=10:7b:44:ce:05:d4
Wed Jul 23 14:37:34 2025 daemon.notice wpa_supplicant[829]: phy0-mesh0: MESH-SAE-AUTH-FAILURE addr=10:7b:44:ce:05:d4
Wed Jul 23 14:37:52 2025 daemon.notice wpa_supplicant[829]: phy0-mesh0: MESH-SAE-AUTH-FAILURE addr=10:7b:44:ce:05:d4
Wed Jul 23 14:37:59 2025 daemon.notice hostapd: phy1-ap0: AP-STA-DISCONNECTED 00:c0:00:05:35:d7
Wed Jul 23 14:37:59 2025 daemon.info hostapd: phy1-ap0: STA 00:c0:00:05:35:d7 IEEE 802.11: disassociated
Wed Jul 23 14:38:00 2025 daemon.info hostapd: phy1-ap0: STA 00:c0:00:05:35:d7 IEEE 802.11: authenticated
Wed Jul 23 14:38:00 2025 daemon.info hostapd: phy1-ap0: STA 00:c0:00:05:35:d7 IEEE 802.11: associated (aid 1)
Wed Jul 23 14:38:00 2025 daemon.notice hostapd: phy1-ap0: AP-STA-CONNECTED 00:c0:00:05:35:d7 auth_alg=open
Wed Jul 23 14:38:00 2025 daemon.info hostapd: phy1-ap0: STA 00:c0:00:05:35:d7 RADIUS: starting accounting session C43ECD6B828F8294
Wed Jul 23 14:38:00 2025 daemon.info hostapd: phy1-ap0: STA 00:c0:00:05:35:d7 WPA: pairwise key handshake completed (RSN)
Wed Jul 23 14:38:00 2025 daemon.notice hostapd: phy1-ap0: EAPOL-4WAY-HS-COMPLETED 00:c0:00:05:35:d7

with this continual MESH-SAE-AUTH-FAILURE/MESH-SAE-AUTH-BLOCKED.

Google doesn't seem to have a great deal of clues, but there are similar issues reported, such as this one:

which seems to hint at some kind of issue using wpad-mesh-wolfssl which is what I included in the firmware build for all my OpenWRT devices.

It suggests that wpad-mesh-openssl doesn't exhibit the same issue.

I don't really want to go back to 'square one' and rebuild all my devices using the wpad-mesh-openssl package instead, but in some of the Google search results, there is a suggestion of using "nohwcrypt=1" so that the firmware drivers don't do hardware encryption but instead it's done in software (at a performance penalty).

So, Google told me to edit /etc/modules.conf and insert the line ath10k nohwcrypt=1 which I did and rebooted. But this doesn't appear to have resolved the issue.

Questions:

  1. Where can I put nohwcrypt=1 where it will have an effect?
  2. Can I just switch my mesh backhaul encryption to WPA2 and expect the problem to just go away? (or even no encryption? do I need encryption on the mesh backhaul for a home LAN?)
  3. Can I just use opkg to uninstall wpad-mesh-wolfssl & instead install wpad-mesh-openssl on all my devices and hey-presto! problem solved?

I've just tried editing the /etc/modules.d/ath10k file so it now says:

ath10k_core frame_mode=2 nohwcrypt=1
ath10k_pci

I'll reboot and see what effect that has.

I'm assuming for testing, I only need to do this on the one Access Point that seems to exhibit the problem most of the time?

Because it works sometimes, it is safe to say that this is caused by bad signal to noise ratio between the meshnodes in question causing the 802.11s link to timeout and drop. The remote node tries to re-establish the link but the poor signal to noise results in it failing. If it does this repeatedly, it will be blocked for 300 seconds.

When the link is down, any clients connected to the access point of the meshnode will fail their "canary test" (AKA CPD test). This is a test to determine if an Internet upstream is available and if it fails the client device will try another SSID it knows about, disconnecting and giving the AP-STA-DISCONNECTED message. If it cannot connect to another SSID, it will eventually come round to this one and connect up again, re-starting the cycle.

Because wifi uses microwave frequencies, the signal to noise ratio can vary considerably due to multipath reflections increasing the noise level as well as insufficient signal, both having a knock on variable effect on the actual S/N ratio.
Sometimes the mesh backhaul will reconnect, sometimes it will disconnect, sometimes it will end up blocked.

The answer to this problem is to increase the signal to noise ratio. The quickest way to try things out is to move the meshnode that drops out closer.

If that works, then you have limited choices:

  1. Increase the rssi_threshold (ie make it a less negative number). The default is to have a setting of "0" which means "just try to connect even if the received signal is very low". The result is often exactly what you are seeing. Supporting this scenario is the very poor performance you are reporting when the meshnode is actually connected.
  2. Decrease the transmitted power. This seems counter intuitive, but in fact can reduce the severity of multipath reflections, ie reflected signals will be attenuated in proportion to direct path, resulting in a higher S/N ratio.
  3. Add an intermediate meshnode.

Why do you have this in the first place? Is that the video you watched giving you very outdated information?

You should be using wpad-mesh-mbedtls or even wpad-mbedtls (full version). These are the defaults and should only be changed if you have a very good reason to and an outdated video is not a good reason.

Yes, your meshnodes are ath10k. From our previous discussions on another thread, you need to make sure you are NOT using the -ct versions.

By now you are probably realising that the numerous "gotchas" you found with Mesh11sd apply equally to Batman, because the underlying 802.11s transport is the same in both.

If I can take the points you raised in reverse order as that's probably simplest for me:

  1. I'm definitely using the ath10k non-ct versions as I built the OpenWRT firmware with the non-ct packages

  2. I installed the wpad-mesh-wolfssl package because the OpenWRT Guide for BATMAN says you must install wpad-mesh-openssl or wpad-mesh-wolfssl

(as well as every other guide I read)

  1. Whilst it's not impossible that weak signal or noise is responsible for my problem, it does seem very unlikely:

a) the Lyra devices are really not very far apart (perhaps about 10-20ft) but they are in different rooms
b) there are a total of six Lyra devices & two EA8300 devices all dotted around rooms in a conventional two storey house. Even my Bluetooth earbuds & battery powered Zigbee sensors can all still work between the same rooms
c) my old 3 node TP-Link Deco S4 Mesh could mostly cover the whole house & my replacement TP-Link Deco X50 mesh has no problem with Deco units dropping off the mesh.
d) I thought the whole point of mesh, was that if a node loses Comms with it's 'associated station', it would then look to it's neighbours to try to rejoin the mesh via them? With my home setup, there's probably at least one, possibly two, maybe even three, other nodes within a reachable Comms distance of the disconnected node - why isn't it switching to them to rejoin the mesh? (and okay, it'll no longer be a star-network, but that surely doesn't matter with mesh??)

I really can't believe that an 8 node OpenWRT/BATMAN mesh covering the same two storey house, struggles to keep all nodes linked when compared to a 3 year old, 3 or 6 node commercial TP-Link mesh system??

Surely there is something else going on here??

Currently, the mesh node I have in my diningroom (in the next room to the mesh node I'm currently connected on) is 'offline' - I can't reach it.

Out in the hallway, is another mesh node that is halfway between me & the unreachable mesh node in the diningroom. I can connect to the mesh node in the hallway no problem.

So why hasn't the unreachable diningroom mesh node, re-associated itself to the hallway mesh node, allowing it to rejoin the mesh albeit with the extra hop of going via the hallway mesh node??

One other thing that might be worth mentioning: on my 8 node mesh network, I currently have just TWO client devices connected - my phone & one IP camera for testing. So there's negligible activity on the mesh network.

Batman is really designed for community networks rather than home networks, so some unique problems of microwave propagation inside occupied buildings are not taken into account. Sometimes this matters, sometimes it does not.... roll of a dice.

The TP-Link mesh system is not actually a mesh at all. It is a proprietary WDS-like system that is designed for home use and can work reasonably well within some limitations.

That in itself can be an issue. If you "enable" the kernel support for "mesh self healing", multiple nodes all within reach of each other can result in multiple possible paths, with active path changes, which in turn can, and will, cause TCP connections to drop amongst other things.
We have discussed this in other threads - you need to somehow actively control the HWMP mac-routing to make it work properly.
Batman does not bother as it has its own layer 3 routing algorithm that assumes you do not have multiple layer 2 paths with dynamic path changes working underneath it.

1 Like

Oversimplifying for clarity, think of it like this:
Batman's layer 3 routing algo can choose the best path out of existing layer 2 paths. But it has no mechanism to create paths at layer 2 like the HWMP protocol does.

Batman needs the layer 2 paths to be stable.

1 Like

Is there an 'idiots guide' for setting up WDS with OpenWRT (preferably with pictures! :grin:) ? Although nodes must connect to the router over one of the 5GHz radios (I can't cable them all back to the router).

I'm thinking that Mesh is probably overkill for my purposes. Very few things in my home 'move' between nodes, and if they do, a delay of a few seconds whilst the client picks an Access Point with a better signal, is no problem at all. In fact, the only two 'mobile' clients that occur to me are: my phone & the robo-vac!

That would apply to both mesh and wds. With wds you are making the paths fixed, but not guaranteeing they would stay up.

Roaming client devices have nothing to do with 80211s. I thought that was covered already.

The auto deployment of mesh11sd is no option for you then?

Well, I tried Mesh11sd before, but I struggled to understand the 'walkthrough' document (I gave @bluewavenet my feedback) and I couldn't manage to make a few customisations I wanted (like which radio to use for backhaul & a few other bits). In the end, I concluded mesh11sd was beyond my skill level.

Oh, and I discovered that Mesh11sd didn't support WPA (and I had quite a few older smart devices that only supported WPA) & also Luci is disabled by default & that would've left me feeling very insecure :grimacing:

It still is disabled on the v6.0.0 release ( eta imminent) but can be turned on.

Of course it doesn't. Mesh11sd is an 802.11s management daemon. WDS is an entirely different thing.

These are simple mesh11sd uci config settings, but no, you cannot set these from Luci.
Most people have 5 or 6 meshnodes to roll out so normally a custom firmware is made with Imagebuilder or Firmware Selector, then flash, flash flash and job is done.

If you go to Github and open an issue, someone on the project will talk you through it if you want to try the new version.
Mesh11sd Issues on Github

1 Like

I'm still getting MESH-SAE-AUTH-FAILED/MESH-SAE-AUTH-BLOCKED reported in my System Logs fairly regularly, but I think there has been a perceptible reduction since I logged into Luci on each node and set my Wireless Country Code thus:

option country 'GB'

Before setting it to GB, it was the default value (which I presume is undefined or perhaps US??)

Not sure why this would make a difference to MESH-SAE-AUTH-BLOCKED occurrences? Perhaps just my imagination & they'll soon start popping up again all the time!

The default is country "0". This probably is not the cause although would limit channel choices on 5GHz.

Background noise might be lower.
or
Something moved in your house - simply a door open when it was closed or closed when it was open can have this effect.

You could try running:
iw dev your_mesh_ifname station dump, replacing "your_mesh_ifname" with its actual name.

The output will show a list of directly connected nodes and lots of information including the signal and average signal levels in dBm.

That command seems to list a lot of stuff - none of which looks bad to me??

Station 10:7b:44:ce:05:xx (on phy2-mesh0)
        inactive time:  130 ms
        rx bytes:       99038290
        rx packets:     868524
        tx bytes:       27769205
        tx packets:     113909
        tx retries:     40765
        tx failed:      0
        rx drop misc:   773
        signal:         -66 [-70, -69, -95, -95] dBm
        signal avg:     -67 [-72, -71, -95, -95] dBm
        Toffset:        472094585 us
        tx bitrate:     325.0 MBit/s VHT-MCS 7 80MHz short GI VHT-NSS 1
        tx duration:    2646696 us
        rx bitrate:     292.6 MBit/s VHT-MCS 6 80MHz short GI VHT-NSS 1
        rx duration:    69122732 us
        last ack signal:-95 dBm
        avg ack signal: -95 dBm
        airtime weight: 256
        mesh llid:      0
        mesh plid:      0
        mesh plink:     ESTAB
        mesh airtime link metric: 26
        mesh connected to gate: no
        mesh connected to auth server:  no
        mesh local PS mode:     ACTIVE
        mesh peer PS mode:      ACTIVE
        mesh non-peer PS mode:  ACTIVE
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       long
        WMM/WME:        yes
        MFP:            yes
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        connected time: 29510 seconds
        associated at [boottime]:       134818.705s
        associated at:  1753335056201 ms
        current time:   1753364565410 ms
Station 10:7b:44:d5:12:xx (on phy2-mesh0)
        inactive time:  100 ms
        rx bytes:       98816416
        rx packets:     868757
        tx bytes:       28335614
        tx packets:     115781
        tx retries:     39234
        tx failed:      0
        rx drop misc:   1259
        signal:         -64 [-66, -68, -95, -95] dBm
        signal avg:     -66 [-70, -72, -95, -95] dBm
        Toffset:        95947005 us
        tx bitrate:     351.0 MBit/s VHT-MCS 8 80MHz VHT-NSS 1
        tx duration:    2702863 us
        rx bitrate:     325.0 MBit/s VHT-MCS 7 80MHz short GI VHT-NSS 1
        rx duration:    73873824 us
        last ack signal:-95 dBm
        avg ack signal: -95 dBm
        airtime weight: 256
        mesh llid:      0
        mesh plid:      0
        mesh plink:     ESTAB
        mesh airtime link metric: 26
        mesh connected to gate: no
        mesh connected to auth server:  no
        mesh local PS mode:     ACTIVE
        mesh peer PS mode:      ACTIVE
        mesh non-peer PS mode:  ACTIVE
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       long
        WMM/WME:        yes
        MFP:            yes
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        connected time: 29507 seconds
        associated at [boottime]:       134820.053s
        associated at:  1753335057549 ms
        current time:   1753364565411 ms
Station ea:9f:80:a5:84:xx (on phy2-mesh0)
        inactive time:  90 ms
        rx bytes:       97990038
        rx packets:     861422
        tx bytes:       27716650
        tx packets:     113745
        tx retries:     43897
        tx failed:      2
        rx drop misc:   780
        signal:         -72 [-71, -84, -95, -95] dBm
        signal avg:     -72 [-72, -84, -95, -95] dBm
        Toffset:        18446744073638271743 us
        tx bitrate:     325.0 MBit/s VHT-MCS 7 80MHz short GI VHT-NSS 1
        tx duration:    2542739 us
        rx bitrate:     325.0 MBit/s VHT-MCS 7 80MHz short GI VHT-NSS 1
        rx duration:    67307284 us
        last ack signal:-95 dBm
        avg ack signal: -95 dBm
        airtime weight: 256
        mesh llid:      0
        mesh plid:      0
        mesh plink:     ESTAB
        mesh airtime link metric: 26
        mesh connected to gate: no
        mesh connected to auth server:  no
        mesh local PS mode:     ACTIVE
        mesh peer PS mode:      ACTIVE
        mesh non-peer PS mode:  ACTIVE
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       long
        WMM/WME:        yes
        MFP:            yes
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        connected time: 29452 seconds
        associated at [boottime]:       134891.829s
        associated at:  1753335129325 ms
        current time:   1753364565412 ms
Station 10:7b:44:d5:12:xx (on phy2-mesh0)
        inactive time:  60 ms
        rx bytes:       95663451
        rx packets:     843040
        tx bytes:       28853794
        tx packets:     114928
        tx retries:     42528
        tx failed:      0
        rx drop misc:   263
        signal:         -55 [-71, -68, -95, -95] dBm
        signal avg:     -57 [-63, -60, -95, -95] dBm
        Toffset:        18446744072077562733 us
        tx bitrate:     650.0 MBit/s VHT-MCS 7 80MHz short GI VHT-NSS 2
        tx duration:    2317090 us
        rx bitrate:     585.1 MBit/s VHT-MCS 6 80MHz short GI VHT-NSS 2
        rx duration:    59992768 us
        last ack signal:-95 dBm
        avg ack signal: -95 dBm
        airtime weight: 256
        mesh llid:      0
        mesh plid:      0
        mesh plink:     ESTAB
        mesh airtime link metric: 13
        mesh connected to gate: no
        mesh connected to auth server:  no
        mesh local PS mode:     ACTIVE
        mesh peer PS mode:      ACTIVE
        mesh non-peer PS mode:  ACTIVE
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       long
        WMM/WME:        yes
        MFP:            yes
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        connected time: 27886 seconds
        associated at [boottime]:       136466.915s
        associated at:  1753336704411 ms
        current time:   1753364565413 ms
Station 10:7b:44:ce:05:xx (on phy2-mesh0)
        inactive time:  40 ms
        rx bytes:       84447015
        rx packets:     769776
        tx bytes:       21745420
        tx packets:     89610
        tx retries:     41267
        tx failed:      4
        rx drop misc:   202
        signal:         -66 [-67, -71, -95, -95] dBm
        signal avg:     -67 [-70, -73, -95, -95] dBm
        Toffset:        18446744072026199664 us
        tx bitrate:     195.0 MBit/s VHT-MCS 4 80MHz short GI VHT-NSS 1
        tx duration:    2409920 us
        rx bitrate:     260.0 MBit/s VHT-MCS 3 80MHz short GI VHT-NSS 2
        rx duration:    43813808 us
        last ack signal:-95 dBm
        avg ack signal: -95 dBm
        airtime weight: 256
        mesh llid:      0
        mesh plid:      0
        mesh plink:     ESTAB
        mesh airtime link metric: 43
        mesh connected to gate: no
        mesh connected to auth server:  no
        mesh local PS mode:     ACTIVE
        mesh peer PS mode:      ACTIVE
        mesh non-peer PS mode:  ACTIVE
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       long
        WMM/WME:        yes
        MFP:            yes
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        connected time: 27535 seconds
        associated at [boottime]:       136792.851s
        associated at:  1753337030346 ms
        current time:   1753364565413 ms
Station 10:7b:44:d5:0e:xx (on phy2-mesh0)
        inactive time:  130 ms
        rx bytes:       89188678
        rx packets:     788072
        tx bytes:       26667237
        tx packets:     105274
        tx retries:     37016
        tx failed:      0
        rx drop misc:   531
        signal:         -66 [-67, -72, -95, -95] dBm
        signal avg:     -69 [-71, -77, -95, -95] dBm
        Toffset:        18446744070556058309 us
        tx bitrate:     325.0 MBit/s VHT-MCS 7 80MHz short GI VHT-NSS 1
        tx duration:    2517291 us
        rx bitrate:     292.6 MBit/s VHT-MCS 6 80MHz short GI VHT-NSS 1
        rx duration:    73241236 us
        last ack signal:-95 dBm
        avg ack signal: -95 dBm
        airtime weight: 256
        mesh llid:      0
        mesh plid:      0
        mesh plink:     ESTAB
        mesh airtime link metric: 26
        mesh connected to gate: no
        mesh connected to auth server:  no
        mesh local PS mode:     ACTIVE
        mesh peer PS mode:      ACTIVE
        mesh non-peer PS mode:  ACTIVE
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       long
        WMM/WME:        yes
        MFP:            yes
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        connected time: 26354 seconds
        associated at [boottime]:       137978.107s
        associated at:  1753338215602 ms
        current time:   1753364565414 ms
Station 10:7b:44:ce:06:xx (on phy2-mesh0)
        inactive time:  10 ms
        rx bytes:       12155827
        rx packets:     104933
        tx bytes:       3833123
        tx packets:     15643
        tx retries:     5086
        tx failed:      1
        rx drop misc:   24
        signal:         -77 [-80, -80, -95, -95] dBm
        signal avg:     -72 [-76, -77, -95, -95] dBm
        Toffset:        18446744047974907875 us
        tx bitrate:     351.0 MBit/s VHT-MCS 8 80MHz VHT-NSS 1
        tx duration:    379842 us
        rx bitrate:     260.0 MBit/s VHT-MCS 3 80MHz short GI VHT-NSS 2
        rx duration:    7398572 us
        last ack signal:-95 dBm
        avg ack signal: -95 dBm
        airtime weight: 256
        mesh llid:      0
        mesh plid:      0
        mesh plink:     ESTAB
        mesh airtime link metric: 27
        mesh connected to gate: no
        mesh connected to auth server:  no
        mesh local PS mode:     ACTIVE
        mesh peer PS mode:      ACTIVE
        mesh non-peer PS mode:  ACTIVE
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       long
        WMM/WME:        yes
        MFP:            yes
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        connected time: 3774 seconds
        associated at [boottime]:       160554.790s
        associated at:  1753360792285 ms
        current time:   1753364565415 ms

It's generated on the main Linksys EA8300 (my router) which is mostly at the centre of the house. It seems to me that all 7 other stations are 'linked' to the router okay? (one of them is linked by cable on my 'batwire' interface but presumably still also has an active wireless 'batmesh' interface with 'avoid bridge loops' on everywhere).

Since about 8am this morning (it's now after 3pm) when I set the Wireless Country Code to 'GB', I don't seem to have had a single MESH-SAE-AUTH-BLOCKED in the System Log on any of my 8 mesh nodes, but I do still see a few MESH-SAE-AUTH-FAILED.

The channel for my mesh backhaul is fixed on Channel 36, so I don't understand why setting the Country Code would have any effect? I don't think I changed anything else.

Incidentally, logging into Luci on 8 different OpenWRT devices to look at the System Log, is a pain in the backside - is there an easier way from a single device? Is setting up OpenWISP a nightmare to do? (given that I've never heard of 'Ansible' before)

Setup rsyslog on i.e. the main router and configure all AP to remote log to that rsyslog server.

Btw avoid bridge loop is enabled for obvious reasons by default anyway.

00 allows for 20 dBm (100mW) on channel 36, GB allows for 23 dBm (200mW) on channel 36, which is twice the power output. That's a difference. But that shouldn't cause the "BLOCKED" thing (I guess that doesn't include "blocked because of hitting any kind of quality threshold").

It's a red herring. Like I said, something changed, something trivial for the household...

Your station dump shows the problem.
All the nodes are connecting directly to the router.
But they can also connect to each other.

Batman runs with HWMP in its minimal passive mode, so the paths from one node to the other will depend only on the signal to noise ratio (noise is not just background, it is also multipath reflections and interference patterns that can change rapidly (physics of electromagnetic waves).

The path used at any one moment is chosen by the value of the mesh airtime link metric, the lowest wins.

The key things to look at in the station dump output are the signal and signal average.

For this node the average signal is -72 dBm, calculated over a small moving average time window.
This is marginal.
But the worst average value is -84dBm. This is unusable and will result in a path change / hop count change, causing a dropout for several seconds.

There are several ways to fix this problem - here are the best two.

  1. Enable active HWMP - unfortunately Batman cannot do this, neither can LuCi.
  2. Very carefully tune tx-power, mesh_rssi_threshold and physical location.

Number 2 is your only option.

Gosh, that's useful to know! Can it go higher than 23dBm? What if I tested setting the Country Code to US rather than GB to see if that made things better (or worse)?