Mesh11sd configuration with existing mesh

I'm trying to set up mesh11sd 3.1.0 with an existing mesh. That is, I defined the mesh network through the web interface, it is up and running and working (2 nodes). I've installed the mesh11sd and kmod-nft-bridge packages on both nodes (initially only installed mesh11sd, but got a warning in the logs about installing kmod-nft-bridge).

All of that went fine as far as I can tell and the mesh is still working, but I'm getting

Error getting mesh interface name

in the syslog every few seconds. The mesh11sd status output doesn't list any peers or active stations even though the mesh is up and working. I've tried the auto configuration and manually configuring it, with the same result. For the manual configuration, I don't know how to point it to the existing mesh -- is this picked up automatically from /etc/config/wireless or do I need to do something for that?

mesh11sd status output:

{
  "setup":{
    "version":"3.1.0",
    "enabled":"1",
    "procd_status":"inactive",
    "portal_detect":"1",
    "portal_channel":"default",
    "mesh_basename":"m-11s-",
    "auto_config":"0",
    "auto_mesh_network":"lan",
    "auto_mesh_band":"2g40",
    "auto_mesh_id":"92d490daf46cfe534c56ddd669297e",
    "mesh_gate_enable":"1",
    "txpower":"24",
    "mesh_path_cost":"10",
    "checkinterval":"10",
    "interface_timeout":"10",
    "ssid_suffix_enable":"1",
    "debuglevel":"1"
  }
  "interfaces":{
    "phy0-mesh0":{
      "mesh_retry_timeout":"100",
      "mesh_confirm_timeout":"100",
      "mesh_holding_timeout":"100",
      "mesh_max_peer_links":"99",
      "mesh_max_retries":"3",
      "mesh_ttl":"31",
      "mesh_element_ttl":"31",
      "mesh_auto_open_plinks":"0",
      "mesh_hwmp_max_preq_retries":"4",
      "mesh_path_refresh_time":"1000",
      "mesh_min_discovery_timeout":"100",
      "mesh_hwmp_active_path_timeout":"5000",
      "mesh_hwmp_preq_min_interval":"10",
      "mesh_hwmp_net_diameter_traversal_time":"50",
      "mesh_hwmp_rootmode":"0",
      "mesh_hwmp_rann_interval":"5000",
      "mesh_gate_announcements":"0",
      "mesh_fwding":"1",
      "mesh_sync_offset_max_neighor":"50",
      "mesh_rssi_threshold":"-80",
      "mesh_hwmp_active_path_to_root_timeout":"6000",
      "mesh_hwmp_root_interval":"5000",
      "mesh_hwmp_confirmation_interval":"2000",
      "mesh_power_mode":"active",
      "mesh_awake_window":"10",
      "mesh_plink_timeout":"0",
      "mesh_connected_to_gate":"0",
      "mesh_nolearn":"0",
      "mesh_connected_to_as":"0",
      "interface":"unmanaged"
    }
  }
}

According to this, it has identified the mesh interface (phy0-mesh0) correctly.

Relevant part of /etc/config/wireless:

config wifi-iface 'wifinet2'
	option device 'radio0'
	option mode 'mesh'
	option encryption 'sae'
	option mesh_id 'XXX'
	option mesh_fwding '1'
	option mesh_rssi_threshold '-80'
	option key 'XXX'
	option network 'lan'

Configuration and output is the same on both mesh nodes. Happy to provide more information. Thanks for your help!

I would suggest you first read the user guide here:
https://openwrt.org/docs/guide-user/network/wifi/mesh/mesh11sd

The 3.x.x versions of mesh11sd default to an autoconfigure mode and will incompatible with any "mesh config" you may have done in Luci.

Note, the mesh11sd configuration is not normally written to the /etc/config files so cannot be seen or changed by Luci.

You can of course turn off autoconfigure (although there is a bug in v3.1.0 we will touch on later). Manual configuration is aimed at users that are very experienced with 802.11s mesh configuration and even then not recommended for a first install.

The simplest way to get started is to flash all nodes with the same configuration which would be:
Using the Firmware Selector, create an image with:

  1. wpad-basic-mbedtls replaced with wpad-mesh-mbedtls
  2. kmod-nft-bridge added
  3. mesh11sd added

This can be with any model of router (they do not have to be the same).

Note: If a node uses the ath10k drivers, you need to change to the non-ct versions

Reflash all the nodes with the relevant firmware for each node that you just generated.

Connect the wan port of one of them to a lan port of your isp router and power it up.

Power up all the other nodes with no ethernet connection.

Wait a few minutes.

You should see the default ssid of each node in the form of OpenWrt-2g-xxxx where xxxx is the last 4 digits of the node's mac address.

Let me know if this makes sense and if you have any questions....

1 Like

Thanks -- all nodes are running the same hardware (not ath10k) and have the same configuration (wpad-mbedtls, kmod-nft-bridge, mesh11sd). All of this is working fine (including the mesh). My question is how I can mesh11sd to work with this existing mesh rather than starting again from scratch.

How many nodes do you have and do you have any special non-mesh configs?
Or is it just a basic OpenWrt install?

Can I assume you have the "remote" nodes configured with a "Wireless AP aka Dumb Ap type of config?
If you do, unless you are well experienced, I really would advise to start again, it should be very quick and simple to let mesh11sd do all the work.

What is the hardware?

If using the existing configs is mandatory, then you really need mesh11sd v3.1.1 (available in master/snapshot - will be in 23.05 as soon as the OpenWrt routing build system is fixed - there is a current issue there preventing new versions of packages being added).

Two nodes. One is essentially a dump AP, the other one has DNS, Adblock, and a few other things that make the config not really basic anymore -- this is the main reason I don't want to start again from scratch (and then of course also that the mesh works without mesh11sd).

Hardware are two Dynalink DL-WRX36.

I would say that I'm reasonably experienced and happy to try advanced stuff, but it also sounds like the answer is essentially "wait for 3.1.1"?

Two nodes will often work without mesh11sd but can be unstable.
More nodes and you will start to hit big problems.

You can download 3.1.1 from here and it will work safely on 23.05 (it will be identical)
https://downloads.openwrt.org/snapshots/packages/aarch64_cortex-a53/routing/

Totally up to you if you want to try.
The end result will be no different to what you currently have apart from potentially being more stable and able to support more meshnodes (of any make/model with 802.11s.support - and that is just about all except devices with Broadcom wireless).

If you want to try just let me know and I can talk you through it (downloading, installing, setting autoconfig to off etc.)

1 Like

hi i has actually this packages

libustream-openssl wpad-openssl mesh11sd

for mesh i will do add in more wpad-mesh-openssl or delete first wpad-openssll

thanks

Thanks, I'll try this version and get back to you.

wpad-openssl is fine. It is the largest wpad version, but if you have lots of free flash it is no problem.

See the user guide I linked in my first reply, it shows a list of suitable wpad versions.
Quoting from there:

You could, according to your own requirements, install one of the other wad options ie:

wpad-mbedtls
wpad-mesh-wolfssl
wpad-wolfssl
wpad-mesh-openssl
wpad-openssl

1 Like

Remember you need to configure manual mode. See:
https://openwrt.org/docs/guide-user/network/wifi/mesh/mesh11sd#manual_configuration

1 Like

The new version doesn't seem to make a difference. I'm still getting lots of "Error getting mesh interface name" messages in the log. I've increased the debug level, and I'm also seeing "Path to station [ ] is stable". It also looks like mesh11sd is restarting itself every few minutes (getting message about configuration options in the log) -- I've also noticed high CPU usage when mesh11sd is running, which would be explained by frequent restarts.

So the new version seems to behave exactly like the old version -- no active stations or peers in the mesh11sd status output. This is for both auto configuration and manual configuration.

Did you configure manual mode?

Yes, I tried both manual and automatic mode. No difference as far as I can tell.

For your existing mesh config, automatic mode will not work.

To set manual mode:

service mesh11sd stop
uci set mesh11sd.setup.auto_config='0'
uci commit mesh11sd
service mesh11sd start

You must do this on every node.

Yes -- that's precisely what I did, on each node, with the old version and the new, which made no difference.

It looks like the problem is that line 2533 in mesh11sd assumes that the ifname is set in the configuration -- this isn't the case if the mesh is defined in the web interface and this is where it fails. If I "override" the interface name it is set in the UCI config and this step succeeds.

It looks like my interface takes quite a bit of time to be up though for whatever reason and the retries/sleeps weren't enough. mesh11sd then set the MAC address of the mesh interface, which on one node caused all sorts of issues -- the interface was permanently dead and didn't come up again until I reset the MAC address manually to what it was before.

And now the mesh isn't working anymore while mesh11sd is running. The nodes are not meshing. I don't see anything in the mesh11sd log output...

It should be and is an error if it is not. Is this a bug in Luci?

Depending on wireless drivers, if a wireless interface has the same mac address as another, either the mesh interface or the AP interface may fail to come up. In addition the possibility of mesh bridge loop storms is opened due to duplicate mac addresses.

Setting both wireless interface names and mac addresses should be a part of the manual configuration.

When mesh11sd corrects the mesh interface mac address, connections if they existed would be dropped for a short interval and then re-establish, sometimes taking a few minutes to do so.

It should be and is an error if it is not. Is this a bug in Luci?

Well, whether it's a bug or intended behavior, Luci does not set either MAC address or ifname unless it's explicitly specified (and both are optional).

The MAC addresses of the AP and mesh interface were indeed the same -- I've changed that so that they are different now.

Still no luck with the daemon though. I get "[iface] is not up - giving up for now." after a while (increased timeout to 90 seconds without luck) and nothing happens. There's nothing else in the log, even if I increase the log level to debug. As soon as I stop mesh11sd everything works -- the mesh interface comes up, the nodes connect and even mesh11sd status shows the expected output.

The "dump" AP doesn't have a wired backhaul connected by default -- is it possible that this is where the problem lies? The refresh_bridgemac() function might fail because the bridge interface isn't up (no cables connected, no mesh node connected). It doesn't look to me like this would cause an issue, but I also don't understand the entire code...

If it is not set, an internally generated name is used and in some circumstances it can change if the wireless system is restarted or after a reboot.
It is expected in manual config mode that required configs are actually done manually.
Having said that it would be possible to parse iw or iwinfo output to try to detect the interface name. I will look into it.

However do users who want manual mode, want everything to be manual or not?

If the bridge was not up (on your Dumb AP), I would suggest its config is incorrect anyway. Wireless interfaces should always be linked to the bridge device. If the bridge ports are empty (ie no ethernet ports configured), the bridge must be configured to come up when "empty" to allow the wireless interface(s) to communicate.

I'll have to look at the "Dumb AP" user guide as it may well be incorrect, particularly now DSA is the standard. It does look like very many people have issues with "Dumb AP" due to that guide without bringing mesh into the picture.

However do users who want manual mode, want everything to be manual or not?

I'm fine with either, as long as it's documented what is expected.

I'll check the dump AP settings tonight.

When looking through the script, I saw a wifi function called in restart_mesh() and elsewhere, but I can't find the definition of it. What does it do?

Ok, mystery solved.

In the configuration of the mesh wireless interface, I had set an RSSI threshold of -80. In the mesh11sd configuration, I hadn't set this. So mesh11sd starts, sees that the RSSI threshold is different from its default of -65 and not set explicitly, changes the RSSI threshold, and restarts the mesh. This restarts the interfaces, which resets the RSSI threshold to -80 (as defined in the wireless config). Cue infinite loop.

Solution for me was to explicitly set the RSSI thresholds for both the wifi interface and mesh11sd to -65.

Could probably check if an RSSI threshold is configured for the mesh interfaces. It's also not clear to me that the wifi interface need to be restarted when the RSSI threshold changes, but I may be missing something...