Mesh failing using gateway, base point, and mesh point

brainchild · January 1, 2022, 10:52pm

I have been engaged furiously in a process of learning the basic principles of mesh networks, and how to configure using OpenWrt, trying to build a trivial deployment.

I have learned quite a bit, but as my recent frustrations suggest, I am still facing a few omissions or misconceptions in my understanding.

I attempted a short guide on mesh networks in OpenWrt, and appreciate comments for any clarifications.

Otherwise, I need help understanding why I have not yet succeeded in building a simple mesh network.

My previous configuration consisted of an appliance serving as an ethernet-only (no wirelesss) NAT router, operating as my gateway, with an Ethernet connection to a functioning OpenWrt AP (Cudy x6) . The appliance providing NAT routing also provides DHCP and DNS services. The OpenWrt device is assigned a static IP address.

My objective is to add a second OpenWrt device, as a mesh point, using the first OpenWrt device as the mash base.

I successfully joined the second device as a client to the primary AP. I also successfully provisioned master-mode networks of the same SSID and WPA/WPA2 key as the primary AP.

I have provisioned, hopefully successfully, 802.11s networks complementing the master-mode networks, on the second device.

Currently, the second device is able to connect internally to the primary AP, the router, and the internet, demonstrated by pulling package updates through Luci and pinging from a shell prompt.

However, clients refuse to maintain wireless connections to it. The administrative portal seems to show temporary connections, but I believe the clients are closing them due to problems acquiring a DHCP lease.

The target configuration is to support the clients connecting to the second device as a mesh point, and acquiring leases to the DHCP server on the NAT appliance, with the base AP as the IP intermediary. It seems my configuration, however, omits some aspect required to support such behavior.

To review, the following represents the target sequence for clients connecting:

Client finds mesh point SSID and connects with credentials.
Client requests DHCP lease.
Lease request is forwarded from mesh point to mesh base.
Lease request is forwarded from mesh base to router.
Lease request is returned to client.
Subsequent normal IP traffic opens using leased IP address for client connected to mesh point.

With the non-mesh configuration, clients are successfully connecting to the base AP, and leases are forwarded from it to the router.

The sequence is not completing, however, with the additional mesh hop.

Following are some question I have considered, but which have lead so far to experiments only being unsuccessful:

How should I configure the Lan bridge network on the second device? When I tried joining it to the subnet of the client network and the base AP, it broke routing of outbound connections.
Should I bridge any combination of the existing Lan bridge, the client networks, or the mesh networks, on the second device?
Which network, the client, the Lan, or another, should be given as the target network for the mesh mode wireless networks, on the second device?
Do I need any special provisioning on the base, to accommodate mesh points joining it in a working mesh network?

It would certainly be very helpful for someone to clarify these questions, as I suspect in one of them is where my solution rests.

bluewavenet · January 2, 2022, 3:26pm

The misconception that causes most problems for people is that IP routing is somehow built in to an 802.11s network. It is not. An 802.11s network knows nothing about nor even cares about IP traffic.
802.11s is a layer 2 protocol. IP is layer 3 and is blindly carried by the layer 2 infrastructure.

The important thing to achieve first is to establish the layer 2 infrastructure. The .11s "protocol" builds a multi-point to multi-point layer 2 routing table based on mac addresses.
Once established the mesh network is self configuring and self healing with potentially multiple paths from any one node to another, with each node's layer 2 routing table version giving it the best route to any other node.

There are many parameters that can be tuned in the mesh network. Timeouts are extensively used for detecting active nodes and keep the layer 2 routing tables fresh. If continuous traffic passes from each node, to the others, then all will continue, but if any given node does not contribute any packets, it will time out and drop from the mesh network. As soon as that node tries to send layer 2 packets again it will rejoin the mesh, a process that can take some time. This is perhaps a little over simplified, but should indicate what is happening in your case.

There is no such thing as a "base" as far as the 802.11s network is concerned, all mesh nodes are equal by default.

Some nodes may just contribute to the infrastructure. These nodes are just that - plain and simple nodes that relay packets to and from other nodes. These should sensibly named Mesh Nodes

Some nodes may provide a gateway from a WiFi access point to the mesh, and some may provide a gateway from the mesh to an internet feed. These are "Mesh Nodes with a Gateway". So lets call them Mesh Gateways

Now, you may well ask, how do we make all the nodes in our mesh infrastructure stay on standby, ready to send and receive packets instantly, rather than timeout and go to sleep if there was no traffic for a while?

The key is in the previously mentioned "many parameters that can be tuned in the mesh network".

To cut a long story short - The parameter we most need is "mesh_gate_announcements".
This parameter is intended to allow important gateways to periodically announce their presence on the mesh. There should be at least one node with this parameter set. All other nodes will (hopefully) see the announcement and do whatever is needed to "stay awake".
It would be sensible to have a number of these announcing nodes for resilience, but in large networks if all nodes were configured in the way, there would be a potentially large traffic overhead. In practice though, this is only significant for large networks.

In OpenWrt though, there is a small problem. Config[uration] is done in the uci config file and this is acted upon only as the network interface is being brought up.
The problem is that some mesh parameters can only be set once the interface is up.
The mesh_gate_announcements parameter is one of these.

In OpenWrt we must set mesh_gate_announcements to "1" using the iw utility once the mesh interface is up. This is best achieved by running a script at boot time (at least), making sure the mesh interface is up before setting.

Another very important parameter to set is mesh_rssi_threshold. Setting this to "0" is very bad.
A zero value means "try to connect regardless of signal strength" and results in very unreliable connectivity if the signal is not good enough.
An excellent setting as a starting point is mesh_rssi_threshold = -80
Values more negative than -80 result in rapidly decreasing performance if the nodes are moved apart for example. Increasing towards zero has the opposite effect at the expense of range. BUT do not set to 0!
For larger networks you might want to set the parameters "mesh_hwmp_rootmode" and "mesh_max_peer_links" but that is getting into the details of optimisation we do not need to go into here.

Once the mesh layer 2 infrastructure is established, it will be working pretty much just like a large "virtual unmanaged layer 2 switch".
Just as with a real physical unmanaged layer 2 switch, the mesh layer 2 network will fully support a layer 3 IP network (arp and dhcp being the layer 2 links for establishing IP connectivity).

TL;DR Use the iw utility to set mesh_gate_announcements to "1"

brainchild · January 2, 2022, 8:03pm

Thanks for the detailed explanation.

Let me put aside for the moment the issue of announcements for maintaining mutual awareness of the devices, because I feel I first need more fully to understand the purpose and function of the 802.11s network and its relation to routing.

Do I understand correctly from your comments that every AP participating in a mesh configuration must be provisioned for the 802.11s network, whether or not it is also the only one providing a link to cabled infrastructure?

Do the various APs that constitute the mesh network need to be reachable to each other by routed L3 connections independent of the mesh network, or does L3 connectivity emerge as an effect of the mesh network, by joining all the nodes in an L2 connection?

I read that the mesh protocols depend on IP protocols, and so joining nodes as meshed is not possible unless they already have been configured for routed L3 connections to one another.

bluewavenet · January 3, 2022, 7:58am

Yes you understand correctly.

No they do not need to be reachable to each other by L3 connections for the 803.11s mesh to work.

Yes.

802.11s mesh protocol does not depend on IP protocol for any purpose.

This is not true. The easiest way to configure 802.11s mesh nodes is to initially configure them as Wireless Access Points and connect each in turn by ethernet to your ISP router where it will get an IP connection via dhcp. You can then SSH or LuciUI to it and set up the mesh interface. Thereafter the ethernet port[s] on the newly configured mesh node become ports on the "virtual layer 2 switch" that the mesh network creates.

Of course, you will leave one of the mesh nodes connected to the ISP router by ethernet so it can become the Mesh Gateway to the Internet.

Lynx · January 3, 2022, 8:34am

In what situations is a mesh network to be preferred over WDS? Presumably the vast majority of users have a fixed number of fixed location points (rather than a varying number of points that vary in location). Say a router and two further access points. In such cases is WDS not always preferable? My own experience with three RT3200s is that mesh worked reliably for a while with snapshot upgrades but months ago something broke and now mesh is unusable because a node eventually becomes inaccessible until the main router is rebooted, so I reported the issue and gave up and went to WDS instead, which so far has seemed reliable. I think the bandwidth throughout I see through WDS is also higher.

brainchild · January 5, 2022, 7:20am

Well, now I am beginning to understand, as I had many incorrect ideas that obviously would never have lead to a useful outcome.

The idea of identically provisioning each device from a direct connection through the router makes me see the symmetrical relation among the nodes in the mesh network, and it also leads me to a few observations.

First, the way mesh networks are portrayed in OpenWrt, at least through Luci, I think is misleading. When 802.11s is provisioned, an association is made with an existing "network" on the device. In fact, this representation is misleading, understanding a network, to practical approximation, as a set of entities of various MAC addresses that have in common how they are reached by routing. In fact, a network, in such use, is an L3 concept, whereas 802.11s, as you noted, is L2. I would then understand that the meaning of this selection is that the 802.11 system would be provisioned on whatever wireless interfaces happen to be associated with the selected network, but not in any robust sense, the network itself.

Second, though your method for provisioning is simple, it seems to allow no possibility for making an administrative connection to a device after being removed from the router. In fact, it is an awkward scenario, without an assigned IP address, routing to the device, and therefore, administrative access is not possible, unless further configuration is added. Would it be possible to add an additional virtual network, with an assigned IP address, or as a DHCP client, to be used for administrative access, on a the device otherwise serving only as a mesh node?

Third, it may be useful, if possible, to support more than one node in a mesh network with cabled connection, to avoid a single point of failure for the uplink. Yet physical constraints may offer no way to connect both devices by by cable, except on separate subnets. If the two devices have no L2 link, how may they utilize a L3 connection to support failover for outbound routing?

bluewavenet · January 5, 2022, 9:34am

I suppose it could be seen that way, but only if the admin user takes the simplistic view that all networks are IP networks, a view that is of course far from the truth.

This is not at all true. You just need to look up the node's ip address from the ip router's dhcp, then use the same tools you used to set up that node. In fact you will find the node will most likely always have the same ip address as it did when you set it up (dhcp works that way) and it is a simple task to make sure it gets the same ip allocation in the dhcp configuration.

Possible, but totally unnecessary.

Yes this is indeed possible. It is however purely an ip routing issue and there are many ways to achieve this. Don't forget to think of the mesh network as a virtual layer 2 switch.

There are several OpenWrt packages you can install to support multiple ip gateways such as olsrd, quagga-ospf and bmx

brainchild · January 5, 2022, 10:07am

To me it feels as the way the term is portrayed in the interface provided by Luci, even if the term itself is more general. When I opened the settings for the network automatically provisioned as Lan, the forms prompted me to choose static versus dynamic IP allocation, and such a choice is only meaningful in the context of a L3 network.

How could the node have an IP address if of its own if it shares an L2 link with the node connected to a router?

How could an L2 switch connect two L3 networks without performing routing?

bluewavenet · January 5, 2022, 10:10am

WDS is a point to point layer 2 wireless extension method.
802.11s mesh is a fully autonomous layer 2 mac-routing protocol.
If you just have two devices, then there will be very little difference between WDS and 802.11s, except that:

Not all wireless drivers support 802.11s (eg Broadcom)
Not all wireless drivers support WDS (eg Broadcom)
WDS has to be set up for a particular pair of devices
802.11s can have the same config for any number of devices

TL;DR Make your own choice depending on the situation.

This is irrelevant, neither WDS nor 802.11s are designed to provide a "mobile" infrastructure (ie they are not like a cell phone network) .

That is the nature of the "daily" snapshots - things break. Use the official release unless you are experimenting/testing.

There might be slightly less overhead with WDS, but in general practice it would normally be unnoticeable. For some wireless drivers, it could be more of an issue perhaps.

Lynx · January 5, 2022, 10:17am

Thanks for your insight.

OK so dumb question time.

Firstly I thought movement was highly significant because with 802.11s relative movement between the nodes and adding and subtracting nodes can result in reconfiguration in terms of paths, whereas with WDS this is not possible?

Otherwise, in what situations should one contemplate use of 802.11s rather than WDS? Can you give a couple of examples? Is the situation that WDS requires extension nodes to connect to fixed single base node, whereas with MESH you could have extensions from extensions?

bluewavenet · January 5, 2022, 10:25am

Lan - local area network. This does not mean local area IP network.
Like I said "the simplistic view that all networks are IP networks, a view that is of course far from the truth". OpenWrt/Luci is not a UI for unknowledgeable home users - ie some knowledge is required - but the nature of it is that you can, with the right mind set, very rapidly learn.

Yes, you have the choice of adding IP support on top of the basic network configuration.

Because it provides the underlying L2 infrastructure that an L3 network requires.

bluewavenet · January 5, 2022, 10:39am

This is true, but nodes become multi peer to multi peer link devices in the underlying infrastructure. If you move a node then the infrastructure will reconfigure itself for the new location. But if, for example, you put a node in your car and drive through the mesh "zone" then your car node will never establish a proper multi peer to multi peer link status.

If the hardware is 802.11s compatible, then consider it in all cases.

Pretty much. Also, WDS does not support the multi path redundancy that 802.11s gives you.

Lynx · January 5, 2022, 10:45am

Thanks. Since your responses are so insightful, and I am likely testing your patience, but what about the typical home scenario in which there is very often just one main router and say two or three wireless extension points.

Would the following be reasonable:

if the main router sits approximately in the middle of the wireless extension points, use WDS (with all extensions points connected to the main router)
else, use 802.11s.

The main consideration being the geometry of the mesh and location of the router relative to access points. Since if the router is not located in the middle then this may hurt WDS.

Are there any studies on the relative throughputs between different technologies? I came across one but it was very outdated.

https://www.semanticscholar.org/paper/Efficiency-of-WLAN-802.11xx-in-the-multi-hop-Kubal/14715c0bfd1f2120c8afca6676d3d16d4ea1b582

This includes OLSR.

brainchild · January 5, 2022, 10:53am

I need some further guidance on how to get an admin portal accessible on the mesh nodes not connected to the router by cable. Once I provision the 802.11s network, how do make the same node into a DHCP client?

Lynx · January 5, 2022, 10:56am

The documentation is a little esoteric but most individuals find this helpful:

bluewavenet · January 5, 2022, 10:58am

If the hardware supports it, why use anything other than 802.11s? If you move things around you can indeed break WDS, whereas 802.11s will reconfigure itself.

Like I said, WDS probably has a slightly lower overhead, but is the difference noticeable? In most scenarios, no.

OLSR is an IP routing protocol that sits on top of a layer 2 infrastructure so is irrelevant as far as testing WDS against 802.11s, both of which are to provide the layer 2 links.

bluewavenet · January 5, 2022, 11:46am

If you understand my suggestion of doing the initial configuration with an ethernet connection to your router, then all you need is an extra step or so.
When configuring, give the node a unique host name eg meshnode1
Then access the node by name eg https://meshnode1.lan should take you into Luci on the node even after putting it in place as a mesh node.

brainchild · January 5, 2022, 12:06pm

So would you have multiple IP addresses on the same interface?

bluewavenet · January 5, 2022, 3:33pm

No. Why do you think so?
The bridge interface on the mesh node will be a dhcp client so it will get an IP address in exactly the same way that it did when configuring with an ethernet connection. This is independent of the mesh configuration. DHCP client does a layer 2 broadcast for DHCP servers - this is how DHCP works.

brainchild · January 5, 2022, 10:57pm

The 802.11s network joins the two bridge interfaces on either node, so any IP address assigned to one must be assigned to the other, at least according to my best understanding from all the of information presented so far. Then, it would be impossible to assign an IP address to one device but not to assign the same address to the other device.