Handover between mesh links

JobL · June 8, 2022, 12:51pm

Hi all,

Currently I am working on my bachelors thesis and for this I need to design a handover from one 802.11s connection to another between two nodes, as shown in the figure below.

Context:

Node 1 has one laptop and one wireless router WR 1.a
Node 2 has one laptop and two wireless routers: WR 2.a and WR 2.b
- node 2 has two WRs to find the strongest possible link to node 1
- On laptop 2 Wireshark is active in wireless monitor mode
Each WR runs OpenWRT; laptops run Linux. All devices have static IPs and DHCP disabled.
All radios are statically configured to have mesh i/f and be in the same MBSS
Transparent L2 connection between the laptops.
Handover is required from WR 2.a to WR 2.b
- Existing situation: connection from WR 1.a to WR 2.a
- Desired situation: connection from WR 1.a to WR 2.b
Laptops should experience as little effect of handover as possible in terms of packet loss, delay, etc. while transferring data.

All WRs have openWRT 21.02.3 with iw-full, kmod-ath9k and wpad-mesh-wolfssl installed using the imagebuilder. /etc/config/network is default (with enabled radio).

/etc/config/wireless:

config wifi-device 'radio0' 
    option type 'mac80211' 
    option path 'pci0000:00/0000:00:0e.0' 
    option channel '36' 
    option band '5g' 
    option htmode 'HT20' 
    option disabled '0' 

config wifi-iface 'mesh0' 
    option device 'radio0' 
    option network 'lan' 
    option mode 'mesh' 
    option ssid 'MaMaNet' 
    option mesh_id 'MaMaNet' 
    option encryption 'none' 
    option disabled '0'

Ifconfig output (truncated with ... and showing only relevant interfaces):

br-lan    Link encap:Ethernet  HWaddr 00:0D:B9:39:7B:10
          inet addr:192.168.1.7  Bcast:192.168.1.255  Mask:255.255.255.0
          ... 
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          ... 

wlan0     Link encap:Ethernet  HWaddr 04:F0:21:17:7B:7E
          ... 
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1 
          ...

With WR 2.b powered off, there is a functional mesh network in which I can ping all devices from both laptops. In order to measure mesh link setup time, I make WR 2.a leave the mesh, and subsequently join the mesh after which I can measure the time before the connection is usable (with pings).

Questions:

Why does ifconfig report wlan0 while the configured interface is called mesh0?
After WR 2.a joins the mesh, it takes 10-30 seconds before the first ping (laptop to laptop) is successful. Why does it take so long? (Wireshark shows that the Mesh Peering sequence is about 100 ms).
To facilitate the handover we want to disable automatic peering and only enable/disable specific peerings when needed. We did the following:
- WR 1.a powered on and issued: iw dev wlan0 set mesh_param mesh_auto_open_plinks=0. The setting is confirmed using iw dev wlan0 get mesh_param mesh_auto_open_plinks.
- Then WR 2.a is powered on and connects as usual. We expected WR 2.a to not connect to WR 1.a or to be blocked. How is this possible?

Thank you for your support!

bluewavenet · June 9, 2022, 3:15pm

This is misuse of the term "mesh network". In a mesh network all links are active at all times. The "best" link is chosen automatically and if one fails another quickly takes over.

Your digram shows three mesh nodes. A mesh node is a device that establishes links to all other nodes it can see within range and contributes to the overall layer 2 mesh network.

A mesh interface on a mesh node will not have an SSID.

No, you only have a point to point link using 802.11s.

Because you have named the config section but not named the interface, therefore it will be given a default name.

To name the mesh interface "mesh0", you need to add:
option ifname 'mesh0'

When it joins the mesh, all that exists is a layer 2 link. Ping uses layer 3 to check if IP is working correctly between two ip addresses and reports the time interval required for a reply to be received. Before any ip packets (layer 3 icmp in this case) will get through, ARP protocol (layer 2) must establish an ip routing table between the two ip addresses.

This is the default, so you are not changing anything.

JobL · June 10, 2022, 12:53pm

Hi Bluewavenet,
Thank you for your response, it has been very valuable!

Might be good to say that we use mesh not for its routing capabilities, but for its peer interfaces instead of AP/STA interfaces, and its four address mode to support transparent L2 links between the laptops. We also don't want a connection between WR 2.a and WR 2.b because they are communication interfaces of the same communication node.

The context of this project is that the radios are actually connected to directional antennas and a separate control algorithm was so far used to determine if and when a connection from WR 2.a needs to handed over to WR 2.b. We are now exploring whether mesh interfaces could be used to automatically perform this handover by exploiting the property that mesh automatically selects the best link. We also want to explore how fast the switching between mesh links can occur. Do you have any idea about what switching times we can expect?

Indeed naming is a bit confusing. We consider Laptop 1 and WR 1.a as a communication node, and laptop 2, WR 2.a and WR 2.b as another communication node. These are designed identically, but for simplicity a minimal number of components are shown. Indeed, there are three mesh nodes, but WR 2.a and WR 2.b belong to the same communication node.

Yes, you're right, but it is the smallest possible mesh network.

Ahhh, alright. Thank you!

I understand, but why does this take so long? I'm looking at a wireshark capture and it shows the path-request-reply sequence is finished within milliseconds, but it takes at least 10 seconds before the ARP request is even sent, and then the ARP response is within milliseconds. As soon as the ARP is answered data can flow on the IP level. How can I speed up this process? And are you aware of any L2 connection testing application (e.g. L2 ping)?

When I do iw dev wlan0 get mesh_param mesh_auto_open_plinks (where wlan0 will soon be changed to mesh0 following your help) on a fresh install it returns 1. We observe it is not default!

Looking forward to your response, much appreciated!

bluewavenet · June 11, 2022, 8:21am

On a fresh install on OpenWrt 21.02.3:

root@BlueWave-22272:~# iw dev mesh0 get mesh_param mesh_auto_open_plinks 
0

"0" has always been and continues to be the default value as far as I have seen.
However, I don't think it does what you think it does..
I could be wrong, but I think setting mesh_auto_open_plinks to 1 "auto opens" all potential peer links, effectively disabling HWMP peer blocking. HWMP peer blocking is important in preventing fringe meshnodes from peering with remote nodes, resulting in a degraded path.

That is how ARP works. You could try one of the more sophisticated IP routing protocols eg OSPF perhaps.

AashishAS · June 11, 2022, 8:25am

It is possible to block nodes via mac right?

bluewavenet · June 11, 2022, 9:07am

If you are asking "Is it possible to block meshnodes via mac?", then by default, no.
This would require installation of the iw-full package.

JobL · June 13, 2022, 12:37pm

A clean install gives me no result because iw is not installed by default, but if I use the image I made with the imagebuilder (packages: iw-full, kmod-ath9k and wpad-mesh-wolfssl, rest default) it returns 1. Strange that we don't see the same result.

This could very well be true, I've only once seen someone talk about this on Open802.11s Mesh Parameters where I interpreted disabling it as only connect when manually told to do so (e.g. using iw dev mesh0 station set $target_mac plink_action open, requiring iw-full). However, this is not what we observed so I have nothing to support this claim.

Thank you for the pointer. I will research this more.

Using the iw-full package as @bluewavenet mentioned it is possible. However, the node has to be a part of the mesh before blocking can occur. Macfiltering is not implemented in mesh. It was hoped that mesh_auto_open_plinks=0 would automatically block all nodes unless specifically unblocked, but this didn't work. We didn't observer any difference between enabling and disabling mesh_auto_open_plinks.

I will try to keep this thread updated with new findings, but no guarantees. Thank you all! More input is always appreciated.

bluewavenet · June 14, 2022, 10:05am

I have a test system here with iw, kmod-ath9k and wpad-mesh-wolfssl and the result is "0". Over some 3 years of development it has always been "0" - with all other drivers used also.

Are you sure you have not set it to "1" somewhere in your image build?

But anyway it is irrelevant as it does not do what you want.

I think I may be correct. I just ran a few tests with a remote meshnode ie low signal strength. This was being blocked on the node I was looking at.

Setting mesh_auto_open_plinks to 1 allowed the remote meshnode to connect, supporting my theory about HWMP peer blocking.....

JobL · June 29, 2022, 9:41am

As an update to everyone from the future reading this: turns out that the switch was the issue. For simplicity I hadn't mentioned the switch between laptop 2 and WR 2 a and b. This switch would learn the used routes and only forward data for laptop 1 to WR 2.a. After WR 2.b took over the switch continued sending all data to WR 2.a. By making the switch send out all data over all ports this was fixed.
This doesn't mean it is without issues. UDP is still troublesome, but that is a new chapter.

system · July 9, 2022, 9:41am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.