BUG Report: 802.11s Mesh (V19.07.4)

I have spent 3 days checking my facts ... and pretty sure this is a bug.

To be clear - I do have the 802.11s mesh working in simple cases. It is on the 'edges' that I see problems.

Consider a case with:

  • 3 OpenWrt devices [A],[B],[C]
  • V19.07.4
  • All in a row
  • All have static IP addresses on the LAN-WLAN0 bridge
  • All OpenWrt devices have STP (spanning Tree) enabled on the LAN device. (But with only one LAN connection, this is irrelevant)
  • Only [C] has a hardwire link (Ethernet) to the local LAN. The LAN has a DHCP server.
  • Mesh radio links exist as follows
    • [A]-[B]
    • [A]-[C]
    • [B]-[C]
  • Connect a laptop to LAN port on [A]
  • Laptop gets DHCP address.
  • From the LAN can ping [A], [B], [C], and Laptop
  • SSH into each of [A], [B], [C]
    • each OpenWrt device can ping the other two OpenWrt devices
  • all good!

Now ... move [A] so that it is beyond the range of [C]

  • Mesh links exist only as follows
    • [A]-[B]
    • [B]-[C]
  • Connect a laptop to LAN port on [A]
  • Failure of Laptop to get DHCP address
  • SSH into each of [A], [B], [C]
    • each OpenWrt device can (still) ping the other two OpenWrt devices
  • From the LAN can ping [A], [B], but NOT [C] or laptop
  • Disabling STP does not change anything
  • :thinking:

So ... it appears that the 'design rule' for these mesh networks is as below (With a Portal Node being defined as one with a hardwire LAN connection)

ALL mesh Points must be in radio contact with ALL Portal Node (s)

This largely defeats the value of using the mesh.

This is a great pity as the other 802.11s mesh benefits are great.

Suggestions?

Comments?

I have nailed the problem down to ARP requests not being re-transmitted out the WLAN interface.

Others have reported same issue here https://bugs.openwrt.org/index.php?do=details&task_id=714, but worryingly ... no fix seems on its way.

BTW - I tried the 'hairpin' fix but it had no effect.

In a nutshell,
If [C] pings [A]

  • [C] issues an ARP broadcast :"Who has IP_A"
  • The ARP arrives on wlan0 of [B].
  • [B] does not reply (as it is not [C])
  • BUT !!!! [B] does NOT forward the ARP back out of wlan0, so the ARP request never reaches [A]
  • [A] never replies, so the ping fails

This is a pretty fundamental issue.

My assumption is that [B] should consult its mesh topology, and if it determines that there are MeshPoints that did not get the ARP broadcast, but which [B] does have a direct path to, then [B] should resend the ARP request out the same wireless interface it arrived.

This is a show stopper for me .. and means 802.11s on OpenWrt is quite broken.

1 Like

This may be a better way of getting the attention of the developers
Reporting Bugs

Add to this open task:

1 Like

Hello! I was planning on setting up 802.11s Mesh at home using OpenWrt in the near future. This report gives me pause. My current mesh setup has wired LAN to WAN connectivity on all mesh nodes. I'm hoping this bug wouldn't impact an identical setup with OpenWrt. I suspect each node would be a portal node in this case, so the wired connection on each node would be the default way of accessing the LAN. Let me know if you have any insight.

I need to ask if I using ethernet wired all the node of mesh point will it caused this problem?

No ... but you are not using the mesh feature. You might as well just have a bunch of APs.

As far as I can see this is not a bug as such - more a config issue.
It all works on first power up but as in the OP's example if one link goes out of range this can happen. Rebooting all the mesh nodes fixes it.

The issue is that the uci config for the following does not take place but these are essential for autonomous layer 2 "routing":

mesh_fwding='1'
mesh_gate_announcements='1'
mesh_rssi_threshold='-80'

These have to be set using the IW utility. My work around is to have a background process that checks and sets these three parameters at a set interval as they are reset to default by a network restart or eg running the wifi command.

Setting the rssi threshold is important. Set it to a value that still gives acceptable performance, -80 is usually good, -90 too weak etc..
mesh_gate_announcements should be set on at least one node. As it does generate some traffic, if you had hundreds of nodes it would be a lot of traffic, but with just tens of nodes, I put it on all of them as it helps to get rapid convergence.

I know.
I tried 2 AP connect to same switch and setting both SSID&Password same but some device like iOS or laptop can't auto switch to nearest AP. That's why I try to use mesh.

Have you progressed on this bug?

Is this an issue with 18.06.9 as well?

hi!

so you basically wrote a script that says

sudo iw dev wlan0 set mesh_fwding='1'
sudo iw dev wlan0 set mesh_gate_announcements='1'
sudo iw dev wlan0 set mesh_rssi_threshold='-80'

and scheduled its execution via LuCI?

if its more complex, would you mind sharing it?

thanks

No.
A script that loops in the background. It uses iw to check settings compared to config values and sets using iw if required - sleep - repeat etc

thanks. and why do you even check and not just set them?

still, the script would be of interest :slight_smile:

I wrote now this:

#!/bin/sh       
mesh_fwding=$(iw dev wlan0 get mesh_param mesh_fwding)
mesh_gate_announcements=$(iw dev wlan0 get mesh_param mesh_gate_announcements)
mesh_rssi_threshold=$(iw dev wlan0 get mesh_param mesh_rssi_threshold)

echo "the current forwarding is: $mesh_fwding"                                                                          
echo "the current gate announcements is: $mesh_gate_announcements"                                                      
echo "the current rssi threshold is: $mesh_rssi_threshold"                                                                                          
                                                                                            

if [ $mesh_fwding == "0" ]; then
	iw dev wlan0 set mesh_param mesh_fwding '1'                                                                             
	echo "forwarding set"                                                                                                   
else                                                                                                                    
	echo "forwarding already set"                                                                                           
fi	

if [ $mesh_gate_announcements == "0" ]; then                                                                  
	iw dev wlan0 set mesh_param mesh_gate_announcements '1'                                                                 
	echo "announcements set"                                                                                                
else                                                                                                                    
	echo "announcements already set"                                                                                        
fi

if [ "$mesh_rssi_threshold" != "-80 dBm" ]; then                                                                                                    
                
	iw dev wlan0 set mesh_param mesh_rssi_threshold '-80'                                                                   
	echo "threshold set"
else                                                                                                                    
	echo "threshold already set"                                                                                            
fi 



I guess yours is similar. I then put this on the scheduled tasks to run it every 5min:

5 * * * * /root/mesh_parameter.sh

so far I only have this set on the mesh nodes, not the master. But I guess it would be a good idea there as well?

and do you only run/set these on the mesh nodes, or also on the master?

I found this thread when looking for a solution which is still a problem on the v21.02 snapshot.

For those interested I use /etc/hotplug.d/net/99-mesh-param

#!/bin/sh

[ "$ACTION" = "add" ] && {
        ! iwinfo $DEVICENAME info | grep -q 'Mesh Point' && exit 0

        exec /root/mesh_param.sh &>/dev/null &
}

and /root/mesh_param.sh

#!/bin/sh

. /lib/functions.sh

parse_list() {
        local value="$1"
        local _device="$2"

        iw $_device set mesh_param $value
}

parse_interface() {
        local section="$1"
        local _mode
        local _mesh_param
        local _mesh_id
        local _device

        config_get _mode "$section" mode
        # Not mesh then exit
        [ "$_mode" != "mesh" ] && return 0

        config_get _mesh_param "$section" mesh_param
        # No mesh_param list then exit
        [ -z "$_mesh_param" ] && return 0

        config_get _mesh_id "$section" mesh_id
        while true; do
                sleep 10
                _device=$(iwinfo | grep "$_mesh_id" | awk '{print $1}')
                [ -z "$_device" ] && continue

                config_list_foreach "$section" mesh_param parse_list $_device
                break
        done
}

config_load wireless
config_foreach parse_interface wifi-iface

I'd be interested if someone has a better solution. The hotplug event runs BEFORE the mesh is up so you can't run the iw commands right away. They have to run later. Also my script will run for each mesh the system has and process ALL meshes each time.

My /etc/config/wireless contains:

        option mesh_fwding '1'
        option mesh_rssi_threshold '-80'
        list mesh_param 'mesh_gate_announcements=1'

Any mesh parameters that aren't recognized properly can be added to the mesh_param list.

1 Like

I am running a mesh on OpenWrt 21, all wifi interfaces and mesh interfaces are bridged with the lan ports.

WiFi clients can get DHCP addresses and I see no problem with ARP.

Are you bridging all of them together right? Or not?

I ran into this once but did something else instead of analyzing the issue, I think I can summarize:

The problem exists when one of the APs is not in wireless range of another AP that is wired, only one that is also wireless-only, example

                  AP 1 ----------- AP 2 ------------ AP 3
          wired /        802.11s           802.11s
               /
Internet Router

where AP 3 will have problems with DHCP for itself and downstream devices, but no problem communicating upstream IF the address is manually set

And for anyone with total control of their network looking for a simpler workaround. I would suggest trying to make all points totally static

that includes for each AP, setting the address and gateway statically, and then also each AP having a DHCP server that serves different ranges within your available address pool.

I have no problem communicating in each direction. Never had this issue, nor on OpenWrt 19, nor on 21 or the current master.

What kind of configurations are you guys using?
Are you sure your bridge is configured correctly? All mesh and wlan interfaces must be bridged together for this to work.

However, there is a time where something like what you describe happens to me, but it's an exceptional case and a bug, which I described and reported here:
https://bugs.openwrt.org/index.php?do=details&task_id=4099

Above you said

all wifi interfaces and mesh interfaces are bridged with the lan ports.

Does that mean you're running all nodes on the same interconnected and fully bridged (layer 2) LAN / Ethernet? So your ARPs and DHCP travels over your LAN / Ethernet between all devices?

If that's the case, then that's not what the people here are talking about. They use the 802.11s mesh as the primary layer 2 interconnection means and the report is about ARP and DHCP only travelling one hop through the mesh, but not multiple hops (contrary to what it should if it wants to be full layer 2).