802.11s mesh packages cause DHCP to misbehave

Hi folks!

I'm neck deep into an issue that started happening when I upgraded to OpenWRT 23.05.5 (from, I think, 23.05.0). My device is a Totolink X5000R.

The issues originally started with my mesh, but I've brought everything down to a barebones configuration without a mesh as my setup was far too complicated to debug.

I now have the following:
image

And what I'm observing, and that is baffling me, is that when I have the following packages installed: mesh11sd, kmod-nft-bridge, wpad-mbedtls (or wpad-openssl, I've tried with each of them, and always with removing wpad-basic-mbedtls)

I sometimes (and it really is random) stop receiving responses to DHCP discovery requests. This happens both over the wifi and over the cable, as long as the cable is connected to the AP.

This is purely with the packages present, but without any mesh configuration in place.

And then just by flashing a sysupgrade that only has wpad-mbedtls or just wpad-openssl everything starts working again.

The thing that's really confusing me is that this isn't a clear black/white, install the packages and everything stops working. Some devices are still able to get a DHCP allocation given enough time (a few long minutes) but clearly something is misbehaving as I've done tcpdumps and there are discoveries simply going unanswered.

I don't think it'll be the VLAN as the AP is completely unaware of it, but I put it in the diagram for the sake of completeness as the DHCP server (i.e., the router) is VLAN aware and internally it's using 1681 but I'm connecting to the AP via an untagged port, so the AP shouldn't see VLANs at all...

The other thing is that I used to have a mesh working before I upgraded OpenWRT (and crucially, before I upgraded mesh11sd). Now I think I was using mesh11sd 2.X and from reading the forums, it sounds like there's been a major rework in 3.X and 4.X so I appreciate the mesh will require a bit more massaging, but I'd like to first understand why even without a mesh configuration and just with the packages present, things start to go wrong...

Any help is much appreciated!

A "dumb ap" connected by ethernet to a router lan port?

Did you upgrade and keep the previous config?

If you have an ethernet connection, why to you need a mesh backbone?

Have you read the documentation? - Particularly this:
https://openwrt.org/docs/guide-user/network/wifi/mesh/mesh11sd#are_you_sure_you_want_a_mesh

FYI: For any package and indeed even for OpenWrt itself, if the first number of the version is incremented, this means the new version is very likely to be incompatible with configs of the old version. See:

Having said all this, please show the output of:
uci show mesh11sd

If you have a Github account, you can also consider opening an issue there.

Yeah. I don't know whether dumb is the right terminology. It's a device that has a wireless AP but is on the same network, so the goal is that it doesn't do routing. It's on the same subnet as everything else. The wireless is on the only lan network.

Originally yes, but then from reading the docs I saw that that likely results in issues, so I'm starting from scratch now.

And what I'm trying at the moment (and that was giving me issues with DHCP) is from a reset to factory defaults, with literally no mesh11sd configuration at all.

From reading the docs, I expect mesh11sd defaults to manual config and therefore would just sit there dormant.

This device was meant to be the mesh "gateway". The one device that's wired to the network, and the other mesh nodes (which for now I've removed from the picture) would mesh against this one for internet connectivity.

I have. When I was actually trying to mesh I had a device configured following the settings under Gateway Node and the other devices with the config under Peer Node. Manually configured using uci.

But currently I don't have any mesh config set at all, and I'm just trying to work out why this AP is getting in the way of DHCP (which I was also observing in the mesh configuration - the mesh otherwise worked, but DHCP discovery was sometimes impacted, just like now).

Makes sense :+1:

I'll try this again later tonight and will post the output. I'll have to reflash the image I have with the packages as I currently don't have them installed.

Will do :+1: wanted to first work out whether this is an actual bug or just a problem of my config...

It will be good too have both here and Github. During office hours you will most likely get a rapid response on Github. Outside those times it is likely to be just me.

The most likely is a problem with config. As you say, with mesh11sd in default manual mode it should just sit there waiting for you to configure.
But if starting from scratch it is probably worth trying the Rapid Deployment method:
https://openwrt.org/docs/guide-user/network/wifi/mesh/rapiddeployment
That way you do not have to worry about "dumb ap", dhcp etc on the device(s) as it does it all for you.

Note on terminology, to prevent confusion:

  • A Mesh Peer - a device that connects to a mesh backhaul and cooperates with other mesh peers to mac-route layer 2 packets, becoming a node in the mesh backhaul.

  • A Mesh Gateway - not the same as an ip gateway. The mesh is concerned only with layer 2. A mesh gateway is a peer node that has a downstream connection on the same layer 2 network eg an access point wireless, an ethernet lan port etc.

  • A Mesh Portal - a special case of a Mesh Peer that has an ip-routing upstream link to another network, eg an Internet feed. A mesh portal can also be a mesh gateway, in that it can also have a downstream link eg a wireless network.

1 Like

I got the output of that command now with the packages installed but no configuration at all:

root@OpenWrt:~# uci show mesh11sd
mesh11sd.setup=mesh11sd
mesh11sd.mesh_params=mesh11sd
mesh11sd.mesh_params.mesh_fwding='1'
mesh11sd.mesh_params.mesh_rssi_threshold='-65'
mesh11sd.mesh_params.mesh_gate_announcements='1'
mesh11sd.mesh_params.mesh_hwmp_rootmode='4'
mesh11sd.mesh_params.mesh_hwmp_rann_interval='5000'
mesh11sd.mesh_params.mesh_hwmp_root_interval='5000'
mesh11sd.mesh_params.mesh_hwmp_active_path_timeout='5000'
mesh11sd.mesh_params.mesh_hwmp_active_path_to_root_timeout='6000'
mesh11sd.mesh_params.mesh_max_peer_links='16'
mesh11sd.mesh_params.mesh_plink_timeout='10'

One thing I noticed, this keeps cropping up in the logs:

Sun Oct 27 20:03:41 2024 daemon.err mesh11sd[2178]: auto_config is disabled. Please configure a mesh interface or enable auto_config....

Perhaps more interestingly, this also shows up at startup:

Sun Oct 27 06:14:03 2024 daemon.warn mesh11sd[2178]: option auto_mesh_network is not configured, defaulting to [ lan ]. Is this what you intended?
Sun Oct 27 06:14:04 2024 daemon.notice mesh11sd[2178]: mesh11sd is in startup

and also noticed the following:

Sun Oct 27 06:14:04 2024 daemon.notice mesh11sd[2178]: mesh11sd v4.1.1 has started: mesh management mode 1

But maybe all of that is expected since I didn't set auto_config and have no other manual configuration of the mesh...

I'm going to be away until Friday, so won't be able to play around with this until then, but when I'm back I'll raise an issue on Github as well and will be able to make further tests.

In either case, many thanks for taking your time in helping me out with this. I'm not so experienced in the world of 802.11s and am trying to learn as I go...

Self explanatory really....

Yes, all as expected, mesh11sd is waiting for you to do some configuration....

We should look at all your other configs too, when you get back.