I've got a three router mesh setup, with the whole setup working as an AP (no DHCP servers running on the routers, firewalls disabled, etc.). I'm using tri-band routers (with 2 5G radios), with one of the 5G radios working as the backhaul.
This all works fine if I place the routers so that the main node is in the middle, one satellite is to the left of that main node, and the other satellite is to the right of that main node. I'm able to ping all nodes, traffic seems to flow correctly, etc. in this case. Graphically, this is what works:
Sat1 ----------- Main Node ------------- Sat 2
However, if I place the nodes such that one satellite tries to communicate through another satellite in order to get back to the main node (because this is the best route), it doesn't work. Graphically, this is what doesn't work:
Main Node -------- Sat 1 -------------- Sat 2
In this case (which is really the one I need, since the network hardware that's all being hooked up is at one end of the building), Main Node can ping and access Sat 1 (and vice-versa), and Sat 1 and Sat 2 can ping and access each other, but Main Node and Sat 2 cannot communicate with each other. Further, no devices plugged into Sat 2 can communicate with Main Node (but can communicate with devices on Sat 1).
All nodes have firewall disabled, odhcpd disabled, and dnsmasq disabled. In the non-working case, the nodes still all seem to know about each other, as a run of iw dev phy2-mesh0 mpath dump shows that both Sat 2 and Main Node know the MAC address of each other and know that they can reach each other via a next hop of Sat 1 (which should be correct?), but I've never gotten any packets to make it between the two.
Various things I've tried:
-
Changing `mesh_hwmp_rootmode` value on the main node (was initially 4, also tried 2). -
Changing `mesh_hwmp_rootmode` value on the satellites (was initially 0, also tried 2). -
Enabling `multicast_to_unicast_all` on all nodes. -
Enabling `mesh_fwding` on all nodes (it was already enabled on the main node, but not the satellites -- this was the one I thought would fix it, but it did not).
This mesh isn't using 802.11sd, but instead I just manually configured it as I thought it would be doable that way (but maybe not?). Snippet of the configs, as configured currently:
Satellite nodes:
config wifi-iface 'mesh'
option device 'radio2'
option encryption 'sae'
option key 'redacted'
option mesh_id 'MESH'
option mode 'mesh'
option network 'lan'
option mesh_fwding '1'
option mesh_gate_announcements '0'
option mesh_hwmp_rootmode '0'
option mesh_max_peer_links '3'
option mesh_ttl '5'
option mesh_element_ttl '3'
option mesh_hwmp_max_preq_retries '2'
option mesh_rssi_threshold '-75'
option multicast_to_unicast_all '1'
Main node:
config wifi-iface 'mesh'
option device 'radio2'
option encryption 'sae'
option key 'redacted'
option mesh_id 'MESH'
option mode 'mesh'
option network 'lan'
option mesh_fwding '1'
option mesh_gate_announcements '1'
option mesh_hwmp_rootmode '2'
option mesh_max_peer_links '5'
option mesh_ttl '5'
option mesh_element_ttl '3'
option mesh_hwmp_max_preq_retries '2'
option mesh_rssi_threshold '-75'
option multicast_to_unicast_all '1'
Anyone know what else I should try? This is driving me nuts. It feels like a Layer 3 problem, but I'm not sure why that would be when everything is just bridged on all the nodes and the firewall isn't even running.
Full disclosure: this is an NSS build (on LN1301 / MX4300), so it is possible this is just an NSS issue, but I'm hoping I've just screwed something up in the config and it's workable...
Thanks!