Troubleshoot a Mesh 802.11s setup

Following setup: Home network with 3 wireless routers:
Router 1: GL.iNet GL-MT6000 on OpenWrt SNAPSHOT r25465-53252eeb3b (router only supported on Snapshot so far)
Router 2: ZyXEL NWA55AXE on OpenWrt 22.03.5
Router 3: GL.iNet GL-MT3000 also on OpenWrt 22.03.5

Router 1 connected to wired internet, all routers wireless connected via a 802.11 mesh and the routers expose a same SSID network. And that just works fine in the house. I have good fast roaming in the house and Wifi/Internet/... is everywhere 'fast'. Also 802.11r Fast Transition is active -don't think it is relevant to the issue.

Except that every 2-3 days the mesh stops working. At least that is how I call it. All routers still expose their wireless networks but router 2 and router 3 loose their connection to router 1; resulting in all wifi clients on router 2 and router 3 to loose their internet connection. New wifi clients on these 2 routers can not register and they don't receive an Ip address -logical as the DHCP server is on the LAN behind router 1. And the router 2&3 are no longer accessible from the LAN network nor from router 1.
The Router 1 remains fully functional for its wifi clients and LAN clients.
You also learn from this that a wifi client will not try to connect to another access point when DHCP is not working from the strongest AP - a site note: forget about it ;-).
Reboot of router 1 solves the problem.

I pasted below the system and kernel log from router 1. But I only copied the system log after 10:32 and later found out the issue started around 9:55 and after the reboot log from before 11:14 is gone... (strange: the system log show info from April 8 and then jumps to today april 14 11:14; why is showing info from April 8 but loosing info from today...?)
On router 2 I see "daemon.err hostapd: nl80211: kernel reports: key addition failed". Router 3 had the same message but I am not sure of the time (found out the time wasn't set correct).

When the issues come back (I started now monitoring for the problem), I will update with more info.

All ideas on how to investigate and possible insights on root cause are welcome

system info

root@MT6000:/# ubus call system board
{
	"kernel": "6.1.80",
	"hostname": "MT6000",
	"system": "ARMv8 Processor rev 4",
	"model": "GL.iNet GL-MT6000",
	"board_name": "glinet,gl-mt6000",
	"rootfs_type": "squashfs",
	"release": {
		"distribution": "OpenWrt",
		"version": "SNAPSHOT",
		"revision": "r25465-53252eeb3b",
		"target": "mediatek/filogic",
		"description": "OpenWrt SNAPSHOT r25465-53252eeb3b"
	}
}
root@MT6000:/# cat /etc/config/network

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'xxx:xxxx:a47d::/48'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'eth1'
	list ports 'lan1'
	list ports 'lan2'
	list ports 'lan3'
	list ports 'lan4'
	list ports 'lan5'

config interface 'lan'
	option device 'br-lan'
	option proto 'dhcp'

config interface 'wan'
	option device 'eth1'
	option proto 'dhcp'

config interface 'wan6'
	option device 'eth1'
	option proto 'dhcpv6'

root@MT6000:/# 
root@MT6000:/# cat /etc/config/wireless

config wifi-device 'radio0'
	option type 'mac80211'
	option path 'platform/soc/18000000.wifi'
	option channel '3'
	option band '2g'
	option htmode 'HE20'
	option cell_density '0'

config wifi-iface 'default_radio0'
	option device 'radio0'
	option network 'lan'
	option mode 'ap'
	option ssid 'backbone'
	option encryption 'sae'
	option key 'xxxx'

config wifi-device 'radio1'
	option type 'mac80211'
	option path 'platform/soc/18000000.wifi+1'
	option channel '36'
	option band '5g'
	option htmode 'HE80'
	option cell_density '0'

config wifi-iface 'default_radio1'
	option device 'radio1'
	option network 'lan'
	option mode 'ap'
	option ssid 'OpenWrt'
	option encryption 'none'
	option disabled '1'

config wifi-iface 'wifinet2'
	option device 'radio1'
	option mode 'mesh'
	option encryption 'sae'
	option mesh_id 'xxxx'
	option mesh_fwding '1'
	option mesh_rssi_threshold '0'
	option key 'xxxx'
	option network 'lan'

config wifi-iface 'wifinet3'
	option device 'radio1'
	option mode 'ap'
	option ssid 'ZONNEWIND2'
	option encryption 'psk2'
	option network 'lan'
	option key 'xxxx'
	option ieee80211r '1'
	option mobility_domain '1234'
	option ft_over_ds '0'
	option bss_transition '1'
	option ft_psk_generate_local '1'

config wifi-iface 'wifinet4'
	option device 'radio0'
	option mode 'ap'
	option ssid 'ZONNEWIND1'
	option encryption 'psk-mixed'
	option key 'xxxx'
	option network 'lan'

root@MT6000:/# cat /etc/config/dhcp

config dnsmasq
	option domainneeded '1'
	option boguspriv '1'
	option filterwin2k '0'
	option localise_queries '1'
	option rebind_protection '1'
	option rebind_localhost '1'
	option local '/lan/'
	option domain 'lan'
	option expandhosts '1'
	option nonegcache '0'
	option cachesize '1000'
	option authoritative '1'
	option readethers '1'
	option leasefile '/tmp/dhcp.leases'
	option resolvfile '/tmp/resolv.conf.d/resolv.conf.auto'
	option nonwildcard '1'
	option localservice '1'
	option ednspacket_max '1232'
	option filter_aaaa '0'
	option filter_a '0'

config dhcp 'lan'
	option interface 'lan'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option dhcpv4 'server'
	option dhcpv6 'server'
	option ra 'server'
	list ra_flags 'managed-config'
	list ra_flags 'other-config'
	option ignore '1'

config dhcp 'wan'
	option interface 'wan'
	option ignore '1'

config odhcpd 'odhcpd'
	option maindhcp '0'
	option leasefile '/tmp/hosts/odhcpd'
	option leasetrigger '/usr/sbin/odhcpd-update'
	option loglevel '4'
root@MT6000:/# cat /etc/config/firewall
config defaults
	option syn_flood	1
	option input		REJECT
	option output		ACCEPT
	option forward		REJECT
# Uncomment this line to disable ipv6 rules
#	option disable_ipv6	1

config zone
	option name		lan
	list   network		'lan'
	option input		ACCEPT
	option output		ACCEPT
	option forward		ACCEPT

config zone
	option name		wan
	list   network		'wan'
	list   network		'wan6'
	option input		REJECT
	option output		ACCEPT
	option forward		REJECT
	option masq		1
	option mtu_fix		1

config forwarding
	option src		lan
	option dest		wan

# We need to accept udp packets on port 68,
# see https://dev.openwrt.org/ticket/4108
config rule
	option name		Allow-DHCP-Renew
	option src		wan
	option proto		udp
	option dest_port	68
	option target		ACCEPT
	option family		ipv4

# Allow IPv4 ping
config rule
	option name		Allow-Ping
	option src		wan
	option proto		icmp
	option icmp_type	echo-request
	option family		ipv4
	option target		ACCEPT

config rule
	option name		Allow-IGMP
	option src		wan
	option proto		igmp
	option family		ipv4
	option target		ACCEPT

# Allow DHCPv6 replies
# see https://github.com/openwrt/openwrt/issues/5066
config rule
	option name		Allow-DHCPv6
	option src		wan
	option proto		udp
	option dest_port	546
	option family		ipv6
	option target		ACCEPT

config rule
	option name		Allow-MLD
	option src		wan
	option proto		icmp
	option src_ip		fe80::/10
	list icmp_type		'130/0'
	list icmp_type		'131/0'
	list icmp_type		'132/0'
	list icmp_type		'143/0'
	option family		ipv6
	option target		ACCEPT

# Allow essential incoming IPv6 ICMP traffic
config rule
	option name		Allow-ICMPv6-Input
	option src		wan
	option proto	icmp
	list icmp_type		echo-request
	list icmp_type		echo-reply
	list icmp_type		destination-unreachable
	list icmp_type		packet-too-big
	list icmp_type		time-exceeded
	list icmp_type		bad-header
	list icmp_type		unknown-header-type
	list icmp_type		router-solicitation
	list icmp_type		neighbour-solicitation
	list icmp_type		router-advertisement
	list icmp_type		neighbour-advertisement
	option limit		1000/sec
	option family		ipv6
	option target		ACCEPT

# Allow essential forwarded IPv6 ICMP traffic
config rule
	option name		Allow-ICMPv6-Forward
	option src		wan
	option dest		*
	option proto		icmp
	list icmp_type		echo-request
	list icmp_type		echo-reply
	list icmp_type		destination-unreachable
	list icmp_type		packet-too-big
	list icmp_type		time-exceeded
	list icmp_type		bad-header
	list icmp_type		unknown-header-type
	option limit		1000/sec
	option family		ipv6
	option target		ACCEPT

config rule
	option name		Allow-IPSec-ESP
	option src		wan
	option dest		lan
	option proto		esp
	option target		ACCEPT

config rule
	option name		Allow-ISAKMP
	option src		wan
	option dest		lan
	option dest_port	500
	option proto		udp
	option target		ACCEPT


### EXAMPLE CONFIG SECTIONS
# do not allow a specific ip to access wan
#config rule
#	option src		lan
#	option src_ip	192.168.45.2
#	option dest		wan
#	option proto	tcp
#	option target	REJECT

# block a specific mac on wan
#config rule
#	option dest		wan
#	option src_mac	00:11:22:33:44:66
#	option target	REJECT

# block incoming ICMP traffic on a zone
#config rule
#	option src		lan
#	option proto	ICMP
#	option target	DROP

# port redirect port coming in on wan to lan
#config redirect
#	option src			wan
#	option src_dport	80
#	option dest			lan
#	option dest_ip		192.168.16.235
#	option dest_port	80
#	option proto		tcp

# port redirect of remapped ssh port (22001) on wan
#config redirect
#	option src		wan
#	option src_dport	22001
#	option dest		lan
#	option dest_port	22
#	option proto		tcp

### FULL CONFIG SECTIONS
#config rule
#	option src		lan
#	option src_ip	192.168.45.2
#	option src_mac	00:11:22:33:44:55
#	option src_port	80
#	option dest		wan
#	option dest_ip	194.25.2.129
#	option dest_port	120
#	option proto	tcp
#	option target	REJECT

#config redirect
#	option src		lan
#	option src_ip	192.168.45.2
#	option src_mac	00:11:22:33:44:55
#	option src_port		1024
#	option src_dport	80
#	option dest_ip	194.25.2.129
#	option dest_port	120
#	option proto	tcp

system & kernel log

Sun Apr 14 10:32:48 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:32:53 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:32:58 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:33:44 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:33:47 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:33:51 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:33:56 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:34:01 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:34:57 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:35:01 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:35:53 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:35:57 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:36:01 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:36:54 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:36:58 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:37:02 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:37:14 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:37:25 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:37:55 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:37:59 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:38:04 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:38:56 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:39:00 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:39:04 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:39:56 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:40:01 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:40:05 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:40:14 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:40:59 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:41:04 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:41:08 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:42:02 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:42:07 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:42:11 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:43:05 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:43:10 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:43:14 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:43:14 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:43:33 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:43:52 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:44:08 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:44:11 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:44:13 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:44:16 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:44:17 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:45:11 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:45:16 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:45:20 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:46:16 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:46:21 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:46:30 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:47:15 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:47:19 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:47:24 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:47:51 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:48:18 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:48:23 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:48:28 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:48:34 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:49:21 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:49:25 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:49:29 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:50:23 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:50:28 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:50:32 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:51:26 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:51:30 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:51:35 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:52:33 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:52:38 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:53:19 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:53:32 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:53:36 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:53:41 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:54:01 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:54:33 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:54:37 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:54:41 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:55:24 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:55:28 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:55:33 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:56:05 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:56:09 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:57:02 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 10:59:05 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 11:05:05 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 11:07:11 2024 daemon.info hostapd: phy0-ap1: STA ba:3d:1d:3f:de:10 IEEE 802.11: authenticated
Sun Apr 14 11:07:11 2024 daemon.info hostapd: phy0-ap1: STA ba:3d:1d:3f:de:10 IEEE 802.11: associated (aid 2)
Sun Apr 14 11:07:11 2024 daemon.notice hostapd: phy0-ap1: AP-STA-CONNECTED ba:3d:1d:3f:de:10 auth_alg=open
Sun Apr 14 11:07:11 2024 daemon.info hostapd: phy0-ap1: STA ba:3d:1d:3f:de:10 RADIUS: starting accounting session DE0QF03D88B637D1
Sun Apr 14 11:07:11 2024 daemon.info hostapd: phy0-ap1: STA ba:3d:1d:3f:de:10 WPA: pairwise key handshake completed (RSN)
Sun Apr 14 11:07:11 2024 daemon.notice hostapd: phy0-ap1: EAPOL-4WAY-HS-COMPLETED ba:3d:1d:3f:de:10
Sun Apr 14 11:07:12 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 11:07:28 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 11:07:38 2024 daemon.warn odhcpd[1794]: No default route present, overriding ra_lifetime!
Sun Apr 14 11:10:17 2024 daemon.notice netifd: Network device 'lan2' link is down
Sun Apr 14 11:10:17 2024 kern.info kernel: [247228.427094] br-lan: port 3(lan2) entered disabled state
Sun Apr 14 11:10:17 2024 kern.info kernel: [247228.428324] mt7530-mdio mdio-bus:1f lan2: Link is Down
Sun Apr 14 11:10:20 2024 kern.info kernel: [247230.841473] mt7530-mdio mdio-bus:1f lan2: Link is Up - 10Mbps/Full - flow control rx/tx
Sun Apr 14 11:10:20 2024 kern.info kernel: [247230.849602] br-lan: port 3(lan2) entered blocking state
Sun Apr 14 11:10:20 2024 kern.info kernel: [247230.854901] br-lan: port 3(lan2) entered forwarding state
Sun Apr 14 11:10:20 2024 daemon.notice netifd: Network device 'lan2' link is up
Sun Apr 14 11:10:21 2024 daemon.notice netifd: Network device 'lan2' link is down
Sun Apr 14 11:10:21 2024 kern.info kernel: [247231.922370] br-lan: port 3(lan2) entered disabled state
Sun Apr 14 11:10:21 2024 kern.info kernel: [247231.924237] mt7530-mdio mdio-bus:1f lan2: Link is Down
Sun Apr 14 11:10:26 2024 kern.info kernel: [247237.030713] mt7530-mdio mdio-bus:1f lan2: Link is Up - 1Gbps/Full - flow control rx/tx
Sun Apr 14 11:10:26 2024 kern.info kernel: [247237.038751] br-lan: port 3(lan2) entered blocking state
Sun Apr 14 11:10:26 2024 kern.info kernel: [247237.044058] br-lan: port 3(lan2) entered forwarding state
Sun Apr 14 11:10:26 2024 daemon.notice netifd: Network device 'lan2' link is up
Sun Apr 14 11:10:34 2024 daemon.notice netifd: Network device 'lan2' link is down
Sun Apr 14 11:10:34 2024 kern.info kernel: [247245.094563] br-lan: port 3(lan2) entered disabled state
Sun Apr 14 11:10:34 2024 kern.info kernel: [247245.100043] mt7530-mdio mdio-bus:1f lan2: Link is Down
Sun Apr 14 11:10:56 2024 kern.info kernel: [247266.456451] mt7530-mdio mdio-bus:1f lan2: Link is Up - 1Gbps/Full - flow control rx/tx
Sun Apr 14 11:10:56 2024 kern.info kernel: [247266.464487] br-lan: port 3(lan2) entered blocking state
Sun Apr 14 11:10:56 2024 kern.info kernel: [247266.469800] br-lan: port 3(lan2) entered forwarding state
Sun Apr 14 11:10:56 2024 daemon.notice netifd: Network device 'lan2' link is up
Sun Apr 14 11:11:02 2024 daemon.notice netifd: Network device 'lan2' link is down
Sun Apr 14 11:11:02 2024 kern.info kernel: [247272.875767] br-lan: port 3(lan2) entered disabled state
Sun Apr 14 11:11:02 2024 kern.info kernel: [247272.880538] mt7530-mdio mdio-bus:1f lan2: Link is Down
Sun Apr 14 11:11:06 2024 kern.info kernel: [247276.807398] mt7530-mdio mdio-bus:1f lan2: Link is Up - 1Gbps/Full - flow control rx/tx
Sun Apr 14 11:11:06 2024 kern.info kernel: [247276.815423] br-lan: port 3(lan2) entered blocking state
Sun Apr 14 11:11:06 2024 kern.info kernel: [247276.820731] br-lan: port 3(lan2) entered forwarding state
Sun Apr 14 11:11:06 2024 daemon.notice netifd: Network device 'lan2' link is up




[   23.741684] br-lan: port 8(phy1-mesh0) entered forwarding state
[197320.705956] br-lan: port 3(lan2) entered disabled state
[197320.711414] mt7530-mdio mdio-bus:1f lan2: Link is Down
[197326.398697] mt7530-mdio mdio-bus:1f lan2: Link is Up - 1Gbps/Full - flow control rx/tx
[197326.406737] br-lan: port 3(lan2) entered blocking state
[197326.412046] br-lan: port 3(lan2) entered forwarding state
[217868.352731] br-lan: port 3(lan2) entered disabled state
[217868.357947] mt7530-mdio mdio-bus:1f lan2: Link is Down
[217871.373875] mt7530-mdio mdio-bus:1f lan2: Link is Up - 1Gbps/Full - flow control rx/tx
[217871.381912] br-lan: port 3(lan2) entered blocking state
[217871.387222] br-lan: port 3(lan2) entered forwarding state
[217922.206521] br-lan: port 3(lan2) entered disabled state
[217922.212134] mt7530-mdio mdio-bus:1f lan2: Link is Down
[217925.313799] mt7530-mdio mdio-bus:1f lan2: Link is Up - 1Gbps/Full - flow control rx/tx
[217925.321837] br-lan: port 3(lan2) entered blocking state
[217925.327136] br-lan: port 3(lan2) entered forwarding state
[217942.663982] br-lan: port 3(lan2) entered disabled state
[217942.669373] mt7530-mdio mdio-bus:1f lan2: Link is Down
[217945.197253] mt7530-mdio mdio-bus:1f lan2: Link is Up - 10Mbps/Full - flow control rx/tx
[217945.205371] br-lan: port 3(lan2) entered blocking state
[217945.210687] br-lan: port 3(lan2) entered forwarding state
[247228.427094] br-lan: port 3(lan2) entered disabled state
[247228.428324] mt7530-mdio mdio-bus:1f lan2: Link is Down
[247230.841473] mt7530-mdio mdio-bus:1f lan2: Link is Up - 10Mbps/Full - flow control rx/tx
[247230.849602] br-lan: port 3(lan2) entered blocking state
[247230.854901] br-lan: port 3(lan2) entered forwarding state
[247231.922370] br-lan: port 3(lan2) entered disabled state
[247231.924237] mt7530-mdio mdio-bus:1f lan2: Link is Down
[247237.030713] mt7530-mdio mdio-bus:1f lan2: Link is Up - 1Gbps/Full - flow control rx/tx
[247237.038751] br-lan: port 3(lan2) entered blocking state
[247237.044058] br-lan: port 3(lan2) entered forwarding state
[247245.094563] br-lan: port 3(lan2) entered disabled state
[247245.100043] mt7530-mdio mdio-bus:1f lan2: Link is Down
[247266.456451] mt7530-mdio mdio-bus:1f lan2: Link is Up - 1Gbps/Full - flow control rx/tx
[247266.464487] br-lan: port 3(lan2) entered blocking state
[247266.469800] br-lan: port 3(lan2) entered forwarding state
[247272.875767] br-lan: port 3(lan2) entered disabled state
[247272.880538] mt7530-mdio mdio-bus:1f lan2: Link is Down
[247276.807398] mt7530-mdio mdio-bus:1f lan2: Link is Up - 1Gbps/Full - flow control rx/tx
[247276.815423] br-lan: port 3(lan2) entered blocking state
[247276.820731] br-lan: port 3(lan2) entered forwarding state

Edit April 14 14:11: added system info

Please do the above, thanks.

The time leaping is because there's no real time clock chip present, so the device comes up with a date and then NTP kicks in to set the correct date and time.

1 Like

I added the system info. If I would have missed to remove private info let me know ...and I will reconfigure my system ;-).

This is a known problem with certain wireless hardware (basically Mediatek I'm pretty sure). I made a thread about running mesh and ran into this (what I think is the same problem as you): A foolproof guide to setting up further wifi coverage - #32 by alex24

The reply: A foolproof guide to setting up further wifi coverage - #37 by bluewavenet

You can try mesh11sd to help but I think it may be a problem with multiple SSID which you need for mesh connectivity. Apparently setting up WDS instead with the same SSID can fix stability problems: OpenWrt for Zyxel WSM20 (Multy M1) development discussion - #846 by falver

Ofcourse it's not true mesh but it should work without problems. I never went too indepth.

Sorry, but 802.11s mesh does not use multiple SSIDs. It has a common mesh id that must be the same on all nodes as it identifies the mesh.

A typical output of iwinfo would look like this:

root@OpenWrt:~# iwinfo
m-11s-0   ESSID: "92d490daf46cfe534c56ddd669297e"
          Access Point: 96:83:C4:08:14:83
          Mode: Mesh Point  Channel: 1 (2.412 GHz)  HT Mode: HT40
          Center Channel 1: 3 2: unknown
          Tx-Power: 20 dBm  Link Quality: 35/70
          Signal: -75 dBm  Noise: unknown
          Bit Rate: 120.5 MBit/s
          Encryption: WPA3 SAE (CCMP)
          Type: nl80211  HW Mode(s): 802.11b/g/n
          Hardware: embedded [MediaTek MT7628]
          TX power offset: none
          Frequency offset: none
          Supports VAPs: yes  PHY name: phy0

phy0-ap0  ESSID: "OpenWrt-2g-1483"
          Access Point: 94:83:C4:08:14:83
          Mode: Master  Channel: 1 (2.412 GHz)  HT Mode: HT40
          Center Channel 1: 3 2: unknown
          Tx-Power: 20 dBm  Link Quality: unknown/70
          Signal: unknown  Noise: unknown
          Bit Rate: unknown
          Encryption: none
          Type: nl80211  HW Mode(s): 802.11b/g/n
          Hardware: embedded [MediaTek MT7628]
          TX power offset: none
          Frequency offset: none
          Supports VAPs: yes  PHY name: phy0

root@OpenWrt:~#

You can see here that iwinfo does not differentiate between ssid and meshid so it only adds to the confusion.

This is typical with more than 2 mesh nodes when required parameters are not set.
These parameters however cannot be set in the wireless config.
This is where mesh11sd comes in as it runs as a service, updating mesh parameters as required.

Problem though with the version of mesh11sd on snapshot (3.1.1-r1) is that it can fail with kernel 6+ depending on wireless drivers.

There is a fix for this however, in testing now, with release planned shortly.

Thanks all some interesting feedbacks... ;-).

I check some things...

I am confused here: what do you mean with WSD on the same SSID?

Not sure what you mean with 'not a real mesh'; Note that I do need the layer 2 connectivity on my network (which is what I understand as 'mesh': wifi backhaul on Layer 2.

Is the 'more than 2 mesh nodes' the key? I can of course create a second mesh on the 'Router 1'.
Mesh 1 connect router 1 and router 2, Mesh 2 connects router 1 and router 3. Would this fix the issue?

I just tried to read on mesh11sd; it is a bit confusing ;-). I need a mesh11sd explanation for dummies...
What I understand: I just install the mesh11sd on top of my mesh setup on all nodes and it will do the job?
The question because I would expect somebody would be able to indicate which parameters are 'wrong' without the mesh11sd service (maybe somebody did this but I missed it...)

It is not that parameters are wrong, but not set, and cannot be set from the wireless config as they have to be set after the mesh interface has come up. The wireless config can only set static options needed to bring up an interface.

The mesh11sd service can and does dynamically set mesh operating parameters dynamically as required.

Here is a link to the possible mesh parameters:
https://openwrt.org/docs/guide-user/network/wifi/mesh/mesh11sd#mesh_parameter_options

As you are "getting by" for a few days at a time, I would suggest you wait for the new version to be released, hopefully in a few days from now (2024-4-14).

1 Like

Tnx for the feedback.
That is the link to the page that got me completely confused on what mesh11sd is... ;-).(like why would one this in the first place...).

In what software would this be released (another snapshot?). Or when i install the deamon in a week i would automatically get this version?

Note that I am at the moment more thinking of creating a second mesh on router 1 so that router 2&3 connect for sure only to router 1. I see this the only option to prevent a useless connection between 2&3.

Is it a bit clearer now? It is good to talk to people struggling with the documentation, it helps to get it improved.

It will first be released in master/snapshot, then quickly after in 23.05.

Errr - nope, a bad idea if it is indeed possible. Most wireless drivers only allow 1 mesh interface per physical radio.

Do this every night for now. You will hopefully only have to wait a few days.

1 Like

the GUI allows it ;-). But I removed it again and didn't try to activate it.
Edit: when trying to understand what it technically would mean, my idea was that 'mesh' wifi is just another SSID and bridged with some network. Easy enough in software. But then I thought it will probably actually run in hardware and I had no idea what the HW engineer would have had in mind and how the software engineers could map this from some network model in all the hardwares and I removed it again...

But this is why got a bit frustrated on this topic. I am an old system engineer. I have no problems that my systems don't work, but I have a problem when I don't know why... In this case: why does it look like the bridge between my networks and the bridge is broken. What is actually not work and how could I fix it without a restart. And what part of this system should restart. In my professional life restarting a device always felt like a full defeat... At home even worse: my children will spot when the network reboots and complain that whatever they are monitoring lost its control for 30 seconds...
I understand it is useful advice to do restarts, but I just hate to do that.

More serious: is there a log that has logged the restarts of openwrt? This would be a better proof on how often this happens.

Might work on that chipset then, but really don't even go there, it sounds like more trouble than it is worth.

Nope, it is very different. Every meshnode has just a single mesh interface for multi point to multi point communications. Compare this with wds that has both an ap and a sta mode interface on each device.

Me too :wink:

After a period of time the layer 2 mac-routing of the mesh backhaul stops working because the HWMP (Hybrid Wireless Mesh Protocol) is not turned on. If there is no traffic for a period of time, nodes loose contact with each other. Turning HWMP on prevents this scenario from happening.

Rather than rebooting, try running the wifi command. This will restart the radios (yes the kids will still drop out but the process is much faster than a reboot).
The mesh interface will restart and try to connect to other mesh nodes, restoring the link(s).

I am only suggesting this workaround to the problem while you are waiting for the mesh11sd update.

No, not really as logs do not survive reboots.

You do not need to prove this happens, I know it does :wink:

1 Like

no; still confusing... For example : When I configured my mesh it surprised me there was no way I could indicate what 'role' each device plays in the mesh (who has 'internet access', which device is just an AP...); I understand this daemon will mange such config which makes a lot of sense. But I read like '...if there is a internet access on a router it will become a portal mesh node and be DHCP server... But I don't want this device to be my DHCP server. And how does it check if there is internet, and how do I check what the result is of this check... And why is such info only there in the 'installation' section and not given in the overview... And comments about the 'non-mesh backhaul' seem interesting for a developer for the software, but I have no clue why I should know this.

Or also: there is a comment stating "It is essential that all meshnodes are configured to use the same radio channel, the same key and the same mesh_id. By default, Mesh11sd will configure this -walter: I assume the writer means the same radio channel, ,not the mesh_id/key that gets automatically configured- for you" => But in my installation, without Mesh11sd, this is also what happens: when I set my radio to 'auto channel', the channels are set all the same: so what will the deamon do?

I am still confused on what the logical order of things is... Do I first create my Mesh wireless devices on the routers and then install/Configure/activate the deamon...
Statement like ' After completing installation, no configuration is required for the mesh to become active, => Euh: but if I don't need to do anything how will each router know it belongs to the mesh?
And there is much more... ;-).

Anyway: If I think I have ideas on improving the documentation, do I create an issue or sent you something directly (I can also update it, but chances are too high I got it wrong.)?

I checked a bit on the difference between WDS and mesh; basically mesh comes with the promise the mesh will handle the layer 2 routing fully for you. WDS, with the different roles as a minimal config needed. Maybe WDS mapped better to my use case as the roles on each router are very clear. But then again: I just found another option to route an ethernet cable in my house, then mesh would be superior with multiple routers that allow internet connections. And still mesh sounds cooler...

it was just for me to 'calculate' how often it has happened. I could for some time also disable an router/AP while the better deamon comes.

mesh11sd has two modes, auto_config and manual config.

Auto is the default. This allows a mesh backhaul to be very simply rolled out using the exact same config baked into a flash image. This is a "no-brainer" task with no actual configuration needed to get the mesh up and running.

Manual mode is for the advanced user who wants to do all the interface configuration along with custom services of all kinds. In this case, mesh11sd just manages the mesh parameters.

All meshnodes are equal peers in the mesh network. There are no mesh roles.

There are other things that a mesh peer can also do, like connect to an upstream network eg the Internet, connect to a downstream network eg an access point for users to connect their devices to. (these have names, mesh portal and mesh gate respectively).

That is fine, but you will have to configure manual mode, which requires knowledge of how a mesh works.

A mesh network is a wireless network that only mesh nodes can connect to and provides a multi-point to multi-point "array" of connections between meshnodes.
This is known as the mesh backhaul.

Normal wifi enabled user devices cannot connect to a mesh network.

Think of the "mesh backhaul" as a virtual ethernet switch. Layer 2 packets are mac-routed through this virtual switch to their intended destination.

A mesh can have links within its infrastructure that are non-mesh, typically an ethernet cable between two particular meshnodes. This is what is meant by "non-mesh backhaul". If you are manually configuring and have some ethernet links, you need to know that mesh11sd will prevent bridge loop storms you would otherwise end up with.

Yes it will, in auto config mode, and yes it means channel, key and id. In manual mode you must do this yourself.

This is by luck rather than good judgment. You cannot reliably use "auto channel" for a mesh network. However, mesh11sd will make peer nodes track the channel of the portal node (the one with the Internet feed).

If you want a simple mesh rollout, use auto_config (the default). It will do everything for you, starting from the basic flash image for the router.

If you want manual mode, you must do everything yourself.

You can have manual mode on your portal node, allowing services like dhcp to reside elsewhere, and auto_config on normal peer nodes.

Each node knows because auto_config sets the mesh id and mesh key for you on every node. You can of course set your own mesh_id string and mesh_key string in the config of mesh11sd (conveniently done by the firmware selector, or you can do this later, using mesh11sd tools).

Open an issue on Github. There we can discuss changes in depth.

Your input will be much appreciated.

1 Like