OpenWrt reboots after device connect

Hello,

I am running OpenWrt 23.05.2 on a TP-Link Archer C6 v2 device. This device is my router. There are two TP-Link Archer C6 v2 devices connected to this running as 'dumb' access points.

A few weeks ago, I have created two VLANs on my network. This has mostly been working fine but it seems like ever since this moment, the main router crashes sometimes.

It seems like at some points, the main router reboots when a device connects to my network and a DHCP request is made. It seems to happen both with wired clients and wireless clients.

After the reboot, my devices will reconnect and everything works fine again.

To troubleshoot this, I have set up a Raspberry Pi with tmux and running logread -f on it until the problem reappears.

Today, I have encountered the problem again so I took a look at the logs. I think the problem has something to do with the last line. Can you help me troubleshoot this problem?

Tue Mar 19 18:42:17 2024 daemon.notice hostapd: nl80211: nl80211_recv_beacons->nl_recvmsgs failed: -5
client_loop: send disconnect: Broken pipe
%          

Thanks in advance.

Well, its VLANs so @psherman handles those best but if I were you I'd put the Pi in charge as your main router and use all the TP-Links as dumb APs.

He will ask for at least some of theses so might as well get them out of the way:

Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </> " button:
grafik
Remember to redact passwords, MAC addresses and any public IP addresses you may have:

ubus call system board
cat /etc/config/network
cat /etc/config/wireless
cat /etc/config/dhcp
cat /etc/config/firewall
1 Like
root@openwrt:~# ubus call system board
{
	"kernel": "5.15.137",
	"hostname": "openwrt",
	"system": "Qualcomm Atheros QCA956X ver 1 rev 0",
	"model": "TP-Link Archer C6 v2 (EU/RU/JP)",
	"board_name": "tplink,archer-c6-v2",
	"rootfs_type": "squashfs",
	"release": {
		"distribution": "OpenWrt",
		"version": "23.05.2",
		"revision": "r23630-842932a63d",
		"target": "ath79/generic",
		"description": "OpenWrt 23.05.2 r23630-842932a63d"
	}
}

root@openwrt:~# cat /etc/config/network

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'redacted/48'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'eth0.1'

config interface 'lan'
	option device 'br-lan'
	option proto 'static'
	option ipaddr '192.168.1.1'
	option netmask '255.255.255.0'
	option ip6assign '60'

config interface 'wan'
	option device 'eth0.2'
	option proto 'dhcp'

config interface 'wan6'
	option device 'eth0.2'
	option proto 'dhcpv6'

config switch
	option name 'switch0'
	option reset '1'
	option enable_vlan '1'

config switch_vlan
	option device 'switch0'
	option vlan '1'
	option ports '0t 2t 3 4t 5'
	option vid '1'
	option description 'LAN'

config switch_vlan
	option device 'switch0'
	option vlan '2'
	option ports '0t 1'
	option vid '2'
	option description 'WAN'

config interface 'iot'
	option proto 'static'
	option ipaddr '192.168.3.1'
	option netmask '255.255.255.0'
	option device 'br-iot'

config switch_vlan
	option device 'switch0'
	option vlan '3'
	option vid '3'
	option description 'IOT'
	option ports '0t 2t 4t'

config device
	option type 'bridge'
	option name 'br-iot'
	list ports 'eth0.3'
root@openwrt:~# cat /etc/config/wireless

config wifi-device 'radio0'
	option type 'mac80211'
	option path 'pci0000:00/0000:00:00.0'
	option channel 'auto'
	option band '5g'
	option htmode 'VHT80'
	option cell_density '0'

config wifi-iface 'default_radio0'
	option device 'radio0'
	option network 'lan'
	option mode 'ap'
	option ssid 'redacted'
	option encryption 'psk2'
	option key 'redacted'
	option ieee80211r '1'
	option mobility_domain '1111'
	option ft_over_ds '0'
	option ft_psk_generate_local '1'
	option dtim_period '3'

config wifi-device 'radio1'
	option type 'mac80211'
	option path 'platform/ahb/18100000.wmac'
	option channel 'auto'
	option band '2g'
	option cell_density '0'
	option htmode 'HT20'
	option legacy_rates '1'

config wifi-iface 'default_radio1'
	option device 'radio1'
	option network 'lan'
	option mode 'ap'
	option ssid 'redacted'
	option encryption 'psk2'
	option key 'redacted'
	option ieee80211r '1'
	option mobility_domain '2222'
	option ft_over_ds '0'
	option ft_psk_generate_local '1'
	option dtim_period '3'

config wifi-iface 'wifinet2'
	option device 'radio1'
	option mode 'ap'
	option ssid 'redacted'
	option encryption 'psk2'
	option key 'redacted'
	option network 'iot'
	option ieee80211r '1'
	option mobility_domain '3333'
	option ft_over_ds '0'
	option ft_psk_generate_local '1'
	option wmm '0'
root@openwrt:~# cat /etc/config/dhcp

config dnsmasq
	option domainneeded '1'
	option localise_queries '1'
	option rebind_protection '1'
	option rebind_localhost '1'
	option local '/lan/'
	option domain 'lan'
	option expandhosts '1'
	option cachesize '1000'
	option authoritative '1'
	option readethers '1'
	option leasefile '/tmp/dhcp.leases'
	option resolvfile '/tmp/resolv.conf.d/resolv.conf.auto'
	option localservice '1'
	option ednspacket_max '1232'
	list addnmount '/bin/busybox'

config dhcp 'lan'
	option interface 'lan'
	option start '100'
	option limit '150'
	option leasetime '1d'
	option dhcpv4 'server'
	option dhcpv6 'server'
	option ra 'server'
	list ra_flags 'managed-config'
	list ra_flags 'other-config'

config dhcp 'wan'
	option interface 'wan'
	option ignore '1'

config odhcpd 'odhcpd'
	option maindhcp '0'
	option leasefile '/tmp/hosts/odhcpd'
	option leasetrigger '/usr/sbin/odhcpd-update'
	option loglevel '4'

config dhcp 'iot'
	option interface 'iot'
	option start '100'
	option limit '150'
	option leasetime '1d'

(many host and cname configs here, redacted keep it tidy and for privacy reasons)
root@openwrt:~# cat /etc/config/firewall

config defaults
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option synflood_protect '1'

config zone
	option name 'lan'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	list network 'lan'

config zone
	option name 'wan'
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option masq '1'
	option mtu_fix '1'
	list network 'wan'
	list network 'wan6'

config forwarding
	option src 'lan'
	option dest 'wan'

config rule
	option name 'Allow-DHCP-Renew'
	option src 'wan'
	option proto 'udp'
	option dest_port '68'
	option target 'ACCEPT'
	option family 'ipv4'

config rule
	option name 'Allow-Ping'
	option src 'wan'
	option proto 'icmp'
	option icmp_type 'echo-request'
	option family 'ipv4'
	option target 'ACCEPT'

config rule
	option name 'Allow-IGMP'
	option src 'wan'
	option proto 'igmp'
	option family 'ipv4'
	option target 'ACCEPT'

config rule
	option name 'Allow-DHCPv6'
	option src 'wan'
	option proto 'udp'
	option dest_port '546'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-MLD'
	option src 'wan'
	option proto 'icmp'
	option src_ip 'fe80::/10'
	list icmp_type '130/0'
	list icmp_type '131/0'
	list icmp_type '132/0'
	list icmp_type '143/0'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-ICMPv6-Input'
	option src 'wan'
	option proto 'icmp'
	list icmp_type 'echo-request'
	list icmp_type 'echo-reply'
	list icmp_type 'destination-unreachable'
	list icmp_type 'packet-too-big'
	list icmp_type 'time-exceeded'
	list icmp_type 'bad-header'
	list icmp_type 'unknown-header-type'
	list icmp_type 'router-solicitation'
	list icmp_type 'neighbour-solicitation'
	list icmp_type 'router-advertisement'
	list icmp_type 'neighbour-advertisement'
	option limit '1000/sec'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-ICMPv6-Forward'
	option src 'wan'
	option dest '*'
	option proto 'icmp'
	list icmp_type 'echo-request'
	list icmp_type 'echo-reply'
	list icmp_type 'destination-unreachable'
	list icmp_type 'packet-too-big'
	list icmp_type 'time-exceeded'
	list icmp_type 'bad-header'
	list icmp_type 'unknown-header-type'
	option limit '1000/sec'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-IPSec-ESP'
	option src 'wan'
	option dest 'lan'
	option proto 'esp'
	option target 'ACCEPT'

config rule
	option name 'Allow-ISAKMP'
	option src 'wan'
	option dest 'lan'
	option dest_port '500'
	option proto 'udp'
	option target 'ACCEPT'

config zone
	option name 'iot'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'REJECT'
	list network 'iot'

config forwarding
	option src 'lan'
	option dest 'iot'

(a few config redirects here - redacted for security reasons)

Great!

Just hang tight.

Nothing jumps out as problematic...

A few thoughts:

  1. Consider removing all of the 802.11r related stuff from your wifi configs.
  2. Is the device in question currently pat of the DHCP leases (or otherwise listed) in the dhcp config file?
  3. Does the problem occur regardless how the device connects (i.e. if it is wired and wireless, does it happen in both cases, or if it is wireless, does it cause the same crash regardless if it connects to your main router's wifi directly or one of the other 2 APs?)

How do you define crash? Does it physically reboot the hardware or just stop responding?

Vlan settings or dhcp doesn’t usually create “crash” how badly configured it may be. You may loose a meaningful network connection but it still keeps on running waiting for better configs.

Consider removing all of the 802.11r related stuff from your wifi configs.

I might try that, but Fast Roaming is a thing I really need on my network

Is the device in question currently pat of the DHCP leases (or otherwise listed) in the dhcp config file?

I saw the reboot ('crash') happening with two devices connecting: a wired PC and my phone on wireless. My PC has no static lease assigned to it, but does get an IP via DHCP. My phone is assigned a hostname via the Static Leases tab without assigning a pre-set IP to it.

Does the problem occur regardless how the device connects (i.e. if it is wired and wireless, does it happen in both cases, or if it is wireless, does it cause the same crash regardless if it connects to your main router's wifi directly or one of the other 2 APs?)

The crash has happened two times now:

  • My PC was connected to AP 192.168.1.2 with an Ethernet cable when the crash happened
  • My phone was probably connected to AP 192.168.1.3 wirelessly when the crash happened.

You're right, the wording is a bit ambiguous. The router stops responding and then reboots, I can tell by the Uptime.

Why?
Have you carefully tuned each AP (channel + channel width, power levels, location where possible) before enabling 802.11r?

Is the crash stimulus repeatable?

Why? Have you carefully tuned each AP (channel + channel width, power levels, location where possible) before enabling 802.11r?

My house has different levels and I want to keep connection when starting a call in the basement and walking to the attic. But I will disable 802.11r for now to make troubleshooting easier.

Is the crash stimulus repeatable?

I have only seen it happening two times now, so for me it's also hard to say what exactly caused the device to crash. It's bound to happen another time in the next few days, so I have a new logread -f running already.

Keep in mind that it could be a red-herring. The crash could have been for any number of reasons, including a failing power adapter that is marginal most of the time, and just beyond the limit at those specific moments.

But yes, please monitor and report back.

1 Like

OK, will do, any way, thanks so far.

In theory, 802.11r is a good thing.
In practice it is one of the first things that come to mind if you're experiencing issues, especially with your client devices.

So great if it works (and if you actually confirmed it to be working on your clients, all of them), but in many cases clients don't like it being there and show a number of different (hard to pinpoint) issues and quirks. Always be prepared to disable 802.11r for testing, to confirm or rule out this as a potential cause.

It's sad, but many proprietary clients (printers, phones, anything smarthome or IoT) are programmed very badly and anything diverging from the norm will cause issues.

3 Likes

Hmm, it could in theory be the harware wdt that is doing this.

But during the years I have been using openwrt I have only seen a automatic reboot once and that is during the normal boot when it was a package fault at build time on the snapshot that failed.

I have seen psu fail many more times causing brown out and reboots, or in other words “crash and reboot”.

2 Likes

I can understand you can deduce when it restarts, but how do you know when it stops responding"? If there is a way to know copy the system log ASAP; try to catch it so just race to the bottom 10 lines.

I doubt that and here is why:
If I run a large company and offer w-fi from the door and throughout Fast Roaming gets clients off that first AP 'Fast'. Then the deeper in it is handy to further hand out slices of bandwidth further offloading frontline APs. If your handheld device is dropping an AP and then not looking for a replacement ASAP, that is not a router issue.

It better be; by adding everything they can throw at the AP. Otherwise we are barking up the wrong tree based on presumptions.

You're right, it's not enitrely clear if the router stops responding. I said that because the webpage I was trying to access stopped loading, but that does not mean that the router itself stopped responding. The only thing we know is that it reboots. I edited the title and question to match this insight.

Thanks for explaining on Fast Roaming, @slh and @LilRedDog, I will keep it disabled for now.

For now, I will keep logread -f running while SSH'd into the router. If there's anything I can do to get more useful / detailed logging, please let me know.

I just encountered a new reboot, this time the trigger was not connecting a new device, but probably me starting an Ubuntu iso bittorrent download.

This was on my PC, connected in a wired fashion to AP 192.168.1.2.

The error happened at around 14.25;

Wed Mar 20 14:10:00 2024 cron.err crond[5336]: USER root pid 5810 cmd scp /tmp/dhcp.leases root@192.168.1.3:/tmp/dhcp.leases
Wed Mar 20 14:20:00 2024 cron.err crond[5336]: USER root pid 5828 cmd scp /tmp/dhcp.leases root@192.168.1.2:/tmp/dhcp.leases
client_loop: send disconnect: Broken pipe

This could be a power issue. Do you have another power adapter for your device? You need the same voltage and the same (or ideally higher) current rating.

OK, I replaced the 12V 1A adapter with a 12V 2A adapter. Let's see how it goes.

Great. You might run another bittorrent or similar stress test and/or wait and see what happens.