Chromecast do not reconnect after router reboot

Netgear R7800 / 19.07-SNAPSHOT / ath10k-ct

I do not not know when this started happening. If I reboot/reconnect those devices they connect fine. If instead of a reboot, I simply restart the wireless then everything reconnects happily. So only a reboot is affected. The issue is reliably reproducible most of the time, but not 100%. I tried waiting for a long time but no luck.

Out of all devices, the chromcast is the most stubborn and annoying, because it creates an open AP if not connected to a WiFi. Upon logging into the router (after a reboot), I can see an arp entry for it, but I cannot ping its IP. The logs show that it got an IP followed by AP-STA-DISCONNECTED 15 seconds later. The peculiar thing that it would not even try to reconnect no matter how long I wait. Rebooting the Chromecast makes it connect fine.

Is there a way to debug this? I would like to figure this out.

I need two logs:

  • Client: dmesg output (if Linux)
  • AP: logread

It is a Chromecast, so cannot get anything from it. I just tested with a non-CT firmware and both 2GHz and 5GHz bands and the behaviour is the same.

The log messages for this MAC (Chromecast) are below.

daemon.info hostapd: STA IEEE 802.11: authenticated
daemon.info hostapd: STA IEEE 802.11: associated (aid 1)
daemon.notice hostapd: AP-STA-CONNECTED
daemon.info hostapd: STA RADIUS: starting accounting session
daemon.info hostapd: STA WPA: pairwise key handshake completed (RSN)
daemon.info dnsmasq-dhcp[10188]: DHCPDISCOVER
daemon.info dnsmasq-dhcp[10188]: DHCPOFFER
daemon.info dnsmasq-dhcp[10188]: DHCPREQUEST
daemon.info dnsmasq-dhcp[10188]: DHCPACK
daemon.notice hostapd: AP-STA-DISCONNECTED

Apologies for making an obvious suggestion, but:

since it does not seem to correlate with ct vs non ct (I assume you also tried the non ct kernel driver as well) perhaps try reverting hostapd and see if the behavior changes.

FWIW on snapshots from master I have observed older ios/android clients connected (on 2.4GHz) to an AP but not having internet access. Repeatedly forcing them to disconnect/reconnect seems to resolve it. I can't reproduce this yet and I've made some significant changes to my network (recent ct firmware update, change from openwrt router as AP/GW to AP only, testing 4.19 kernel, etc.)

Seems like this started ~ 1 month ago for me

Yes I do. In addition to that I also copy the latest firmware/board files form kvalo, but because CT is standard I will be troubleshooting with it only.

I am experiencing the same issue with a MacBook (it can never reconnect after a router reboot while Chromecast sometimes does) and I can provide logs for it. When the router is coming down I see these messages below in wifi.log. When the router is up and I see the AP, the MacBook is not even trying to connect: no activity in any logs until I manually connect it to the AP.
I also tried restarting the wireless and noticed the difference: the middle portion of the log is missing during a wifi restart, but is present during a router reboot.

Sat Nov  9 12:50:28.156 <kernel> IO80211AssociationJoinSnapshot::captureRequestCallback Problem reported from corecapture
Sat Nov  9 12:50:28.163 <kernel> en0: link down event, informing supplicant.
Sat Nov  9 12:50:28.171 <kernel> AirPort_BrcmNIC::getSSIDData(): Get failure: APPLE80211_IOC_SSID: 6
Sat Nov  9 12:50:28.254 <kernel> en0: Terminating supplicant.
Sat Nov  9 12:50:28.254 <kernel> RSNSupplicant: Releasing authenticator for YY:YY:YY:YY:YY:YY
Sat Nov  9 12:50:28.298 <airportd[223]> _processIPv4Changes: ARP/NDP offloads disabled, not programming the offload
Sat Nov  9 12:50:28.314 <airportd[223]> _processIPv4Changes: ARP/NDP offloads disabled, not programming the offload
Sat Nov  9 12:50:28.438 <kernel> parseRSNIE: groupCipherType = 5 pairwiseCipherType = 5 authSel = 6
Sat Nov  9 12:50:28.438 <kernel> initWithInterfaceAndIE: _myMacAddress XX:XX:XX:XX:XX:XX
Sat Nov  9 12:50:28.438 <kernel> setPMK: PMK SET!
Sat Nov  9 12:50:31.968 <kernel> Unexpected payload found for message 9, dataLen 0
Sat Nov  9 12:50:31.987 <kernel> en0: Terminating supplicant.
Sat Nov  9 12:50:31.987 <kernel> RSNSupplicant: Releasing authenticator for 00:00:00:00:00:00
Sat Nov  9 12:50:32.291 <kernel> parseRSNIE: groupCipherType = 5 pairwiseCipherType = 5 authSel = 6

######
###### The following six lines are missing on MacBook when wireless is restarted on the router with **wifi** command.
######
Sat Nov  9 12:50:32.291 <kernel> initWithInterfaceAndIE: _myMacAddress XX:XX:XX:XX:XX:XX
Sat Nov  9 12:50:32.291 <kernel> setPMK: PMK SET!
Sat Nov  9 12:50:35.836 <kernel> Unexpected payload found for message 9, dataLen 0
Sat Nov  9 12:50:35.856 <kernel> en0: Terminating supplicant.
Sat Nov  9 12:50:35.857 <kernel> RSNSupplicant: Releasing authenticator for 00:00:00:00:00:00
Sat Nov  9 12:50:37.238 <kernel> parseRSNIE: groupCipherType = 5 pairwiseCipherType = 5 authSel = 6


Sat Nov  9 12:50:37.238 <kernel> initWithInterfaceAndIE: _myMacAddress XX:XX:XX:XX:XX:XX
Sat Nov  9 12:50:37.238 <kernel> setPMK: PMK SET!
Sat Nov  9 12:50:40.776 <kernel> Unexpected payload found for message 9, dataLen 0
Sat Nov  9 12:50:40.796 <kernel> en0: Terminating supplicant.
Sat Nov  9 12:50:40.796 <kernel> RSNSupplicant: Releasing authenticator for 00:00:00:00:00:00

my only other suggestion comes after a bit of googling

For the senario of certain clients are not staying connecting after a router reboot, see if the router does or does not have an internet connection out (on WAN) when the clients first attempt a (re)connection. The chromecast log above seems to indicate that it did connect after the reboot but then left (no indication to me if the router dropped it or the chromcast "chose" to leave).

The hint I found indicated that ios or google devices may try to determine if they are on a "captive portal" (i.e. no internet access after associating to an AP). What ios or google os's do if they don't think they can get out may be part of the issue... (at least I think this could be related to my issues).

good luck

Well, it is all possible (I am using a PPPoE connection that takes a while to establish), but I do not think is explains why the clients do not try to reconnect ever. Below is what I see in the log. Is there a way to disable automatic WiFi start? That way I could bring it up manually once the PPPoE tunnel is established. I do not mind trying this theory.

Hmm, I have only now noticed that it took seven minutes between Chromcast associating and getting an IP address. Now need to figure out how to delay WiFi...

15:22:27 - Chromecast associated
15:22:45 - dnsmasq[916]: exiting on receipt of SIGTERM
15:22:45 - dnsmasq-dhcp[4696] starts
15:22:46 - pppd Exit (first failed attempt)
15:23:09 - dnsmasq-dhcp[5080] starts with a new PID
15:23:14 - Network device 'pppoe-wan' link is up
15:23:15 - Reloading firewall due to ifup of wan (pppoe-wan)
15:29:51 - DHCPDISCOVER/DHCPOFFER/DHCPREQUEST/DHCPACK from the Chromecast
15:29:51 - AP-STA-DISCONNECTED

if your just looking to test the theory:

connect to router with wired
manually stop wifi,
manually stop WAN,
then manual bring wifi back up and see what happens with your clients

Figuring out how to slow wifi down could be as simple as a sleep command to an init script

EDIT: if you do this, perhaps slow down the wifi connection of specific clients by their MAC address. Slowing down your whole wifi net seems silly to fix google &/or apple's issues

or writing a watch script to grep for " 'pppoe-wan' link is up" in the logs.

I did something similar in the WOL script below (now modified but you'll get the idea)

VERSION=0.2
INTERVAL=240
NUMP=3
OLD=""
PORT=443
WOLPORT=9
TARGET=XX.XXX.XX.XX
BROADCAST=XX.XXX.XX.255
MAC=00:00:00:00:00:00
WOLIF=br-lan
#$WOL -i $BROADCAST -p $WOLPORT $MAC
#WOL=/usr/bin/wakeonlan
#$WOL -b -i $WOLIF $MAC
WOL=/usr/bin/etherwake
LOGGER="/usr/bin/logger"
LOGPREFIX="WOL: "
echo "wol.sh: version $VERSION started" | $LOGGER

while sleep $INTERVAL;do
#uncomment to search logs for trigger
#NEW=`logread | awk '/'"$LOGPREFIX"'/ && /DST='"$TARGET"'/ && /DPT='"$PORT"'/ {print }' | tail -1`
#SRC=`logread | awk '/'"$LOGPREFIX"'/ && /DST='"$TARGET"'/ && /DPT='"$PORT"'/ {print }' | sed -e 's/.*SRC=\(.*\) DST=.*/\1/p' | tail -1`
#if [ "$NEW" != "" -a "$NEW" != "$OLD" ]; then

   if ping -qc $NUMP $TARGET >/dev/null; then
      echo "wol.sh: $TARGET is alive" | $LOGGER
   else
      echo "wol.sh: $TARGET not responding to pings" | $LOGGER
      echo "wol.sh: $WOL -b -i $WOLIF $MAC" `$WOL -b -i $WOLIF $MAC` | $LOGGER
   fi
#   OLD=$NEW
   sleep 360

#fi
done

So, I figured it out. There are actually two issues helping each other make things worse.

This is how things are supposed to work:

  1. dnsmasq is started by S19dnsmasq and the expectation is that it comes up very quickly.
  2. wifi is started a bit later by S20network
  3. It looks like after the network interfaces are up, dnsmasq is restarted via SIGTERM
  4. WiFi clients start getting associated between steps 2 and 3 above and a second or two after step 3 they start getting IP addresses

Things are different with adblock...

  1. Upon the first activation, adblock adds an entry to /etc/config/dhcp: option serversfile '/tmp/adb_list.overall'
  2. Once everything is setup, the router is rebooted
  3. Now dnsmasq cannot start because /tmp/adb_list.overall does not exist yet
  4. WiFi comes up and clients are being associated, but they cannot get an IP address because dnsmasq is not running
  5. adblock does not do anything until a WAN trigger is activated and in my case it is wan (pppoe-wan) and sometimes it takes time before the tunnel is up
  6. Now adblock starts, creates an empty /tmp/adb_list.overall and finally dnsmasq can fully come up
  7. Clients waited for a long time between associating and getting an IP address and if they get it too late, some will not try this AP until those clients are rebooted or forced to reconnect. Most clients will keep re-trying, but apparently not Chromecast...

That was issue #1. Turned out that Chromecast does indeed need Internet access right after getting an IP address, but with PPPoE that is sometimes problematic. That is issue #2: Chromecast just stops re-trying.

Here are is how I fixed the issue.

/etc/init.d/adblock_fix creates an empty /tmp/adb_list.overall before dnsmasq starts

#!/bin/sh /etc/rc.common

START=18
USE_PROCD=1

start_service() {
	/usr/bin/logger -t "adblock_fix" "start_service"
	/bin/touch /tmp/adb_list.overall
}

/etc/init.d/wifi_disable_on_boot disables radios before network starts

#!/bin/sh /etc/rc.common

START=18
USE_PROCD=1

start_service() {
	/usr/bin/logger -t "wifi_disable_on_boot" "start_service"
	/sbin/uci set wireless.radio0.disabled='1'
	/sbin/uci set wireless.radio1.disabled='1'
	/sbin/uci commit wireless
}

/etc/hotplug.d/iface/99-wifi-pppoe-wan enables radios and starts wifi after pppoe-wan interface is up

#!/bin/sh
. /lib/functions.sh

[ "$ACTION" = ifup ] || exit 0
[ "$INTERFACE" = wan ] || exit 0
[ "$DEVICE" = pppoe-wan ] || exit 0

logger -t "99-wifi-pppoe-wan" "Enabling WiFi due to $ACTION on $INTERFACE ($DEVICE)"
uci set wireless.radio0.disabled='0'
uci set wireless.radio1.disabled='0'
uci commit wireless
wifi
1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.