Debugging Wi-Fi disconnections for a Windows machine

Hello,

I have recently moved to a new place, and gotten some additional hardware for Wi-Fi coverage throughout the house. My setup is relatively straightforward: a Netgear R7800 as the main router & DHCP server, then two TP-Link Archer C6 v3.2s acting as 'dumb' access points/managed switches. These are all on the same subnet, and I do not have any fancy VLAN/VPN/firewall business.

I have flashed the latest OpenWRT versions on these devices, and it seems to be working relatively okay for most of my devices (2 android phones, a television connected by cable, and a laptop running Linux). However, my partner's Windows machine seems to run into WiFi trouble (cable works fine). At seemingly random moments, the machine reports that it is connected to WiFi, but not to the internet. This usually resolves after a couple minutes have passed. I'd like to fix it, but am not sure how to proceed in debugging.

I've looked at the logs of the access point, where I can see the disconnect happening. However, the device successfully reconnects to the WiFi immediately, so I am not sure why it reports no internet connectivity. Running ipconfig shows a local IPv4 address, although its global IPv6 address is missing. pinging does not work, also not for local addresses.

A snippet of such a disconnect happening:

Wed Dec 28 12:00:14 2022 daemon.notice hostapd: phy1-ap0: AP-STA-DISCONNECTED [mac redacted]
Wed Dec 28 12:00:14 2022 daemon.info hostapd: phy1-ap0: STA [mac redacted] IEEE 802.11: authenticated
Wed Dec 28 12:00:14 2022 daemon.info hostapd: phy1-ap0: STA [mac redacted] IEEE 802.11: associated (aid 1)
Wed Dec 28 12:00:14 2022 daemon.notice hostapd: phy1-ap0: STA-OPMODE-SMPS-MODE-CHANGED [mac redacted] off
Wed Dec 28 12:00:14 2022 daemon.notice hostapd: phy1-ap0: AP-STA-CONNECTED [mac redacted] auth_alg=open
Wed Dec 28 12:00:14 2022 daemon.info hostapd: phy1-ap0: STA [mac redacted] RADIUS: starting accounting session 5AB6FC4A7C292C6A
Wed Dec 28 12:00:14 2022 daemon.info hostapd: phy1-ap0: STA [mac redacted] WPA: pairwise key handshake completed (RSN)
Wed Dec 28 12:00:14 2022 daemon.notice hostapd: phy1-ap0: EAPOL-4WAY-HS-COMPLETED [mac redacted]
Wed Dec 28 12:00:20 2022 daemon.notice hostapd: phy1-ap0: BEACON-REQ-TX-STATUS [mac redacted] 185 ack=1

Let me know if more information on the configuration is wanted. I've mostly used LuCi to do the configuration. I have also installed DAWN on all devices, but the disconnects are not preceded by anything dawn related in the logs as far as I can tell.

Is there any activity on the other AP ?
Possibly trying to switch, failing, and returning to the first one?

Also, what’s the debug level set at?

Also, starting hostapd manually gives you a lot more info to trace issues.

Nope, there is no activity on the other switches when this happens -- I guess they are out of range. The only other activity I see in the log is in that of the DHCP server, which looks like a regular renewal after the reconnect occurs.

I haven't touched the debug levels. I'll try increasing them and will post back later if this gets me more information.

Alright, I increased the hostapd debug level from info=2 to debug=1 and waited for another such disconnection. It looks like this in the log:

Fri Jan  6 11:19:24 2023 daemon.notice hostapd: phy1-ap0: AP-STA-DISCONNECTED [mac redacted]
Fri Jan  6 11:19:24 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] WPA: event 3 notification
Fri Jan  6 11:19:24 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] IEEE 802.1X: unauthorizing port
Fri Jan  6 11:19:24 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] IEEE 802.11: deauthenticated
Fri Jan  6 11:19:24 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] MLME: MLME-DEAUTHENTICATE.indication([mac redacted], 1)
Fri Jan  6 11:19:24 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] MLME: MLME-DELETEKEYS.request([mac redacted])
Fri Jan  6 11:19:25 2023 daemon.notice hostapd: phy0-ap0: BEACON-REQ-TX-STATUS [unrelated-mac-redacted] 79 ack=1
Fri Jan  6 11:19:28 2023 daemon.notice hostapd: phy0-ap0: BEACON-REQ-TX-STATUS [unrelated-mac-redacted] 80 ack=1
Fri Jan  6 11:19:31 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] IEEE 802.11: authentication OK (open system)
Fri Jan  6 11:19:31 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] MLME: MLME-AUTHENTICATE.indication([mac redacted], OPEN_SYSTEM)
Fri Jan  6 11:19:31 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] MLME: MLME-DELETEKEYS.request([mac redacted])
Fri Jan  6 11:19:31 2023 daemon.info hostapd: phy1-ap0: STA [mac redacted] IEEE 802.11: authenticated
Fri Jan  6 11:19:31 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] IEEE 802.11: association OK (aid 1)
Fri Jan  6 11:19:31 2023 daemon.info hostapd: phy1-ap0: STA [mac redacted] IEEE 802.11: associated (aid 1)
Fri Jan  6 11:19:31 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] MLME: MLME-ASSOCIATE.indication([mac redacted])
Fri Jan  6 11:19:31 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] MLME: MLME-DELETEKEYS.request([mac redacted])
Fri Jan  6 11:19:31 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] IEEE 802.11: binding station to interface 'phy1-ap0'
Fri Jan  6 11:19:31 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] WPA: event 1 notification
Fri Jan  6 11:19:31 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] WPA: start authentication
Fri Jan  6 11:19:31 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] IEEE 802.1X: unauthorizing port
Fri Jan  6 11:19:31 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] WPA: sending 1/4 msg of 4-Way Handshake
Fri Jan  6 11:19:31 2023 daemon.notice hostapd: phy1-ap0: STA-OPMODE-SMPS-MODE-CHANGED [mac redacted] off
Fri Jan  6 11:19:31 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] WPA: received EAPOL-Key frame (2/4 Pairwise)
Fri Jan  6 11:19:31 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] WPA: sending 3/4 msg of 4-Way Handshake
Fri Jan  6 11:19:31 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] WPA: received EAPOL-Key frame (4/4 Pairwise)
Fri Jan  6 11:19:31 2023 daemon.notice hostapd: phy1-ap0: AP-STA-CONNECTED [mac redacted] auth_alg=open
Fri Jan  6 11:19:31 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] IEEE 802.1X: authorizing port
Fri Jan  6 11:19:31 2023 daemon.info hostapd: phy1-ap0: STA [mac redacted] RADIUS: starting accounting session 2D404439E95CD44B
Fri Jan  6 11:19:31 2023 daemon.info hostapd: phy1-ap0: STA [mac redacted] WPA: pairwise key handshake completed (RSN)
Fri Jan  6 11:19:31 2023 daemon.notice hostapd: phy1-ap0: EAPOL-4WAY-HS-COMPLETED [mac redacted]
Fri Jan  6 11:19:32 2023 daemon.notice hostapd: phy0-ap0: BEACON-REQ-TX-STATUS [unrelated-mac-redacted] 81 ack=1
Fri Jan  6 11:19:35 2023 daemon.notice hostapd: phy1-ap0: BEACON-REQ-TX-STATUS [mac redacted] 159 ack=1

???

Can you provide details?

I've read your thread a few times, I only see RADIUS noted in the log - you never mention this.

You mean the 2 TP's?

1 Like

Hey, thanks for your response!

I haven't mentioned RADIUS because I haven't touched or configured RADIUS in any way. The wifi network has a regular password, no need to enter credentials or anything.
I'm not sure why it occurs in the log, that is curious.

Yes, I have installed DAWN on both the TP-Links, and on the R7800.

Taking a different angle, I've tried to track down where in the source of hostapd these log entries come from, in the hope of learning more. I found out that this line:

Fri Jan  6 11:19:24 2023 daemon.debug hostapd: phy1-ap0: STA [mac redacted] MLME: MLME-DEAUTHENTICATE.indication([mac redacted], 1)

indicates that the client sends a 802.11 deauthentication frame to the access point, with reason code 1 = Unspecified reason. This differs from when I manually disconnect from the wifi with e.g. my phone, that will show reason code 3 = station is leaving. Not sure how to do something with this information, though.

…running hostapd manually gives you a lot more insight, especially when navigating it’s source code.

From the log it seems key share is ok (aid 1) but the transition somehow fails…

Sorry for the radio silence: I had not much to go on, and the problem appeared less frequently. Unfortunately, I also started noticing connection problems on my Linux laptop.

My laptop appeared to disconnect randomly, but never when I was using it. If I left the laptop running unattended, it would disconnect, but once I started using it again, it would reconnect. I initially thought this might be some power saving thing, but I have since ruled that out. I can now reproduce the following reliably: My laptop will lose its connection within 2-6 minutes when it is the only WiFi device connected to the access point. My laptop disconnects when I leave because I take my phone with me, which disconnects from the access point. My laptop reconnects when I return, because my phone reconnects to the access point. If left the phone and laptop unattended in the same room, the connection stays stable.

By losing connection I mean: I am unable to ping any address from my laptop, even local addresses: I get a 'Destination Host Unreachable' error for everything. Using the advice given above, I ran hostapd manually with the -dd flag, although it did not give me more information than before. Curiously, hostapd does not see the disconnect, and reports succes when sending the BEACON- messages to my device. Pinging the access point from my laptop does not show anything interesting, except that it does really lose connection: I get Destination Host Unreachable at some point, which repeats until my phone is back in the room. A couple of ping responses arrive very late. Pinging my laptop from the access point is somewhat more interesting: no errors, but alot of the ping messages somehow clog up until the phone is back, then they all return at once. In the log, this shows as a sudden jump in response time.

Here are the logs: hostapd log and ping laptop from router log.

Any pointers on possible solutions/how to continue investigating the problem would be much appreciated!