Unexpected switching between APs


running here a network of 4 OpenWRT AP (version (git-19.079.18950-11e64f8) running on a NETGEAR WNDR4300. Authentication is WPA2-EAP based with a central Radius server. The WiFi clients are essentially Raspberry PI3 boards with latest distribution. All APs use the same SSID and all APs have been setup to support pre-authentication. Checking the stations, confirms that pre-authentication has been performed.

Now the strange behaviour. One of mi PI3 boards is only 3 meters away from its closest AP. RSSI is > -48dB, looks perfect in WiFi terms. What I notice is that a irregualr intervals (1-2 hours) the Station disconnects and connects to a different AP with a worse RSSI (< -60 dB). After 1-2 hours the Station switches back to the better AP. BTW, my other PI3 boards have the same behaviour although their "best" AP is not as good as the one discribed above.

Trying to force the Station to switch to the better AP (using roam command in wpa_cli) results in a failure. Doing some deeper analyses reveales that the better AP has been blacklisted in the Station. So I understand that that AP is skipped because it is blacklisted (that's normal), but the question is why does the station disconnect in the first place and why is it blacklisted.

Any clue?

My best guess would be that the APs WiFi briefly fails, so the station connects to the next best AP. If you could put 2 stations in the same place for the sake of trial and see if they will do that at exactly same time.

Looking at the log of the APs at those times may help you to see what happens a those times.

My guess, PI power sleep related ( +pi os .11 deterministic algo ) seems like they find the weaker signal just as good. or the first signal just as good?

Yeah, the first thing I always do on Pi 3 units is to kill the WLAN power management, as it makes the onboard wifi HORRENDOUSLY flaky.

1 Like

So did some further debugging this week and learned that the station moves from (let's say) AP1 (best signal) to AP2 (por signal) because of a EAPOL Key Timeout during the 4-way handshake with AP1. This works fine most of the time, but sometimes the message is not being receive dispite 5 retransmits which results in the mentioned timeout.

Any clue how to solve this is where to check further?