There was a power blackout in our dormitory a few weeks ago. Since then every 24 hours after the power came back ~20-25% of the STAs are dropped. The logs show that the WPA rekeying GTK process is started and due to group key handshake failed after 4 tried, the WPA_PTK state machine disconnects the STAs.
There are APs rebooted since the power had came back, and they are not affected by the upper symptom.
This looks quite impossible for me, but I cannot figure out any other explanation. The config files are unchanged.
STA number drop on the whole network:
Here are some log records:
May 24 19:30:22 AP-03 hostapd: moranet_wlan_5: WPA GMK rekeyd
May 24 19:30:22 AP-03 hostapd: moranet_wlan_5: WPA rekeying GTK
May 24 19:30:22 AP-03 hostapd: moranet_wlan_5: STA b8:3b:cc:c3:ca:4e WPA: sending 1/2 msg of Group Key Handshake
May 24 19:30:22 AP-03 hostapd: moranet_wlan_5: STA e0:cc:f8:73:53:8e WPA: sending 1/2 msg of Group Key Handshake
May 24 19:30:22 AP-03 hostapd: moranet_wlan_5: STA b8:3b:cc:c3:ca:4e WPA: EAPOL-Key timeout
May 24 19:30:22 AP-03 hostapd: moranet_wlan_5: STA b8:3b:cc:c3:ca:4e WPA: sending 1/2 msg of Group Key Handshake
May 24 19:30:22 AP-03 hostapd: moranet_wlan_5: STA e0:cc:f8:73:53:8e WPA: EAPOL-Key timeout
May 24 19:30:22 AP-03 hostapd: moranet_wlan_5: STA e0:cc:f8:73:53:8e WPA: sending 1/2 msg of Group Key Handshake
May 24 19:30:23 AP-03 hostapd: moranet_wlan_5: STA b8:3b:cc:c3:ca:4e WPA: EAPOL-Key timeout
May 24 19:30:23 AP-03 hostapd: moranet_wlan_5: STA b8:3b:cc:c3:ca:4e WPA: sending 1/2 msg of Group Key Handshake
May 24 19:30:23 AP-03 hostapd: moranet_wlan_5: STA e0:cc:f8:73:53:8e WPA: EAPOL-Key timeout
May 24 19:30:23 AP-03 hostapd: moranet_wlan_5: STA e0:cc:f8:73:53:8e WPA: sending 1/2 msg of Group Key Handshake
May 24 19:30:24 AP-03 hostapd: moranet_wlan_5: STA b8:3b:cc:c3:ca:4e WPA: EAPOL-Key timeout
May 24 19:30:24 AP-03 hostapd: moranet_wlan_5: STA b8:3b:cc:c3:ca:4e WPA: sending 1/2 msg of Group Key Handshake
May 24 19:30:24 AP-03 hostapd: moranet_wlan_5: STA e0:cc:f8:73:53:8e WPA: EAPOL-Key timeout
May 24 19:30:24 AP-03 hostapd: moranet_wlan_5: STA e0:cc:f8:73:53:8e WPA: sending 1/2 msg of Group Key Handshake
May 24 19:30:24 AP-03 hostapd: moranet_wlan_2n: WPA GMK rekeyd
May 24 19:30:24 AP-03 hostapd: moranet_wlan_2n: WPA rekeying GTK
May 24 19:30:24 AP-03 hostapd: moranet_wlan_2n: STA 72:6e:c8:65:42:dc WPA: sending 1/2 msg of Group Key Handshake
May 24 19:30:24 AP-03 hostapd: moranet_wlan_2n: STA 00:45:e2:6f:9f:e1 WPA: sending 1/2 msg of Group Key Handshake
May 24 19:30:24 AP-03 hostapd: moranet_wlan_2n: STA 18:65:90:d8:20:51 WPA: sending 1/2 msg of Group Key Handshake
May 24 19:30:24 AP-03 hostapd: moranet_wlan_2n: STA 00:45:e2:6f:9f:e1 IEEE 802.1X: did not Ack EAPOL-Key frame (broadcast index=0)
May 24 19:30:25 AP-03 hostapd: moranet_wlan_2n: STA 18:65:90:d8:20:51 WPA: received EAPOL-Key frame (2/2 Group)
May 24 19:30:25 AP-03 hostapd: moranet_wlan_2n: STA 18:65:90:d8:20:51 WPA: group key handshake completed (RSN)
May 24 19:30:25 AP-03 hostapd: moranet_wlan_2n: STA 72:6e:c8:65:42:dc WPA: EAPOL-Key timeout
May 24 19:30:25 AP-03 hostapd: moranet_wlan_2n: STA 72:6e:c8:65:42:dc WPA: sending 1/2 msg of Group Key Handshake
May 24 19:30:25 AP-03 hostapd: moranet_wlan_2n: STA 00:45:e2:6f:9f:e1 WPA: EAPOL-Key timeout
May 24 19:30:25 AP-03 hostapd: moranet_wlan_2n: STA 00:45:e2:6f:9f:e1 WPA: sending 1/2 msg of Group Key Handshake
May 24 19:30:25 AP-03 hostapd: moranet_wlan_2n: STA 00:45:e2:6f:9f:e1 IEEE 802.1X: did not Ack EAPOL-Key frame (broadcast index=0)
May 24 19:30:25 AP-03 hostapd: moranet_wlan_5: STA b8:3b:cc:c3:ca:4e WPA: EAPOL-Key timeout
May 24 19:30:25 AP-03 hostapd: moranet_wlan_5: STA b8:3b:cc:c3:ca:4e WPA: group key handshake failed (RSN) after 4 tries
May 24 19:30:25 AP-03 hostapd: moranet_wlan_5: STA b8:3b:cc:c3:ca:4e WPA: WPA_PTK: sm->Disconnect
May 24 19:30:25 AP-03 hostapd: moranet_wlan_5: AP-STA-DISCONNECTED b8:3b:cc:c3:ca:4e
May 24 19:30:25 AP-03 hostapd: moranet_wlan_5: STA b8:3b:cc:c3:ca:4e WPA: event 3 notification
May 24 19:30:25 AP-03 hostapd: moranet_wlan_5: STA b8:3b:cc:c3:ca:4e IEEE 802.1X: unauthorizing port
May 24 19:30:25 AP-03 hostapd: moranet_wlan_5: STA e0:cc:f8:73:53:8e WPA: EAPOL-Key timeout
May 24 19:30:25 AP-03 hostapd: moranet_wlan_5: STA e0:cc:f8:73:53:8e WPA: group key handshake failed (RSN) after 4 tries
May 24 19:30:25 AP-03 hostapd: moranet_wlan_5: STA e0:cc:f8:73:53:8e WPA: WPA_PTK: sm->Disconnect
There are multiple SSIDs. Some of them has 802.11r, some of them has not. Both cases the problem stands. The wpa_group_rekey parameter is the default and the wpa_disable_eapol_key_retries is 0.
I cannot clearly see the mechanism taking place every 24 hours on the simultaneously rebooted APs. But if there is any, was it a good workaround to develop a service, which delays the startup with a few random minutes?