Wifi WPA2-Enterprise EAP-TLS connection drops

Dear All,

My wifi is based on WPA2-Enterprise for many years now. My radius server is freeradius3-3.0.17 running on a pfSense firewall. My aim is to replace proprietary wifi access points by hardware running OpenWrt.

I did obtain a Linksys WRT3200ACM and installed OpenWrt 18.06.2 r7676-cddd7b4c77. Unlike with my current Lancom access points, WPA2-Enterprise EAP-TLS based wifi connections drop approximately hourly. The log contains the following information around the time of a drop:

Sun May 19 23:20:57 2019 daemon.info hostapd: wlan0: STA c0:ee:fb:e1:9a:ce IEEE 802.11: associated (aid 1)
Sun May 19 23:20:57 2019 daemon.notice hostapd: wlan0: CTRL-EVENT-EAP-STARTED c0:ee:fb:e1:9a:ce
Sun May 19 23:20:57 2019 daemon.notice hostapd: wlan0: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1
Sun May 19 23:20:57 2019 daemon.notice hostapd: wlan0: CTRL-EVENT-EAP-SUCCESS2 c0:ee:fb:e1:9a:ce
Sun May 19 23:20:57 2019 daemon.info hostapd: wlan0: STA c0:ee:fb:e1:9a:ce WPA: pairwise key handshake completed (RSN)
Sun May 19 23:20:57 2019 daemon.notice hostapd: wlan0: AP-STA-CONNECTED c0:ee:fb:e1:9a:ce
Sun May 19 23:20:57 2019 daemon.info hostapd: wlan0: STA c0:ee:fb:e1:9a:ce RADIUS: starting accounting session 81D6B10F3D57E7F7
Sun May 19 23:20:57 2019 daemon.info hostapd: wlan0: STA c0:ee:fb:e1:9a:ce IEEE 802.1X: authenticated - EAP type: 13 (TLS)
Sun May 19 23:20:57 2019 daemon.info hostapd: wlan0: STA c0:ee:fb:e1:9a:ce IEEE 802.11: authenticated
Sun May 19 23:21:00 2019 kern.debug kernel: [30744.824639] ieee80211 phy1: staid 3 deleted
Sun May 19 23:21:01 2019 daemon.info hostapd: wlan1: STA c0:ee:fb:e1:9a:ce RADIUS: stopped accounting session 92391275E0F3E5DA
Sun May 19 23:21:01 2019 daemon.info hostapd: wlan1: STA c0:ee:fb:e1:9a:ce IEEE 802.11: associated (aid 3)
Sun May 19 23:21:01 2019 daemon.notice hostapd: wlan1: CTRL-EVENT-EAP-STARTED c0:ee:fb:e1:9a:ce
Sun May 19 23:21:01 2019 daemon.notice hostapd: wlan1: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1
Sun May 19 23:21:01 2019 daemon.info hostapd: wlan1: STA c0:ee:fb:e1:9a:ce IEEE 802.11: authenticated
Sun May 19 23:21:01 2019 daemon.notice hostapd: wlan1: CTRL-EVENT-EAP-SUCCESS2 c0:ee:fb:e1:9a:ce
Sun May 19 23:21:01 2019 daemon.notice hostapd: wlan1: AP-STA-CONNECTED c0:ee:fb:e1:9a:ce
Sun May 19 23:21:01 2019 daemon.info hostapd: wlan1: STA c0:ee:fb:e1:9a:ce RADIUS: starting accounting session 92391275E0F3E5DA
Sun May 19 23:21:01 2019 daemon.info hostapd: wlan1: STA c0:ee:fb:e1:9a:ce IEEE 802.1X: authenticated - EAP type: 13 (TLS)
Sun May 19 23:21:01 2019 daemon.info hostapd: wlan1: STA c0:ee:fb:e1:9a:ce WPA: pairwise key handshake completed (RSN)
Sun May 19 23:21:05 2019 daemon.notice hostapd: wlan1: CTRL-EVENT-EAP-STARTED b4:f7:a1:e7:29:e4
Sun May 19 23:21:05 2019 daemon.notice hostapd: wlan1: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1
Sun May 19 23:21:05 2019 daemon.notice hostapd: wlan1: CTRL-EVENT-EAP-SUCCESS2 b4:f7:a1:e7:29:e4
Sun May 19 23:21:05 2019 daemon.info hostapd: wlan1: STA b4:f7:a1:e7:29:e4 IEEE 802.1X: authenticated - EAP type: 13 (TLS)
Sun May 19 23:21:05 2019 daemon.info hostapd: wlan1: STA b4:f7:a1:e7:29:e4 WPA: pairwise key handshake completed (RSN)
Sun May 19 23:23:33 2019 daemon.notice hostapd: wlan1: CTRL-EVENT-EAP-STARTED e0:b9:a5:d2:ea:1b
Sun May 19 23:23:33 2019 daemon.notice hostapd: wlan1: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1
Sun May 19 23:23:36 2019 daemon.notice hostapd: wlan1: CTRL-EVENT-EAP-RETRANSMIT2 e0:b9:a5:d2:ea:1b
Sun May 19 23:23:36 2019 daemon.notice hostapd: wlan1: CTRL-EVENT-EAP-SUCCESS2 e0:b9:a5:d2:ea:1b
Sun May 19 23:23:36 2019 daemon.info hostapd: wlan1: STA e0:b9:a5:d2:ea:1b IEEE 802.1X: authenticated - EAP type: 13 (TLS)
Sun May 19 23:23:36 2019 daemon.info hostapd: wlan1: STA e0:b9:a5:d2:ea:1b WPA: pairwise key handshake completed (RSN)

wlan0 is 802.11nac and wlan1 is 802.11bgn. The SSID is the same for both frequencies. That is intended as the user should not have to bother about that when deciding to use a particular network. With Lancom devices, that is commonly no problem. The even have a "band steering" mechanism pushing a client towards the more appropriate frequency.

The connection drop seems to be a consquence of switching between wlan0 and wlan1. Upon the drop, the client cannot ping the OpenWrt access point, the router or anything else. The only solution is to disconnect wifi and connect it again manually on the client side. Nevertheless, it does look like the re-authentication would happen quickly (approximately four seconds?).

Can someone detect something that can be improved in order to avoid the drops, please?

Regards,

Michael Schefczyk

First, I had what seems the same issue for years till it got so annoying I tracked it down.

The result of that was surprising: There can be only few WPA-EAP installations using linux with hostapd without freezes at the 1h mark if there is traffic at that time:
Hostapd is automatically rekeying the PTK key when using EAP after 1h. And PTK rekeying can best be described as "mostly broken". (I found much more broken devices/drivers than working ones! And not only for Linux... long story.)

So when this interval is not manually set WPA-EAP will rekey the PTK key every hour while WPA-PSK will just continue using the initial key. Therefore nearly all setups will work fine with WPA-PSK but get 1h long connection freezes or reconnects after 1h when you use EAP.

Disable PTK rekeying for EAP - eap_reauth_period=0 in hostapd.conf - should solve the issues you have. On my Openwrt AP (now tracking git master) I set it via:

uci set wireless.wifinet0.eap_reauth_period=0
uci commit
wifi

My AP can now rekey the PTK correctly, but simply all Android devices I have are not. Therefore I still have either very high rekey intervals or disable it.

If this seems to be not working double check that the key is indeed not rekeyed. (Normally it's triggered by the AP, but a STA can also request it.)

PTK rekeying relly is something you better avoid, even when using APs able to handle it. I got it working with ath9k and iwlwifi but both the AP and the STAs must use mac80211 from >= linux 4.20.

If you want to know more:
Here is the intro to patch set which got that from "mostly broken" to "mostly working" for mac80211 drivers:
https://marc.info/?l=linux-wireless&m=153572160619006&w=2
Hostapd/wpa_supplicant patches to complement the patch above are not yet available.

2 Likes

Indeed, surprising result. The issue that I had for months got solved by disabling rekeying. Just took very long to find this article. So 2 questions remaining:

Enjoying a connection which is already stable for 48. This never happened before.

I wouldn't expect that much activity for mwlwifi anymore…

Be aware that regular rekeying intervals are an important security feature, disabling it (or extending it too much) isn't a good idea.

@TurboWrt
The main issue are all the broken devices/drivers out in the wild. And since frequent pairwise rekeys are comparable rare most developers are not aware about the issue or care much about it.

The best (short of amending IEEE-802.11) solution seems to be to enforce a full disconnect/reconnect instead of replacing only the key. And this may well take > 10s and would also hit the ones using "correct" drivers/devices or we are not able to prevent the issue in all cases. While that would be an improvement most users will consider that an bug, too. (I'm planning to release some code into that direction, but there is no timeline or guarantee it will be accepted in hostapd/wpa_supplicant.)

Getting the issue better documented, reach developers and the users affected by rekey problems and start disable it in Openwrt by default for EAP - is probably the best action for now.

@slh
The issue is pretty generic and not restricted to mwlwifi. The same is true for the "fix": Looks like it will just disconnect a STA when either end of the connection tries to rekey.

But the best workaround when you know you have the issue is for now still disabling pairwise rekeys.

While this should degrade the WLAN security - at least for EAP where it's more often enabled by default - it really depends on the cards/drivers you are using.

Let's look at the broader picture:

  • Most Wlans are not rekeying the pairwise key (PTK) at all. It only changes when you reconnect somehow. This is still considered sufficient for basically all users. (It may not when you are targeted by the NSA for some reason. But brute forcing a PSK handed out by a radius server will not be simple/cheap and only reveal the traffic of one connection.)

Of course rekeying and using the keys e.g. only 1h should make it much harder/expensive to crack the PSK... but only when we rekey "correctly". ath9k with a kernel < 4.20 is all but broadcasting the PSK for any attacker listening in when you enable it - when you are (un)lucky and with my limited understanding of the cryptography.

  • At least ath9k (and probably also ath5k/ath6k) with mac80211 from a kernel < 4.20 is not stopping Tx when replacing the key in the card. The card therefore happily sends out cleartext frames after the old key has been deleted but prior to the new key being installed. I have captures with retransmits of one MPDU where the retransmit is clear text. Only the encryption is missing all other fields including the PN are set up for crypto. That should be sufficient to calculate the PTK with next to no efforts, allowing the attacker to decode any MPDUs send till the rekey. Now other drivers are "save", mostly what seems to be design happenstance. (They still may cause freezes when using/supporting A-MPDU.)
    Luckily TKIP is already considered as insecure, since the "attack detection -> pairwise rekey" response is exactly what an attacker needs to "time" the rekey to increase the chance to get "interesting" packets.
    Regardless if you have pairwise rekey enabled or not...

So my conclusion here is, that using pairwise rekey with a card which has not been "cleared" for that is way more dangerous than just not rekey the pairwise key at all. And if you have not disables TKIP it's one more reason to do so...

And trust me: Having every hour an (good) chance to "freeze" the Wlan connection, forcing a manual reconnect is getting annoying fast even as a private user. Having that in a Campus/Company WLAN sounds like a nightmare.