[SOLVED] WDS client won't stay connected (PREV_AUTH_NOT_VALID) using recent snapshot builds

Yes I've experienced this and have been investigating. I have tried several devices both 2 GHz and 5 GHz, ath9k and ath10k, and MediaTek. I don't think it is directly related to the wifi system at all.

Set log_level to 0 on both devices (in the radio section) and some more details become apparent.

The failure occurs as the AP sends message 1 of the EAPOL (4-way handshake) but the client does not acknowledge it. The AP repeats the message 3 times, then retries out and disconnects the client. The next packet from the client is refused with PREV_AUTH_NOT_VALID. The client then waits for a probationary time then restarts the connection from scratch, and it again fails.

EAPOL is used in a PSK system to check that both sides are using the same preshared key, and then it generates a session key which is used for the AES encryption of data. (Interestingly, both "authentication" and "association" do not depend on the PSK at all. It is the third stage, "authorization" where the AP does some cryptography to actually decide whether to allow the client access to the network.)

On the client, using tcpdump on wlan0 shows the EAPOL messages were received through the radio. They should next pass through the bridge that wlan0 is part of for userspace processing by wpa_supplicant. However tcpdump on br-wan (in my case I'm using the WDS connection as part of the WAN network) will show no EAPOL packets received.

With a firmware that works, EAPOL packets are received on br-wan and the key negotiation and the connection completes. I believe that wpa_supplicant is listening to br-wan and the problem is the packets are not passing through, and what @ratking said about it being a problem with the kernel bridge is correct.

This is likely only to happen if the link is encrypted with PSK. I have not actually tried (in retrospect, obviously I should) but I suspect that an unencrypted WDS connection will complete and operate normally.

1 Like

The CT drivers can be made very "chatty" (to the point of overwhelming the logs). See, for example

https://www.candelatech.com/ath10k-bugs.php

https://www.candelatech.com/ath10k-ug.php

The /etc/modules[-boot].d/ approach described there works on OpenWrt as well

Ok, I've tried using this radio to connect to a WDS AP on a C7v2 running 18.06.2. Pretty much the same behavior described above with the integrated radio.

The only obvious difference is that both authentication and association sometimes take two or three tries. I'm guessing the retries are due to contention and interference from nearby 5GHz networks on the higher frequency channels I'm using.

1 Like

Thanks for checking, that tells me that is is unlikely the IPQ4019 itself. That seems to support mk24's hypothesis that it is not a hardware-specific issue.

As an experiment, I tried connecting to my C2600 AP as a (non-WPS) client using IPQ4019 radio.

It seems to be working fine.

I think this is likely the same bug described in FS#2286 and in another forum thread here. If it is, I've also been able to solve it in a local build using the patch that @ratking posted above.

1 Like

Comments on the PR, if you can confirm testing and resolution, might speed core-dev examination and acceptance

I can also confirm that the updated kernel patch fixes it.

Apparently something has changed in the kernel data structures so that the previous OpenWrt version of the patch no longer works.

I confirm same problem , tested with WR1200JS and XIAOMI R3G:

with one box openwrt 4.14.102 and the other box in 4.14.121 no problem

with openwrt 4.14.121 and 2 * box i have this in WDS connection:

[ 96.503771] wlan1: send auth to d4:5f:25:eb:09:82 (try 1/3)
[ 96.517956] wlan1: authenticated
[ 96.526209] wlan1: associate with d4:5f:25:eb:09:82 (try 1/3)
[ 97.086156] wlan1: associate with d4:5f:25:eb:09:82 (try 2/3)
[ 97.200919] wlan1: RX AssocResp from d4:5f:25:eb:09:82 (capab=0x11 status=0 aid=1)
[ 97.208706] wlan1: associated
[ 101.312913] wlan1: deauthenticated from d4:5f:25:eb:09:82 (Reason: 2=PREV_AUTH_NOT_VALID)

I'll check it asap.

@anon69880279

Do you mind if I use your log inside the commit message?

Thanks

Fixes are merged into my staging tree. (master & 18.06 branches)
Could someone please build/test it and report back?

Thanks all :slight_smile:

4 Likes

I don't have the time and attention to get things set up so I can do a build right now, but if someone builds a firmware with the patched kernel for the ea8300 and shares it I can flash it and report back.

The @xback build worked. Test system is a MTC WR1201 (MT7621 / MT7602 based) tested on 2 GHz.

1 Like

1 - git clone … openwrt.git
2 - scripts/feeds update -a & install -a
3 - get xback-9dd6b51.tar.gz , extract & overwrite openwrt forder
4 - compilation with my old .config for WR1200JS

update firmware ( 2 * WR1200JS master and client WDS ) and no problem exactly as with x-wrt firmware

this patch solved part of the problem, without it the devices would associate and drop, now they stay associated but still no data pass through. built r10011. AP side log, there are still key problems:

Thu Jun  6 19:38:43 2019 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED 18:e8:29:30:9b:2c
Thu Jun  6 19:38:43 2019 kern.info kernel: [  625.840896] device wlan0.sta1 left promiscuous mode
Thu Jun  6 19:38:43 2019 kern.info kernel: [  625.846146] br-lan: port 3(wlan0.sta1) entered disabled state
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  625.854349] ------------[ cut here ]------------
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  625.859253] WARNING: CPU: 0 PID: 1417 at backports-4.19.32-1/net/mac80211/key.c:907 ieee80211_free_keys+0x170/0x228 [mac80211]
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  625.870871] Modules linked in: ath9k ath9k_common pppoe ppp_async ath9k_hw ath10k_pci ath10k_core ath pppox ppp_generic mac80211 iptable_nat iptable_mangle iptable_filter ipt_REJECT ipt_MASQUERADE ip_tables cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD x_tables thermal_sys slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_log_common nf_flow_table_hw nf_flow_table nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack hwmon crc_ccitt compat ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  625.930674] CPU: 0 PID: 1417 Comm: hostapd Not tainted 4.14.118 #0
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  625.936958] Stack : 804d0000 80489150 00000000 00000000 80460fc0 82e79a9c 82e9d35c 804b1307
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  625.945494]         8045d190 00000589 80603670 0000038b 804838a0 00000001 82e79a50 688195c4
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  625.954197]         00000000 00000000 80600000 00003dd0 00000000 00000000 00000008 00000000
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  625.962745]         000000c8 d9e4e916 000000c7 00000000 80000000 00000000 83266130 83234cfc
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  625.971247]         00000009 0000038b 804838a0 00000100 00000001 8026bd44 00000000 80600000
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  625.979754]         ...
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  625.982249] Call Trace:
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  625.984764] [<8006a9ec>] show_stack+0x58/0x100
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  625.989305] [<80085080>] __warn+0xe4/0x118
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  625.993492] [<80085144>] warn_slowpath_null+0x1c/0x28
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  625.998793] [<83234cfc>] ieee80211_free_keys+0x170/0x228 [mac80211]
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  626.005308] [<8321611c>] ieee80211_ibss_leave+0xa70/0x1940 [mac80211]
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  626.011970] [<802f4998>] rollback_registered_many+0x2dc/0x414
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  626.017813] [<802f60b0>] unregister_netdevice_queue+0x94/0xec
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  626.023762] [<8321fd8c>] ieee80211_nan_func_match+0x2894/0x29a0 [mac80211]
Thu Jun  6 19:38:43 2019 kern.warn kernel: [  626.030795] ---[ end trace 5309fee2cf0ee39d ]---
Thu Jun  6 19:38:43 2019 daemon.notice hostapd: wlan0: WDS-STA-INTERFACE-REMOVED ifname=wlan0.sta1 sta_addr=18:e8:29:30:9b:2c
Thu Jun  6 19:38:48 2019 daemon.info hostapd: wlan0: STA 18:e8:29:30:9b:2c IEEE 802.11: authenticated
Thu Jun  6 19:38:48 2019 daemon.info hostapd: wlan0: STA 18:e8:29:30:9b:2c IEEE 802.11: associated (aid 1)
Thu Jun  6 19:38:48 2019 kern.info kernel: [  631.114522] br-lan: port 3(wlan0.sta1) entered blocking state
Thu Jun  6 19:38:48 2019 kern.info kernel: [  631.120364] br-lan: port 3(wlan0.sta1) entered disabled state
Thu Jun  6 19:38:48 2019 kern.info kernel: [  631.126603] device wlan0.sta1 entered promiscuous mode
Thu Jun  6 19:38:48 2019 daemon.notice hostapd: wlan0: WDS-STA-INTERFACE-ADDED ifname=wlan0.sta1 sta_addr=18:e8:29:30:9b:2c
Thu Jun  6 19:38:48 2019 daemon.err hostapd: Could not set interface wlan0.sta1 flags (UP): Invalid argument
Thu Jun  6 19:38:48 2019 daemon.err hostapd: nl80211: Failed to set WDS STA interface wlan0.sta1 up
Thu Jun  6 19:38:48 2019 daemon.err hostapd: nl80211: NL80211_ATTR_STA_VLAN (addr=18:e8:29:30:9b:2c ifname=wlan0.sta1 vlan_id=0) failed: -127 (Network is down)
Thu Jun  6 19:38:48 2019 daemon.notice hostapd: wlan0: AP-STA-CONNECTED 18:e8:29:30:9b:2c
Thu Jun  6 19:38:48 2019 daemon.info hostapd: wlan0: STA 18:e8:29:30:9b:2c WPA: pairwise key handshake completed (RSN)

interesting thing is that i was able to bridge Archer C7 v1's 5GHz using tp-link firmware without any problems even before applied this patch

Which hardware is this? OpenWrt / ath10k will not work on Archer C7v1. That first version of the chip has silicon bugs that the developers were unwilling to work around, so they abandoned it.

The bridge patch only affects clients. The unpatched kernel still worked on APs.

While dusty memories, I still recall being very happy that I didn't buy a C7v1 when they were introduced because of the lack of open-source driver support. As far as I know, the C7V1's chip revision still isn't (and never will be) supported for 5 GHz.

Litebeam AC Gen2, Archer was running tp-link firmware when the WDS bridge was established with one of these.

Set log_level to 0 on the AP there may be more information.

It looks like this is an ath10k bug in AP mode that doesn't belong in this thread.