WDS repeater instability

Hello,
I needed to cover bigger area with my wifi so I set up a repeater following guide on https://openwrt.org/docs/guide-user/network/wifi/wifiextenders/wds.
My main AP is Xiaomi Redmi Router AX6S running OpenWrt 23.05.4 r24012-d8dd03c46f / LuCI openwrt-23.05 branch git-24.086.45142-09d5a38. The WDS client is TP-Link Archer C7 v2 with OpenWrt 23.05.3 r23809-234f1a2efa / LuCI openwrt-23.05 branch git-24.073.29889-cd7e519. Client shows snr -75/-93 dBm.

The setup works fine but I'm facing random disconnects which usually does not recover without manual intervence. When I find the client disconnected it helps to restart AP wifi interface or the client. Both ends in immediate reconnect and normal function.
I have also caught a state where the client was still connected, but the connection was very poor (more than 50% ping packet loss). Again restarting AP wifi iface fixed it immediately.

These disconnects happens quite rarely about once per one or two days and it's really annoying. Mainly the fact that it needs the restart.

Is there anything I can setup better? I have almost all settings at defaults, just the wds ap and client from the wiki guide. There is nothing special in the logs.

best regards
Jan

-75 dBm is a very low signal. Can you try to move the AP a bit closer to the main router? Like a few meters closer.

Another test you can make is to use the other radio as backhaul. It seems you're using 2.4GHz, you can do a test using the 5GHz radio, althoug for such low signal levels, the 5GHz can be worse. But give it a try.

I have WDS working fine at my GF's apartment between a Redmi AC2100 as main router and a Xiaomi Mi 4A 100M as AP and using the 5GHz radio.

I hope you can solve the situation.

Oh, and 23.05.4 had some throughput issues that were fixed on 23.05.5. Or by downgrading to 23.05.3. You can update both routers first.

-75 dBm is a very low signal. Can you try to move the AP a bit closer to the main router? Like a few meters closer.

not an option, unfortunately :frowning:

Another test you can make is to use the other radio as backhaul. It seems you're using 2.4GHz, you can do a test using the 5GHz radio, althoug for such low signal levels, the 5GHz can be worse. But give it a try.

you are right, it is too far for 5GHz, no signal

Oh, and 23.05.4 had some throughput issues that were fixed on 23.05.5. Or by downgrading to 23.05.3. You can update both routers first.

I had there an older fw until few days ago and the situation was the same

I would understand if there are connectivity issues because low SNR, what makes me mad is the fact that after iface reset it works like a charm for some time...

Please share you uci configurations and also try to check cpu usage, memory usage and load average when the issue occurs (specially in the secondary AP since it seems to be the problem).

it helps to restart AP wifi interface or the client

How do you restart it? (just the AP and not the whole radio?)

Also check the output of "logread" and "dmesg" when this happens.

AP configuration:

package wireless

config wifi-device 'radio0'
	option type 'mac80211'
	option path 'platform/18000000.wmac'
	option channel '1'
	option band '2g'
	option htmode 'HT20'
	option cell_density '0'
	option country 'CZ'

config wifi-iface 'default_radio0'
	option device 'radio0'
	option network 'lan'
	option mode 'ap'
	option ssid 'puda'
	option encryption 'sae-mixed'
	option key 'xxx'
	option wds '1'
	option ieee80211r '1'
	option ft_over_ds '0'

client configuration

config wifi-device 'radio1'
	option type 'mac80211'
	option hwmode '11g'
	option log_level '2'
	option country 'CZ'
	option path 'platform/ahb/18100000.wmac'
	option cell_density '0'
	option htmode 'HT20'
	option channel '1'

config wifi-iface 'wifinet2'
	option device 'radio1'
	option mode 'sta'
	option ssid 'puda'
	option encryption 'sae-mixed'
	option key 'xxx'
	option wds '1'
	option network 'lan'

config wifi-iface 'wifinet3'
	option device 'radio1'
	option mode 'ap'
	option encryption 'psk-mixed'
	option network 'lan'
	option ssid 'puda'
	option key 'xxx'
	option ieee80211r '1'
	option ft_over_ds '0'
	option ft_psk_generate_local '1'

How do you restart it? (just the AP and not the whole radio?)

restarting of the AP radio is enough to get back to working state, I think I haven't tried to restart the client radio only, because it was easier to just reboot it, which helps as well.

I have not noticed anything weird in logread
ap: logread | grep client mac

Wed Nov  6 20:40:16 2024 daemon.info hostapd: wl0-ap0: STA d2:6e:0e:28:e7:ae WPA: group key handshake completed (RSN)
Thu Nov  7 20:40:18 2024 daemon.info hostapd: wl0-ap0: STA d2:6e:0e:28:e7:ae WPA: group key handshake completed (RSN)
Thu Nov  7 22:00:52 2024 daemon.info hostapd: wl0-ap0: STA d2:6e:0e:28:e7:ae IEEE 802.11: associated (aid 1)
Thu Nov  7 22:00:52 2024 daemon.info hostapd: wl0-ap0: STA d2:6e:0e:28:e7:ae IEEE 802.11: associated (aid 1)
Thu Nov  7 22:00:53 2024 daemon.info hostapd: wl0-ap0: STA d2:6e:0e:28:e7:ae WPA: pairwise key handshake completed (RSN)
Thu Nov  7 22:00:53 2024 daemon.notice hostapd: wl0-ap0: EAPOL-4WAY-HS-COMPLETED d2:6e:0e:28:e7:ae
Fri Nov  8 07:19:23 2024 daemon.info hostapd: wl0-ap0: STA d2:6e:0e:28:e7:ae IEEE 802.11: associated (aid 1)
Fri Nov  8 07:19:23 2024 daemon.info hostapd: wl0-ap0: STA d2:6e:0e:28:e7:ae IEEE 802.11: associated (aid 1)
Fri Nov  8 07:19:24 2024 daemon.info hostapd: wl0-ap0: STA d2:6e:0e:28:e7:ae WPA: pairwise key handshake completed (RSN)
Fri Nov  8 07:19:24 2024 daemon.notice hostapd: wl0-ap0: EAPOL-4WAY-HS-COMPLETED d2:6e:0e:28:e7:ae
Fri Nov  8 20:40:19 2024 daemon.info hostapd: wl0-ap0: STA d2:6e:0e:28:e7:ae WPA: group key handshake failed (RSN) after 4 tries
Fri Nov  8 20:40:19 2024 daemon.notice hostapd: wl0-ap0: AP-STA-DISCONNECTED d2:6e:0e:28:e7:ae
Fri Nov  8 20:40:24 2024 daemon.info hostapd: wl0-ap0: STA d2:6e:0e:28:e7:ae IEEE 802.11: deauthenticated due to local deauth request
Fri Nov  8 20:40:24 2024 daemon.err hostapd: nl80211: NL80211_ATTR_STA_VLAN (addr=d2:6e:0e:28:e7:ae ifname=wl0-ap0 vlan_id=0) failed: -2 (No such file or directory)
Fri Nov  8 20:40:25 2024 daemon.notice hostapd: wl0-ap0: WDS-STA-INTERFACE-REMOVED ifname=wl0-ap0.sta1 sta_addr=d2:6e:0e:28:e7:ae
Fri Nov  8 20:43:35 2024 daemon.info hostapd: wl0-ap0: STA d2:6e:0e:28:e7:ae IEEE 802.11: associated (aid 1)
Fri Nov  8 20:43:35 2024 daemon.notice hostapd: wl0-ap0: WDS-STA-INTERFACE-ADDED ifname=wl0-ap0.sta1 sta_addr=d2:6e:0e:28:e7:ae
Fri Nov  8 20:43:36 2024 daemon.notice hostapd: wl0-ap0: AP-STA-CONNECTED d2:6e:0e:28:e7:ae auth_alg=sae
Fri Nov  8 20:43:36 2024 daemon.info hostapd: wl0-ap0: STA d2:6e:0e:28:e7:ae WPA: pairwise key handshake completed (RSN)
Fri Nov  8 20:43:36 2024 daemon.notice hostapd: wl0-ap0: EAPOL-4WAY-HS-COMPLETED d2:6e:0e:28:e7:ae

client: logread | grep ap mac

Fri Nov  8 07:19:20 2024 daemon.notice wpa_supplicant[1547]: phy1-sta0: CTRL-EVENT-DISCONNECTED bssid=5c:02:14:b0:18:ca reason=4 locally_generated=1
Fri Nov  8 07:19:22 2024 daemon.notice wpa_supplicant[1547]: phy1-sta0: SME: Trying to authenticate with 5c:02:14:b0:18:ca (SSID='puda' freq=2412 MHz)
Fri Nov  8 07:19:22 2024 kern.info kernel: [175337.106920] phy1-sta0: authenticate with 5c:02:14:b0:18:ca
Fri Nov  8 07:19:22 2024 kern.info kernel: [175337.143991] phy1-sta0: send auth to 5c:02:14:b0:18:ca (try 1/3)
Fri Nov  8 07:19:22 2024 daemon.notice wpa_supplicant[1547]: phy1-sta0: SME: Trying to authenticate with 5c:02:14:b0:18:ca (SSID='puda' freq=2412 MHz)
Fri Nov  8 07:19:22 2024 kern.info kernel: [175337.249623] phy1-sta0: authenticate with 5c:02:14:b0:18:ca
Fri Nov  8 07:19:22 2024 kern.info kernel: [175337.255431] phy1-sta0: send auth to 5c:02:14:b0:18:ca (try 1/3)
Fri Nov  8 07:19:22 2024 daemon.notice wpa_supplicant[1547]: phy1-sta0: Trying to associate with 5c:02:14:b0:18:ca (SSID='puda' freq=2412 MHz)
Fri Nov  8 07:19:22 2024 kern.info kernel: [175337.274699] phy1-sta0: associate with 5c:02:14:b0:18:ca (try 1/3)
Fri Nov  8 07:19:22 2024 kern.info kernel: [175337.283139] phy1-sta0: RX AssocResp from 5c:02:14:b0:18:ca (capab=0x431 status=30 aid=1)
Fri Nov  8 07:19:22 2024 kern.info kernel: [175337.291542] phy1-sta0: 5c:02:14:b0:18:ca rejected association temporarily; comeback duration 1000 TU (1024 ms)
Fri Nov  8 07:19:23 2024 kern.info kernel: [175338.345130] phy1-sta0: associate with 5c:02:14:b0:18:ca (try 2/3)
Fri Nov  8 07:19:23 2024 kern.info kernel: [175338.454649] phy1-sta0: associate with 5c:02:14:b0:18:ca (try 3/3)
Fri Nov  8 07:19:23 2024 kern.info kernel: [175338.523529] phy1-sta0: RX AssocResp from 5c:02:14:b0:18:ca (capab=0x431 status=0 aid=1)
Fri Nov  8 07:19:23 2024 daemon.notice wpa_supplicant[1547]: phy1-sta0: Associated with 5c:02:14:b0:18:ca
Fri Nov  8 07:19:24 2024 daemon.notice wpa_supplicant[1547]: phy1-sta0: WPA: Key negotiation completed with 5c:02:14:b0:18:ca [PTK=CCMP GTK=CCMP]
Fri Nov  8 07:19:24 2024 daemon.notice wpa_supplicant[1547]: phy1-sta0: CTRL-EVENT-CONNECTED - Connection to 5c:02:14:b0:18:ca completed [id=0 id_str=]
Fri Nov  8 20:43:34 2024 kern.info kernel: [223588.732413] phy1-sta0: deauthenticating from 5c:02:14:b0:18:ca by local choice (Reason: 2=PREV_AUTH_NOT_VALID)
Fri Nov  8 20:43:34 2024 daemon.notice wpa_supplicant[1547]: phy1-sta0: CTRL-EVENT-DISCONNECTED bssid=5c:02:14:b0:18:ca reason=2 locally_generated=1
Fri Nov  8 20:43:35 2024 daemon.notice wpa_supplicant[1547]: phy1-sta0: SME: Trying to authenticate with 5c:02:14:b0:18:ca (SSID='puda' freq=2412 MHz)
Fri Nov  8 20:43:35 2024 kern.info kernel: [223590.043410] phy1-sta0: authenticate with 5c:02:14:b0:18:ca
Fri Nov  8 20:43:35 2024 kern.info kernel: [223590.080446] phy1-sta0: send auth to 5c:02:14:b0:18:ca (try 1/3)
Fri Nov  8 20:43:35 2024 daemon.notice wpa_supplicant[1547]: phy1-sta0: SME: Trying to authenticate with 5c:02:14:b0:18:ca (SSID='puda' freq=2412 MHz)
Fri Nov  8 20:43:35 2024 kern.info kernel: [223590.199606] phy1-sta0: authenticate with 5c:02:14:b0:18:ca
Fri Nov  8 20:43:35 2024 kern.info kernel: [223590.205330] phy1-sta0: send auth to 5c:02:14:b0:18:ca (try 1/3)
Fri Nov  8 20:43:35 2024 daemon.notice wpa_supplicant[1547]: phy1-sta0: Trying to associate with 5c:02:14:b0:18:ca (SSID='puda' freq=2412 MHz)
Fri Nov  8 20:43:35 2024 kern.info kernel: [223590.253992] phy1-sta0: associate with 5c:02:14:b0:18:ca (try 1/3)
Fri Nov  8 20:43:35 2024 kern.info kernel: [223590.315059] phy1-sta0: RX AssocResp from 5c:02:14:b0:18:ca (capab=0x431 status=0 aid=1)
Fri Nov  8 20:43:35 2024 daemon.notice wpa_supplicant[1547]: phy1-sta0: Associated with 5c:02:14:b0:18:ca
Fri Nov  8 20:43:35 2024 daemon.notice wpa_supplicant[1547]: phy1-sta0: WPA: Key negotiation completed with 5c:02:14:b0:18:ca [PTK=CCMP GTK=CCMP]
Fri Nov  8 20:43:35 2024 daemon.notice wpa_supplicant[1547]: phy1-sta0: CTRL-EVENT-CONNECTED - Connection to 5c:02:14:b0:18:ca completed [id=0 id_str=]

I'll try to get the rest of info during failed state, if I'm able to catch it again...

Could you try disabling 802.11r/FT options? Also I find you have some interfaces with .encryption to 'psk-mixed' and other with 'sae-mixed'. Which are your actually needs for authentication/encription?

I know that it is not the solution but in this type of case what can be done in the while is to look for some monitorable parameter to detect that the problem is occurring so that at least we can leave a script to "recover" the system automatically. For example, in your case you restart the radio, it would be a script to restart the radio when it "detects" that the problem is occurring.

That's why I suggest you look at ram, flash, % cpu, load average to check if there are abnormalities.

That combined with the ping test you do, and/or anything else more specific, could be taken into account to make the script as accurate as possible and only reset the radio when truly necessary.

This is very likely the cause of the problem.
Remember that wifi uses microwave frequencies. There is a reason that radar also uses microwave - because you get well defined reflections. This is great for radar, but bad for wifi.

Typically, especially indoors, you will get many reflections, giving rise to multiple paths and both constructive and destructive interference. To make things worse, these reflections can and will be dynamic in nature. eg people walking round, opening and closing doors, your cat walking up the stairs etc. etc....

With an average signal strength of only -75dbM, your cat could make this drop to below the noise level for a few seconds....

Yes, relocating both sides of the WDS link to be closer together will often fix the problem, but if you cannot do this, just moving a few centimetres, even further away or side to side, might be enough (because the wavelengths involved are also centimetres).

I will try to disable fast roaming, let's see.

But maybe this is the issue
I noticed that the client log is filled with daemon.notice wpa_supplicant[1547]: phy1-sta0: CTRL-EVENT-BEACON-LOSS when the connection is broken.

CPU and flash load seems to be very low in all cases.

I know the signal is weak. I would understand if there are connection drops.
Proble is that it won't recover without manual intervence.

Also my notebook is connected to my AP from the same place with no issues...

You can use watchcat to monitor the link and restart the network if necessary.

Can you dump the dmesg of 2nd router after the connection drops?

you may be getting errors like wmac error

There are a few things you could try....

  1. Both your WDS AP and Client devices have moveable antennas. Try re-orienting the antennas on both devices to see if you can get a better signal between the devices. Usually in almost all cases playing with the antennas improves signal strength considerably. If your WDS client is on the upper floor while your WDS AP is on the ground floor you would get much better signal rates if you lie the antennas horizontal.

  2. Purchase another AP and dedicate it to the backhaul (or as a WiFi access point for your client devices). This will mean you will have 2 x AP on the remote side...1 x AP for managing WiFi device connections and the other specifically for the wireless backhaul. When placing 2 x AP next to each other careful consideration must be taken to prevent channel interference.

  3. Same of the above but separate APs entirely dedicated to the backhaul with additional APs connected by LAN to the backhaul APs purely to handle client device connections. This is how I have configured my setup at my home and everything runs smooth and reliably. I have my main AP gateway and the WiFi is configured for client device connections while connected by LAN is another AP running WDS AP and whose only function is to provide the backhaul services. I have this dual router arrangement in two other locations around my house. So in total I have 6 x routers = 12 radios ( i.e. 6 x 2.4Ghz and 6 x 5GHz). I found PingTools on my android phone critical in helping me analyze the airwaves and selecting which frequencies those 12 radios use. This setup supports not only multiple mobile client devices but also multiple smarts TVs running streaming services simultaneously.