WiFi on WDR3600 unstable after 2-3 Days

Hello LEDE Community,

I'am using two TP-LINK WDR3600 (one per floor). After around 2 days the WiFi of the devices start to get unstable. mobile devies don't get any data troughput, connection drops, are not able to connect to WiFi...

I also have a TP-Link CPE210 acting as a Client which syslog starts spamming messages like this:

Tue Jan 3 22:18:51 2017 kern.info kernel: [40666.993067] wlan0: authenticate with a0:...
Tue Jan 3 22:18:51 2017 kern.info kernel: [40667.015504] wlan0: send auth to a0:... (try 1/3)
Tue Jan 3 22:18:51 2017 kern.info kernel: [40667.132279] wlan0: send auth to a0:... (try 2/3)
Tue Jan 3 22:18:52 2017 kern.info kernel: [40667.242280] wlan0: send auth to a0:... (try 3/3)
Tue Jan 3 22:18:52 2017 kern.info kernel: [40667.354084] wlan0: authentication with a0:... timed out

Tue Jan 3 22:18:50 2017 kern.info kernel: [40665.373188] wlan0: authenticate with 64:...
Tue Jan 3 22:18:50 2017 kern.info kernel: [40665.395635] wlan0: send auth to 64:... (try 1/3)
Tue Jan 3 22:18:50 2017 kern.info kernel: [40665.472852] wlan0: send auth to 64:... (try 2/3)
Tue Jan 3 22:18:50 2017 kern.info kernel: [40665.588255] wlan0: send auth to 64:... (try 3/3)
Tue Jan 3 22:18:50 2017 kern.info kernel: [40665.679647] wlan0: authentication with 64:... timed out

The a0:... MAC ist the WDR3600 on the down floor, 64:... the one on the upper Floor. syslog of WDR3600 on the down floor starts spamming entries like:

Wed Jan 4 03:18:43 2017 daemon.info hostapd: wlan0: STA 84:... WPA: group key handshake completed (RSN)
Wed Jan 4 03:28:43 2017 daemon.info hostapd: wlan0: STA 84:... WPA: group key handshake completed (RSN)
Wed Jan 4 03:38:46 2017 daemon.info hostapd: wlan0: STA 84:... WPA: group key handshake completed (RSN)
Wed Jan 4 03:48:43 2017 daemon.info hostapd: wlan0: STA 84:... WPA: group key handshake completed (RSN)
Wed Jan 4 03:58:43 2017 daemon.info hostapd: wlan0: STA 84:... WPA: group key handshake completed (RSN)
Wed Jan 4 04:08:43 2017 daemon.info hostapd: wlan0: STA 84:... WPA: group key handshake completed (RSN)
Wed Jan 4 04:18:43 2017 daemon.info hostapd: wlan0: STA 84:... WPA: group key handshake completed (RSN)
Wed Jan 4 04:28:43 2017 daemon.info hostapd: wlan0: STA 84:... WPA: group key handshake completed (RSN)

84:.. MAC is the one of my CPE210 client but its the same with any other client, this for example is my mobile Phone on 5 GHz trying to get a grip:

Wed Jan 4 00:18:43 2017 daemon.info hostapd: wlan1: STA c0:...WPA: group key handshake completed (RSN)
Wed Jan 4 00:28:43 2017 daemon.info hostapd: wlan1: STA c0:...WPA: group key handshake completed (RSN)
Wed Jan 4 00:38:44 2017 daemon.info hostapd: wlan1: STA c0:...WPA: group key handshake completed (RSN)
Wed Jan 4 00:48:43 2017 daemon.info hostapd: wlan1: STA c0:...WPA: group key handshake completed (RSN)
Wed Jan 4 00:58:43 2017 daemon.info hostapd: wlan1: STA c0:...WPA: group key handshake completed (RSN)
Wed Jan 4 01:08:43 2017 daemon.info hostapd: wlan1: STA c0:...WPA: group key handshake completed (RSN)
Wed Jan 4 01:18:43 2017 daemon.info hostapd: wlan1: STA c0:...WPA: group key handshake completed (RSN)
Wed Jan 4 01:28:43 2017 daemon.info hostapd: wlan1: STA c0:...WPA: group key handshake completed (RSN)
Wed Jan 4 01:36:58 2017 daemon.info hostapd: wlan1: STA c0:...IEEE 802.11: authenticated
Wed Jan 4 01:36:58 2017 daemon.info hostapd: wlan1: STA c0:...IEEE 802.11: associated (aid 3)

As soon as I restart the both WDR3600's, everything is OK again for around 2-3 days. This Problem has been around for a long time now for me, I'm not sure when it all started but i have the feeling it got worse the last 2-3 Weeks. Im updating my devices regulary with the latest LEDE Snapshot.

Any Ideas?

Best Regards - Daniel

I also happen to have two WDR3600 both running LEDE Reboot r2677 and currently both have uptime of about 8 days with no issues. Have you tried to reinstall the firmware wiping all settings configuring everything from scratch?

Did you have similar Problems? I was postponing a fresh install, i've got vlan's, multiple ssid's, ser2net, snmp and a print server running on them. Clean install will take 1 or 2 hours.

Today it went worse just after a few hours, so i had to restart them again.

I know it's a pain in the ass to redo everything. I did it anyway as they were on openwrt and I wasn't confident in transitioning them to lede without wiping everything. I've got similar things are you configured - vlans, multiple ssid's, print server, asterisk, openvpn.

Looks like I have to bite the bullet. I think I will return to stock TP-LINK Firmware and then clean flash LEDE again, just to be sure.

That is definately not necessary, just issue "firstboot" or "sysupgrade -n new-firmware-snapshot.bin" to nuke the existing config on your device.

I know, but better safe than sorry :wink:

I did fresh Installation yesterday and the Problems reapeared just after 3,5 Hours :frowning:

Fri Jan  6 07:54:36 2017 kern.info kernel: [59991.324982] wlan0: authenticate with a0:...
Fri Jan  6 07:54:36 2017 kern.info kernel: [59991.347418] wlan0: send auth to a0:... (try 1/3)
Fri Jan  6 07:54:36 2017 kern.info kernel: [59991.421572] wlan0: send auth to a0:... (try 2/3)
Fri Jan  6 07:54:36 2017 kern.info kernel: [59991.491070] wlan0: send auth to a0:... (try 3/3)
Fri Jan  6 07:54:36 2017 kern.info kernel: [59991.551076] wlan0: authentication with a0:... timed out

I have absolutely no clue whats the Problem here. Its the excact same Problem with both WDR3600, so a Hardware defect is very unlikely.

I picked one WDR3600, disabled 5 GHz, enabled the MAC Filter on 2,4 GHz to only accept the MAC of the CPE210, told the CPE210 to only connect to this BSSID. So there is only this one Client connected on this one WDR3600. Rebooted booth Devices...

BAAM unstable from the Start:

PING 192.168.1.5 (192.168.1.5): 56 data bytes
64 bytes from 192.168.1.5: seq=0 ttl=64 time=609.641 ms
64 bytes from 192.168.1.5: seq=1 ttl=64 time=191.071 ms
64 bytes from 192.168.1.5: seq=2 ttl=64 time=914.054 ms
64 bytes from 192.168.1.5: seq=3 ttl=64 time=982.639 ms
64 bytes from 192.168.1.5: seq=4 ttl=64 time=589.829 ms
64 bytes from 192.168.1.5: seq=5 ttl=64 time=933.197 ms
64 bytes from 192.168.1.5: seq=6 ttl=64 time=694.081 ms
64 bytes from 192.168.1.5: seq=7 ttl=64 time=514.259 ms
64 bytes from 192.168.1.5: seq=8 ttl=64 time=1012.370 ms
64 bytes from 192.168.1.5: seq=9 ttl=64 time=22.127 ms
64 bytes from 192.168.1.5: seq=10 ttl=64 time=948.874 ms
64 bytes from 192.168.1.5: seq=11 ttl=64 time=89.344 ms
64 bytes from 192.168.1.5: seq=12 ttl=64 time=541.406 ms
64 bytes from 192.168.1.5: seq=13 ttl=64 time=460.133 ms
64 bytes from 192.168.1.5: seq=14 ttl=64 time=837.942 ms
64 bytes from 192.168.1.5: seq=15 ttl=64 time=87.990 ms
64 bytes from 192.168.1.5: seq=16 ttl=64 time=47.822 ms
^C
--- 192.168.1.5 ping statistics ---
18 packets transmitted, 17 packets received, 5% packet loss
round-trip min/avg/max = 22.127/557.457/1012.370 ms

WIll try a TP-LINK WR-841N as a Client later.

The lag it's giving you is insane. Meanwhile my two wdr3600 have now uptimes 8 and 9 days respectively with no issues. The only difference I can think of is that I have used my own images not the default ones downloadable here. The reason for that is that I wanted to squeeze in packages like asterisk without using the extroot.

The Ping was from a PC wired to the 3600 to the CPE210 which is bridged through WDS. I have the feeling that it has something to do with either the CPE210 or WDS.

I switched both 3600's to Master Mode without WDS and turned of the CPE210. Running without Problems since a few Hours. Will investigate further...

There were ath9k Updates a few Days ago, did a clean install again... its running since 2 days without problems now. WDS turned on... Will try putting some load on the CPE210 and see if its stable.

No, its not. Even with the latest Build it gets unstable after 1-2 Days till it drops all 2.4 GHz Clients and does not accept any new :frowning: Everything fine after reboot.

Personally I can not reproduce that issue (uptime 14 days, no problems - tl-wd4300 <--> tl-wdr3600 WDS).

And here it goes again... Went away again around 13:55, max 10 mins. later, no 2.4 GHz Client connected or able to connect.

Syslog: http://pastebin.com/HXLZ3sg8
Kernel Log: http://pastebin.com/v9d05yXx

Hi, I also have a TL-WDR3600 v1, currently running 17.01.0-rc2 and i also experience Wifi instabilities (after about 24h). My clients can connect, but don't get IP-Adresses. I can connect to the router using a static IP (even over a WDS bridge, I really start to like WDS...). After restarting dnsmasq everthing seems to be back to normal again.

When looking through to logs I found this trace: http://pastebin.com/RKkJKLp7

I heard other people had Wifi problems after enabling of ath9k airtime fairness, but I used a LEDE master-snapshot with airtime fairness patches a few weeks ago and never had problems, so I suspect this is something else.

It doesn't hurt for @cyablo to disable ATF and see if it helps...

On the command line:

echo 0 > /sys/kernel/debug/ieee80211/phy0/ath9k/airtime_flags
echo 0 > /sys/kernel/debug/ieee80211/phy1/ath9k/airtime_flags

Will try disabling airtime fairness and reply if it helps. Problem still exists on current snapshot.

Just to let you know: its unrelated to airtime fairness, problem persists with flag 0.

I am running WDR3600 and had to reboot today. My configuration is minimal because hosting applications on a firewall is just crazy. Next time, will try to connect under SSH and display memory usage, disk usage, etc ... Looks like a memory leak IMHO.

Def no memory leak, monitoring all my hardware via snmp. Memory stays under 30% usage and CPU mostly under 10%. These devices are quite powerful, enough juice for running smaller applications like print server as well.