Android device disconnects after fast roaming due to lost DHCP replies

Got a more detailed description of what you're trying and then observing? Also syslog output with debug logging for hostapd turned on?

I just looked again... The syslog output is confusing. The STA connects at 21:39:46 using 4-way-hs. Then either is happily connected or disappears for about a minute and shows back up with a disconnection? And then instantly reconnects using FT? Did you cut out log messages?

Also, for Android, please use the Ubiquiti WiFiMan app:
When you open it, hit "wireless" bottom center. Then select your SSID and you'll see all your APs. Then select "signal mapper". This shows you signal strength and latency to 8.8.8.8.
In addition, and this is the interesting part, it shows you the handoffs at the bottom. When the device roams from one AP to another you either see "AP1 --> AP2", which means FT, or you see "AP1 --> disconnected" and "disconnected --> AP2" which means 4-way-hs.
You can then walk around and see where signal strength drops off and when the device roams and how it does it.

1 Like

Exactly the way you said it, that's what's happening. The disappearance is because I walked towards the second AP (didn't share the logs for second AP but basically it did the FT in the same manner as first AP, exactly same logs 21:40:43 to 21:40:43 appear on the second AP). I'm not sure why the AP-STA-DISCONNECTED appears, does this appear with you? The thing is the FT on second AP works and I get connectivity, and then when I walk back to first AP it behaves like in the logs I posted.

Will do the WiFiMan app steps you shared and will post an update, thanks for sharing!

So I have option max_inactivity '20' which causes the disconnect due to timeout (generally) to happen before the next reconnect. Dunno whether that makes a difference -- it shouldn't unless there's a bug, worth a shot perhaps...

OMG, this is a bit embarrassing! I rebooted both APs, and Finally! FT is working! :smiling_face_with_tear: :tada: (typical did you try "turning it off and on again")

And the strangest part is that my config was right all along from the beginning. It's very strange and I tried to recall what I have done on the APs. AP#2 was not touched at all, all of my changes were done on AP#1.

At one point (within the past two weeks) I tried creating a vlan on AP#1 which didn't work out for some reason so I reverted back the configuration. Then I created this post when I discovered that the FT was not working properly. I tried the reassociation_deadline or max_inactivity on AP#1 and AP#2 but that didn't solve the problem so I reverted. Finally I thought of updating firewall package on AP#1. Then I tried updating the packages on AP#1, and the firewall package got updated. Afterwards I realized that the iptables were loaded with fw3 rules on AP#1 but there were none in AP#2 (here I'm not sure if the iptables were loaded before the firewall package upgrade or afterwards, but my guess was afterwards as I reviewed the upgrade output and it showed it was loading the rules). I ran /etc/init.d/firewall stop which removed all the rules, but that also didn't fix the FT. Finally I decided to try rebooting the APs (both of them) and that finally did the trick.

I'll try to keep an eye on this issue, hopefully I could catch the culprit that caused all of this. But indeed it was something on the lan ports/switch side, just literally blocking the packets after FT (I wished I could figure out what was going in the switch, via the commands, I just don't know how).

The weird thing is that after performing full wifi handshake with any of the APs, the packets for the phone would go through without any problems. On top of that and when the problem was still happening, I would also notice that the device wouldn't be able to get dhcp-reply after doing full wifi handshake, this behavior happens for a few minutes and then eventually the packets are unblocked and the device receives dhcp-reply and connects successfully.

Now AP#1->AP#2->AP#1->AP#2->AP#1->AP#2->... works beautifully, even with secondary ssid's on the wlan nic.

This is how a disabled firewall should look like in iptables (there is no need to remove /etc/config/firewall, just disabling it and stopping the firewall service will do the trick):

~# iptables -L -v -n
Chain INPUT (policy ACCEPT 8406 packets, 683K bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 3 packets, 1002 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 5063 packets, 2618K bytes)
 pkts bytes target     prot opt in     out     source               destination

And here's my config (didn't even need reassociation_deadline or max_inactivity):

config wifi-iface 'default_radio1'
        option device 'radio1'
        option network 'lan'
        option mode 'ap'
        option ssid '<redacted>'
        option dtim_period '3'
        option encryption 'psk2'
        option key '<redacted>'
        option ieee80211r '1'
        option mobility_domain '123F'
        option ft_over_ds '0'
        option ft_psk_generate_local '1'

Thanks for everyone's replies and suggestions.

2 Likes

As I understand it, the whole problem was in the firewall #AP1 ?

Unfortunately no, both APs were doing the same blocking behavior after carrying out the scenario of: connect to first AP, FT to second AP, then FT back to first AP. Regardless of which AP I connected to first, I ended up with the blocking behavior on that same AP.

My configuration: 2 linksys ea7500 2 and 1 xiaomi mi router 4a gigabit edition.
All routers are connected via wifi using 802.11s protocol on 5Ghz
All routers have a Wi-Fi network configured with one ssid on both bands
My scenario:

  1. Android phone (lg v50) is connected to ap1 5GHz
  2. I go to ap2 and see a switch to ap2
  3. I go back to ap1, there is a switch to ap1, then a connection break and a new connection

Got syslog output to post with debug logging for hostapd turned on?

AP1

Mon Sep 12 18:50:44 2022 daemon.err hostapd: nl80211: kernel reports: Channel is disabled
Mon Sep 12 18:51:25 2022 daemon.notice netifd: wan (3082): udhcpc: sending renew to 172.23.0.1
Mon Sep 12 18:51:25 2022 daemon.notice netifd: wan (3082): udhcpc: lease of 172.23.53.216 obtained, lease time 3600
Mon Sep 12 18:51:45 2022 daemon.err hostapd: nl80211: kernel reports: Channel is disabled
Mon Sep 12 18:51:45 2022 daemon.notice hostapd: handle_probe_req: send failed
Mon Sep 12 18:52:00 2022 daemon.info dnsmasq-dhcp[3428]: DHCPREQUEST(br-lan) 192.168.5.159 98:b8:ba:14:fe:51
Mon Sep 12 18:52:00 2022 daemon.info dnsmasq-dhcp[3428]: DHCPACK(br-lan) 192.168.5.159 98:b8:ba:14:fe:51 V50-ThinQ
Mon Sep 12 18:52:07 2022 daemon.err hostapd: nl80211: kernel reports: Channel is disabled
Mon Sep 12 18:52:07 2022 daemon.notice hostapd: handle_probe_req: send failed
Mon Sep 12 18:52:07 2022 daemon.err hostapd: nl80211: kernel reports: Channel is disabled
Mon Sep 12 18:52:07 2022 daemon.notice hostapd: handle_probe_req: send failed
Mon Sep 12 18:52:07 2022 daemon.err hostapd: nl80211: kernel reports: Channel is disabled
Mon Sep 12 18:52:07 2022 daemon.notice hostapd: handle_probe_req: send failed
Mon Sep 12 18:52:13 2022 daemon.err hostapd: nl80211: kernel reports: Channel is disabled
Mon Sep 12 18:52:13 2022 daemon.notice hostapd: wlan1-1: AP-STA-DISCONNECTED 98:b8:ba:14:fe:51
Mon Sep 12 18:52:13 2022 daemon.info hostapd: wlan1-1: STA 98:b8:ba:14:fe:51 IEEE 802.11: disassociated due to inactivity
Mon Sep 12 18:52:13 2022 daemon.debug hostapd: wlan1-1: STA 98:b8:ba:14:fe:51 MLME: MLME-DISASSOCIATE.indication(98:b8:ba:14:fe:51, 4)
Mon Sep 12 18:52:13 2022 daemon.debug hostapd: wlan1-1: STA 98:b8:ba:14:fe:51 MLME: MLME-DELETEKEYS.request(98:b8:ba:14:fe:51)
Mon Sep 12 18:52:14 2022 daemon.err hostapd: nl80211: kernel reports: Channel is disabled
Mon Sep 12 18:52:14 2022 daemon.info hostapd: wlan1-1: STA 98:b8:ba:14:fe:51 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Mon Sep 12 18:52:14 2022 daemon.debug hostapd: wlan1-1: STA 98:b8:ba:14:fe:51 MLME: MLME-DEAUTHENTICATE.indication(98:b8:ba:14:fe:51, 2)
Mon Sep 12 18:52:14 2022 daemon.debug hostapd: wlan1-1: STA 98:b8:ba:14:fe:51 MLME: MLME-DELETEKEYS.request(98:b8:ba:14:fe:51)

AP2

Mon Sep 12 18:51:51 2022 daemon.err hostapd: nl80211: kernel reports: key addition failed
Mon Sep 12 18:51:51 2022 daemon.debug hostapd: wlan1-2: STA 98:b8:ba:14:fe:51 IEEE 802.11: binding station to interface 'wlan1-2'
Mon Sep 12 18:51:51 2022 daemon.debug hostapd: wlan1-2: STA 98:b8:ba:14:fe:51 IEEE 802.11: authentication OK (FT)
Mon Sep 12 18:51:51 2022 daemon.debug hostapd: wlan1-2: STA 98:b8:ba:14:fe:51 MLME: MLME-AUTHENTICATE.indication(98:b8:ba:14:fe:51, FT)
Mon Sep 12 18:51:51 2022 daemon.debug hostapd: wlan1-2: STA 98:b8:ba:14:fe:51 IEEE 802.11: association OK (aid 1)
Mon Sep 12 18:51:51 2022 daemon.info hostapd: wlan1-2: STA 98:b8:ba:14:fe:51 IEEE 802.11: associated (aid 1)
Mon Sep 12 18:51:51 2022 daemon.notice hostapd: wlan1-2: AP-STA-CONNECTED 98:b8:ba:14:fe:51
Mon Sep 12 18:51:51 2022 daemon.debug hostapd: wlan1-2: STA 98:b8:ba:14:fe:51 MLME: MLME-REASSOCIATE.indication(98:b8:ba:14:fe:51)
Mon Sep 12 18:51:51 2022 daemon.debug hostapd: wlan1-2: STA 98:b8:ba:14:fe:51 IEEE 802.11: binding station to interface 'wlan1-2'
Mon Sep 12 18:51:51 2022 daemon.debug hostapd: wlan1-2: STA 98:b8:ba:14:fe:51 WPA: event 6 notification
Mon Sep 12 18:51:51 2022 daemon.debug hostapd: wlan1-2: STA 98:b8:ba:14:fe:51 WPA: FT authentication already completed - do not start 4-way handshake
Mon Sep 12 18:52:08 2022 daemon.err hostapd: nl80211: kernel reports: key addition failed
Mon Sep 12 18:52:08 2022 daemon.debug hostapd: wlan0: STA 98:b8:ba:14:fe:51 IEEE 802.11: binding station to interface 'wlan0'
Mon Sep 12 18:52:08 2022 daemon.debug hostapd: wlan0: STA 98:b8:ba:14:fe:51 IEEE 802.11: authentication OK (FT)
Mon Sep 12 18:52:08 2022 daemon.debug hostapd: wlan0: STA 98:b8:ba:14:fe:51 MLME: MLME-AUTHENTICATE.indication(98:b8:ba:14:fe:51, FT)
Mon Sep 12 18:52:08 2022 daemon.debug hostapd: wlan0: STA 98:b8:ba:14:fe:51 IEEE 802.11: association OK (aid 1)
Mon Sep 12 18:52:08 2022 daemon.info hostapd: wlan0: STA 98:b8:ba:14:fe:51 IEEE 802.11: associated (aid 1)
Mon Sep 12 18:52:08 2022 daemon.notice hostapd: wlan0: AP-STA-CONNECTED 98:b8:ba:14:fe:51
Mon Sep 12 18:52:08 2022 daemon.debug hostapd: wlan0: STA 98:b8:ba:14:fe:51 MLME: MLME-REASSOCIATE.indication(98:b8:ba:14:fe:51)
Mon Sep 12 18:52:08 2022 daemon.debug hostapd: wlan0: STA 98:b8:ba:14:fe:51 IEEE 802.11: binding station to interface 'wlan0'
Mon Sep 12 18:52:08 2022 daemon.debug hostapd: wlan0: STA 98:b8:ba:14:fe:51 WPA: event 6 notification
Mon Sep 12 18:52:08 2022 daemon.notice hostapd: wlan1-2: Prune association for 98:b8:ba:14:fe:51
Mon Sep 12 18:52:08 2022 daemon.notice hostapd: wlan1-2: AP-STA-DISCONNECTED 98:b8:ba:14:fe:51
Mon Sep 12 18:52:08 2022 daemon.debug hostapd: wlan0: STA 98:b8:ba:14:fe:51 WPA: FT authentication already completed - do not start 4-way handshake
Mon Sep 12 18:52:10 2022 daemon.debug hostapd: wlan1-2: STA 98:b8:ba:14:fe:51 MLME: MLME-DISASSOCIATE.indication(98:b8:ba:14:fe:51, 1)
Mon Sep 12 18:52:10 2022 daemon.debug hostapd: wlan1-2: STA 98:b8:ba:14:fe:51 MLME: MLME-DELETEKEYS.request(98:b8:ba:14:fe:51)
Mon Sep 12 18:52:38 2022 daemon.info hostapd: wlan1-2: STA 98:b8:ba:14:fe:51 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Mon Sep 12 18:52:38 2022 daemon.debug hostapd: wlan1-2: STA 98:b8:ba:14:fe:51 MLME: MLME-DEAUTHENTICATE.indication(98:b8:ba:14:fe:51, 2)
Mon Sep 12 18:52:38 2022 daemon.debug hostapd: wlan1-2: STA 98:b8:ba:14:fe:51 MLME: MLME-DELETEKEYS.request(98:b8:ba:14:fe:51)
Mon Sep 12 18:52:49 2022 daemon.debug hostapd: wlan1-2: WPA rekeying GTK```

Suggestion: maybe this is because 1 ssid for two bands is used?

These errors look pretty disconcerting... Something is not right?

Also, the logs for AP2 do not show any issue. The STA successfully connects to wlan1-2 using FT and then "changes its mind" 17 seconds later and switches to wlan0, again successfully using FT. You may wonder why it doesn't connect to wlan0 from the outset but that's not a bug.

That's common and desirable practice.

I think that DHCP replies are broadcast at layer 3 but unicast at layer 2. So the switch needs to direct them properly to the AP that the client was last heard on. (also you'd want to configure tcpdump so you can see MAC addresses and confirm they are as expected).

Try connecting the two APs to an Ethernet switch (either unmanaged or managed with all the ports untagged in the same VLAN) then connect the switch to one port of your x86. This will differentiate if the problem is something with the software bridge emulating a layer 2 switch.

Running three instances of tcpdump, on the three bridge ports involved, may show something.

Whom are you replying to?

1 Like

But in reality it doesn't look like that.
I went to ap2, on the phone the wifi signal is -72 dB, then it connects and then breaks (no wifi signal) and connect again

Here's the tcpdump command that I used (need to opkg install tcpdump first):

# tcpdump br-lan and save into br-lan.pcap and output to console as well, 
# -e shows the mac addresses, br-lan with wlan1 for example
tcpdump -i br-lan -w - -U port 67 or port 68 or arp | tee br-lan.pcap | tcpdump -en -r -

Exactly like how @mk24 said, run tcpdumps on the APs, and run tcpdump on the main switch connecting the APs (if it is a managed switch or in my case a virtualized managed switch on x86 openwrt), things will become clearer where the problem is happening.

And one more thing, best to reboot the APs before starting the trial.

1 Like

:rofl: :rofl: :rofl:

1 Like

@SpinozaWeb , the logs you shared show as if your device was connected AP#2 5GHz, then it asked for DHCP request and received reply, then it FT'ed to AP#2 2.4Ghz (as if you were walking away from the AP#2 so your phone switched to 2.4GHz cause signal is better), and that's it. It never tried connecting to AP#1. I do think that two bands with same ssid may be complicating things with understanding the problem, so change the configs on both APs to have the ssid on one band only, and then try again the FT from AP to AP (I actually did that in my case and it helped a lot).

1 Like

Hi @ahasbini
Sorry to tag onto your post but you seem to know about these devices, I have the exact same APs as you (WAC104).
I'm new to Openwrt as i was just using them with the Netgear firmware but thought I'd try this.
Anyway I'm using a Draytek router/modem with WiFi turned off as it's useless and that's hard wired to 2 WAC104's.
My question is although they seem to be working ok but a load of my apps like Facebook, Twitter etc can take ages for images to load.
When I do a speed test I'm getting not far off maximum download speeds not sure if the fast transitioning is working as it should as the SSIDs are the same so I may still be on the furthest AP.
I'm going to copy your settings for the fast transitioning but I was hoping you might have done suggestions about what it could be.
Thanks

Hi @steveuk23 , I'd be happy to help, though I think it would be better to create a separate topic, share your network and dhcp config of the dumb APs, the version of openwrt you're running on them, etc...

Just a tip for now, for the apps you mentioned, what I would think that matter the most is ping and dns lookups to be as fast as possible, in addition to having a high download speed. Focus on those first, fast roaming can be worked on afterwards. Once you create the topic I could give you some commands to run and check.

1 Like

Thanks ill do that tomorrow when i've finished work hopefully you can help as it's painfully slow at the moment.