Frequent ARPs and no traffic over WLAN

One device on my wifi network (an honor view 10) is sporadically loosing a network connection, whilst the wifi connection. I think I have narrowed it down to an ARP problem of some description.

If I ping the device I see something like:

64 bytes from 192.168.1.199: icmp_seq=419 ttl=64 time=50.5 ms
64 bytes from 192.168.1.199: icmp_seq=427 ttl=64 time=15439 ms
64 bytes from 192.168.1.199: icmp_seq=428 ttl=64 time=14419 ms
<snip>
64 bytes from 192.168.1.199: icmp_seq=436 ttl=64 time=6267 ms
64 bytes from 192.168.1.199: icmp_seq=437 ttl=64 time=5244 ms
64 bytes from 192.168.1.199: icmp_seq=439 ttl=64 time=3196 ms
64 bytes from 192.168.1.199: icmp_seq=440 ttl=64 time=2174 ms
64 bytes from 192.168.1.199: icmp_seq=441 ttl=64 time=1150 ms
64 bytes from 192.168.1.199: icmp_seq=442 ttl=64 time=126 ms
64 bytes from 192.168.1.199: icmp_seq=443 ttl=64 time=250 ms
64 bytes from 192.168.1.199: icmp_seq=444 ttl=64 time=3.03 ms
64 bytes from 192.168.1.199: icmp_seq=445 ttl=64 time=3.54 ms
From 192.168.1.100 icmp_seq=458 Destination Host Unreachable
From 192.168.1.100 icmp_seq=459 Destination Host Unreachable
From 192.168.1.100 icmp_seq=460 Destination Host Unreachable
From 192.168.1.100 icmp_seq=461 Destination Host Unreachable

This shows very variable ping times, and then a disconnect which might take a minute or so to recover from.

If I run tcpdump -i wlan1-3 arp host 192.168.1.199 or tcpdump -i br-home arp host 192.168.1.199 I see: a constant flood of:

20:18:46.490180 ARP, Request who-has downstairs_router.lan tell honorview.lan, length 28
20:18:46.490235 ARP, Request who-has downstairs_router.lan tell honorview.lan, length 28
20:18:46.490442 ARP, Reply downstairs_router.lan is-at 9c:3d:xx:xx:xx:xx (oui Unknown), length

This is repeated every second when the ping isn't working, which shows that the wifi is staying up, and the L2 network is fine it is just the ARP failures that are stopping the IP connectivity.

When the IP connection restarts (after a minute or so) the ARP requests slow down. I don't have this problem on any other device.

For info I am running OpenWRT 21.02.1 on downstairs_router which has a different bridge and SSID for home, work, admin and guest. There is a second router running OpenWRT 21.02.0 which is running trunked ethernet between the two. I have removed the VLANs and the secondary device to no avail, so don't believe they are part of the problem.

What could this be? I have disabled every power saving setting on the phone and can adb shell to debug.. but I'm pretty much at the end end of my debugging skill.

Threads like this do suggest it could be something to do with the timing of ARP replies, but I have no idea if this is something I can test or debug.

Does this happen for this particular device only?
The ping response times are very bad for pinging a device within the lan. Does this happen to other devices pinging from the same location as the phone?
If you move the phone close to the AP does it get better?

Yes, it is only on one device, all other mobile devices are fine.

That device works fine on other wireless networks

Location makes no difference at all, and I can see that in both directions the device and router consider they have a good signal

It's hard to pinpoint the culprit, when everything works fine whenever these two are not combined.
I can suggest (backup first) restore to defaults and try to connect with bare minimum configuration. Vlans don't matter. But maybe you can wait for another opinion before you go down this road.

When I upgraded to 21.02 I wiped out the entire config, and slowly rebuilt so I feel like I've been through that step. I also have two different Netgear R7800 and I swapped them over a few weeks ago, so I'm confident that it's not a hardware problem.

1 Like

I am moderately sure this problem has been resolved by reducing the beacon interval to 75 (from the default of 100).

My assumption is that ARP requests from the device were being received and responded to by the access point, but that the device had stopped listening (as it tries to save power?) and never received the ARP response, therefore it kept doing a new ARP request.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.