One device on my wifi network (an honor view 10) is sporadically loosing a network connection, whilst the wifi connection. I think I have narrowed it down to an ARP problem of some description.
If I ping the device I see something like:
64 bytes from 192.168.1.199: icmp_seq=419 ttl=64 time=50.5 ms
64 bytes from 192.168.1.199: icmp_seq=427 ttl=64 time=15439 ms
64 bytes from 192.168.1.199: icmp_seq=428 ttl=64 time=14419 ms
<snip>
64 bytes from 192.168.1.199: icmp_seq=436 ttl=64 time=6267 ms
64 bytes from 192.168.1.199: icmp_seq=437 ttl=64 time=5244 ms
64 bytes from 192.168.1.199: icmp_seq=439 ttl=64 time=3196 ms
64 bytes from 192.168.1.199: icmp_seq=440 ttl=64 time=2174 ms
64 bytes from 192.168.1.199: icmp_seq=441 ttl=64 time=1150 ms
64 bytes from 192.168.1.199: icmp_seq=442 ttl=64 time=126 ms
64 bytes from 192.168.1.199: icmp_seq=443 ttl=64 time=250 ms
64 bytes from 192.168.1.199: icmp_seq=444 ttl=64 time=3.03 ms
64 bytes from 192.168.1.199: icmp_seq=445 ttl=64 time=3.54 ms
From 192.168.1.100 icmp_seq=458 Destination Host Unreachable
From 192.168.1.100 icmp_seq=459 Destination Host Unreachable
From 192.168.1.100 icmp_seq=460 Destination Host Unreachable
From 192.168.1.100 icmp_seq=461 Destination Host Unreachable
This shows very variable ping times, and then a disconnect which might take a minute or so to recover from.
If I run tcpdump -i wlan1-3 arp host 192.168.1.199
or tcpdump -i br-home arp host 192.168.1.199
I see: a constant flood of:
20:18:46.490180 ARP, Request who-has downstairs_router.lan tell honorview.lan, length 28
20:18:46.490235 ARP, Request who-has downstairs_router.lan tell honorview.lan, length 28
20:18:46.490442 ARP, Reply downstairs_router.lan is-at 9c:3d:xx:xx:xx:xx (oui Unknown), length
This is repeated every second when the ping isn't working, which shows that the wifi is staying up, and the L2 network is fine it is just the ARP failures that are stopping the IP connectivity.
When the IP connection restarts (after a minute or so) the ARP requests slow down. I don't have this problem on any other device.
For info I am running OpenWRT 21.02.1 on downstairs_router which has a different bridge and SSID for home, work, admin and guest. There is a second router running OpenWRT 21.02.0 which is running trunked ethernet between the two. I have removed the VLANs and the secondary device to no avail, so don't believe they are part of the problem.
What could this be? I have disabled every power saving setting on the phone and can adb shell
to debug.. but I'm pretty much at the end end of my debugging skill.
Threads like this do suggest it could be something to do with the timing of ARP replies, but I have no idea if this is something I can test or debug.