ARP resolution failing in LAN

Hey, I'm trying to debug an issue with my network set up.

I'm running 1 router and 2 APs connected via ethernet. Router is Xiaomi Mi Router 3 Pro, APs are Xiaomi Mi R3g v1s. All are running OpenWrt 19.07.7

I have 2 Wifi networks in LAN, one 2.4GHz and other 5GHz.

The issue I'm experiencing is similar to this:

But I haven't been able to find a config that works just from that topic, and I would also like to better understand the logic here.


The issue I see is that ARP resolution starts failing after a while. Clients are no longer able to reach each other in the LAN, but can still reach the internet just fine.

Ping example:

PING host (172.16.***.***): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
^C
--- host ping statistics ---
5 packets transmitted, 0 packets received, 100.0% packet loss

By running tcpdump on the host: tcpdump -i any -w test.pcap I see a constant stream of ARP requests with no answer.


/etc/config/network:

config interface 'loopback'
	option ifname 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix '******'

config interface 'lan'
	option type 'bridge'
	option ifname 'eth0.1'
	option proto 'static'
	option ip6assign '60'
	option netmask '255.255.***.***'
	option ipaddr '*****'
	option igmp_snooping '1'
	option multicast_to_unicast '1'

config device 'lan_eth0_1_dev'
	option name 'eth0.1'
	option macaddr '*********'

config interface 'wan'
	option ifname 'eth0.2'
	option proto 'dhcp'
	option peerdns '0'
	option dns '127.0.0.1'

config interface 'wan6'
	option ifname 'eth0.2'
	option proto 'dhcpv6'
	option peerdns '0'
	option dns '0::1'

config switch
	option name 'switch0'
	option reset '1'
	option enable_vlan '1'

config switch_vlan
	option device 'switch0'
	option vlan '1'
	option ports '2 3 6t'

config switch_vlan
	option device 'switch0'
	option vlan '2'
	option ports '4 6t'

config interface 'isplan'
	option ifname 'eth0.3'
	option proto 'dhcp'
	option defaultroute '0'
	option peerdns '0'
	option dns '127.0.0.1'

config switch_vlan 'isp'
	option device 'switch0'
	option vlan '3'
	option ports '1 6t'

/etc/config/wireless:

config wifi-device 'radio0'
  option type 'mac80211'
  option hwmode '11g'
  option path 'pci0000:00/0000:00:00.0/0000:01:00.0'
  option htmode 'HT20'
  option channel '2'

config wifi-iface 'default_radio0'
  option device 'radio0'
  option network 'lan'
  option mode 'ap'
  option ft_over_ds '1'
  option ssid '******'
  option encryption 'psk2+ccmp'
  # https://forum.openwrt.org/t/ieee-802-11-could-not-set-sta-to-kernel-driver/84998
  option ieee80211w '1'
  option ft_psk_generate_local '1'
  option key '********'
  option ieee80211r '1'
  option pmk_r1_push '1'
  option mobility_domain '****'
  option isolate '1'

config wifi-device 'radio1'
  option type 'mac80211'
  option hwmode '11a'
  option path 'pci0000:00/0000:00:01.0/0000:02:00.0'
  option htmode 'VHT80'
  option channel '44'
  option country '**'

config wifi-iface 'default_radio1'
  option device 'radio1'
  option network 'lan'
  option mode 'ap'
  option ft_over_ds '1'
  option ssid '*******'
  option encryption 'psk2+ccmp'
  # https://forum.openwrt.org/t/ieee-802-11-could-not-set-sta-to-kernel-driver/84998
  option ieee80211w '1'
  option ft_psk_generate_local '1'
  option key '**********'
  option ieee80211r '1'
  option pmk_r1_push '1'
  option mobility_domain '****'
  option isolate '1'

These are the values I have in sys that the topic above recommended checking.

/sys/kernel/debug/ieee80211/phy1/netdev:wlan1/multicast_to_unicast
0x0
/sys/kernel/debug/ieee80211/phy0/netdev:wlan0/multicast_to_unicast
0x0
/sys/devices/pci0000:00/0000:00:00.0/0000:01:00.0/net/wlan0/brport/hairpin_mode
0
/sys/devices/pci0000:00/0000:00:00.0/0000:01:00.0/net/wlan0/brport/multicast_to_unicast
1
/sys/devices/pci0000:00/0000:00:00.0/0000:01:00.0/net/wlan0/brport/isolated
0
/sys/devices/pci0000:00/0000:00:01.0/0000:02:00.0/net/wlan1/brport/hairpin_mode
0
/sys/devices/pci0000:00/0000:00:01.0/0000:02:00.0/net/wlan1/brport/multicast_to_unicast
1
/sys/devices/pci0000:00/0000:00:01.0/0000:02:00.0/net/wlan1/brport/isolated
0
/sys/devices/system/cpu/isolated

/sys/devices/virtual/net/eth0.1/brport/hairpin_mode
0
/sys/devices/virtual/net/eth0.1/brport/multicast_to_unicast
0
/sys/devices/virtual/net/eth0.1/brport/isolated
0

After a /etc/init.d/networking restart clients can reach each other again.

I would appreciate some help in debugging this issue. The only thing I can think that maybe is not working properly is that the eth interfaces have dots in them (eth0.1), and as suggested in the topic above that may be causing issues. But these are physical interface names that are set by default.

My two cents:

  • Local IP addresses are safe to share, and hiding them makes it hard to diagnose the issue.
  • You have added a "isolate = 1" option to your wireless config... that explicitly blocks traffic between wireless clients.
  • The "option DNS 127.0.0.1" makes no sense.
2 Likes

Well, yes and no, I'd prefer not to leak info about my internal nw setup. But I can edit the post and reveal them if that'd make diagnosing the issue easier.

According to the post I linked isolate = 1 isolates clients in hostapd. This requires the packets sent by the clients to be passed over to the LAN interface and apply rules from there (firewall rules for example).

This is the post I was mentioning:

I've tested these settings and in fact they make it easier for wireless clients to reach each other (paired with multicast to unicast and igmp snooping). That can't be the problem in any case because this has happened since before I added these settings (no snooping, no multicast to unicast, isolate=0), and always happens a few hours after the network stack has been restarted.

I'm running a local forwarder, DNS is working fine.

The issue that I'm having seems to be transient in nature and disappear after a network restart, which means the settings themselves should be ok.

Back up and factory reset.
If the issue persists, then it's hardware-related.

Otherwise, isolate the cause of the problem.
Use diff or one change at a time modification.

2 Likes

Adding an update on this, it seems that upgrading to the new 21.02 release fixed my issues.

None of the other workarounds/solutions presented here seemed to have any impact on the problem, and my suspicion is that it is related to the multiple swconfig bugs reported on the mt7621 platform.

Since upgrading I have been running for about 3 weeks without the issue creeping back up and the routers seem much more stable.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.