Clients in same WLAN can't reach each other

I have exactly this problem too on Reboot (17.01.4, r3560-79f57e422d).

On my 2.4GHz radio I have a second SSID with option isolate 1. The main SSID does not have this option. /sys/devices/virtual/net/br-lan/lower_wlan0/brport/hairpin_mode shows 1 like it should. multicast_to_unicast is 1. and /var/run/hostapd-phy0.conf shows ap_isolate=1 on all SSID's like it should (because it should be hairpinned via bridge). Still, the Wi-Fi client to client traffic fails. I see my laptop send ARP out but never gets a reply from other Wi-Fi clients. This looks like a bug to me. Or an undocumented feature :slight_smile:

1 Like

I'm also running 17.01.4, but I had the same under previous releases too. Works perfectly for 6-10 days, then ARP requests to find my Wireless Printer and Raspberry PI no longer get replies. From the router itself, everything is good!

My config is also using 2 SSID on the 2.4Ghz radio...

For now, my workaround is to run the "wifi" command from the cron schedule every night @1am. It reloads wifi config and I get back my wireless printer. It does kick out every WIFI users for a few secs though

3 Likes

I'm running 17.01.4 and have my LAN bridged to both the 2.4GHz and 5GHz WLANs. I'm seeing exactly the same thing - works fine for a few days then my desktop on the LAN can't get ARP replies from my wireless devices. But the wireless devices can still access my file server on the LAN and so I can still access them from there and from the router itself. Bouncing the wifi fixes the problem, so I've added a cron job to do that.

1 Like

Same problem (17.01.2, r3435-65eec8bd5f) on a Linksys WRT AC3200. Wifi clients can't see each other. I've looked at the values in /sys/devices/*/hairpin_mode, multicast_to_unicast, isolate_mode and they match what people say are the "correct values". Rebooting will make it work for a hour or day or two, then it stops again. Even when it is working, ping times between Wifi clients seem long (200-300ms).

Settings appear to be the same before and after boot:

root@openWRT:/sys# for f in $(find . -name hairpin_mode -o -name multicast_to_unicast -o -name isolate_mode); do echo $f; cat $f; done
./devices/platform/soc/soc:pcie-controller/pci0000:00/0000:00:01.0/0000:01:00.0/net/wlan0/brport/isolate_mode
0
./devices/platform/soc/soc:pcie-controller/pci0000:00/0000:00:01.0/0000:01:00.0/net/wlan0/brport/hairpin_mode
1
./devices/platform/soc/soc:pcie-controller/pci0000:00/0000:00:01.0/0000:01:00.0/net/wlan0/brport/multicast_to_unicast
1
./devices/platform/soc/soc:pcie-controller/pci0000:00/0000:00:02.0/0000:02:00.0/net/wlan1/brport/isolate_mode
0
./devices/platform/soc/soc:pcie-controller/pci0000:00/0000:00:02.0/0000:02:00.0/net/wlan1/brport/hairpin_mode
1
./devices/platform/soc/soc:pcie-controller/pci0000:00/0000:00:02.0/0000:02:00.0/net/wlan1/brport/multicast_to_unicast
1
./devices/platform/soc/soc:internal-regs/f10d8000.sdhci/mmc_host/mmc0/mmc0:0001/mmc0:0001:1/net/wlan2/brport/isolate_mode
0
./devices/platform/soc/soc:internal-regs/f10d8000.sdhci/mmc_host/mmc0/mmc0:0001/mmc0:0001:1/net/wlan2/brport/hairpin_mode
1
./devices/platform/soc/soc:internal-regs/f10d8000.sdhci/mmc_host/mmc0/mmc0:0001/mmc0:0001:1/net/wlan2/brport/multicast_to_unicast
1
./devices/platform/soc/soc:internal-regs/f1034000.ethernet/net/eth0/brport/isolate_mode
0
./devices/platform/soc/soc:internal-regs/f1034000.ethernet/net/eth0/brport/hairpin_mode
0
./devices/platform/soc/soc:internal-regs/f1034000.ethernet/net/eth0/brport/multicast_to_unicast
0
./kernel/debug/ieee80211/phy1/netdev:wlan1/multicast_to_unicast
0x0
./kernel/debug/ieee80211/phy0/netdev:wlan0/multicast_to_unicast
0x0

Is there any hope that a newer version fixes this? If not, I'm going to give up and to go back to my old router.

1 Like

Same ping times when using the "wifi" command?

when it is working, ping times between Wifi clients seem long (200-300ms).

Same ping times when using the “wifi” command?

I don't know what you mean by 'when using the "wifi" command'.

I've been doing some additional testing, and it looks like the long ping times happen with certain wifi clients even when pinged from the wired network.

It also seems like when it stops working, clients on the 5GHz wlan can talk to each other, but clients on the 5GHz wlan can't talk to clients on the 2.4GHz wlan. Wired clients can talk to all wifi clients.

The wifi command, when issued with no arguments, will reload your wifi. Resets things & allows your clients to see each other again. When I first ran into this issue a reboot oddly didn't help but issuing the wifi command did.

wifi --help
Usage: /sbin/wifi [config|down|reload|status]
enables (default), disables or configures devices not yet configured.

The wifi command didn't seem to affect ping times. Rebooting did appear to reduce ping times for one client, but another.

After spending a couple more hours fighting with it, I had to revert to using a Netgear WNDR3800 with OpenWRT 15.05.1. The Linksys WRT AC3200 running 17.01.2 would only pass packets between wifi clients for a few hours at a time, and having to restart/reboot several times a day isn't viable. The 2.4GHZ radio wold also sometimes lock up completely (though that only happens a few times a month rather than several times a day).

Funny issue, encountering the same, brand new WRT AC3200 with latest LEDE.
Had very inconsistent, slow inter-device bandwidth and latency.
Could not proof that it has something to do with the radio-device (2,4ghz vs. 5ghz).

Nevertheless, I can clearly confirm that the following solves the whole issue (the dirty way I believe):

for fn in /sys/devices/virtual/net/br-lan/*/brport/{hairpin_mode,multicast_to_unicast,isolate_mode}
do
echo "$fn "
(echo $fn | grep 'isolate' && echo -n 0 | tee $fn) || echo -n 1 > $fn
done

hairpin_mode is the most important one with the highest impact (no vs. good communication)
multicast_to_unicast seems to do something, but not enough evidence collected
isolate_mode HAS to be '0' or no communication will be there between clients

Clearly this is far from fancy, so: How do I make sure LEDE is caring about these settings?

I can clearly confirm that the following solves the whole issue

for fn in /sys/devices/virtual/net/br-lan/*/brport/{hairpin_mode,multicast_to_unicast,isolate_mode}
   do
   echo "$fn "
   (echo $fn | grep 'isolate' && echo -n 0 | tee $fn) || echo -n 1 > $fn
  done

Does that end up applying the same settings to both wireless interfaces and eth0?

How long does the fix last?

Is it permanent (until reboot), or does it start failing again after a few hours?

Any progress on a "real" fix for this problem?

I've been digging quite some time into this the last days and those are my (for now) final thoughts and conclusions:

  • as what I've experienced WRT3200ACM and WRT1200ACv2 are by far not ready for regular usage with LEDE
  • both devices suffer from bad wifi-drivers, means: no reliable connection, (once started they seem to work fine for some seconds, then the wifi brakes down and all packets get lost through the interace(s)
  • my aforementioned hack does only work for the 3200, the 1200v2 devices are nearly useless with the (current state of) LEDE
  • under the line it would have been great, if wikis and docs somehow mention this situation with those devices, because it destroyed a lot of time and never a reliable, fast and solid wifi-ap/router device was available

I fully understand that there a limitations due to the strict open source policy for the drivers, nevertheless having such flawed device(s) listed as "working" is from my recent experience not appropriate.

dd-wrt in comparision does deliver the expected performance for those devices and after the first config it now runs for several days without generating issues. Don't get me wrong, I would also prefer LEDE (as I've intentionally chosen it initially) but as the bare basics do not work (provide a decent wifi-routing/ap) I had to drop LEDE.

Sorry for not having better news on this topic hopefully it will be resolved sooner or later....

Thanks for the update. For now, I think I'll stick with OpenWRT 15.05.1 on my old WNDR3800. Wireless speeds aren't quite as good, but it's completely reliable.

having very same issue on WRT3200ACM
can't ping lan-wlan, wlan-wlan, but from shell I can ping everything.
Whenever wlan client ping lan client, lan client got an MAC and able to ping as well.
So it appears hairpin mode won't work properly either because of LEDE bug, or marvel wifi driver.

Boy, talk about dissappointing, this problem has be around for almost a year and there is no evidence of any progress to solving it. Bug report referenced above shows Status Unconfirmed Assigned To No-one Priority Very Low Reported Version All How many reports of the same problem does it take to confirm a problem or get it assigned a priority above Very Low?

I just installed lede 17.01.04 on my Buffalo WZR-600DHP and nothing connected to the router can contact anything else connected to the router. Everything can contact the router and everything can reach the internet. Very basic setup, just set up the radios and give it a try but to be sure I didn't do anything stupid I did a firstboot and set up again.

This appears to be affecting several architectures and making them virtually unusable. Yes, most look to be fairly old but they were/are very popular boxes. Keeping compatibility with older hardware isn't as sexy as adding new features for new silicon in the latest kernel but one of the main reasons I want to use OpenWrt is to avoid being locked to old, unmaintained firmware if I don't buy the latest, greatest box du jour. So much for that idea, I'm back on Chaos Calmer.

check for client isolation mode? maybe this is an accidentally enabled feature rather than a bug?

In my frustration I didn't state that I have tried what is recommended in this thread:

root@LEDE:/# for f in $(find . -name hairpin_mode); do echo -n $f; echo -n "
";cat $f; done
./sys/devices/pci0000:00/0000:00:11.0/net/wlan0/brport/hairpin_mode 1
./sys/devices/pci0000:00/0000:00:12.0/net/wlan1/brport/hairpin_mode 1
./sys/devices/platform/ag71xx.0/net/eth0/brport/hairpin_mode 1
root@LEDE:/# for f in $(find . -name multicast_to_unicast); do echo -n $f; echo
-n " ";cat $f; done
./sys/devices/pci0000:00/0000:00:11.0/net/wlan0/brport/multicast_to_unicast 1
./sys/devices/pci0000:00/0000:00:12.0/net/wlan1/brport/multicast_to_unicast 1
./sys/devices/platform/ag71xx.0/net/eth0/brport/multicast_to_unicast 1
./sys/kernel/debug/ieee80211/phy1/netdev:wlan1/multicast_to_unicast 0x0
./sys/kernel/debug/ieee80211/phy0/netdev:wlan0/multicast_to_unicast 0x0
root@LEDE:/# for f in $(find . -name isolate_mode); do echo -n $f; echo -n "
";cat $f; done
./sys/devices/pci0000:00/0000:00:11.0/net/wlan0/brport/isolate_mode 0
./sys/devices/pci0000:00/0000:00:12.0/net/wlan1/brport/isolate_mode 0
./sys/devices/platform/ag71xx.0/net/eth0/brport/isolate_mode 0
root@LEDE:/#

I can't find a magic incantation to change the values in /sys/kernel/debug entries

root@LEDE:/# grep -r isolate_mode /etc/*
grep: /etc/localtime: No such file or directory
grep: /etc/ppp/resolv.conf: No such file or directory
root@LEDE:/# grep -r hairpin_mode /etc/*
grep: /etc/localtime: No such file or directory
grep: /etc/ppp/resolv.conf: No such file or directory
root@LEDE:/# grep -r multicast_to_unicast /etc/*
grep: /etc/localtime: No such file or directory
grep: /etc/ppp/resolv.conf: No such file or directory

I was just looking through my interfaces and attempting to run this script. It looks like the cause might be from the sysfs device name changing from

/sys/devices/virtual/net/br-lan/brport/...

to

/sys/devices/virtual/net/br-lan/brif/...

I am unable to reproduce the problem on a fresh install of 17.01.4 on a TL-WR1043nd and a snapshot build on an Alix APU with ath9k radios.

Will give it another try tonight.

Update:
Unfortunately I am still not able to reproduce this on a standard LEDE 17.01.4 install, this time with a TP-Link TL-WDR4900 v1. I tried ping tests between an iPhone and a laptop in the same wifi, between iPhone and laptop with each being on a different radio, between desktop on ethernet and iPhone and vice versa - it all worked as expected.

Could the issue occur only on particular combinations of hardware? Where other devices, even using the same wifi driver, remain unaffected.

I have seen this issue on a netgear_r6100 Atheros AR9344 with 17.01.4. Seen between ethernet and 2.4GHz. Unable to test the 5GHz radio.

I found this thread after upgrading to LEDE 17.01.4 from 15.01. I discovered that Chromecast had been broken.

I got it working again by changing /sys/devices/virtual/net/br-lan/brif/wlan0/brport/hairpin_mode from 1 to 0. wlan0 is my 2.4 GHz interface.

I updated bug report 714.

https://bugs.openwrt.org/index.php?do=details&task_id=714