Client isolation on guest vlan in BATMAN mesh doesn't work

Don't worry. Someone will know the answer to this.

In the meantime here is a little food for thought / questionable workarounds:

  1. If you have enough routers you could attach individual routers to the mesh nodes by LAN (or AP/STA? - more demanding on the mesh nodes resources though) and have REAL LANS (run them as routers not dumb APs) per se on independent subnets. Individual routers running 19.07.7 should be able to isolate clients. This is what I am moving towards until pioneers such as yourself carve a path forward.

  2. Another alternative is to make individual VLANS for each device you want isolated. Do laugh. I have done this before just for a laugh with a different router / product family. This is more work - but UCI is arguably cut and paste after the first few are set up - you can just copy them and change a few variables. Layer 2 isolation is imperfect even on other products and may not be secure. Thus just separate everything onto isolated VLANs. Even that is not perfect and does not guarantee security. The number of VLANs you can make is memory constrained. Make a note of how much memory they are using each time you create one.

  3. Turn off software/hardware flow offloading on all routers - maybe it is bypassing layer 2 functionality the same way it impacts QOS/SQM?

Thoughts...

The problem your routers are exhibiting is probably a mixture of router hardware and router software not working correctly more so than your config IMHO.

It looks like BATMAN and OpenWrt are not playing nicely on layer 2 (unable to isolate clients) despite most of your settings looking good. If this is the case then this is something the developers would have to address.

Crazy ideas...

Maybe your phone is caching results (like firefox caching webpages) try verifying isolation by alternative means. Erase the app and reinstall?

Buy more used routers cheap off the internet? That's what I am doing. It's fun!

Solution #1 costs more but then you have more redundancy and functionality / distibuted processing power.

Solution #2 doesn't cost anything. VLANS are free! If you can make one, you can make a 1001!

Last thoughts. IPC40xx product family has always been problematic with VLANS and many choose to avoid it and go for mediatek for VLAN or other chipsets.

I have IPQ4019 / EA8300 meshed with regular 802.11s (you have IPQ4018 in the ASUS I think) so I am probably not going to have success with layer 2 isolation either unless the developers can patch them or whatever may be the case. Eventually I will test and configure BATMAN. But I am fairly happy with the ease of use and stability I have right now with plain 802.11s and my config is twice as fast as my 150Mbps internet connection.

1 Like

Hey, @andy_archer_asus . I can try to help you out but I'm finding it hard to follow everything that you've done and cannot tell how your devices are currently configured. As @mk24 pointed out, client isolation within the same network has nothing to do with batman and it shouldn't be overly complicated to implement it--it's just a matter of adding option isolate 1 to the guest wifi-iface stanza in /etc/config/wireless.

I know you've already put several hours into trying to solve this issue using your current configuration but moving forward, my suggestion is the following:

  • First, reset both your network devices to their default settings.
  • Then follow my mesh guide until the VLAN configuration part. Test connectivity to rule out any hardware-specific issue with the mesh configuration. (Do not modify any default values unless you know what you are doing.)
  • If all looks good, then go ahead and create mesh VLANs (bat0.1, bat0.2, etc.), as explained in the guide. Test connectivity in each VLAN.
  • If the VLAN config is good, create a wireless access point for your guest network and add option isolate 1 to the guest wifi-iface stanza in /etc/config/wireless. Reboot your device and once it comes back up, connect two clients to the guest WAP and try pinging each other.

If that doesn't work, then reply this message with the current config files (/etc/config/network, /etc/config/wireless, /etc/config/dhcp, /etc/config/firewall) of all your network devices (remove the passwords before posting them here) and in your "dumb" AP, let me know if you disabled /etc/init.d/odhcpd, /etc/init.d/dnsmasq, and /etc/init.d/firewall.


Client isolation works just fine and it's trivial to implement it (see @mk24 comments). Nothing wrong with the implementation.

2 Likes

If it's helpful you should be aware that you can run iptables on bridges. nftables is even easier. But current OpenWrt runs iptables, so you just have to enable a sysctl for iptables on bridges, and then write some code to make it so that no LAN->LAN packets are allowed to bridge.

Thank you, Carlos! In fact, I spent several days with your guide - very awesome and I think it's the guide that made setting up the mesh just a breeze. Really really great work.
Now I did have to re-read and re-check some parts until I got a good enough understanding of what's happening there, but at this point I'm pretty certain that the mesh itself works as it supposed to.

I'm at the point where VLANs are created and working, wireless interfaces for the vlans are configured with option isolate 1, so that's not a culprit. Now as I said previously toggling batman's ap_isolation option for vlans changes the behavior in a way that clients are not pingable any more, so batman does do something for this.

Please also check this out:

ap isolation

Available since: batman-adv 2011.4.0

Standard WiFi AccessPoints support a feature known as 'AP isolation' which prevents one connected wireless client to talk to another wireless client. In most situations this is considered a security feature. If the WiFi AP interface is bridged into a batman-adv mesh network it might be desirable to extend this wireless client isolation throughout the mesh network, therefore batman-adv has the ability to do just that (turned off by default). This setting only affects packets without any VLAN tag (untagged packets). The ap isolation setting for VLAN tagged packets is modifiable through the per-VLAN settings.

https://www.open-mesh.org/projects/batman-adv/wiki/Tweaking

This protocol can be used to change VLAN-specific settings
> of a batman-adv soft-interface.
> At the moment only ap_isolation is available, but new ones
can be easily added as soon as batman-adv get to support more.

https://b.a.t.m.a.n.open-mesh.narkive.com/DGlFDtrs/b-a-t-m-a-n-patch-openwrt-feed-batman-adv-enable-batadv-vlan-protocol

I did post my configuration and updated it in the first post - both for the gateway and the bridge.

If you have a setup available for a quick test, can you please try to set ap_isolation for a vlan (batctl -m bat0.1 ap 1), check the status batctl -m bat0.1 ap and try something like service network restart. After you do that, check the status of the option again batctl -m bat0.1 ap and see if it's still enabled.

last few times I did that I had to reset my routers. Now I'm thoroughly terrified of iptables and things like this

Did you ever have success with it on a batman vlan?

1 Like

Write a Cron job to reset the firewall every 3 mins... Test your code, then if it locks you out, wait 3 mins.

1 Like

Hi
i came across this post because of being cited before and since the first part of the question is still not marked as solved and it sounds similar to what i had build i may can give a hint
but also since a lot is going on in the thread already sorry if any of this was suggested before...

so first of all you usually can set client isolation in the wifi config of OpenWrt which should work for single ap devices but obviously not with hole vlans. the batman isolation is from what if have read merely an extension of this exact feature so probably also not doing what you want in the long run.

if you really want to do that then you are probably on your own doing low level network configuration. which still might be hard because batman is considered a layer 2 technology so i am not aware if firewall rules are even working on this level

a quick search brought up this post by jeff for me which may be related

I never thought of that! What an awesome way to learn firewall language!

So I will probably call it quits with this. I think I will just add batman vlan ap_isolation enable commands in user firewall - so every time netowrk is reset it will reload firewall and re-enable the isolation on vlans...

Thanks all for all the help!

I really dont want to discourage anyone. open source is all about understanding the issue and doing it yourself (of course with support by the community :slight_smile: ). what i mostly wanted to do is point out that i think the magic is happening on layer 2 inside the kernel managed by batman which may overrule some of the common OpenWrt configs.

please someone correct me if i am wrong...
i think that the router your client device is connected to will receive the packet from that client device. this packet at least contains the target ip address. the router does kind of know what mac address this ip address is related to since the router itself is part of the large virtual batman switch (see distributed arp table). with this information the ip address frame (layer 3) gets warped with the mac address frame (layer 2) and handed over to batman. batman then decides where this packet goes to within the virtual switch and it pops out the the other routers kernel level. and here now i dont know what will happen first but i would guess the kernel again looks at layer 2 first and immediately decides that it knows the mac address and hands the packet over to the related switch (this part might be wrong and maybe you are able to apply firewall rules here) if i am right and your client devices are connected via cable then you are passing by the wifi isolation management.

what i could think of is having your devices by firewall rule only establish communication to the internet and actively block (DROP) communication within there firewall zone on all the routers. maybe you are able to create a minimalist test setup with two routers and try that (ofc only if you have not already)

Yes. I have at least three different locations that use client isolation inside their guest network, meaning clients connected to the guest WAP cannot reach each other. (Then depending on the particulars of the private networks, I also have additional firewall rules at the gateway-level to deny/allow guest network clients from reaching other networks.) The guest network is my only use-case for client isolation, so I don't use it with any other network (and do not quite understand why you are using with almost every single private network at your location).


The reason I mentioned that you should reset your devices is because I noticed multiple unusual settings in your first post. For example, in your /etc/config/network files:

there are two options--namely, delegate and ap_isolation-- that are not listed in the OpenWrt documentation (https://openwrt.org/docs/guide-user/base-system/basic-networking). (If you restart your network settings via /etc/init.d/network restart and then check the syslog via logread, don't you see anything unusual there?)

In addition, your bat0 config contains ap_isolation 1 but you do not need it for you non-mesh clients (anyone joining your guest WAP, for example). Set it to ap_isolation 0 instead.


Then specifically talking about in your "dumb" AP, disable dnsmasq (and dhcp), firewall, and odhcpd, as follows:

/etc/init.d/dnsmasq stop && /etc/init.d/dnsmasq disable 
/etc/init.d/odhcpd stop && /etc/init.d/odhcpd disable
/etc/init.d/firewall stop && /etc/init.d/firewall disable

This is what makes it "dumb" and ignore the respective config files in /etc/config/. I'm mentioning this because in your original post, you pasted the dhcp and firewall config files for your "dumb" AP, which should be ignored if the AP is actually "dumb".

And still in the "dumb" AP, notice that in /etc/config/wireless, its config wifi-iface 'mesh0' stanza has a duplicated option encryption. Also, now in the /etc/config/network config file, we have (a) duplicated option, (b) use of option that do not exist, and (c) your AP does not even specify a protocol for the guest interface:

(For reference, see a standard mesh-bridge config here.)


Do you see what I mean? There are multiple strange things going on that make it so much harder to troubleshoot the client isolation issue you are experiencing, and I've not even mentioned the use of batctl to change batman-adv config. If you've not given up on the issue, then follow the suggestion in my previous message and let me know how it goes.

Can we dig a little deeper here? What exactly do you mean by "cannot reach"? Is it they can't see each other? They can't ping each other? They can't see any ports on other clients?
I'm asking because when batctl -m bat0.4 ap 0, my clients can ping each other, but when it is batctl -m bat0.4 ap 1, no ping is possible and no ports are reachable. So they are not reachable! But they can still somehow see each other (ipv6?)

The guest network is my only use-case for client isolation

This case is good enough - one of the three "shared" vlans I'm setting up is a guest network. If I can set it up this way, I can just replicate the settings for the other vlans.

you should reset your devices

I respectfully disagree - this is not Windows 98 we are playing with here and you shouldn't be reinstalling the OS every time you got a segfault from borland ide...

there are two options--namely, delegate and ap_isolation -- that are not listed in the OpenWrt documentation

You are looking in the wrong place:

In addition, your bat0 config contains ap_isolation 1 but you do not need it for you non-mesh clients

It does change the behavior - without it the clients can ping each other and with it they cannot. Nothing else changes.

Then specifically talking about in your "dumb" AP, disable dnsmasq (and dhcp), firewall, and odhcpd, as follows:

This has been done as my configuration is 95% based on your guide!

ignore the respective config files in /etc/config/

I didn't know that! Maybe this explains no reaction of the dumb ap router on the firewall config change...

has a duplicated option encryption

Has been corrected previously

your AP does not even specify a protocol for the guest interface:

I've not even mentioned the use of batctl to change batman-adv config

Hey, that's the only thing the moves me in the right direction! :smiley:

I do think there's a bug with batman vlan config. I might even have it set up right, I'm just not sure what should I expect from client isolation really...

So this is indeed weird. I'm reading up on OSI, it says:

3. Network Layer

The network layer has two main functions. One is breaking up segments into network packets, and reassembling the packets on the receiving end. The other is routing packets by discovering the best path across a physical network. The network layer uses network addresses (typically Internet Protocol addresses) to route packets to a destination node.

That means that any discovery packet my phone might be sending (idk, ping or port scan or whatever) should indeed be routed through the gateway (remember destination node above?) IN ANY CASE and the gateway should stop this discovery attempt - no matter what happens on Layer 2. And in fact batman's ap_isolation should have no effect on the bahavior. But it does. What's going on there?

Or is not going through the gateway? We have an option enabled option distributed_arp_table '1', which if I understand correctly keeps a table of IP-mac records, we have 'bridge_loop_avoidance '1'' and 'aggregated_ogms '1'' which should in fact forward a packet directly to destination without ever sending it to the gateway before it even has a chance to pass through the firewall. Is that what you were trying to tell me guys?

packets on a given LAN don't go through the gateway ever. On a single LAN they just go through switches. for a situation like a BATMAN mesh they get shunted through the various bridges.

https://www.open-mesh.org/projects/batman-adv/wiki/Ap-isolation

says that it doesn't do anything to stuff connected via wired links. If you have any wired LAN clients, you won't be able to isolate them with BATMAN only.

What exactly is the behavior you're trying to achieve? Be as specific as possible please!

If you have any wired LAN clients

No, not on the VLANs

What exactly is the behavior you're trying to achieve? Be as specific as possible please!

Currently I got as far as this:

  1. A phone connected to the wireless iot network and running a network scanner (such as Fring or Net Analyzer) is able to list all the other devices on the network and Fring even says they are "online".
  2. When I try to ping another device, I get 100% packet lost
  3. Traceroute hops fail every time
  4. The scanners cannot find any open ports

If I disable batman's isolation (but not wireless AP isolation), I can ping other devices and see open ports on them.

I am trying to figure out if this is expected behavior and how the network analyzer and fring are able to discover all other devices when there no open ports and pings possible? Can complete stealth be achieved in this scenario?

well I was hoping you could tell me what you wanted rather than what you've currently got.

But based on your question about "complete stealth" I'd say it sounds like what you want is that any device connected to your wifi access points can only send to / receive from the MAC address of the gateway/router.

is that right?

Btw I tried one more thing. I enabled firewall on the ASUS dumb AP bridge and configured the following rules:

config defaults
        option input 'ACCEPT'
        option output 'ACCEPT'
        option forward 'REJECT'
        option synflood_protect '1'

config include
        option path '/etc/firewall.user'

config zone
        option name 'iot'
        option input 'REJECT'
        option forward 'REJECT'
        option network 'iot'
        option output 'ACCEPT'

config rule
        option dest_port '67'
        option src 'iot'
        option name 'iot_DHCP'
        option target 'ACCEPT'
        list dest_ip '192.168.10.1'   #this is the ARCHER gateway running DHCP and DNS and giving internet access

config rule
        option dest_port '53'
        option src 'iot'
        option name 'iot_DNS'
        option target 'ACCEPT'
        list dest_ip '192.168.10.1'

And it didn't change the behavior in any way.

That I'm not sure, I want the scanners to not show other devices on the networks - if this is only achievable by clients only talking to the gateway MAC - then yes, that's it.

yeah, that's basically the gist of it. within the LAN you want devices to only talk to the gateway, not others in the same LAN, and that's basically at the MAC level.