VLANs break connectivity of devices which are directly connected to OpenWRT

Hello! I have OpenWRT running on x86_64 with this 4-port network device:

Intel Corporation Ethernet Controller I225-V (rev 03)

When there are no VLANs configured, devices connected to this OpenWRT router can communicate with each other and with the OpenWRT machine. As soon as I enable my VLAN configuration, they cannot communicate with each other but they can still communicate with the OpenWRT machine.

# from machine A to B
user@machine-a:~$ ping 192.168.142.170
PING 192.168.142.170 (192.168.142.170): 56 data bytes
92 bytes from openwrt.lan (192.168.142.1): Destination Port Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 5400 6d53   0 0000  40  01 6eaa 192.168.142.176  192.168.142.170
# from OpenWRT to A or B
root@OpenWrt:~# ping 192.168.142.176
PING 192.168.142.176 (192.168.142.176): 56 data bytes
64 bytes from 192.168.142.176: seq=0 ttl=64 time=3.228 ms

root@OpenWrt:~# ping 192.168.142.170
PING 192.168.142.170 (192.168.142.170): 56 data bytes
64 bytes from 192.168.142.170: seq=0 ttl=64 time=0.792 ms

I can also say that machine A knows the MAC address of machine B:

user@machine-a:~$ arp -a -n | grep 170
? (192.168.142.170) at b8:27:eb:c1:dd:6d on en0 ifscope [ethernet]

Why am I using VLANs? I have a 4G LTE modem/router running far away from the OpenWRT machine on the LAN's physical infrastructure, but segregated on its own VLAN as it's considered WAN by OpenWRT. This architecture works great, in fact, I'm using it right now with a temporary fix: I've connected all devices to a dumb switch connected to one of the OpenWRT ports - this way they can communicate with each other.

Here are the relevant parts of my configuration:

# /etc/config/network

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'eth1'
	list ports 'eth2'
	list ports 'eth3'
	list ports 'zt<zero tier VPN port>'


config bridge-vlan
	option device 'br-lan'
	option vlan '1'
	list ports 'eth1'
	list ports 'eth2'
	list ports 'eth3'
	list ports 'zt<zero tier VPN port>'

config device
	option type '8021q'
	option ifname 'br-lan'
	option vid '1'
	option name 'br-lan.1'

config interface 'lan'
	option device 'br-lan.1'
	option proto 'static'
	option ipaddr '192.168.142.1'
	option netmask '255.255.255.0'
	option ip6assign '60'
	option delegate '0'


config bridge-vlan
	option device 'br-lan'
	option vlan '44'
	list ports 'eth1:t'
	list ports 'eth2:t'
	list ports 'eth3:t'

config device
	option name 'br-lan.44'
	option type '8021q'
	option ifname 'br-lan'
	option vid '44'

config interface 'wan_lte'
	option device 'br-lan.44'
	option proto 'static'
	option ipaddr '192.168.44.2'
	option netmask '255.255.255.0'
	option gateway '192.168.44.1'
	option metric '20'
# /etc/config/firewall

config defaults
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option synflood_protect '1'

config zone
	option name 'lan'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	list network 'lan'

config zone
	option name 'wan'
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option masq '1'
	option mtu_fix '1'
	list network 'wan_lte'

config forwarding
	option src 'lan'
	option dest 'wan'

Can you see where I'm going wrong or what I should try next?

Remove all the 802.1q stanzas. Also remove zerotier - that is a routed protocol and does not belong in the bridge.

And for vlan1, if you wish to have all ports untagged + pvid, specify that explicitly by adding. :u* after each port.

@psherman thanks for the lightning fast response!

remove zerotier - that is a routed protocol and does not belong in the bridge.

zerotier can do L2 bridging, that's working rather nicely as is.

specify that explicitly by adding. :u* after each port.

This configuration was generated by LuCI. It seems that the default list ports type is untagged (u) so it's omitted. The * would indicate is primary VLAN which is noop for interfaces where there is only one untagged VLAN. Can you explain what I'd be achieving by adding u* ?

Remove all the 802.1q stanzas.

Can you please explain what I would be achieving by removing these lines? In LuCI there are three VLAN device options: 802.1q, 802.1aq and MAC VLAN, which one is the default (would take effect by removing 802.1q)?

Thanks again

At least for now, remove zerotier from the bridge. You can add it back once things are working and see if it causes any issues.

Specifying u* shouldn’t be mandatory, but if that is the intent of the port, I always recommend explicitly setting it. The u indicates untagged egress. The * is for the ingress and ensures that the traffic is associated with the correct vlan.

And the 802.1q stanzas are not required at all. I have a theory (currently unproven) that they may actually conflict with bridge VLANs and swconfig based VLANs in certain circumstances. Regardless, I can say with confidence that you do not need them at all - the underlying 802.1q device is created automatically under the hood when you use bridge VLANs. Just remove them and it should be fine.

At least for now, remove zerotier from the bridge. You can add it back once things are working and see if it causes any issues.

A fair point. I did this first. No change.

Specifying u* shouldn’t be mandatory, but if that is the intent of the port, I always recommend explicitly setting it.

Specifying Is Primary VLAN in LuCI puts this u*, so I've done that for all untagged VLAN list ports. No change.

And the 802.1q stanzas are not required at all ... Just remove them and it should be fine.

I'd like to do this, but I'm not sure how to achieve this with LuCI and I'm concerned that I might lose access to the machine if I change the configuration files directly. I can't find anything about automatic rollback when not using LuCI, so do you know how I can achieve this via LuCI or at least with automatic rollback?

Mixing tagged and untagged on the same port is not recommended. Though technically standards-compliant, not all hardware and drivers support it. Think of ports as two types:
Access port: untagged in exactly one VLAN, connected to an ordinary non VLAN device such as a desktop computer or unmanaged switch.
Trunk port: tagged in multiple VLANs, connected to a VLAN aware device at the other end of the cable.

On x86 there is not a DSA switch, each port has its own independent connection to the CPU. A DSA-style configuration should work, but you can also use the old method of multiple bridges which each handle only one network and VLAN.

@mk24 thanks for the detailed information.

Mixing tagged and untagged on the same port is not recommended

I'm pretty sure I tried this, but I will try it again.

you can also use the old method of multiple bridges which each handle only one network and VLAN

Can you say a little more about this approach? My guess is that I would need to create a VLAN device for each physical port, then create a bridge between them. Is that it?

Also, given that there is no DSA (which I assume to be roughly equivalent to offloading the switching to hardware), there should be no discernable difference in performance between the two approaches. Correct?

@psherman I tried running these commands

uci delete network.@device[3].type
uci delete network.@device[4].type
service network restart

but there was no change in behaviour.

I didn't commit the configuration so that I was able to revert after testing with:

uci revert network.@device[3].type
uci revert network.@device[4].type

Distinguishing trunk from access ports does not help either:

config bridge-vlan
	option device 'br-lan'
	option vlan '1'
	list ports 'eth1:t'
	list ports 'eth2:u*'
	list ports 'eth3:u*'

config bridge-vlan
	option device 'br-lan'
	option vlan '44'
	list ports 'eth1:t'

hi @rcambrj
if this is 4 port card, then you need in your config eth0, but i dont see it
maybe you missed this, and your devices are on eth0 ?

@NPeca75 you have a keen eye, and I apologise for missing that crucial information in my initial post.

eth0 is used for another uplink which runs on PPPoE over VLAN6 to the ONT. Not bridged. Do you think this might affect the bridged VLANs?

In parallel, I tried configuring OpenWRT with separate VLAN devices per physical port, then using one bridge per VLAN to bring those together. I tried, first with Enable VLAN filtering off, and then on. The network didn't come up, so both times LuCI rolled the changes back. I might have got something wrong, so if this is expected to work, please let me know and I'll keep trying.

after all, since Vlan6 does not interfere with other vlans, it is best to put ALL vlans in br-filtering and then

option vlan '6'
list ports 'eth0:t'

after this change, paste your complete config/network file with redacted pppoe credentials

@NPeca75 since eth0 is not a part of the LAN bridge, it does not come up as an option to select tagged/untagged/primary in LuCI. The only reason VLAN6 is used is because that's the requirement provided by the ISP.

Are you sure about this suggestion?

yes, i am sure
it is best to deal with vlans on one place
in your case, br-filtering

put eth0 in bridge with other ethX

when you finish
paste output from SSH

bridge vlan

I realised that I made an error while testing configuring OpenWRT with separate VLAN devices per physical port, then using one bridge per VLAN to bring those together (I left all devices plugged into the dumb switch). I'll try that again.

@NPeca75 I see what you mean now, thanks for the suggestion. I'm happy to try this, but also curious... does this mean that when using VLANs, that none of the ports can use physically separate infrastructure? I would prefer to maintain that physical separation where possible.

In the meantime, I ran the command you suggested, just to see what the output would be beforehand:

root@OpenWrt:~# bridge vlan
-ash: bridge: not found

no, please dont mix per port and br-filtering

first, make sure that vlans work with br-filterng
then you could make your life hard if you want to

about ssh

ip-bridge - 6.3.0-1
ip-full - 6.3.0-1
ip-tiny - 6.3.0-1

without bridge, br-vlan-filtering will not work if i am right
so, make sure these packages are installed

Make sure that no VLAN tags are passed into the unmanaged switch. There can only be a single network on that switch -- it must be untagged.

Let's see the latest config.

This is actually not required in this case -- the ports on the network card are not part of a switch.

Separate bridges per vlan looks like this:

config device
   option name br-vlan10
   option type 'bridge'
   list ports 'eth1.10'
   list ports 'eth2'

config device
    option name 'br-vlan44'
    option type 'bridge'
    list ports 'eth1.44'
    list ports 'eth3'

config interface 'lan'
    option device 'br-vlan10'
    ....

config interface 'guest'
    option device 'br-vlan44'
    ....

Here eth1 is a trunk port, eth2 is access to vlan 10 (associated with the lan network), and eth3 is access to vlan 44 (associated with the guest network). The names of the bridges are completely arbitrary but should have something related to the function or the VLAN tag number in the name to help you keep track.

Also under this paradigm it's possible to do really weird things like rewrite the VLAN number as a packet traverses between trunk ports, but that will be confusing.

This will not work on routers that have a switch. It only works on x86 and other boards that have each port independently linked to the CPU. A multi-port Ethernet card is not a switch. The PHY ports are connected to the CPU through some sort of PCI expander chip which makes them enumerate as completely independent NICs to the kernel.

3 Likes