VLANs, trunks, port tagging, oh my

I've been trying to delve into more complex setups with OpenWrt to finally manage my home as it was meant to be. However VLAN setups are still too mysterious for me, and I could use some advice, guidance, or just being told if I'm stupid.

My current network setup is a bit complex. For gateway between WAN and LAN, I'm running OpenWrt in a VM, in Proxmox, on an Odroid H2. The first ethernet is directly PCI passthrough'd to the VM (this is WAN, and done to avoid the Proxmox host directly accessing the internet), while the second port is part of a simple Linux bridge.

This second ethernet is directly hooked up to a Belkin RT3200, also running OpenWrt, currently in a dumb AP setup (all ethernets bridged, including wan, with both a DHCP and DHCPv6 clients set up, and WLANs added to it). To this device I have two end devices connecting (a PS4 and an Xbox), as well as another RT3200, and a Synology RT2600ac, both in similar dumb AP setups. The RT3200 has two wired clients, as well as the Syno, plus the wireless APs set up.

Ideally I'd like to create four networks in total:

  • MAIN - the main network as it is right now. 10.0.0.1/23 address range, with DHCP defined on the second 24 segment, and all static devices (routers, NAS, Proxmox box, VMs in Proxmox) are on the first segment with static IPs
  • GUEST - this would go under 10.0.2.1/24, without access to any other networks, but with WAN access. Guests would use this.
  • IOT - this is 10.0.3.1/24, no internet access, and can only access one host on MAIN, my HoneAssistant instance. Any local-only iot devices would connect to this network.
  • VPN - to no surprise, this would be 10.0.4.1/24, and devices on this network should have access to MAIN, but instead of connecting to the internet through WAN, they'd use a WireGuard instance

What I was hoping for that I could make VLAN work in a way that all ports on the RT3200's would act as what I understand trunk is, while still leaving the end clients directly connected to MAIN only, on any port. If I understand things correctly, this would be a combined tagged/untagged situation, where any packet untagged would be going to MAIN, packets tagged with 1 would go to to GUEST, 2 to IOT, 3 to VPN, and so on, should I add any further VLANs in the future.

I've tried setting this up - on the gateway I created a VLAN with ID 1 on br-lan, assigned this to a new bridge br-guest on top of this, created an interface with static IP protocol, set up the DHCP server, allowed DHCP and DNS firewall rules from the new guest zone, disabled forwarding to LAN. Then did the same on the RT3200, except the guest interface there is a DHCP client, and there's no firewall. Then created a new wireless AP, assigned it to br-guest, saved the config, and tried to test it.

guest on the RT3200 gets an IP address (though at first it took quite some time), however clients over WiFi are unable to get addresses - I can see in logs of dnsmasq that DHCPDISCOVER and DHCPOFFER events occur, but no DHCPREQUEST or DHCPACK.

Where am I going wrong with this setup?

The VM is running OpenWrt 2021.2.0-rc4, while the RT3200 APs are running recent Snapshot versions (r17217).

DISCLAIMER: I'm too lazy to VLAN - I just plug in more routers. Real LANs. More redundancy. Less downtime for other users. Easier to troubleshoot. Keep it real.

Well, one of the first things to point out is that the default VLAN is, in fact, 1. Creating one explicitly breaks that assumption.

Well, almost two weeks after posting this, and two more weeks struggling before the post, I've finally cracked this nut.

My first mistake was assuming that by creating a bridge with all the ports, and then defining the VLANs on top of that will work. I mean, it did, to an extent - wired devices could get DHCP, and communicate with each other, but bridging that with a wireless AP did not. Devices connecting to that wireless network received no IP, static IP didn't work either.

The breakthrough was thanks to the Netgear GS308T I bought two days ago. Its default config shone some light on what I've been doing wrong.

So instead of defining a bridge with all ports (say, br-lan), then creating VLANs on that (e.g. br-lan.2), this switch does it the other way around: it defines an empty bridge (in this case, switch), then creates bridge-vlan devices on top of that, defining the VLAN ID, ports, and their tagged/untagged status.

My final solution is basically the following:

  1. First define the main switch bridge, empty of course
config device 'switch'
        option name 'switch'
        option type 'bridge' 
  1. Then add the VLAN bridges
config bridge-vlan 'lan_vlan'
        option device 'switch'
        option vlan '1'
        option ports 'lan1 lan2 lan3 lan4 lan5 lan6 lan7 lan8'  

config bridge-vlan 'guest_vlan'
        option device 'switch'
        option vlan '2'
        option ports 'lan1:t lan2:t lan3:t lan4:t lan5:t lan6:t lan7:t lan8:t'

config bridge-vlan 'iot_vlan'
        option device 'switch'
        option vlan '3'
        option ports 'lan1:t lan2:t lan3:t lan4:t lan5:t lan6:t lan7:t lan8:t'

config bridge-vlan 'vpn_vlan'
        option device 'switch'
        option vlan '4'
        option ports 'lan1:t lan2:t lan3:t lan4:t lan5:t lan6:t lan7:t lan8:t'
  1. Finally create the interfaces on the appropriate VLAN devices
config interface 'lan'
        option device 'switch.1'
        option proto 'dhcp'

config interface 'guest'
        option device 'switch.2'
        option proto 'dhcp'

config interface 'iot'
        option device 'switch.3'
        option proto 'dhcp'

config interface 'vpn'
        option device 'switch.4'
        option proto 'dhcp'

config interface 'mgmt'
        option device 'switch.100'
        option proto 'dhcp'
  1. Then I just needed to set the WiFi APs to the appropriate networks (which I've already had in place, luckily)
config wifi-iface 'wifinet1'
        option device 'radio1'
        option network 'lan'
        option mode 'ap'
        option ssid 'xxxxxx'
        option encryption 'psk-mixed'
        option key 'xxxxxx'

config wifi-iface 'wifinet2'
        option device 'radio0'
        option mode 'ap'
        option ssid 'xxxxxx Guest'
        option encryption 'none'
        option network 'guest'

config wifi-iface 'wifinet3'
        option device 'radio0'
        option mode 'ap'
        option ssid 'xxxxxx_iot'
        option encryption 'psk-mixed'
        option hidden '1'
        option key 'xxxxxx'
        option network 'iot'

I believe the issue lies within LuCI, as it generates the following config if one goes by the official guides and sets up VLANs with it:

config device
        option name 'br-lan'
        option type 'bridge'
        list ports 'lan1'
        list ports 'lan2'
        list ports 'lan3'
        list ports 'lan4'
        list ports 'wan'

config device
        option type '8021q'
        option ifname 'br-lan'
        option vid '2'
        option name 'br-lan.2'

config device
        option type '8021q'
        option ifname 'br-lan'
        option vid '3'
        option name 'br-lan.3'

config device
        option type '8021q'
        option ifname 'br-lan'
        option vid '4'
        option name 'br-lan.4'

config device
        option type 'bridge'
        option name 'br-guest'
        list ports 'br-lan.2'

config device
        option type 'bridge'
        option name 'br-iot'
        list ports 'br-lan.3'

config device
        option type 'bridge'
        option name 'br-vpn'
        list ports 'br-lan.4'

config interface 'lan'
        option device 'br-lan'
        option proto 'dhcp'

config interface 'guest'
        option proto 'dhcp'
        option device 'br-guest'

config interface 'iot'
        option proto 'dhcp'
        option device 'br-iot'

config interface 'vpn'
        option proto 'dhcp'
        option device 'br-vpn'

As you can see, instead of bridge-vlan devices, it sets up 8021q type devices, which, when added to a regular bridge, apparently lose their VLAN capabilities, hence why my WiFi setup with the latter config failed miserably.

I have to note that BOTH configurations appear exactly the same on the Network - Interfaces - Devices page. Both bridge-vlan and 8021q devices show up the same way, the main bridge (switch in the first config, br-lan in the second) show the same ports assigned, and the VLAN configuration looks exactly the same as well. This, combined with the limited resources available on DSA, make it an incredibly confusing experience if one does not dig into the config files. The documentation says absolutely nothing about the correct location of assigning bridge ports.

So finally I have a config where every port is a trunk port, but defaults to the main network (which is the intended behaviour, which I correctly understood as VLAN1 being untagged, and every other VLAN tagged) if I plug a device in.

@tmomas sorry for tagging you directly, but I believe you're the right person to answer. Do you think that this would be worthy of a bug report? I'd think that LuCI generating incorrect config based on incorrect assumptions, while presenting both the working and non-working setup the same way is a big issue that would need to be addressed in the next 21.02 RC.

5 Likes

just send a link with your post/findings to the openwrt-devel mailing list...

pinging @jow (done) for his take on your findings is likely a near equivalent...

2 Likes

Thanks. I will write up a more matching description for the mailing list and send it in in the next few days.

2 Likes

Thank you for posting. I'm struggling with the new VLAN config structure as well.

LuCI is working fine for that setup here. You probably created individual VLAN devices instead of using the bridge VLAN configuration tab of the br-lan bridge device.

The official documentation you refer to is likely wrong and hasn't been updated for DSA and/or refers to doing VLANs on switch-less devices.