Trying to make sense of vlan

terrytw · October 20, 2024, 2:43am

Hi everyone, vlan in openwrt is very confusing for me. After reading through a lot, I think I understand the basic concepts of vlans, like how does tagged and untagged port work, how does PVID work etc. I have managed to make it work for my setup, but I don't really understands some details in here, so here I am, asking questions:

I have got a virtual openwrt VM in PVE, device eth0 is a PVE bridge vmbr0, devices eth1 eth2 eth3 eth4 are passthrough physical ports. eth1 is used for wan, and eth0 eth2 eth3 eth4 are used for br-lan.

I have setup vlan for this router:

It works, but I am still confused.

I never setup vlan for eth1 device, if a data frame comes from eth2 and is tagged vlan ID 1 in the process, where is the tag stripped before going out through eth1 (wan)?
How is vlan tag handled for wireless? I never configure anything, just define the wireless network to use lan interface, and it just works, why?
Why can't I use vlan 2 and 3 for guest and IoT interfaces, and continue using device br-lan for interface lan? It seems that once I start using vlan filtering, I must setup vlan ID for interface lan as well.

Thank you for the help! Really appreciate it.

psherman · October 20, 2024, 3:08am

VLANs are a layer 2 (switching) concept only. They do not apply to L3 (routing), so the VLAN tag (on a tagged interface) is stripped when frames have entered through the ethernet port.

VLANs don't apply to (normal) wifi. It can apply for less common wifi scenarios involving some mesh configurations and point-to-point radio links (these are not normal AP modes).

I'm not sue what you mean by this. Can you be more specific? In fact, it's best if you show us your config:

Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </> " button:

Remember to redact passwords, MAC addresses and any public IP addresses you may have:

ubus call system board
cat /etc/config/network

terrytw · October 20, 2024, 6:18pm

EDIT: If you are looking for answers, read bold-fonted part.

In fact, it's best if you show us your config:

Sorry I didn't do so earlier, because my network topology is quite complicated. I have removed irrelevant parts:

Main router (PVE VM)

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix ''

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'eth0'
	list ports 'eth2'
	list ports 'eth3'
	list ports 'eth4'

config interface 'lan'
	option device 'br-lan.1'
	option proto 'static'
	option ipaddr '192.168.1.1'
	option netmask '255.255.255.0'
	option ip6assign '64'
	option delegate '0'

config interface 'wan'
	option device 'eth1'
	option proto 'dhcp'
	option peerdns '0'
	list dns '127.0.0.1'

config interface 'wan6'
	option device 'eth1'
	option proto 'dhcpv6'
	option reqaddress 'try'
	option reqprefix 'auto'
	option peerdns '0'
	list dns '::1'

config interface 'guest'
	option proto 'static'
	option device 'br-lan.2'
	option ipaddr '192.168.2.1'
	option netmask '255.255.255.0'
	option delegate '0'

config interface 'IoT'
	option proto 'static'
	option device 'br-lan.3'
	option ipaddr '192.168.3.1'
	option netmask '255.255.255.0'
	option delegate '0'

config bridge-vlan
	option device 'br-lan'
	option vlan '1'
	list ports 'eth0'
	list ports 'eth2'
	list ports 'eth3:t'
	list ports 'eth4'

config bridge-vlan
	option device 'br-lan'
	option vlan '2'
	list ports 'eth0:t'
	list ports 'eth3:t'

config bridge-vlan
	option device 'br-lan'
	option vlan '3'
	list ports 'eth0:t'
	list ports 'eth3:t'

Dumb AP: (its wan is connected to Main router's eth3)

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix ''

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'lan1'
	list ports 'lan2'
	list ports 'lan3'
	list ports 'wan'

config interface 'lan'
	option device 'br-lan.1'
	option proto 'static'
	option ipaddr '192.168.1.2'
	option netmask '255.255.255.0'
	option gateway '192.168.1.1'
	list dns '192.168.1.1'
	option delegate '0'

config bridge-vlan
	option device 'br-lan'
	option vlan '1'
	list ports 'lan1'
	list ports 'lan2'
	list ports 'lan3'
	list ports 'wan:t'

config bridge-vlan
	option device 'br-lan'
	option vlan '2'
	list ports 'wan:t'

config bridge-vlan
	option device 'br-lan'
	option vlan '3'
	list ports 'wan:t'

config interface 'guest'
	option proto 'static'
	option device 'br-lan.2'
	option delegate '0'

config interface 'IoT'
	option proto 'static'
	option device 'br-lan.3'
	option delegate '0'

Thanks for the explanation! I am still a little confused:

the VLAN tag (on a tagged interface) is stripped when frames have entered through the ethernet port.

I am not sure I can follow this. On the main router, eth2 (part of br-lan) and eth1 (wan) are both physical ethernet ports that are passed through to Openwrt, if VLAN tag is stripped when a frame enters through eth2 (which is an ethernet port), then it seems to me that it is not doing anything at all because the tag gets added at eth2 and then it gets removed at eth2 as well (when entering, i.e. ingress). My understanding is that the tag should remain until it is stripped when leaving an untagged port, which leads to the problem that, if a tag 1 gets added at eth2, where does it get removed before leaving at eth1?

EDIT: I have done some more reading, and have a temporary explanation:

As shown in graph, data go from Port 0 to CPU Port which is Port 8 then goes through RAM to CPU, I guess Linux's kernel handles (or consumes) the VLAN tag, and then the data goes out from CPU to Port 8 then out through Port 3 without VLAN tag. If this is wrong, please point it out.

The frame from the switch is received by the CPU’s Ethernet controller, and the driver calls netif_receive_skb() to pass the frame to the network stack in the normal way. eth_type_trans() is called to determine the Ether Type of the frame. As part of eth_type_trans(), a check is made to see if the ingress interface is a DSA master interface, i.e. netdev_uses_dsa(). If so, tagged frames are expected. The tag protocol receiver function is then invoked on th frame. This extracts the information from the tag, and then removes the tag from the frame. If the switch ingress port is valid, the DSA slave interface is determined, and the ingress interface is updated in the skb to point to the slave device. The frame is then again passed to the network stack using netif_receive_skb(). This time the true Ether Type can be extracted from the frame, and the frame is passed on for IP processing, etc. The transmit path is similar. The slaves transmit function invokes the tagger transmit function. It inserts the switch tag, and then calls the master interface’s transmit function via dev_queue_xmit().

VLANs don't apply to (normal) wifi. It can apply for less common wifi scenarios involving some mesh configurations and point-to-point radio links (these are not normal AP modes).

Does this mean that wifi does not "see" the VLAN tag at all and send it as is? Or do wifi strip VLAN tag when sending data out?

EDIT: I may have figured this part out, according to https://openwrt.org/docs/guide-user/network/dsa/converting-to-dsa, it seems that wireless interfaces act as untagged ports.

I'm not sue what you mean by this. Can you be more specific?

Right now, on the main router, I am using br-lan.1 for interface lan, br-lan.2 for interface guest, br-lan.3 for interface IoT.

My original plan was to use br-lan for interface lan, but it doesn't work. I'm just curious why.

EDIT: I think I figured this one out too. When using DSA based openwrt, there is actually default VLAN setting enabled like this:

When I manually setup vlan filtering, the default setting is removed, so I have to configure lan interface's vlan myself.

OK at least now I can sleep well. Until someone points out that I actually got it all wrong lol.

wilsonyan · October 21, 2024, 12:14am

The basics are pretty simple, in general there's no reason for 'tagged' at all except you want to have more than one network working over a port.

Otherwise, basically no point, so basically...

untagged = one / default network on the port
tagged = several network on the port (typically referred to as a 'trunk' port)

terrytw · October 21, 2024, 12:23am

Thanks, the basics concept is pretty simple, but the Openwrt implementation is not very well documented.

Like psherman seems to mix it up by saying VLANs don't apply to wifi, but actually Openwrt seems to use wifi as an untagged port.

Anyway, there is still one thing that I am not understanding, if you can enlighten me, that would be great:

Let's assume the basic setup of one network, all lan ports are untagged, and Openwrt seems to use default PVID of 1. When a data frame enters Openwrt from my computer via a ethernet port, it should be tagged for VLAN ID 1, right? Then that data should be going to google.com, it will be forwarded to wan interface and leave the Openwrt router. I am wondering, at which point in this process does the VLAN ID 1 gets untagged?

wilsonyan · October 21, 2024, 1:14am

vlan over wifi seems like a special case, i've never had a need to go beyond having multiple networks on a radio so don't have much experience with it

There's a complex answer to that above my head, but ultimately i'd say it doesn't matter, it says egress untagged, and the sausage is made haha.

terrytw · October 21, 2024, 1:17am

and the sausage is made

Haha! I wish I can use this attitude more! My life would be happier and with less head scratching!

psherman · October 21, 2024, 5:25am

Strictly speaking, VLANs are not applicable to WiFi (in terms of standard AP - STA mode connectivity), as the 802.11 standard doesn't have a provision for tagging. That is what I was referring to.

As far as OpenWrt's approach, you connect an SSID with a network. Yes, it is, for all practical purposes, an untagged 'port', but really the SSID connects to the bridge, and there is never a mechanism to make the SSID/WiFi tagged, anyway.

(when I refer to the fact that standard 802.11 WiFi doesn't support VLANs, I'm talking about the AP-to-end device (phones/computers/etc.) in a standard AP-STA relationship. This discussion does not include mesh/B.A.T.M.A.N/GRE-TAP type methods to transport VLANs over the radio link which are not 'general purpose' WiFi connectivity.)

I should have been more clear -- there is a bit more nuance. When there is a switch chip involved, there will be a method inside that switch chip to maintain the VLAN identities -- this may be 802.1q tags or other methods, depending on the internal architecture. The switch-CPU connection will carry the tags. Individually routed ports behave a bit differently insofar as tags are carried directly to the CPU. But the tags are really stripped at the point that it enters into the OpenWrt L2/L3 boundary because VLANs don't exist on L3. So, in the context of a bare-metal config, as soon as the frames pass into the Ethernet port and into the CPU. I probably should have said "at the CPU."

terrytw · October 21, 2024, 6:04am

Thanks for the technical detail and clarification.

So I guess my rudimentary understanding of the VLAN tag of ingress traffic being "consumed" at CPU is close enough. Also the AP part as well.

What I am trying to get at, is really just understand how data flow through VLANs in Openwrt, why it is routed to here and there, why it is allowed through or discarded, and why certain config works while others don't. Just the logics. I have never really studied the OSI model to make sense of it since my field of study and work is no where near tech.

There are a lot of questions and answers, tutorials flowing around, but not nearly enough official document to explain the Openwrt approach on VLAN, which is unfortunate, at least from my perspective.

psherman · October 21, 2024, 6:13am

The difficulty in answering is that your questions are too general… for example:

Well, one config is probably correct and the other is not. I don’t say that to be flippant, but rather to point out that if you want to know why a config doesn’t work, we would have to look at the details of that config (I.e the config files themselves) and then evaluate the syntax, structure, or possibly the dependencies on the external connections (upstream/downstream devices and their configs).

If you can provide more specific questions and config examples, that might help us dive into a substantive discussion.

Also, the Wikipedia article on 802.1q/VLANs is actually pretty good - it is still Wikipedia, so be aware of potential errors, but you don’t have to buy a networking textbook to get most of the information about how VLANs work in general and how they fit into the osi model. From there, if you have questions about how they are implemented in openwrt, you may be able to ask more targeted questions.

system · October 31, 2024, 6:13am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.