Swconfig: enable_vlan and enable_vlan4k documentation

znark · June 29, 2022, 10:45am

On devices which are still stuck with the old swconfig system for configuring their switches, what is the actual difference between the enable_vlan and enable_vlan4k options in the switch configuration sections defined in the /etc/config/network file?

I was trying to look for the answer to this question in the documentation, but at least none of these articles explained it (or even mentioned the existence of the enable_vlan4k option, or only mentioned it in passing (in a capture from command output without explaining it)):

Documentation / User guide / Base system:
Network basics /etc/config/network – Switch configuration (legacy swconfig)

Documentation / User guide / Network / VLAN (Virtual LAN):
VLAN – Assigning VLAN IDs on VLAN-enabled switch hardware

Documentation / Technical Reference:
swconfig

So my questions are:

enable_vlan apparently enables 802.1Q VLAN bridging/filtering in the “switching fabric” of the switch hardware, but for some smaller range of VLAN IDs than the standard 1 – 4094 (12-bit ID where the values 0x000 and 0xFFF are reserved). What is this “smaller range” and how is it determined? Where does this limitation come from?
enable_vlan4k apparently enables the full 802.1Q VLAN ID range (1 – 4094)? Is this assumption correct? If it is, why does not enable_vlan (without the 4k) already do this by default? Why do we need a separate enable_vlan4k option? (Why would you ever want to use enable_vlan without also setting enable_vlan4k at the same time?)
Does enable_vlan4k always need to be paired with the enable_vlan option to work correctly or can you use it in a stand-alone fashion?

lleachii · June 29, 2022, 11:39am

Usually 0x0 - 0xf (i.e. 15 not counting VLAN 0)
Hexadecimal numbering

I thought that was clear, it was programmed. That covers a lot of your other questions.

There's also deeper reasoning on some hardware:

Do you need more than 15 VLANs?

znark · June 29, 2022, 12:33pm

Programmed by whom? (The switch ASIC vendor? The Linux kernel project? Contributors to the OpenWrt project?) Programmed in where (The switch ASIC firmware? Some kernel driver? Some logic that only lives in the codebase of OpenWrt userspace tools and scripts?) To what end / for what purpose?

For instance, it seems that on this switch...

root@OpenWrt:~# swconfig dev switch1 help
switch1: rtl8367rb(RTL8367B), ports: 8 (cpu @ 6), vlans: 4096
[...]

...with enable_vlan set to 1, you can only use VLAN IDs 1...31 unless you also set enable_vlan4k to 1, after which OpenWrt starts accepting configurations with higher-numbered VLAN IDs. But why not accept the higher-numbered VLAN IDs right from the get-go with just enable_vlan, and without needing to set this 4k configuration directive separately?

lleachii · June 29, 2022, 12:36pm

I'm guessing you didn't see the post regarding switch hardware limitations (or lack thereof) linked above; or you didn't understand its relevance to your inquires?

Also:

Are you experiencing some issue (e.g. numbering a VLAN greater than x); or is this just a general VLAN inquiry?

I wanted to just elaborate on what's relevant to your use case.

znark · June 29, 2022, 1:18pm

I did not understand its relevance to my inquiries. Please enlighten me. What I’m asking about is the allowed range of VLAN IDs, not the (maximum) number of VLANs that a particular switch ASIC/firmware implementation supports, which is a different thing.

For instance, I have a D-Link DGS-1100-16 (a managed “smart switch”) which clearly says in its UI the maximum number of VLANs that the hardware/firmware supports is 32. Yet, you can use any VLAN ID from the range of 1 – 4094 for defining those 32 VLANs (as you should.)

OpenWrt’s (swconfig’s) behavior with the built-in switch of my router (rtl8367rb(RTL8367B)) seems to be based on some different logic where it initially limits you to 1 – 31 by default (not only the number of VLANs but their IDs, too!) but then allows overriding this limit (at least the ID limit) with a special, underdocumented configuration directive (enable_vlan4k). I do not understand what this overridable initial limit is trying to accomplish. The switch seems to work well with the higher-numbered VLAN IDs so why initially disable them (for this switch) — what is the original thinking here?

My issue is, with a managed switch (any managed switch), I would normally expect to be able to set my defined 802.1Q VLANs to any VLAN ID from the range of 1 – 4094 without doing any special configuration, except enable VLAN filtering/bridging in the switch. (VLAN IDs are often assigned not in strict numeric order but according to a site-specific local scheme which may use e.g. IDs 10, 550, 620 etc. and reserve many IDs in-between for future need.)

I do not expect to be able to define 4094 separate VLANs on all hardware, that is a different thing.

(I am also interested in improving the OpenWrt documentation in the Wiki on this matter but it is difficult to improve it properly if the reasoning behind certain design choices remains unclear.)

mk24 · June 29, 2022, 2:20pm

On older hardware like the Atheros 9300 series SoC built in 10/100 switches, the hardware memory table that holds VLAN assignments is only 16 entries. Using option vlan with a number larger than 15 or 16 will not work. However it is possible to have any standard number in the VLAN tag by adding 'option vid' with the tag number.

The entire /etc/config/network file is built within the router during the first boot of OpenWrt, using hardware properties found in other files. It should select enable_vlan or enable_vlan4k based on what is appropriate for the hardware.

lleachii · June 29, 2022, 3:00pm

(i.e. in some devices the hardware memory VLAN ID register is only one ~~byte~~ nibble wide)

So @znark - this is why I asked; because you should not be having this issue; but it seems you found vid information.

znark · June 29, 2022, 3:10pm

This is all understandable, many switches have hardware/firmware limitations in the number of VLANs they allow to be configured.

Wait, so in an option vlan 'x' configuration segment that does not have a separate vid definition, what is the x?

Is it the 12-bit 802.1Q VLAN ID?
Is it some sort of an index number to the switch ASIC’s/firmware’s internal VLAN definition table that identifies a particular entry in that table (hence setting the upper limit if the table on that particular switch implementation is <4094)?
Is it something that... tries to conflate both of the above (totally different concepts) into the same number? If so, why?

lleachii · June 29, 2022, 3:34pm

I don't see how any of your options follow logic or abstraction (i.e. what computers do on each level).

I'll give you an example:

You have such a device with the hardware limits already shown to you - and your ISP requires you to use VLAN 741. OK, so most OpenWrt devices with switches use VLAN 1 (LAN) and VLAN 2 (WAN). So you can use 3-15. You set VLAN 3 to listen for tags destined for VLAN ID 741. Simple. Or you can e.g. edit the on-wire tag on WAN...to VLAN 741.

I would say:

You obviously must create an index of some kind if you need to define or configure a frame on-wire greater than 15 (0xf).

To be 100% clear:

≥ 15

(less than or equal to 15) - well, in the example being used thru this entire thread

This is why I didn't pick option 2 - I wasn't sure if you were i.) implying the developers place an artificial limitation - ii.) if you're really having an issue - or iii.) you still don't understand the hardware problem on some devices. But option 2 most closely describes what I would have selected.

This follows no concept of abstraction in a machine. I didn't even follow the logic of what the value would represent to man or machine. Maybe I missed something.

mk24 · June 29, 2022, 4:40pm

option vlan x is the index in the chip table. If it is not a chip with a 4096 entry table, x needs to be within the chip limit such as 1-15. The IEEE tag number inserted / removed from tagged packets defaults to x unless there is also option vid.

znark · June 29, 2022, 4:57pm

Why? Which one of my three options was not understandable? I am happy to clarify if you let me.

Thank you. This example suggests that you (or the the original swconfig designers) think VLAN definitions in terms of an indexed table where the index numbers are “important” and visible to the end user, and the end user needs to manage the VLAN definitions by the index numbers, assigning each definition to a numbered “slot”.

This is a bit unusual design choice, though. Normally “managed” switches (Cisco, HP ProCurve etc.) or “smart” switches (the entry-level D-Link, ZyXEL etc. models) hide such index numbers from the users as an “implementation detail” and only identify VLAN definitions by their actual (802.1Q) VLAN IDs in the configuration files, command-line interfaces, or GUIs, never exposing any internal indices or “slot numbers” to the end users.

For instance, here’s some VLAN configuration from an HP ProCurve switch (an excerpt from the configuration file):

vlan 1
   name "DEFAULT_VLAN"
   no untagged 1-10
   no ip address
   exit
vlan 1031
   name "knm_kara"
   untagged 3-8
   tagged 10
   no ip address
   exit
vlan 1050
   name "knm_sisa"
   untagged 1-2,9-10
   ip address 172.30.50.9 255.255.255.0
   exit
primary-vlan 1050
management-vlan 1050

As you can see, the VLAN definitions are based on 802.1Q VLAN identifiers only. This configuration file format does not require the user to set two IDs (an index ID and a 802.1Q VLAN ID) separately. Obviously, the switch stores the actual definitions in some indexed table deep down in its bowels, but the end user does not need to know about that — it’s an implementation detail hidden from the user.

What causes some confusion is that in OpenWrt (swconfig) appears to do the same by default (since option vlan 'x' not only stores the VLAN definition in the specified index/slot x but also sets the actual 802.1Q VLAN ID to the same number x) but this then falls apart if you try to define a VLAN (without explicitly setting the VID) using an x that is larger than the number of available slots.

This still does not explain what enable_vlan4k '1' then does.

znark · June 29, 2022, 5:41pm

Thank you, this is the clear technical explanation I was trying to extract from this discussion — also mentioning the “defaults to x” detail, which was the important missing piece.

Someone who has worked on Cisco, HP, etc. managed switches and their text-based configuration formats might quite easily assume it to be the other way around, with the x specifying primarily the 802.1q VLAN ID (since that how it even works for the low numbers due to this default initialization), instead of it primarily specifying the low-level, firmware/chip register-level, run-time storage “slot” for that VLAN definition.

Since I am still missing a definition for what enable_vlan4k does (behind the scenes) I am now assuming it is some sort of an override for cases where the chip actually reserves room for configuring “all the 802.1Q VLANs” but there’s some component between swconfig and the chip that incorrectly reports a smaller number? Correct or not?

lleachii · June 29, 2022, 8:11pm

No!

I literally said my concern was you would imply the developers limited something...and you imply it anyways?

I really can't see what you don't understand - so I think I'll stop replying here.

It's like you're purposely missing the point...

What "override"???

If a chip can only handle 15 VLANs and you want to address a VLAN on-wire (it's actual tag as seen when monitoring the Ethernet cable) greater than the number 15, something would have to keep track. I don't see why that concept is difficult.

Example: If you wanted to tag a trunk with VLAN 741 on device that only handles up to VLAN 15...the vid concept is how that's done.

(I really don't see how one doesn't understand the example of needing to use a number of 741, which is greater than 15 - knowing said example device can only handle 15 VLANs total.)

slh · June 29, 2022, 8:15pm

Hardware and driver implementations differ, News at 11:00.

Especially older designs tend to be more limited than contemporary ones.

lleachii · June 29, 2022, 8:20pm

I woulda marked this as the solution when MK said it first...lol

I woulda thought that concept was crystal clear to the OP by now.

znark · June 29, 2022, 8:42pm

Interestingly, on my specific swconfig-based device (TP-Link Archer C2 AC750 V1), with OpenWrt 21.02.3, using...

switch1: rtl8367rb(RTL8367B), ports: 8 (cpu @ 6), vlans: 4096

...as its outer, user-accessible switch, I observe the following behavior:

The default OpenWrt “factory” configuration is:

config switch
        option name 'switch1'
        option reset '1'
        option enable_vlan '1'

config switch_vlan
        option device 'switch1'
        option vlan '1'
        option ports '1 2 3 4 6t'

config switch_vlan
        option device 'switch1'
        option vlan '2'
        option ports '0 6t'

Adding explicit option vid 'x' to either of the VLAN definitions (by any ID number in place of x, low or high) and rebooting does not have any effect — the switch VLAN configuration remains unchanged when observed using swconfig dev switch1 show; traffic still flows to the original eth0.1 and eth0.2 devices.
Removing the attempted option vid 'x' lines and changing the index/ID number specified on the option vlan 'x' lines, instead, does change the switch 802.1Q VLAN ID configuration in the manner desired, but only up to the ID 31 (which is no good since the rest of the network / external switch / WAN is generally using a 802.1Q VLAN ID scheme with much higher-numbered IDs).
Adding the mysterious enable_vlan4k '1' option to the switch1 configuration removes the ≤31 VLAN index/ID limitation from option vlan 'x' lines, and makes everything work right, except for the fact that explicit option vid 'x' lines still do not have any effect (I tried just out of curiosity). But an explicit vid setup is not really required since the basic option vlan 'x' stanzas can now be used to set the higher-numbered 802.1Q VLAN IDs directly/implicitly.

This does not exactly match the information given in the above discussion about how the option vid 'x' lines should work (they don’t) but maybe this device is misconfigured in the official images somehow, and should have the enable_vlan4k '1' option in its default configuration?

lleachii · June 30, 2022, 9:45am

I've experienced no such limitation. I commonly set i.e. wan to VLAN IDs greater than 31 and set the port as tagged (or "trunked" in Cisco terms). It instantly works. I don't have to do any advanced configuration you're describing.

e.g. - as I noted:

So its clear I don't exprience that issue. But it's very difficult to understand if you're just lost on configuration or your hardware is experiencing the limitation already explained to you.

znark · June 30, 2022, 4:03pm

Since there’s some doubt on whether the configurations I tried were sane, here goes:

This is just for demonstrative purposes (and posterity in case someone owning the same device – a TP-Link Archer C2 AC750 V1 – would find this useful):

Default “factory” configuration (redacted a bit for brevity):

config device
        option name 'br-lan'
        option type 'bridge'
        list ports 'eth0.1'

config interface 'lan'
        option device 'br-lan'
        option proto 'static'
        option ipaddr '192.168.1.1'
        option netmask '255.255.255.0'
        option ip6assign '60'

config device
        option name 'eth0.2'
        option macaddr 'xx:xx:xx:xx:xx:xx'

config interface 'wan'
        option device 'eth0.2'
        option proto 'dhcp'

config interface 'wan6'
        option device 'eth0.2'
        option proto 'dhcpv6'

config switch
        option name 'switch1'
        option reset '1'
        option enable_vlan '1'

config switch_vlan
        option device 'switch1'
        option vlan '1'
        option ports '1 2 3 4 6t'

config switch_vlan
        option device 'switch1'
        option vlan '2'
        option ports '0 6t'

The above works as expected: eth0.2 (the wan device) gets an IP address via DHCP from the WAN port (port 0), which is connected to “the Internet” through an upstream managed switch.

Altered configuration #1 (showing only the relevant sections from now on):

config device
        option name 'eth0.31'
        option macaddr 'xx:xx:xx:xx:xx:xx'

config interface 'wan'
        option device 'eth0.31'
        option proto 'dhcp'

config interface 'wan6'
        option device 'eth0.31'
        option proto 'dhcpv6'

config switch
        option name 'switch1'
        option reset '1'
        option enable_vlan '1'

config switch_vlan
        option device 'switch1'
        option vlan '31'
        option ports '0 6t'

The above works as well. eth0.31 (which is now the wan device) gets an IP address via DHCP from the WAN port (port 0). In fact, the option vlan indices 1 – 31 all seem to be usable.

Altered configuration #2:

config device
        option name 'eth0.32'
        option macaddr 'xx:xx:xx:xx:xx:xx'

config interface 'wan'
        option device 'eth0.32'
        option proto 'dhcp'

config interface 'wan6'
        option device 'eth0.32'
        option proto 'dhcpv6'

config switch
        option name 'switch1'
        option reset '1'
        option enable_vlan '1'

config switch_vlan
        option device 'switch1'
        option vlan '32'
        option ports '0 6t'

The above no longer works. eth0.32 does not get an IP address and swconfig dev switch1 show only displays VLAN 1 as configured. In fact, option vlan indices above 31 all seem to be unusable.

Altered configuration #3 (adding option enable_vlan4k '1' to the switch1 configuration):

config device
        option name 'eth0.32'
        option macaddr 'xx:xx:xx:xx:xx:xx'

config interface 'wan'
        option device 'eth0.32'
        option proto 'dhcp'

config interface 'wan6'
        option device 'eth0.32'
        option proto 'dhcpv6'

config switch
        option name 'switch1'
        option reset '1'
        option enable_vlan '1'
        option enable_vlan4k '1'

config switch_vlan
        option device 'switch1'
        option vlan '32'
        option ports '0 6t'

Now the VLAN 32 works. I can freely use option vlan indices above 31 (and therefore also get the VLANs use the corresponding 802.1Q VLAN IDs by default.)

Altered configuration #4 (making the WAN port a member of 802.1Q VLAN 500 in tagged mode):

config device
        option name 'eth0.500'
        option macaddr 'xx:xx:xx:xx:xx:xx'

config interface 'wan'
        option device 'eth0.500'
        option proto 'dhcp'

config interface 'wan6'
        option device 'eth0.500'
        option proto 'dhcpv6'

config switch
        option name 'switch1'
        option reset '1'
        option enable_vlan '1'
        option enable_vlan4k '1'

config switch_vlan
        option device 'switch1'
        option vlan '500'
        option ports '0t 6t'

Note the 0t above. The upstream switch port has been configured in trunk mode with “the Internet” being provided via a tagged 802.1Q VLAN 500 (I commonly connect switches and routers in trunk mode with several VLANs carried in the trunk). Works fine, eth0.500 gets an IP address from the ISP’s DHCP server as expected.

Altered configuration #5 (trying the same with the original VLAN index 2 and an explicit option vid '500' to set the VID separately; also removing the enable_vlan4k option from the switch settings):

config device
        option name 'eth0.500'
        option macaddr 'xx:xx:xx:xx:xx:xx'

config interface 'wan'
        option device 'eth0.500'
        option proto 'dhcp'

config interface 'wan6'
        option device 'eth0.500'
        option proto 'dhcpv6'

config switch
        option name 'switch1'
        option reset '1'
        option enable_vlan '1'

config switch_vlan
        option device 'switch1'
        option vlan '2'
        option vid '500'
        option ports '0t 6t'

The above does not work – eth0.500 does not get an IP address from the tagged VLAN provided by the uplink switch. Also, swconfig dev switch1 show gives no indication of the 802.1Q VLAN ID 500 having been configured:

VLAN 1:
        info: VLAN 1: Ports: '12346t', members=005e, untag=001e, fid=0
        ports: 1 2 3 4 6t 
VLAN 2:
        info: VLAN 2: Ports: '0t6t', members=0041, untag=0000, fid=0
        ports: 0t 6t

I tried lower VID numbers as well, such as option vid '10' (configuring the upstream switch port with a matching tagged VLAN) just to see if the 802.1Q ID range itself is somehow limited in the non-vlan4k mode – but no dice; the explicit option vid setting does not seem to have effect on this switch/router.

Maybe there’s something wrong with the above simple option vid line but whatever it is, I cannot figure it out.

It is fortunate that enable_vlan4k unlocks access to 802.1Q VLAN IDs above 31 via the implicit option vlan 'x' VID setting mechanism, though, since otherwise it seems I could not have used higher-numbered 802.1Q VLAN IDs on this device at all.

So now you can probably see why I was curious about the swconfig architecture and the logic behind this whole thing.

Previously, I’ve only used OpenWrt on routers whose switch configuration is already DSA-based and where option vlan '1050' (in the bridge-vlan sections) configures a 802.1Q VLAN 1050 straight away with no extra configuration gymnastics needed on top of that. This TP-Link was a bit older gear, though, and got me wondering, especially as the default config does not include the required enable_vlan4k (so I only found out about it accidentally), and enable_vlan4k is not listed or documented among the available switch configuration options in the Wiki.

lleachii · July 1, 2022, 6:24am

@znark, I saw this linked in another thread - hope this helps and clears confusion on what the value means:

znark · July 1, 2022, 8:43am

@lleachii wrote:

Thanks, but as you can see from the non-working examples above, I am not (explicitly) setting or trying to set the PVID value, I am setting the VID / 802.1Q VLAN ID for the VLAN (which, as noted in the comment you quote, typically implicitly also sets the PVID for ingress tagging if the port is an “access port” (member of a VLAN in untagged mode), but the non-working example above with the VLAN ID 500 concerned tagged traffic on a trunk port (0t). The port (0t) in that example is not set up for untagged traffic at all, so PVID should be of no concern.)

I have been working with VLANs and managed switches for quite some time. Maybe there’s some confusion, but it is not necessarily totally at my end.