Vlan trunk debugging question

i have read several posts and tutorials, but i'm still unable to set up the typical home setup:

3 vlans for 3 AP's: lan, guest, and iot, and a trunked vlan connection between my gw (BPI-R4, which is DSA) and my dumb AP (tp-link wdr4300, which is still swconfig based).

both are running OpenWrt 24.10.0-rc3.

i'm an experienced programmer, but my network admin fu is rather limited, apparently.

the AP seems to work: the lan wifi works, and if i try to connect to the guest wifi, then i see the DHCP requests leaving in tcpdump -n -e --interface eth0 | grep -v "vlan 8":

00:22:13.927887 fe:fb:1a:xx:xx:xx > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 342: vlan 4, p 0, ethertype IPv4 (0x0800), 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fe:fb:1a:8c:da:fc, length 296

but these packets don't show up on any interfaces on the gw. i tried tcpdumping br-lan, br-lan.4, and lan1. and accordingly, no DHCP response is received by the AP, and wifi fails to connect.

question one:

is it a valid expectation of mine that a tcpdump -n -e --interface br-lan.4 on the gw should show the packets that i see on the AP being sent on the trunk?

the trunk works, kinda, because my lan (10.0.8.1, vlan 8) works through the same trunked port. what doesn't work is the two new vlans. which is, btw, baffling to me: how come one of the vlans work and the other two doesn't? i'm clearly missing here something.

my AP:

# ip route
default via 10.0.8.1 dev br-lan proto static 
10.0.2.0/24 dev br-iot proto kernel scope link src 10.0.2.2 
10.0.4.0/24 dev br-guest proto kernel scope link src 10.0.4.2 
10.0.8.0/24 dev br-lan proto kernel scope link src 10.0.8.2 
# cat /etc/config/network 

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'xxx::/48'

config switch
	option name 'switch0'
	option reset '1'
	option enable_vlan '1'

config device 'iot_dev'
	option type 'bridge'
	option name 'br-iot'
	option ports 'eth0.2'

config interface 'iot'
	option proto 'static'
	option device 'br-iot'
	option ipaddr '10.0.2.2'
	option netmask '255.255.255.0'

config device 'guest_dev'
	option type 'bridge'
	option name 'br-guest'
	list ports 'eth0.4'

config interface 'guest'
	option proto 'static'
	option device 'br-guest'
	option ipaddr '10.0.4.2'
	option netmask '255.255.255.0'

config device 'br_lan'
	option ports 'eth0.8'
	option type 'bridge'
	option name 'br-lan'

config interface 'lan'
	option proto 'static'
	option device 'br-lan'
	option ipaddr '10.0.8.2'
	option netmask '255.255.255.0'
	list dns '10.0.8.1'

config route
	option target '0.0.0.0/0'
	option gateway '10.0.8.1'

config switch_vlan
	option device 'switch0'
	option vlan '0'
	option ports '0t 1t'
	option vid '2'

config switch_vlan
	option device 'switch0'
	option vlan '0'
	option ports '0t 1t'
	option vid '4'

config switch_vlan
	option device 'switch0'
	option vlan '0'
	option ports '0t 1t 2 3 4 5'
	option vid '8'

my gw:

# cat /etc/config/network 

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'xxx::/48'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'eth1'
	list ports 'lan1'
	list ports 'lan2'
	list ports 'lan3'

config interface 'lan'
	option device 'br-lan.8'
	option proto 'static'
	option ipaddr '10.0.8.1'
	option netmask '255.255.255.0'
	option ip6assign '60'

config device
	option name 'br-wan'
	option type 'bridge'
	list ports 'wan'
	list ports 'eth2'

config device
	option name 'wan'
	option macaddr '56:e6:75:xx:xx:xx'

config device
	option name 'eth2'
	option macaddr '56:e6:75:xx:xx:xx'

config interface 'wan'
	option device 'br-wan'
	option proto 'pppoe'
	option username 'xxx'
	option password 'xxx'
	option ipv6 'auto'
	option keepalive '0 1'

config interface 'wan6'
	option device 'br-wan'
	option proto 'dhcpv6'

config device 'iot_dev'
	option type 'bridge'
	option name 'br-iot'
	option ports 'br-lan.2'

config interface 'iot'
	option proto 'static'
	option device 'br-iot'
	option ipaddr '10.0.2.1'
	option netmask '255.255.255.0'

config device 'guest_dev'
	option type 'bridge'
	option name 'br-guest'
	option ports 'br-lan.4'

config interface 'guest'
	option proto 'static'
	option device 'br-guest'
	option ipaddr '10.0.4.1'
	option netmask '255.255.255.0'

config bridge-vlan
	option device 'br-lan'
	option vlan '2'
	list ports 'eth1:t'
	list ports 'lan1:t'

config bridge-vlan
	option device 'br-lan'
	option vlan '4'
	list ports 'eth1:t'
	list ports 'lan1:t'

config bridge-vlan
	option device 'br-lan'
	option vlan '8'
	list ports 'eth1:t'
	list ports 'lan1:t'
	list ports 'lan2:u*'
	list ports 'lan3:u*'

question two

if i try to ping 10.0.4.1 (guest vlan) from my AP, then i can see the ARP who-has packets:

00:36:59.050754 c0:4a:00:xx:xx:xx > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 4, p 0, ethertype ARP (0x0806), Request who-has 10.0.4.1 tell 10.0.4.2, length 28

but no one responds to them. root cause is probably the same as in my first question.

if i delete the routes for the vlans and only leave the default one, then ping starts to work also for the vlan subnets. i guess they get through in a higher layer, routed through the default gw.

# ip route del 10.0.4.0/24
# ping 10.0.4.1
PING 10.0.4.1 (10.0.4.1): 56 data bytes
64 bytes from 10.0.4.1: seq=0 ttl=64 time=0.455 ms

ultimate goals

find a good intro material for linux networking that also contains the part of this puzzle that i'm missing.

find the right tools for debugging such issues myself.

and to run my AP in a very dumb mode, i.e. no routing on it at all. it's old hw, i'm hoping to squeeze out the most wifi performance from it.

There are lots of issues here. Let's start with the router:

Delete this:

And this:

Edit the iot network to use br-lan.2:

config interface 'iot'
	option proto 'static'
	option device 'br-lan.2'
	option ipaddr '10.0.2.1'
	option netmask '255.255.255.0'

and similarly the guest network to use br-lan.4:

config interface 'guest'
	option proto 'static'
	option device 'br-lan.4'
	option ipaddr '10.0.4.1'
	option netmask '255.255.255.0'

Restart the router and then we will move on to the AP:
The iot network should be unmanaged:

config interface 'iot'
	option proto 'none'
	option device 'br-iot'

Same with guest:

config interface 'guest'
	option proto 'none'
	option device 'br-guest'

Delete the route;

The vlan lines are wrong here... don't use 0. These should be 1, 2, 3, etc. I think you can actually directly enter the VLAN ID (and remove VID), but for now, let's do it this way:

config switch_vlan
	option device 'switch0'
	option vlan '1'
	option ports '0t 1t'
	option vid '2'

config switch_vlan
	option device 'switch0'
	option vlan '2'
	option ports '0t 1t'
	option vid '4'

config switch_vlan
	option device 'switch0'
	option vlan '3'
	option ports '0t 1t 2 3 4 5'
	option vid '8'

Reboot the AP after these changes are complete and test again.

1 Like

wow, where is the heart response...?

thank you very much @psherman, you have fixed my issue!

i think the crux of it was that the vlan entries had those zeros. not sure how. i was copying the uci calls from the luci changes view. maybe i have copy-pasted them and forgot to properly edit the ids.

but now that it works, back to the methodology: i guess you just looked at the config and saw what's broken... but what could someone less experienced have done?

how would this error in my config manifest in some logs, or how could i have otherwise identified it? i looked at the output of:

  • brctl show
  • bridge vlan show
  • ip -json -pretty -details link show br-lan
  • at the luci pages of course

i tried to:

  • arping
  • tcpdump -n -e --interface eth0 | grep -v "vlan 8"

but i couldn't find the root of the problem with my newbie-to-networking eyes.

maybe we can extract something that i could add to the openwrt wiki that would help other newcomers...

Glad to hear it!

I am very familiar with VLANs on OpenWrt and in general. VLANs can be tricky to wrap your head around at a theoretical level, which is the primary thing that I think most people struggle with. But that said, VLANs on OpenWrt are well documented -- swconfig and DSA

I think that swconfig via the GUI is much more staightforward than DSA is on the GUI. Specifically, on swconfig, one can add a VLAN and assign ports right from the web interface. This has the added benefit of helping to ensure that the physical ports are assigned as desired since in the text based configuration, there can be ambiguity about the logical-to-physical port mapping.

Conversely, DSA is a bit more tricky to setup via the web interface as it is necesary to enable bridge-VLAN filtering under devices and then add the VLANs and port assignments accordingly. The most critical issue that comes up here is the fact that most users will forget to adjust the device (in the lan interface) to br-lan.x where x is the VLAN ID they have assigned for the lan. But, one major benefit to DSA is that it is equally clear in the GUI and the text configs what physical ports are being assigned.

I stumbled across this issue as well.

Enlightment grew when I found both DHCP-query and offer by tcpdumping the vlan on my router and by wiresharking a mirror of the switch port connected to the AP, but the response not making it to the tcpdumps of the AP any more.

As I have a dozen of vlans and a couple of AP, I decided to generate the synced config for network and wireless by scipt. Maybe it takes longer to write the script than manual config, but once this is done, it ist consistent.

I also think that DSA broke the dynamic generation of vlans by hostapd - at least when I compare it to openWRT documentation and the plethora of forum threads on the case.

However, at least configuring the password per vlan by hostapd.WPA_PSK still works dynamically, albeit only for vlans properly generated in network and wireless config

FTR, i can confirm that it works: i can use just option vlan '1' and get rid of the vid options.

so, the conclusion is that even though my immediate problem has been solved (much appreciated @psherman !), but there's not much i can add to the wiki. i still don't know how i could have debugged this myself.

i suspect that the observability of openwrt needs some love (the part that converts openwrt's /etc/config/ stuff into the various config files of the components of the system). my config was basically illegal with the repeated zero vlan value, and there wasn't anything in the log about it.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.