ER-X-SFP: VLANs not working properly with kernel 5.4

Hey all,

I have built and started testing PR https://github.com/openwrt/openwrt/pull/2901 and I must say that I am excited to try out the features and stability of the new kernel and underlying changes to the codebase.

Unfortunately I am having problems with VLANs. In order to really be able to test the router under load, VLAN support is needed.

Here is a reduced and redacted version of my network config

  • I have tried various combinations of config options auto '1' and force_link '1'
  • This is a freifunk setup using OLSR with multiple PtP links to neighbors (don't let the netmask 255.255.255.255 scare you).
  • This setup works will older firmwares (using swconfig) on both the ERX-SFP and many other devices (RB450g, WDR4900)
  • There are some extra defined bridges which were not needed with a switch that supported swconfig. There is no need for the router to have access to this VLANs, but the ability to switch packets between ports is very important. These interface sections are those with proto 'none'
config interface 'mgmtlan'
	option proto 'static'
	option ifname 'eth0.5 eth1.5 eth2.5 eth3'
	option delegate '0'
	option netmask '255.255.255.0'
	option ipaddr '192.168.5.1'
	option type 'bridge'

config interface 'lan'
	option proto 'static'
	option netmask '255.255.255.0'
	option ip6assign '64'
	option ip6hint 'f'
	option ipaddr '192.168.0.1'
	option ifname 'eth0.10 eth1.10 eth2.10 eth4'
	option dns '8.8.8.8 1.1.1.1'
	option type 'bridge'

# I'm not sure how to set this properly in this situation.  I have also tried it
# with this section removed.
config device 'dhcp_dev'
	option name 'eth1.10'
	option macaddr 'f0:WW:XX:YY:ZZ:d7'

config interface 'transtor'
	option proto 'none'
	option type 'bridge'
	option auto '1'
	option ifname 'eth1.11'

config interface 'service1'
	option proto 'none'
	option type 'bridge'
	option auto '1'
	option ifname 'eth1.12 eth2.12'
	option force_link '1'

config interface 'mgmtVlan50'
	option proto 'none'
	option type 'bridge'
	option auto '1'
	option ifname 'eth0.50 eth2.50'
	option force_link '1'

config interface 'olsrVlan201'
	option proto 'static'
	option type 'bridge'
	option ifname 'eth0.201'
	option ipaddr '10.1.1.201'
	option netmask '255.255.255.255'
	option ip6assign '64'
	option ip6hint '1'
	option auto '1'
	option force_link '1'

config interface 'olsrVlan202'
	option proto 'static'
	option ifname 'eth0.202'
	option ipaddr '10.1.1.202'
	option netmask '255.255.255.255'
	option ip6assign '64'
	option ip6hint '2'
	option auto  '1'
	option force_link '1'

config interface 'olsrVlan203'
	option proto 'static'
	option ifname 'eth0.203'
	option ipaddr '10.1.1.204'
	option netmask '255.255.255.255'
	option ip6assign '64'
	option ip6hint '3'
	option auto '1'
	option force_link '1'

config interface 'olsrVlan204'
	option proto 'static'
	option ifname 'eth0.204'
	option ipaddr '10.1.1.205'
	option netmask '255.255.255.255'
	option ip6assign '64'
	option ip6hint '4'
	option auto '1'
	option force_link '1'

config interface 'olsrVlan205'
	option proto 'static'
	option ifname 'eth0.205'
	option ipaddr '10.1.1.205'
	option netmask '255.255.255.255'
	option ip6assign '64'
	option ip6hint '5'
	option auto '1'
	option force_link '1'

config interface 'olsrVlan206'
	option proto 'static'
	option ifname 'eth0.206'
	option ipaddr '10.1.1.206'
	option netmask '255.255.255.255'
	option ip6assign '64'
	option ip6hint '6'
	option auto '1'
	option force_link '1'

config interface 'meshlan'
	option proto 'static'
	option ifname 'eth1.211'
	option ipaddr '10.1.1.211'
	option netmask '255.255.255.255'
	option ip6assign '64'
	option ip6hint 'b'
	option auto '1'
	option force_link '1'

config interface 'dslMgmt300'
	option proto 'none'
	option type 'bridge'
	option auto '1'
	option ifname 'eth1.300 eth2.300'
	option force_link '1'

config interface 'dlsMgmt500'
	option proto 'none'
	option type 'bridge'
	option auto '1'
	option ifname 'eth1.500 eth2.500'
	option force_link '1'

If there are any specific log files or other /sys or /proc information which would be helpful, please let me know.

Thanks again to all the developers which have brought these latest changes into openWRT.

AFAIK DSA does not support this configuration, you have to use VLAN filtering
Unfortunately OpenWrt does not have a UI for this

Using VLAN filtering can avoid the configuration of multiple bridges.
ping @Thirsty

VLAN is an known issue and DENQ is I think working on this.

IRC log:

< dengqf6> https://lore.kernel.org/netdev/E1jEB0y-0006iF-5g@rmk-PC.armlinux.org.uk/
< dengqf6> Rene__: Currently, set vlan_filtering to 1 immediately blocks all traffic in the bridge. With this patch, the 2 if statements removed, and "ds->vlan_bridge_vtu = 1;" in mt7530_setup(), it is fixed

Also someone mentioned in the openwrt mailinglist.
http://lists.infradead.org/pipermail/openwrt-devel/2020-April/022670.html

dengqf6 sended a patch to linux netdev to fix this issue: https://lore.kernel.org/netdev/20200414063408.4026-1-dqfext@gmail.com/T/#u

I did some basic testing added two VLAN ontop of a normal network, ping them and remove 1 of the VLANs.
I had no issues.

@pmelange I added extra branch to test this patch: https://github.com/vDorst/openwrt/commits/ramips-5.4%2Bfixes
Is only for testing!

I tested VLAN's on top of non-bridged interface (eth5) and on top of bridged interface (br-lan).
All seems to work as expected.

I'm building the freifunk firmware now with your new branch. I look forward to testing it tomorrow.

How did you do your test? With uci settings or like how it was described in this post Xiaomi Router 3G V2 - VLAN with DSA? Or even better, could you post your setup?

Just using plain linux tools.
Can be used in the same way on both sides of the network.

Example adding vlan 200 to br-lan

ip link add link br-lan name vlan200 type vlan id 200
ifconfig vlan200 10.10.200.1 netmask 255.255.255.0
ifconfig vlan200 up

or with this config /etc/config/network

root@OpenWrt:/# cat /etc/config/network

config interface 'loopback'
        option ifname 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fdd3:5e2f:da83::/48'

config interface 'lan'
        option type 'bridge'
        option ifname 'eth0 eth1 eth2 eth3 eth4'
        option proto 'static'
        option ipaddr '192.168.1.1'
        option netmask '255.255.255.0'
        option ip6assign '60'

config device
        option type '8021q'
        option ifname 'br-lan'
        option vid '100'
        option name 'vlan100'

config interface 'vlan100'
        option ifname 'vlan100'
        option proto 'static'
        option ipaddr '10.10.100.1'
        option netmask '255.255.255.0'

config device
        option type '8021q'
        option ifname 'eth5'
        option vid '200'
        option name 'vlan200'

config interface 'vlan200'
        option ifname 'vlan200'
        option proto 'static'
        option ipaddr '10.10.200.1'
        option netmask '255.255.255.0'

config interface 'wan'
        option ifname 'eth5'
        option proto 'dhcp'

config interface 'wan6'
        option ifname 'eth5'
        option proto 'dhcpv6'

But when turning on the vlan_filtering on the bridge, it kills all the traffic until a reboot.
echo "1" > /sys/class/net/br-lan/bridge/vlan_filtering

you can use "@lan.200" as interface too

hi @Thirsty
one question
did you try to make a bridge of TAGGED and UNTAGGED ports?

for example
trunk is on WAN, say wan.200
and i want to make a bridge to lets say... lan2
it is common practice that tagged trunk is coming into device and switch separate various vlans to untagged (access) ports

after apply this patch you pointed, vlans indeed working.
but when i try to make a bridge, devices behind OpenWRT (untagged port) get the ipv4/ipv6 address from trunk and nothing more. I tried to ping , nothing. ARP show nothing ... i see that packets are moving, rx/tx but for some reason no communication

config interface 'l2'
        option proto 'none'
        option type 'bridge'
        option ifname 'wan.200 lan2'

did you test this scenario?

Bridging untagged and tagged ports or bridging tagged ports with different VID does not work on DSA right now. You have to use VLAN filtering.

hi @LGA1150
thx for info. yes, you are right. last night after wireshark session :slight_smile: i come to same conclusion. When device behind router try to communicate, packet get properly tagged when leaving the router. But when reply come back, bridge act as a blackhole and none of them leave the untagged (access) port

well, at least, DengQ patch made tagged WAN possible

Is that a good practice? According to the Linux documentation it would appear to be managed with the bridge v command instead (of creating a virtual interface from the bridge netdev), e.g. bridge v a dev br-lan vid 200 tagged self

Yes but seems only a small part of the setup.

# Set bridge in vlan_filtering
ip link set dev br-lan type bridge vlan_filtering 1
 
# Tell the bridge to tag vlan 100 on eth0
bridge vlan add dev eth0 vid 100
# Tell the bridge that we want vlan 100 at CPU side
bridge vlan add dev br-lan self vid 100
 
# Create a interface for the vlan 100 and assign a ip.
ID=100 && ip link add link br-lan name vlan${ID} type vlan id ${ID} && ifconfig vlan${ID} 10.10.${ID}.1 netmask 255.255.255.0 && ifconfig vlan${ID} up;

now I can ping remote ip on vlan 100.

I update my branch with also a extra vlan filter patch:

Lets see if I can make that example.

for example
trunk is on WAN, say wan.200
and i want to make a bridge to lets say... lan2
it is common practice that tagged trunk is coming into device and switch separate various vlans to untagged (access) ports
#Set vlan 200 tagged on wan
bridge vlan add dev wan vid 200

# Set vlan 200 untagged on lan2
bridge vlan add dev lan2 vid 200 pvid 200 untagged
# Also using it on the CPU side
#Set vlan 200 to CPU
bridge vlan add dev br-lan self vid 200
ID=200 && ip link add link br-lan name vlan${ID} type vlan id ${ID} && ifconfig vlan${ID} 10.10.${ID}.1 netmask 255.255.255.0 && ifconfig vlan${ID} up;

Assigning an ip/mask to br-lan and managing VLAN tagging for that bridge netdev with bridge v should suffice. Adding a virtual interface on top of the bridge netdev and adding a VLAN ID seems redundant, adding complexity that could cause issues?

As I understand it, the kernel still needs to know what to do with the vlan tagged packets.
By adding a vlan interface on top of br-lan, the packet will flow to that vlan interface.

hi @Thirsty
about the patch you linked here: 778-mt7530-vlan-filtering.patch

which version of OpenWRT you use ? i tried with latest git but there is compile error

That is being taken care of through switchdev (utilized by DSA) via the bridge netdev layer [1], far as I understand from various documentation [2]

DSA directly utilizes SWITCHDEV when interfacing with the bridge layer, and more specifically with its VLAN filtering portion when configuring VLANs


[1] https://github.com/torvalds/linux/blob/master/Documentation/networking/switchdev.txt
[2] https://www.kernel.org/doc/html/latest/networking/dsa/dsa.html#switchdev

So you are saying that

If I want connected eth2 untagged to vlan id 200 and add vlan 200 tagged on eth5.
I just create untagged vlan200 on eth2, assign a ip to eth2 and add vlan200 tagged to eth5?
Like this?

# Set bridge in vlan_filtering
ip link set dev br-lan type bridge vlan_filtering 1
# Remove vid 1 eth2
bridge vlan del dev eth2 vid 1
# Set eth2 untag vlan 200
bridge vlan add dev eth2 vid 200 pvid 200 untagged master
# configure eth2
ifconfig eth2 10.10.200.1 netmask 255.255.255.0
# Remove vid 1 eth5
bridge vlan del dev eth5 vid 1
# add tagged vlan 200 on eth5
bridge vlan add dev eth5 vid 200 tagged

with tcpdump running on eth2, I see

20:33:08.662157 ARP, Request who-has 10.10.200.1 tell 10.10.200.2, length 42
20:33:09.686170 ARP, Request who-has 10.10.200.1 tell 10.10.200.2, length 42
20:33:10.710199 ARP, Request who-has 10.10.200.1 tell 10.10.200.2, length 42
20:33:11.734159 ARP, Request who-has 10.10.200.1 tell 10.10.200.2, length 42

But device is not replying.

remote ip, on eth5 vlan 200, 10.10.200.2 can ping remote ip, on eth2, 10.10.200.3
Both can't ping the device.

You need all the three patches! kernel: net: dsa: mt7530: vlan filtering., kernel: pending: net: dsa: mv88e6xxx: fix vlan setup and kernel: backport: net: dsa: mt7530: fix tagged frames pass-through in…
I am using master ad19751edc of today.