Link-Aggregation in LAN with VLANs

Ars3n · November 3, 2022, 8:56pm

Hi all,
Im looking for some advice for configuring link-aggregation (/bonding/port-channel) of 2 lan ports of my OpenWRT-Router (Linksys WRT-3200ACM / OpenWrt 21.02.3 r16554-1d4dea6d4f ) in LUCI-UI and routing multiple VLANs each with their own network-interface through the port-channel.

I googled for quite some time and read multiple threads in this forum but im just not able to wrap my head around this topic.

I installed these extra packages and i think im able to do all necessary things in Luci but im not sure.
kmod-bonding 5.4.188-1
luci-proto-bonding git-21.222.28122-085bb7c
proto-bonding 2021-04-09-1

Im trying to achieve something like this:

Would be very grateful if someone could help me with this topic.

Do i have to create a bond-interface and ensalve lan3 and lan4 and create an 802.1q devices with the bond-interface as the base?
And than create network-interfaces (each with its separate ip-nw) based on the 802.1q devices?

trendy · November 3, 2022, 10:41pm

That seems quite reasonable approach to me.

Ars3n · November 4, 2022, 7:16pm

Sadly its not working for me or im too dumb to configure it right.
Do u know if there are any other modules i need to install for it to work?

psherman · November 4, 2022, 7:24pm

I haven't ever attempted a LAG on OpenWrt, but my hunch is that you won't get any significant speeds benefit from it unless you have a lot of inter-VLAN traffic that is routed. Your WAN is likely 1Gbps max, and therefore you'll never really benefit from multiple connections for the purposes of standard WAN traffic.

However, you may find this useful if you either require redundancy or if you have lots of inter-VLAN traffic that could cause the link to saturate. In the latter case, you may be limited by the router... but if the router's switch architecture actually has multiple internal 1G links, could simplify by splitting VLANs across multiple ports/links on your router and switch.

IMO, the use case for LAG as you've drawn it is a) redundancy, or b) educational; I think speed/performance will not improve (at least that is my hypothesis). But let us know what gains you experience once you do implement it.

slh · November 4, 2022, 8:38pm

Most traditional routers just don't have enough ethernet ports (5+1, with additional constraints coming from the router's CPU ports!), making this rarely worthwhile (I even find it hard to come up a use case for LAG on 8-port managed switches, above that, sure).

Ars3n · November 5, 2022, 10:09pm

I've got it to work with the packages i mentioned in the original post solely using LUCI.
But i found this forum-thread very helpful, while low-level debugging. Especially the the answer from @mj5030 . Thx 2 u mate.
I just redid the same steps i did before in the ui. Seems like there was a hikup in openwrt at my first try.

UI step by step guide for other people with the problem:

Install the packages kmod-bonding 5.4.188-1, luci-proto-bonding git-21.222.28122-085bb7c, proto-bonding 2021-04-09-1 => two devices called "bond0" and "bonding_masters" should be created automatically
create a new interface using the protocol link-aggregation (channel-bonding)
configure the interface with an ip and subnetmask (i never used them, but they r necessary for the interface creation) (i would recommend to wait with pressing save because i've got a feeling, that there was my hikup)
go to advanced settings
check force link
choose the interfaces/devices u want to bond (enslave) (e.g. lan3 and lan4)
safe; safe and apply
go to devices and there u should find a device called "bonding-"
create new 802.1q devices as ur vlan-devices with the bond-device as there base-device
create new interfaces for ur vlan-devices
do ur typical config-stuff for interfaces
done

if ur created bond-device config in /etc/config/network just contains a name option and nothing else. give it a second try.

psherman · November 5, 2022, 10:11pm

Now that you have it physically working, can you comment on the practical benefit of doing this? Have you measured performance (I can't imagine it would be any different)? Or are you doing this for redundancy? Or is this purely for learning?

Ars3n · November 5, 2022, 10:16pm

redundancy
when working its easier for me to configure, because i dont want to lose a thought on which vlan should be on which port (tl;dr fits my network-architecture pretty well in my opinion)
im doing a lot of inter-vlan routing (i didnt test the performance benefits. and im not planing to do it)

psherman · November 5, 2022, 10:32pm

I get this point... redundancy can be a good thing if you worry about a single-point-of-failure in your cables (probably not the most likely thing to fail, but nonetheless can be useful).

I don't understand your second point...

VLANs on a LAG operate the same way as they do on a single physical link, except that you are using 2 or more physical cables to make the trunk. They fundamentally carry the same data, though... so one link or many, it's just a difference in the physical medium. You could just as easily have your VLANs on a single physical link.

Can you elaborate more on why a LAG is better than a single link WRT VLANs?

I highly doubt you will see any performance benefit at all here, unless your router actually has a higher bandwidth on the internal switch-to-CPU connection. Most devices have 1 or at most 2 internal gigabit links between the built-in switch and the CPU... one of them is often dedicated to the WAN, the other (if present) would carry your LAN/VLAN traffic. So you wouldn't gain any performance benefit.

I'm not sure why you're not planning to do any inter-VLAN routing performance measurements after asserting that this is a reason for you to use a LAG in the first place. Wouldn't it be good to know if it is actually helping (or worse, what about if it is hurting performance)?

juanitu · April 23, 2023, 10:31am

Good morning,
I am reconfiguring my network due to a house move. Previously I had my old rt-ac88u configured as Switch+AP only with vlans with the new DSA architecture and I managed to get everything working.
Due to the location of the equipment on this new floor, I will need to connect more devices, including a NAS with services for the internal network and internet, and I would like to configure two ports with LACP (trunk to the router, to which I want to configure the other two ports that you are left with another LAG for other devices in another location of the apartment).
Following the indications of @Ars3n I have configured the LACP in two ports of my router as static (as active or passive I have not been able to make it work), and two ports in the rt-ac88u with OpenWRT, the router show LAG up and both devices can ping each other.
I have also created several vlans associated with the LAG following your instructions, but I cannot correctly configure the vlans in the rest of the ports so that the traffic goes through the trunk.
I have tried adding both the LAG interface and the vlans on the LAN bridge (both show up at 2 Gigabytes) and tagging them but it doesn't work.
My technical knowledge is limited, can someone help me?
Thank you very much in advance.

dannutu · October 12, 2023, 1:31am

Hi all

I'd like to configure LACP on 2x 1Gbps links between a 20 port 1Gbps smart managed switch that supports LACP and a NanoPi R4S running OpenWrt 22.03.5 used only as a router/firewall (no WiFi). The reason I'm looking to configure LACP is that I have multiple VLANs configured on the switch and a lot of inter-VLAN traffic filtering (multiple sources to multiple destinations) via ACLs on the R4S router.

The R4s has 2 separate built-in 1Gbps Ethernet interfaces (eth0 and eth1) to be used for the LAN LACP and also an USB3.0 1Gbps Ethernet adapter (eth2) for the WAN connection.

Hardware performace is not an issue as the 6-core R4S is not CPU bound for the 2 built-in 1Gbps Ethernet interfaces (eth0 and eth1) and the USB3.0 Ethernet adapter (eth2) is never ever going to get anywhere close to its 1Gbps as the WAN connection is 250Mbit down / 25Mbit up.

I am ok to use the CLI to configure the bond interface or anything else but I am looking for the up-to-date steps for the current Openwrt version 22.03.5. Installing the luci-proto-bonding package I think is useless on 22.03.5, as no new menu entry appeared in luci (even after a reboot). After installing all of kmod-bonding, proto-bonding, luci-proto-bonding, git and mii-tool I got no devices called "bond0" and "bonding_masters" created automatically, so I'm not sure what I could do to make them show up.

Could anyone please point me to an already existing post describing the steps working on 22.03, or maybe describe the steps here if it's not too much trouble? Cheers!

_bernd · October 15, 2023, 9:07am

as a first step try to configure it manually like with this brief explanation: http://www.uni-koeln.de/~pbogusze/posts/LACP_configuration_using_iproute2.html

If you have not done so already install the ip-full package to get a complete iproute2. And the related bond modules. If that works have another look how to use UCI/Luci.

dannutu · October 15, 2023, 10:31am

Thanks a lot for your help. I had a little bit of time yesterday and I figured I was missing the ip-full package, I had only the "lite" built-in one. Once I installed ip-full I could at least see the bond0 (linux) interface. But then, once I configured the 2 onboard gigabit Ethernet interfaces as slaves in bond0 and I set br-lan to (only) bond0 (as I have no other Ethernet ports, except the USB3.0 adapter for WAN) I promptly lost access to the R4S (and yes, the corresponding ports on the managed switch were configured for 802.3ad LACP).

Unfortunately my hardware (case) "mod" to add the USB2.0 and serial connectors to the R4S went only ~86% right, in the sense that the 4 wires for USB2.0 are ok and the GND and TX wires for the serial are also working, but not the RX, it probably got disconnected while I closed the case (the fact that the case doesn't accommodate the DuPont connectors is very, very annoying, I had to replace the plastic casing with some shrink tubing and then bend the connectors at around 45 degrees, which explains why the RX was probably disconnected). Which means that I can see the console messages but I cannot type anything in the shell Which further means that for each bond configuration change I try I have to wait some 90 seconds for Luci to revert it... this got annoying very quickly.

Since then I repurposed the USB3.0 Ethernet adapter as LAN instead of WAN so I can stay connected, and I also dug up an extra USB TTL serial adapter so I can set up a fully working serial console, too, just in case. I hope to have some more time this evening to have another go at fixing the bonding, I've done some reading this morning on how to configure and troubleshoot it. For now I can't figure out what's wrong, it should be so simple - config the switch ports for 802.3ad LACP, load the bonding module with "modprobe bonding mode=802.3ad miimon=100", ifconfig it up, add the underlying slave eth interfaces and basta. But it just doesn't want to work, at least for now. I can see LACP v1 packets with tcpdump but I can't tell which side sends them (the switch, the router or both).

_bernd · October 15, 2023, 10:36am

what switch do you use? Look up the vendor doc about the lack related settings and there defaults to mimic that on Linux and/or try to not customize the bond on Linux and stick with defaults. Yeah try and error...
But best is to cross check with the switch vendor doc about their lack defaults... Happy hacking!

dannutu · October 15, 2023, 10:51am

D-Link DGS-1210-16. I've got it since it also runs OpenWrt - in case I get fed up with the original firmware. But its manual states: "Extensive Layer 2 Features. Implemented as complete L2 devices, these switches include functions such as IGMP snooping, port mirroring, Spanning Tree, 802.3ad LACP and Loopback Detection to enhance performance and network resiliency."

I've had a quick glance at the article you linked to above - it mentions "ip link add dev bond0 type bond is recommended to configure bonding using iproute2". Maybe that's it, as I was using ifconfig, as I found in some posts on the OpenWrt forum. I'm very tempted to try now but I have some serious house matters to take care of and I'm afraid that if I start now with this it will become a time sink / rabbit hole for the day

The switch manual lists very few parameters for link aggregation - essentially just group id, ports in each group, static or LACP, active/passive and short/long timeout, and the defaults are all sane and compatible with the default linux bonding module config.

The switch also has a CLI but I don't think it is as advanced as the IOS one to have an " sh lacp neighbor"

dannutu · October 15, 2023, 11:22am

Ok, I had a quick run at it, trying with eth0 only (while connected on br-lan via eth1).

"ip link add dev bond0 type bond" wasn't required, as the bond0 (linux) interface showed up right after "modprobe bonding miimon=100 mode=802.3ad lacp_rate=slow" (beware the typo in the article, it's miimon not mmiimon)

Then

ip link set dev eth0 down
ip link set dev bond0 down

ip link set dev eth0 master bond0

ip link set dev eth0 up
ip link set dev bond0 up

ip link set bond0 up

all went without any errors (although I'm not sure if the last command was actually required, or did anything)

ifenslave bond0 eth0
``` obviously didn't work, as there is no ifenslave executable

And, finally, /proc/net/bonding/bond0 is completely empty.

Any ideas?

dannutu · October 15, 2023, 12:10pm

Content of proc/net/bonding/bond0:

Ethernet Channel Bonding Driver: v5.10.176

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: <MAC of eth0>
Slave queue ID: 0

/sys/class/net/bond0/statistics/* files show packets being both sent and received, with no errors, of any type. No idea where to go from here

LE: I am wondering about Bonding Mode: load balancing (round-robin) - shouldn't be this 802.3ad because of modprobe bonding miimon=100 **mode=802.3ad** lacp_rate=slow?

LE2: yes, that is indeed a problem, I'm looking at the relevant kernel source code file https://github.com/torvalds/linux/blob/master/drivers/net/bonding/bond_procfs.c and the bonding module is definitely NOT running in 802.3ad mode, despite the mode=802.3ad argument. I've also tried with mode=4 and I got exactly the same behaviour. I wonder if the modprobe version on openwrt (/sbin/modprobe, symlink to /sbin/kmodloader) passes the cmdline args to the bonding module... its Usage: doesn't mention any arguments:

Usage:
        modprobe [-q] [-v] filename
        modprobe -a [-q] [-v] filename [filename...]

_bernd · October 15, 2023, 1:23pm

Chapter 3.1

As said you have to get the parameters right. Both sides have to use the same stuff obviously. You can set everything. Mode 4 is lacp. Leave out the timers.
I'm currently afk and can not try it on my setup sorry.

dannutu · October 15, 2023, 1:33pm

I think I got this, I was right just above when I wrote "I wonder if the modprobe version on openwrt (/sbin/modprobe, symlink to /sbin/kmodloader) passes the cmdline args to the bonding module... its Usage: doesn't mention any arguments" - it actually does NOT pass any arguments!

A quick search surfaced this 6.5 year old post: https://forum.openwrt.org/t/pass-parameters-to-kernel-module/5156

Used insmod with exactly the same arguments and voila! /proc/net/bonding/bond0 is now showing

Bonding Mode: IEEE 802.3ad Dynamic link aggregation

Can I just type a big fat DUHHHH! here?

dannutu · October 15, 2023, 3:57pm

Alright, so link aggregation is now working great, I've put the commands in a small script called from rc.local so that bond0 is automatically recreated at boot time, and I tested it and it does its job with no issues. Pheeewww

I'm now moving back to tackle the subject of this thread, i.e. "Link-Aggregation in LAN with VLANs"

I am a bit ashamed to admit this but I still can't figure if I should create the 802.1q interfaces using bond0 or br-lan as the base interface. I've done a fair amount on reading on this but the move to DSA makes it difficult for me to figure out which old posts are too old to still be relevant and which old posts are not that old that they are irrelevant, if you can follow what I'm trying to say here. I can't even figure out if this device (R4S) is using DSA - I reckon it doesn't, since it doesn't have a built-in switch, just a few separate Ethernet interfaces, each of them connected independently to the CPU (showing up as eth0, eth1 and eth2).

I was able to create br-lan.1 and configure it to work with the tagged ports set for LACP and just 1 other port on the switch but before going any further creating more VLANs I'd really like to understand if I should also create the rest based on br-lan (it seems counterintuitive to me, as the whole point is to separate them from the LAN), or based on bond0 (this feels more intuitive to me, however I tried and it didn't work, had to revert), or based on something else (I don't think so).

Essentially I'm struggling to paint a mental picture of what device/interface "sits on top" (or "should sit on top") on what other device/interface, in the OSI layer sense. I reckon eth0 and eth1 are at the lower level, bond0 "unifies" them, then br-lan comes over bond0 and br-lan.1 does the VLAN thing, all at layer 2. Frankly I feel br-lan is kinda useless here since it only "bridges" bond0 (and no other interface in my scenario), but I guess I'm wrong Then LAN is layer 3 with an IP address, using br-lan.1. I reckon my question comes down to: do I really need br-lan in this setup to be able to create VLANs, or can I get away without it?