Link aggregation from managed switch

I asked this in another post but that may be a bit confusing since there was another question about something else.

I have a QNAP QSW-M408S managed 1gb/10gb switch. I run openWRT on an Odroid H2 w/netcard and wind up with 2x1gb and 4x2.5gb interfaces. The switch isn't going to negotiate 2.5gb so I want to set up Link Aggregation on 2 or 3 of the LAN side interfaces.

I tried setting up a Link Aggregation group of 3x1gb ports of the switch and bridging 3 ports to 1 LAN interface of the H2. After a few minutes the connection would drop, not quite sure why. Is this how this should be setup or is there something else I should be doing with the LAN ports of the H2 in OpenWRT?

Bridging is not the same as aggregation. You probably created a loop in your network, where the same packets traveled back and forth, untill the link saturated.

As far as I know, "UCI" does not support link aggregation, you will have to build your own scripts to configure the connection.

5 Likes

Are you saying that bonding needs to be done at the switch, and OpenWRT? I've done this before using a dumb switch and bonded interface with OpenMediaVault. This time, I was trying to bond at the switch (link aggregation) without having to do it in OpenWRT, but I'm not sure how the interfaces would be setup in openWRT.

I see there are packages available for bonding. kmod-bonding, luci-proto-bonding, and proto-bonding. I assume these would do it. Problem is the packages available in the repo require kernel 5.4. I'm using a custom snapshot with kernel 5.10 since my rtl8125 nics aren't supported in the stock images. Any way around this?

Install latest snapshot, install packages immediately afterwards.

2 Likes

rtl8125 does not work with latest snapshot.

Bonding requires a managed (at least smart-managed/ L2-) switch.

Your OpenWrt router 'can' be used for this, but that's hardly a good idea (unless you're exclusively interested in speeding up a 1:1 link between two devices) for most devices (for bonding to work well, you need enough bandwidth coming in and going out, most OpenWrt devices only have 4+1 ports and CPU ports limited to 1 GBit/s (and not enough resources to sustain that) - so in practice you really want a managed switch for bonding.

3 Likes

Thanks for the response but as explained above, I do have a managed switch, it's a QSW-M408S. I don't know anything about levels but it supports VLANs and link aggregation, LACP and static.

I don't know what "most OpenWRT devices are, I assume a consumer router....I am not using a consumer router with OpenWRT, I am using an Odroid H2 X86_64 single board computer, specifically a J4105 Celeron w/8gb RAM. It has 2 onboard 1gb NICs and a PCIE gen2x4 NVME slot that can also support a "netcard" with 4 x 2.5gbe NICs each on a single PCIE lane. I don't have any doubt that this setub can switch beyond the capacity I need it to.

To reiterate, the problem is the modem has a 2.5gbe WAN port connected to a 2.5gbe WAN port on my OpenWRT device. My switch only supports 1gbe/10gbe, not 2.5gbe. My only option to remove the bottleneck on LAN is to bond several of the NICs on OWRT @ 1gbe each. So the questions are:

If I am to set up a LAG only at the switch, how do I set up the interface for the multiple NICs in OWRT? It's suggested above it also needs to be bonded @ the OWRT device, OK then I have another issue....Current snapshots do not support the rtl8125 NICs, I needed to use a custom snapshot (found in a thread) that mismatches the kernel version in OPKG and I cannot install bonding kernel modules. I don't really know how to build OWRT and would rather not unless I have to.

1 Like

If you cannot create an image with the required drivers for both your hardware and bondig support, i do not know how to do this.

1 Like

Yes, you'll have to configure the managed switch AND the linux box it is talking to with compatible parameters (f.e. LACP 802.3ad, see your man pages for details). There are 2 ways in linux: kernel bonding driver or kernel team driver + teamd userspace daemon. I don't know if owrt supports the team thing though.

To get the aggregated bandwidth fully used, you'll need multiple simultaneous parallel connections to the box.

1 Like

OK, this was tricky but I think I got it working. The fella that posted the OpenWRT image for 19.07.7 re-uploaded a working image so I moved to that. The Kmod-bonding package threw errors about kernel version but I'm not sure why since it was the same kernel, but --force-depends got it going. So FYI, I read up in the forum and wound up with the following:

/etc/rc.local

  GNU nano 5.6.1                                                                                      /etc/rc.local                                                                                                
# Put your custom commands here that should be executed once
# the system init finished. By default this file does nothing.

modprobe bonding mode=balance-rr miimon=100
ifconfig bond0 192.168.2.1 netmask 255.255.255.0 up
ip link set eth1 master bond0
ip link set eth2 master bond0
ip link set eth3 master bond0

exit 0

in /etc/config/network


config interface 'loopback'
        option ifname 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fd9e:4b25:870b::/48'

config interface 'bond0'
        option ifname 'bond0.1'
        option ifname 'bond0.2'
        option ifname 'bond0.3'

config interface 'lan'
        option type 'bridge'
        option ifname 'bond0'
        option proto 'static'
        option ipaddr '192.168.2.1'
        option netmask '255.255.255.0'
        option ip6assign '60'
        list dns '208.67.222.222'
        list dns '208.67.220.220'

config interface 'wan'
        option ifname 'eth0'
        option proto 'dhcp'

config interface 'wan6'
        option ifname 'eth0'
        option proto 'dhcpv6'

This bonds the lan interface to 3 NICs. they're 2.5gb nics, but the switch is only supporting 1gb on each. A LAG was set at the switch on LACP for the 3 ports. I have no idea how to test this, AFAIK I'll still see a max of 1gb with a single transfer. I ran iperf3 from the desktop which is connected @ 10gb to the switch and saw 1gb to OpenWRT as expected. I also don't know if round-robin is the best mode to choose. If anyone has any tuning tips or a test I can run....

also, anyone know how I can get an ipk for luci-proto-bonding on 19.07.7? (besides building it myself lol)

3 Likes

First things first, what do you mean with "The switch isn't going to negotiate 2.5gb"?

Of course it won't do that on the 1Gb ports but you have like 4 SFP+ ports that have 10Gbit bandwith each, what about those?

The SFP+ slots per-se don't force you to run at 10Gbit, the transceiver module you install into them has 10Gbit connection to the switch but then it can negotiate whatever speed it wants on the wire.

Did you try already to buy a SFP+ module for your switch like for example this (unaffiliated, just random one that actually lists some specs) https://www.amazon.com/QSFPTEK-Transceiver-10GBASE-T-SFP-10G-T-S-UF-RJ45-10G/dp/B07QXNQTXG/ and confirmed that you can't just plug a cable that goes to one of your 2.5Gbit ports?
Here also another one of Mikrotik (a more well-known brand) from another website https://www.roc-noc.com/mikrotik/routerboard/SplusRJ10.html same story, can autonegotiate whatever it feels like on the wire.

So as long as you buy a module that says it can autoneg down to 2.5Gbit you should be fine.

Getting a module and connect a wire is the "recommended" and "best performance" and "smooth-brain-proof" choice. It should work as long as your switch does not require special transceiver modules due to stupid vendor lock limitations like for example Cisco.

I mean yeah, it will cost you half of what you paid for the Odroid, but hey, it could have been worse, you could have been stuck with bonding.



usual bonding disclaimer:
Bonding, especially done in software like this is not what most people think it is, you join the total bandwith of the network but the max bandwith per connection is still the same (you can make more connections). So for example if you bond two 1Gb connections, two applications can run at the same time using 1Gbit without hurting each other, but each can only get as much as 1Gb due to limitations in TCP protocol.

This is what you also experienced with your iperf3 test. Max speed per connection is still 1Gbit, because surprise surprise that's the size of the ports you are bonding.

Some businness switches may or may not break spec and actually pull off true bonding like people think, but that's not standard and won't work unless both switches are of the same brand, and very not the case here so let's not get into that.


For the hardware support
https://bugzilla.kernel.org/show_bug.cgi?id=208361
The support for that driver is merged in upstream Linux kernel version 5.9, OpenWrt snapshot is still on 5.4, and testing kernel 5.10. Good luck waiting on that.

Since your device's hardware specs are on better than most entry-level firewall appliance anyway, OpenWrt is not your only option.

There are pfSense or OPNSense, and you can run either of those OSes instead of OpenWrt. Both at latest versions should support also the rtl8125 NICs and still offer a web interface, documentation and a full "serious firewall" experience.
Both can do link aggregation so you can do what you wanted to do.
https://docs.netgate.com/pfsense/en/latest/interfaces/lagg.html
https://docs.opnsense.org/manual/other-interfaces.html

But what I said above about bonding limitations still applies even to them. SO yeah, you won't go faster than that.

how I can get an ipk for luci-proto-bonding on 19.07.7? (besides building it myself lol)

Luci is a bunch of scripts and text files, you can try just installing whatever is in snapshot and see if it does work.

If it does not work, then it's not a matter of building (there is nothing to build), but of backporting what works in snapshot to what Luci was using more than a year ago.

1 Like

C'mon bud, I bought a 10gb switch with SFP ports. Do you really think I chose the bonding route because I was too cheap to buy a transceiver? I have 6com transceivers on 10gb to 10gb links and they work fine. When doing this I looked at this review...
https://www.servethehome.com/sfp-to-10gbase-t-adapter-module-buyers-guide/
And purchased a Wiitek transceiver that is verified to do 2.5gb on a switch that supports it, and it does not work. If link negotiation is set on the OWRT device I get 150mb. If I turn autoneg off and set the link to 1gb I will get 1gb. The switch never mentions supporting 2.5gb and does not have any config for the SFP ports. Switches that do support "multigig" are labeled multigig.....I hoped it worked as you said, but I can confirm it does NOT.

FYI, by the time the Odroid is fitted with RAM, storage, and a netcard it costs more than the switch.....it's not about being cheap.

As explained, I understand that 1gb is the limit for a single transfer, I asked how to test that it is actually performing some benefits of the multiple links over a single.

I'm running a home network, nothing too crazy. I'm familiar with OpenWRT so as long as I CAN get things working, which it appears I have.....I'll stick with OpenWRT

have you tried iperf3 with '-P'?

1 Like

I tend to assume ignorance if not told otherwise, so many times someone did not think of the most obvious solution and went chasing butterflies.

Uhhh, that's bad. They went out of their way to lock it down then, like Cisco.

That would have been enough proof for me to return the switch and get hardware from a more honest brand, like for example I have some from Mikrotik, where a 6C-SFP-10G-T is recognized, can autoneg down to 10Mbit and is configurable too.

Most people that look into bonding I see here are can't just buy better hardware for some reason and are looking into a way to bandaid their situation.
Is that "being cheap"? Maybe. I didn't say you are being cheap though.
My mention of the cost of the SFP+ module was mostly a jab at the prices of copper 10Gbit modules, while you can get 10 Gbit fiber modules used for peanuts on ebay and such.

I kind of hinted at that

You need to run more than one process or application that bogs down the network. You should see that if you run three iperf processes at the same time each should get 1Gbit of bandwidth, for a total of 3Gbit network bandwith, since you are bonding 3x gbit ports.
This behaviour is obviously not possible if you only had a single 1Gbit port.
the other guy told you the specific option for iperf3.

1 Like

Hadn't noticed that option. Probably a valid test with the desktop as client as it's connected @ 10gb. I don't think I have bonding working right yet because I'm getting a summed total of ~1gb

Usually the LAG uses hashing either of MAC or IP addresses so to test the aggregate you'll need several different machines

3 Likes

you can also use luci-proto-bonding to get the Web UI bonding interface, and you can set it to different modes like Round robin, failover... etc ( Bonding man pages ).
and to properly load the connection you'd need to run multiple Iperf connections - assuming your Iperf "server" is at 192.168.0.15, Iperf3 can make 10 concurrent connections - look at the "SUM" row of the output to get the aggregate throughput, and that the server and client both have several bonded interfaces or equal theoretical throughput... ( no point in running iperf from H2 ( 3 NICs bonded ) to the switch ( 3 NICs bonded ) to the test server ( 1 NIC ) )

iperf3 -c 192.168.0.15 -P 10 -t 60

I've done this between 2 OpenWRT routers ( OK, x86 motherboards with many PCIe Ethernet cards - much like the H2 with the Quad NIC card ), not sure it your off the shelf switch plays along with linux Bonding tho..

1 Like

Funny you mention that. Luci-Proto-bonding is not in 19.07.7, you need to use a snapshot, then you wind up with kernel mismatches and can't install kernel modules. It's a bit more tricky here because no current version of OWRT has kernel support for my NICs, rtl8125.

I use a custom image from gitgay who also was kind enough to build the luci-proto-bonding for 19.07.7 at my request. I've tried it, the GUI stuff all works it looks like it's properly modifying the configs, yet I haven't gotten a working bonding config out of it yet, like I was unable to connec to anything. I'll try again soon.

My first iperf3 test with the bonding config that seemed to work was the ORWT router as the server and my 10gb connected desktop as the client running 3 parallel connections.

I'll be doing a "multicast" test on iperf when I get a chance. I have to look into it a bit more but you can run multiple servers against 1 client. I've got more than 3 Linux boxes connected @ 1gb or 10gb to that switch so getting several boxes running against the router won't be hard, hence the point of bonding from altogether.

makes sense- I build from Master for most of my devices