I have a switch with 28 ports. I would like to interconnect 2 of the ports using LACP. The connected server has 2 network cards, which are also connected with LACP to a “bond interface”.
I can see with tcpdump that LACP packets go through this connection, but e.g. a simple ping does not arrive on the other side. Somewhere there is a parameter that blocks everything (I assume).
All ports of the switch should work like a normal switch with VLAN tagging if necessary. The services should be accessed via the 2 network connections to the server (bond switch). At the same time, the network access to the switch should be on the bond interface if possible.
What should the configuration look like? Or, what is wrong with my configuration?
~# cat /etc/config/network
config interface 'loopback'
option device 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'
config globals 'globals'
option ula_prefix 'cccc:cccc:cccc::/48'
config device 'switch'
option name 'switch'
option type 'bridge'
option macaddr 'aa:aa:aa:aa:aa:aa'
list ports 'bond-switch0'
list ports 'lan1'
list ports 'lan2'
list ports 'lan3'
list ports 'lan4'
list ports 'lan5'
list ports 'lan6'
list ports 'lan7'
list ports 'lan8'
list ports 'lan9'
list ports 'lan10'
list ports 'lan11'
list ports 'lan12'
list ports 'lan13'
list ports 'lan14'
list ports 'lan15'
list ports 'lan16'
list ports 'lan17'
list ports 'lan18'
list ports 'lan19'
list ports 'lan20'
list ports 'lan21'
list ports 'lan22'
list ports 'lan23'
list ports 'lan24'
list ports 'lan25'
list ports 'lan26'
list ports 'lan27'
list ports 'lan28'
config bridge-vlan 'lan_vlan'
option device 'switch'
option vlan '5'
list ports 'bond-switch0'
config device
option name 'switch.5'
option macaddr 'aa:aa:aa:aa:aa:aa'
config interface 'switch0'
option proto 'bonding'
option ipaddr '192.1.1.10'
option netmask '255.255.255.0'
list slaves 'lan25'
list slaves 'lan26'
option bonding_policy '802.3ad'
option min_links '1'
option ad_actor_sys_prio '65535'
option ad_select 'bandwidth'
option lacp_rate 'slow'
option xmit_hash_policy 'layer2+3'
option all_slaves_active '0'
option link_monitoring 'mii'
option miimon '300'
option downdelay '600'
option updelay '600'
option use_carrier '0'
The configuration on my server (with Debian) looks like this:
$ cat /etc/network/interfaces
auto lo
iface lo inet loopback
auto eno1
iface eno1 inet manual
auto eno2
iface eno2 inet manual
auto bond1
iface bond1 inet static
address 192.168.1.11/24
bond-slaves eno1 eno2
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2+3
Oh..., that doesn't sound good. Thanks for the information. I have two DGS-1210-28 switches because I want to connect my server redundantly. This has 4 network interfaces.
And the switches should be connected to each other with a LAG/LACP. None of this works now.
Is there anything I can do to help fix this bug?
Alternatively: Which switch with 28 ports is still supported by OpenWRT and works with LAG/LACP?
Ok, I removed "lan25", "lan26" and "bond-switch0" from the bridge "switch".
Unfortunately, it still doesn't work. The interfaces are active on both systems. But nothing arrives on the other side.
There is a network cable in port “lan25” of the switch and in network card eno1. A second network cable is in port “lan26” on the switch and in network card eno2 on the server.
lan25 and lan26 should be connected to eno1 and eno2 via LACP. The current configuration does not work. Or is there something wrong with the hardware?
switch $> ip a
27: lan25@eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond-pve01a state UP group default qlen 1000
link/ether a2:a3:f0:30:a0:a8 brd ff:ff:ff:ff:ff:ff permaddr 00:00:00:01:00:00
28: lan26@eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond-pve01a state UP group default qlen 1000
link/ether a2:a3:f0:30:a0:a8 brd ff:ff:ff:ff:ff:ff permaddr 00:00:00:01:00:00
38: bond-switch0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether a2:a3:f0:30:a0:a8 brd ff:ff:ff:ff:ff:ff
inet 192.168.6.10/24 brd 192.168.6.255 scope global bond-pve01a
valid_lft forever preferred_lft forever
inet6 fe80::a0a3:f0ff:fe30:a0a8/64 scope link
valid_lft forever preferred_lft forever
server $> ip a
4: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP group default qlen 1000
link/ether 0c:c4:7a:84:aa:20 brd ff:ff:ff:ff:ff:ff
altname enp2s0f0
altname ens3f0
5: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP group default qlen 1000
link/ether 0c:c4:7a:84:aa:20 brd ff:ff:ff:ff:ff:ff permaddr 0c:c4:7a:84:aa:21
altname enp2s0f1
altname ens3f1
87: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 0c:c4:7a:84:aa:20 brd ff:ff:ff:ff:ff:ff
inet 192.168.6.11/24 scope global bond1
valid_lft forever preferred_lft forever
inet6 fe80::ec4:7aff:fe84:aa20/64 scope link
valid_lft forever preferred_lft forever
config device 'switch'
option name 'switch'
option type 'bridge'
option macaddr 'a0:a3:f0:30:a0:90'
list ports 'bond-switch0'
list ports 'lan1'
list ports 'lan2'
list ports 'lan3'
list ports 'lan4'
list ports 'lan5'
list ports 'lan6'
list ports 'lan7'
list ports 'lan8'
list ports 'lan9'
list ports 'lan10'
list ports 'lan11'
list ports 'lan12'
list ports 'lan13'
list ports 'lan14'
list ports 'lan15'
list ports 'lan16'
list ports 'lan17'
list ports 'lan18'
list ports 'lan19'
list ports 'lan20'
list ports 'lan21'
list ports 'lan22'
list ports 'lan23'
list ports 'lan24'
list ports 'lan27'
list ports 'lan28'
I think I should be able to ping from IP 192.168.6.10 (switch) to IP 192.168.6.11 (server). But nothing arrives.
The driver has a netdevice event handler which actually calls the functions for configuring LAG hardware offloading (without DSA knowing about it). This can be seen in the kernel log:
[ 628.803920] rtl83xx_lag_add: Added port 14 to LAG 0. Members now 0000000000004000.
[ 629.736573] rtl83xx_lag_add: Added port 15 to LAG 0. Members now 000000000000c000.
In addition, the driver also implements the proper DSA methods (like port_lag_join). However, these don't actually do anything because a check in rtl83xx_lag_can_offload always returns false due to an uninitialized field ( num_lag_ids).
Without the netdevice event handler, it would just go through the CPU like you wrote. But as it is, you actually end up with a broken state where the bonding interface doesn't work properly.
(It still is possible to get a working bonding interface without hardware offloading by choosing a policy that the driver does not support, i.e. anything other than "balance-xor" and "802.3ad".)
The last time I tried this, I couldn't get the bonding interface to be part of a bridge if it has an IP address. It seems to work now, so I'm not sure if I did anything wrong back then, or if the behaviour has changed. But just to make sure, maybe check brctl show to ensure that the interface is actually part of the bridge.
In any case, it is also possible to configure bonding as a device instead of an interface without an IP address. Support for this in LuCI has been added recently, and is included in OpenWrt main branch and 24.10 (luci-proto-bonding no longer exists).
You can try, but it probably won't work because the driver is still broken. At least didn't work for me when I tried yesterday (with policy set to one that the driver tries to offload).
I have now installed OpenWRT 24.10.0rc5. The bonding packages have changed slightly and you now create a bonding device in “Network -> Intefaces -> Devices”. No IP address is required. Everything looks much better for the time being.
However, if I now create a “Bonding/Aggregation device” with the “Bonding Policy: LACP - 802.3ad”, the system has a load of approx. 20 after a restart. This is probably the effect of the previously described error in the driver.
What possibility do I still have to get a usable bonding configuration with which the switch can be used?