D-Link DGS-1210-28 with RTL8382 and OpenWRT 23.05.5 Network connection to a server via “bonding interface”

I have a switch with 28 ports. I would like to interconnect 2 of the ports using LACP. The connected server has 2 network cards, which are also connected with LACP to a “bond interface”.

I can see with tcpdump that LACP packets go through this connection, but e.g. a simple ping does not arrive on the other side. Somewhere there is a parameter that blocks everything (I assume).

All ports of the switch should work like a normal switch with VLAN tagging if necessary. The services should be accessed via the 2 network connections to the server (bond switch). At the same time, the network access to the switch should be on the bond interface if possible.

What should the configuration look like? Or, what is wrong with my configuration?

~# cat /etc/config/network

config interface 'loopback'
        option device 'lo' 
        option proto 'static'          
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'      
                                                    
config globals 'globals'       
        option ula_prefix 'cccc:cccc:cccc::/48'
                                                    
config device 'switch'              
        option name 'switch'  
        option type 'bridge'  
        option macaddr 'aa:aa:aa:aa:aa:aa'
        list ports 'bond-switch0'
        list ports 'lan1'
        list ports 'lan2'
        list ports 'lan3'
        list ports 'lan4'
        list ports 'lan5'
        list ports 'lan6'
        list ports 'lan7'
        list ports 'lan8'
        list ports 'lan9'
        list ports 'lan10'
        list ports 'lan11'
        list ports 'lan12'
        list ports 'lan13'
        list ports 'lan14'
        list ports 'lan15'
        list ports 'lan16'
        list ports 'lan17'
        list ports 'lan18'
        list ports 'lan19'
        list ports 'lan20'
        list ports 'lan21'
        list ports 'lan22'
        list ports 'lan23'
        list ports 'lan24'
        list ports 'lan25'
        list ports 'lan26'
        list ports 'lan27'
        list ports 'lan28'

config bridge-vlan 'lan_vlan'
        option device 'switch'
        option vlan '5'
        list ports 'bond-switch0'

config device
        option name 'switch.5'
        option macaddr 'aa:aa:aa:aa:aa:aa'

config interface 'switch0'
        option proto 'bonding'
        option ipaddr '192.1.1.10'
        option netmask '255.255.255.0'
        list slaves 'lan25'
        list slaves 'lan26'
        option bonding_policy '802.3ad'
        option min_links '1'
        option ad_actor_sys_prio '65535'
        option ad_select 'bandwidth'
        option lacp_rate 'slow'
        option xmit_hash_policy 'layer2+3'
        option all_slaves_active '0'
        option link_monitoring 'mii'
        option miimon '300'
        option downdelay '600'
        option updelay '600'
        option use_carrier '0'

The configuration on my server (with Debian) looks like this:

$ cat /etc/network/interfaces

auto lo                    
iface lo inet loopback

auto eno1
iface eno1 inet manual

auto eno2           
iface eno2 inet manual

auto bond1
iface bond1 inet static
        address 192.168.1.11/24
        bond-slaves eno1 eno2
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer2+3

Bonding and link aggregation is not offloaded on rtl838x (and with that effectively unusable).

However, this means that it should work with the appropriate network configuration. And I would like to have the right configuration for this.

And a slow connection is different from a connection that doesn't work.

Remove bonding slaves from other bridge.

The switch driver contains a broken bonding implementation. See my previous post in the RTL838x thread for details.

Oh..., that doesn't sound good. Thanks for the information. I have two DGS-1210-28 switches because I want to connect my server redundantly. This has 4 network interfaces.

And the switches should be connected to each other with a LAG/LACP. None of this works now.

Is there anything I can do to help fix this bug?

Alternatively: Which switch with 28 ports is still supported by OpenWRT and works with LAG/LACP?

Did you remove ports from the bridge? "broken implementation" means (extremely slow) implementation over CPU

Yes, I did remove the bonding port from the bridge:

list ports 'bond-switch0'

Or did you mean I should remove all ports from the bridge?

You should remove bonding slaves from the bridge, aggregation group should stay.

Ok, I removed "lan25", "lan26" and "bond-switch0" from the bridge "switch".

Unfortunately, it still doesn't work. The interfaces are active on both systems. But nothing arrives on the other side.
There is a network cable in port “lan25” of the switch and in network card eno1. A second network cable is in port “lan26” on the switch and in network card eno2 on the server.
lan25 and lan26 should be connected to eno1 and eno2 via LACP. The current configuration does not work. Or is there something wrong with the hardware?

switch $> ip a

27: lan25@eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond-pve01a state UP group default qlen 1000
    link/ether a2:a3:f0:30:a0:a8 brd ff:ff:ff:ff:ff:ff permaddr 00:00:00:01:00:00
28: lan26@eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond-pve01a state UP group default qlen 1000
    link/ether a2:a3:f0:30:a0:a8 brd ff:ff:ff:ff:ff:ff permaddr 00:00:00:01:00:00
38: bond-switch0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether a2:a3:f0:30:a0:a8 brd ff:ff:ff:ff:ff:ff
    inet 192.168.6.10/24 brd 192.168.6.255 scope global bond-pve01a
       valid_lft forever preferred_lft forever
    inet6 fe80::a0a3:f0ff:fe30:a0a8/64 scope link 
       valid_lft forever preferred_lft forever

server $> ip a

4: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP group default qlen 1000                                                                                                  
    link/ether 0c:c4:7a:84:aa:20 brd ff:ff:ff:ff:ff:ff                                                   
    altname enp2s0f0                                                                                                                                                                                              
    altname ens3f0                                                                                       
5: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP group default qlen 1000                                                                                                  
    link/ether 0c:c4:7a:84:aa:20 brd ff:ff:ff:ff:ff:ff permaddr 0c:c4:7a:84:aa:21                        
    altname enp2s0f1                                                                                                                                                                                              
    altname ens3f1
87: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 0c:c4:7a:84:aa:20 brd ff:ff:ff:ff:ff:ff
    inet 192.168.6.11/24 scope global bond1
       valid_lft forever preferred_lft forever
    inet6 fe80::ec4:7aff:fe84:aa20/64 scope link 
       valid_lft forever preferred_lft forever

I can't get any further at the moment. :frowning:

The aggregate interface has to be in the bridge - OK?

Ah, ok. But it doesn't work with that either. :frowning:

Here is the current configuration of the bridge.

config device 'switch'                                                                                                                                                                                            
        option name 'switch'                                                                                                                                                                                      
        option type 'bridge'                                                                                                                                                                                      
        option macaddr 'a0:a3:f0:30:a0:90'                                                                                                                                                                        
        list ports 'bond-switch0'                                                                                                                                                                                  
        list ports 'lan1'                                                                                                                                                                                         
        list ports 'lan2'                                                                                                                                                                                         
        list ports 'lan3'                                                                                                                                                                                         
        list ports 'lan4'                                                                                                                                                                                         
        list ports 'lan5'                                                                                                                                                                                         
        list ports 'lan6'                                                                                                                                                                                         
        list ports 'lan7'                                                                                                                                                                                         
        list ports 'lan8'                                                                                                                                                                                         
        list ports 'lan9'                                                                                                                                                                                         
        list ports 'lan10'                                                                                                                                                                                        
        list ports 'lan11'                                                                                                                                                                                        
        list ports 'lan12'                                                                                                                                                                                        
        list ports 'lan13'                                                                                                                                                                                        
        list ports 'lan14'                                                                                                                                                                                        
        list ports 'lan15'                                                                                                                                                                                        
        list ports 'lan16'                                                                                                                                                                                        
        list ports 'lan17'                                                                                                                                                                                        
        list ports 'lan18'                                                                                                                                                                                        
        list ports 'lan19'                                                                                                                                                                                        
        list ports 'lan20'                                                                                                                                                                                        
        list ports 'lan21'                                                                                                                                                                                        
        list ports 'lan22'                                                                                                                                                                                        
        list ports 'lan23'                                                                                                                                                                                        
        list ports 'lan24'                                                                                                                                                                                        
        list ports 'lan27'                                                                                                                                                                                        
        list ports 'lan28'

I think I should be able to ping from IP 192.168.6.10 (switch) to IP 192.168.6.11 (server). But nothing arrives.

Nope, that IP is useless. You should be able to ping br-lan IP from the server at other end of LAG

Unfortunately, that is not the case.

The driver has a netdevice event handler which actually calls the functions for configuring LAG hardware offloading (without DSA knowing about it). This can be seen in the kernel log:

[  628.803920] rtl83xx_lag_add: Added port 14 to LAG 0. Members now 0000000000004000.
[  629.736573] rtl83xx_lag_add: Added port 15 to LAG 0. Members now 000000000000c000.

In addition, the driver also implements the proper DSA methods (like port_lag_join). However, these don't actually do anything because a check in rtl83xx_lag_can_offload always returns false due to an uninitialized field ( num_lag_ids).

Without the netdevice event handler, it would just go through the CPU like you wrote. But as it is, you actually end up with a broken state where the bonding interface doesn't work properly.

(It still is possible to get a working bonding interface without hardware offloading by choosing a policy that the driver does not support, i.e. anything other than "balance-xor" and "802.3ad".)

The last time I tried this, I couldn't get the bonding interface to be part of a bridge if it has an IP address. It seems to work now, so I'm not sure if I did anything wrong back then, or if the behaviour has changed. But just to make sure, maybe check brctl show to ensure that the interface is actually part of the bridge.

In any case, it is also possible to configure bonding as a device instead of an interface without an IP address. Support for this in LuCI has been added recently, and is included in OpenWrt main branch and 24.10 (luci-proto-bonding no longer exists).

Setting irrelevant IP and helps.

Should I do an upgrade to OpenWRT 24.x?

Thats a good idea, backup current config first.

You can try, but it probably won't work because the driver is still broken. At least didn't work for me when I tried yesterday (with policy set to one that the driver tries to offload).

1 Like

I have now installed OpenWRT 24.10.0rc5. The bonding packages have changed slightly and you now create a bonding device in “Network -> Intefaces -> Devices”. No IP address is required. Everything looks much better for the time being.
However, if I now create a “Bonding/Aggregation device” with the “Bonding Policy: LACP - 802.3ad”, the system has a load of approx. 20 after a restart. This is probably the effect of the previously described error in the driver. :slight_smile:
What possibility do I still have to get a usable bonding configuration with which the switch can be used?