Support for RTL838x based managed switches

Yes, indeed, on the 838x platform using a GPIO to make a hard reset is the preferred way of performing a reset in the later GPL code dumps I saw, very likely to avoid this issue. However, I would not bet that this GPIO is always there, especially since it is not used in the earlier source drops, which were often quite specific to a particular device. My feeling is that the companies that did the board designs were fighting with the issue quite a bit, too.

1 Like

You're right that this requires the vendor to have included a reset GPIO in their design. I'm currently playing around with the watchdog timer, but I fear that will only provide the same (broken) reset. Like you note, the lock-up also happens after reset, so by then the watchdog will already be disabled...

1 Like

Awesome, thanks!
I also have a DGS-1210-26 and would like to run OpenWRT on it.

Yes, the DGS-1210-26 has no combo ports. All 24 copper ports and 2 SFP slots can be used simultaneously.

Did someone tried to create an OpenWRT image that can be flashed using stock firmware in order to avoid opening the case and break the warranty sticker?
I have found https://memo205.wordpress.com/2021/09/20/dgs-1210-28-f1-フゑームウェをパヒ/ and it appears all required information about D-Link's image format is known or included in the GPL sources.

Meanwhile I also got a DGS-1210-28 and for fun traced out its SFP I2C lines. It seems that it uses the exact same pin assignment as given by @musashino for the ApresiaLightGS120GT-SS. Only difference: There is no "LOOP" led in that device.

The reset button input and a SoC reset output (which works!) are also connected exactly the same, probably all copied from some reference design.

1 Like

What is the right way to specify the SFPs in the DTS definitions for those shared copper/fiber ports?

I tried it like this for the DGS-1210-28:

	/* SFP slot, port 25 */
	i2c0: i2c-gpio-0 {
		compatible = "i2c-gpio";
		sda-gpios = <&gpio1 6 (GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN)>;
		scl-gpios = <&gpio1 7 (GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN)>;
		i2c-gpio,delay-us = <2>;
		#address-cells = <1>;
		#size-cells = <0>;
	};

	sfp0: sfp-p25 {
		compatible = "sff,sfp";
		i2c-bus = <&i2c0>;
		mod-def0-gpio = <&gpio1 8 GPIO_ACTIVE_LOW>;
		los-gpio = <&gpio1 9 GPIO_ACTIVE_HIGH>;
	};

Of course also enable the rtl8231:

&gpio1 {
	status = "okay";
};

And then replace the

		SWITCH_PORT(24, 25, qsgmii)

entry with a

		port@24 {
			reg = <24>;
			label = "lan25";
			phy-mode = "qsgmii";
			sfp = <&sfp0>;
		};

and this makes for example querying the eeprom/diagnostic data with ethtool -m lan25 possible.

But then I can get neither copper nor fiber links to work. I can do something with setting the port it to either mii or fibre using ethtool -s (there is also the option tp, which does nothing):

root@OpenWrt:/etc/config# ethtool -s lan25 port fibre
[11698.006650] rtl8380_rtl8214fc_media_set: port 24, set_fibre: 1
[11698.024069] Current media 3
[11698.027288] Powering off COPPER
[11698.057640] RTL8380 Link change: status: 1, ports 1000000
root@OpenWrt:/etc/config# ethtool -s lan25 port mii
[11698.088529] Powering on FIBRE
[11704.054310] rtl8380_rtl8214fc_media_set: port 24, set_fibre: 0
[11704.075065] Current media 1
[11704.078280] Powering off FIBRE
[11704.129827] Powering on COPPER
[11706.796739] RTL8380 Link change: status: 1, ports 1000000

This also enables/disable the SFP module and the copper port, their link goes on/off on the other side of the fibre or copper cables - so there is light/power on the links.

What is the right definition for the shared copper + SFP ports?

Or is this simply not yet implemented, and putting an sfp node in there confuses the programmed logic? Without specifying them using ethtool -s lan25 port fibre or mii works as intended.

You'd need a phy driver with support for the SFP port. And the port entry int the DTS should point to this phy which then points to the SFP slot. I don't know if this is implemented for this realtek phy?

I was wondering if Open vSwitch is able to use the switch hardware in these devices or if it's limited to software (i.e. the weak CPU). I'm suspecting the answer is no, it's probably too soon to discuss this when hardware support is still not 100% done, but I think it's an important thing to have eventually

My understanding is that this relies on powerful CPUs or at least OpenFlow compatible hardware. A target device could be one with an RTL931x, which supports OpenFlow and there are devices with 1GB DRAM and a decent CPU-performance. Sebastian and I have been able to squeze the first ping out of such a device a couple of days ago.

1 Like

If it doesnt exist already then it can be simply implemented using the generic SFP ops, you just want to validate the modes modules support against the PHY and then the generic OPS will attach the SFP to the upstream ethernet device.
But if you need to reconfigure the PHY into fiber mode you can do that as well upon insert, but note that this requires the SFP cage to be compatible with the generic SFP bus driver meaning it needs I2C and GPIOS-s

Afaik OpenFlow is the name of the management protocol, There are "OpenFlow" switches that provide that, but it is just an abstract API provided by the management CPU. The backend ASIC can be very different.

OpenvSwitch is an opensource software and kernel driver for Linux to do software-defined networking that supports this protocol, but it has "hardware accelerated data paths" too.
I did some digging and according to this https://hareshkhandelwal.blog/2020/03/11/lets-understand-the-openvswitch-hardware-offload/
If you enable the hardware acceleration options the first packets hit the CPU but after it has figured out where each packet stream belongs it uses "TC flower utility" to manage the switch hardware to handle the rest.

Once all flows pertaining to a particular traffic stream are formed, ovs will use TC flower utility to push and program these flows on NIC hardware.
...
Ovs uses TC data path when hw-offload flag is enabled. TC flower is an iproute2 utility and used to write data path flows on hardware.

You were talking about using tc flower to test switching hardware offload in this PR https://github.com/openwrt/openwrt/pull/4535#issuecomment-917434775 (that was merged a few hours ago, btw)

Does this mean that we should get the same hardware acceleration on OpenvSwitch too without additional work?

The RTL931x have OpenFlow hardware compatibility options. They can be switched on for the forwarding database and the Packet Inspection Engine. I am not sure what they actually do, but I stumbled over them in the past days working on the RTL931x port of OpenWRT.

Anyay: tc flower support is in OpenWRT master (5.10 testing kernel) since this morning. All of it is offloaded using the Packent Inspection Engine of the SoCs. Support is there for the 838x and later with more complex filters being possible on the newer SoCs. In principle hundreds of filter rules are supported with several filters stacked on top of each other.

But not always are all combinations possible. The boundary conditions are very complex and differ from SoC to SoC and I only wrote a very simple solution finder making use of the templates that are predefined in hardware.

The PIE works by having match-fields (against packet properties such as IP-prefixes, ports, packet types, protocols), some logic conditions to combine with other rules, and action fields (drop, re-route, mirror to a different port, trap to CPU, change VLAN-tags, apply priority rules...). All SoCs have limitations as to what combination of properties can be matched without resorting to chaining rules (which is not implemented yet). And the 838x has a limitation on the size (in bytes!) of the total list of actions, while newer SoCs can combine all types of actions. The PIE memory is a three-way memory with match masks, match templates and packet data, where different areas of the memory allow different (even programmable) templates. Programming templates of your choosing and value ranges are not implemented. You should get an error if you try something that is too complex.

The PIE is an extremely powerful but also complex beast, and it will need someone who knows how to implement algorithmic solution finding and optimization to make it work with all its features.

To try out make sure you have all the tc flower kernel modules in your image plus the tc command.

tc qdisc add dev eth0 ingress # Allways neede

tc filter add dev eth0 protocol ip parent ffff: handle 2 flower src_ip 192.168.2.150 skip_sw action drop

tc filter add dev eth0 protocol ip parent ffff: handle 2 flower src_mac 24:5e:be:50:cb:98 skip_sw action drop

tc filter show dev eth0 ingress
tc filter del dev eth0 ingress
tc -s filter show dev eth0 ingress

# Redirection
tc filter add dev eth0 protocol ip parent ffff: handle 2 flower src_ip 192.168.2.150 skip_sw action mirred egress redirect dev lan3

tc filter add dev eth0 protocol ip parent ffff: handle 2 flower src_ip 192.168.2.150 skip_sw action mirred egress mirror dev lan3

# VLAN
tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_id 100 skip_sw action drop

tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_id 100 skip_sw action vlan pop

tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_id 100 skip_sw action vlan push id 200

tc filter add dev eth0 protocol ip parent ffff: handle 2 flower dst_ip 192.168.2.150 skip_sw action vlan push id 100

Some modules you might need:

root@OpenWrt:/# lsmod
act_bpf                 3746  0
act_connmark            2978  0
act_csum                6658  0
act_ctinfo              4706  0
act_gact                2914  0
act_ipt                 4098  0
act_mirred              4962  0
act_pedit               5122  0
act_police              4802  0
act_simple              2530  0
act_skbedit             3874  0
act_vlan                3618  0
cls_basic               4306  0
cls_bpf                 6770  0
cls_flow                6898  0
cls_flower             26258  0
cls_fw                  5106  0
cls_matchall            4658  0 

As a general rules: If you get a file not found error after a tc command you are missing a module.

3 Likes

Hi there!

with the current master and the 5.10 kernel, I have some trouble with large outgoing packets. It seems packets larger than 1496 bytes originating from the CPU are not sent out. Switched packets between outside ports are not affected, switching seems to work fine, at least with the bugfixes of PR #4535 included.

These outgoing packets are visible in tcpdump -i switch.vlanid but don't actually seem to go to wire. Here are two dumps, one from the switch:

root@OpenWrt:~# tcpdump -i switch.100 -n
[ 3159.589614] device switch.100 entered promiscuous mode
[ 3159.595488] device switch entered promiscuous mode
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on switch.100, link-type EN10MB (Ethernet), capture size 262144 bytes
22:51:17.620667 IP 10.1.1.100 > 10.1.0.102: ICMP echo request, id 69, seq 1, length 1476
22:51:17.621103 IP 10.1.0.102 > 10.1.1.100: ICMP echo reply, id 69, seq 1, length 1476
22:51:18.621795 IP 10.1.1.100 > 10.1.0.102: ICMP echo request, id 69, seq 2, length 1476
22:51:18.622173 IP 10.1.0.102 > 10.1.1.100: ICMP echo reply, id 69, seq 2, length 1476
22:51:19.622698 IP 10.1.1.100 > 10.1.0.102: ICMP echo request, id 69, seq 3, length 1476
22:51:19.623085 IP 10.1.0.102 > 10.1.1.100: ICMP echo reply, id 69, seq 3, length 1476
22:51:24.628064 ARP, Request who-has 10.1.0.1 tell 10.1.0.102, length 28
22:51:24.628624 ARP, Reply 10.1.0.1 is-at 4a:17:f1:89:2f:c2, length 42
22:51:26.775001 IP 10.1.1.100 > 10.1.0.102: ICMP echo request, id 70, seq 1, length 1477
22:51:26.775437 IP 10.1.0.102 > 10.1.1.100: ICMP echo reply, id 70, seq 1, length 1477
22:51:27.786844 IP 10.1.1.100 > 10.1.0.102: ICMP echo request, id 70, seq 2, length 1477
22:51:27.787219 IP 10.1.0.102 > 10.1.1.100: ICMP echo reply, id 70, seq 2, length 1477
22:51:28.810738 IP 10.1.1.100 > 10.1.0.102: ICMP echo request, id 70, seq 3, length 1477
22:51:28.811113 IP 10.1.0.102 > 10.1.1.100: ICMP echo reply, id 70, seq 3, length 1477

and from the router (10.1.0.1) in-between the switch and my desktop:

root@router:~# tcpdump -i eth0.100 host 10.1.0.102 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0.100, link-type EN10MB (Ethernet), capture size 262144 bytes
00:51:17.616106 IP 10.1.1.100 > 10.1.0.102: ICMP echo request, id 69, seq 1, length 1476
00:51:17.617416 IP 10.1.0.102 > 10.1.1.100: ICMP echo reply, id 69, seq 1, length 1476
00:51:18.617245 IP 10.1.1.100 > 10.1.0.102: ICMP echo request, id 69, seq 2, length 1476
00:51:18.618372 IP 10.1.0.102 > 10.1.1.100: ICMP echo reply, id 69, seq 2, length 1476
00:51:19.618132 IP 10.1.1.100 > 10.1.0.102: ICMP echo request, id 69, seq 3, length 1476
00:51:19.619246 IP 10.1.0.102 > 10.1.1.100: ICMP echo reply, id 69, seq 3, length 1476
00:51:24.624110 ARP, Request who-has 10.1.0.1 tell 10.1.0.102, length 46
00:51:24.624271 ARP, Reply 10.1.0.1 is-at 4a:17:f1:89:2f:c2, length 28
00:51:26.770443 IP 10.1.1.100 > 10.1.0.102: ICMP echo request, id 70, seq 1, length 1477
00:51:27.782251 IP 10.1.1.100 > 10.1.0.102: ICMP echo request, id 70, seq 2, length 1477
00:51:28.806175 IP 10.1.1.100 > 10.1.0.102: ICMP echo request, id 70, seq 3, length 1477

(Forgive the different time-zones)

This seems to be already present before the recently merged PR #4535, I just tested a build of commit f887c93.

Any ideas? Is this another quirk of the DGS-1210 bootloader initialization?

1 Like

It the RTL93XX supported in OpenWRT at this time? We are looking at a switch with a RTL9311 chip in it, but also evaluating a model with a RTL9301 chip inside of it. What do you think about these two chips? Which of the two are currently supported and will work for basic switching today?

1 Like

This seems to be a longer standing problem. It might even be a silicon bug. The outgoing buffer in fact has a size of 1600 bytes and there is no other limit set as far as I know. Someone stated that lowering the MTU to 1400 and then up again might work. Could you try that? I'll also dig into whether there might be any other limits imposed e.g. by the rate control system.

Does not seem to help.

I have been investigating further. On the RTL93xx platform packets with sizes 1496 and larger are automatically fragmented by the NIC. I will dig futher to find any size limitations in the configuration, but there is a possibility that these 1496 are a limitation of the NIC.

Related to Paul Fertser's restart patch on the mailing list, I've written a driver for the watchdog peripheral that can also restart the system.

Driver is included only for the 5.10 kernel. Feel free to test and report back! If your device hangs on reboot, you may need to add "reboot=warm" or "reboot=soft" to the bootargs.

4 Likes

I just tested the DGS-1210-28 with the stock firmware: There I do receive back 1500 byte ping responses from the switch, also from ports with a VLAN tag (I saw ethernet frames going out with 1514 bytes without VLAN tags and 1518 bytes tagged).

However, I couldn't elicit responses with big packets from e.g. the Web-UI for a clear cut result, not sure why, but I guess the ICMP packets anyway must also originate from the CPU port and cannot be offloaded somehow.

The pings will also come from the CPU port. I didn't think it would be possible to do this kind of test on the original firmware, great you figured this out. I'll continue to dig into any settings that might limit the packet size, then.

Another datum: It occurred to me that on my Netgear GS108T v3 with OpenWrt 21.02.0 I also never noticed any issues talking to the device itself, so I just tested there as well and can also receive back packets with MTU 1500 just fine, with and without VLAN tags.