Support for RTL838x based managed switches

I believe @svanheule has some code laying around for driving the LEDs but unsure about the state of it. GS108T V3 suffers from the same problem.

I do, but it's pretty stale and depends on an MFD implementation of the SoC's switchcore:

1 Like

So, here comes finally the answer to your issue: It is an interoperablitiy issue between the RTL8218B (internal in RTL838x) SoC or external RTL8218B/D PHYs and several other PHYs.

It does not seem to be fixable, unfortunately. I complete rewrote all of the EEE code, to test this one needs to apply:

6fbc7c5001 realtek: Improve EEE suport in PHY drivers
0035dae116 realtek: Add EEE support for RTL931X
3a1ad31315 realtek: improve EEE handling
bf755b2120 realtek: Fix bugs in MMD-register access on RTL838x

from my rtl8214qf_merge branch. It brings in particular support for displaying LPI timers and uses standard MMD register r/w to setup EEE and read its status. All of this is now based on SDK 4 code, and no longer on the bits and pieces originally found in the u-boot source code. Unfortunately there is a statement in the RTL8218B driver which says:

     * For Marvell EEE IOP issue, Need to take care to RTL8218B

and then the necessary code is missing. What happens is that the Autonegotiation fails, as whitenessed by

Link partner advertised EEE link modes:  Not reported

I tested with the OEM firmware on the TP-link T2500G-10TS which has the same HW as the GS1900-10HP and see the same behaviour.

So then I did my own IOP testing:


		GS1900-10HP	XGS1210				XGS1210		Edgecore	XGS1250
		RTL8218B	RTL8218D	        RTL8226     RTL8221B	AQR113C
	
RTL8218B	OK		OK				    OK		    OK		    No Link
										
RTL8218D	OK		OK				    OK		    OK		    No Link

RTL8221B	OK		OK								            OK
												                2.5G not rep

RTL8226		OK		OK				    OK

AQR113C	    No link. No link. 			OK		    OK
										            2.5G not rep

AQC100	  Unplug: req.	  Unplug: req.	 OK				OK
  		  remote on/off	  remote on/off  2.5G not rep			2.5/5G not rep
		  Initially OK     Initially OK

Edimax EU-4307
R8169		OK		OK				    OK
no EEE API in 5.10						2.5G not rep
but default is on								

Columns: Switch with OpenWRT and the above patches.
Lines: Variuous connection partners with OEM firmware or a desktop Linux. AQC100 sits on an Intel PCI module in my Desktop. The EU-4307 is a USB adapter. Both are on debian testing with 5.10.

It looks like Realtek is compatible with itself. The AQC100 from Marvel does not like the RTL8218B/D, it requires disabling/enabling it whenever the cable is unplugged as you reported for the Pi.
The AQR113C is having trouble with OpenWRT. Note that we are using the 5.10 (5.18 does not change this) default driver. We in principle have much improved SDK code available to fix this. There is very poor EEE support in Linux for the 2.5/5GBit modes up to current kernels.

1 Like

did some limited testing on my Netgear GS108Tv3 only. cherry-picked the listed commits to current master, ut ended up just copying the phy driver from your branch instead of fighting all the conflicts.

I only have remote access at the moment so I can unplug cables. And I also cannot reboot any of the already connected devices (one is my server/router and the other is the PoE switch powering the Netgear). But the tests I've done look good. I'm unable to get into the state where the reported link partner modes are wrong/missing.

And whatever you did seems to have fixed the bug related to --show-eee on lan1 on the GS108Tv3. I can now do that without breaking networking.

This looks very good to me so far.

4 Likes

Hm, that doesn't look as if LED support for the port LEDs was just a missing DTS entry...I don't quite understand why the LEDs are not hardware-controlled on this device, is there no option to enable hardware control apart from your WiP-branch?

The control registers for the port LEDs are in the switchcore, and since that's a terrible mess of registers it's difficult/impossible to write an independent driver for it.

One possible alternative, is to modify the bootcmd in the u-boot-env. If you prefix that with rtk network on, then the switch will always be initialised by the bootloader, thus enabling the port LEDs.
For example, if bootcmd=boota:

setenv bootcmd rtk network on\; boota

Or the equivalent operation to set bootcmd from OpenWrt.

1 Like

Would/should this work on the GS108T v3 too?

Seems to me it should work on any device. As long as rtk network on initialises the port LEDs, they will behave the same when Linux is running. Our drivers don't touch the relevant registers, at least not on rtl838x and rtl839x IIRC.

1 Like

Just tested, and indeed it does bring the LEDs to life. For that the u-boot env partition needs to be writable though :stuck_out_tongue:. Sent in a patch for that :slight_smile:.

Is this just for the port LEDs? So the GS308T and GS310TP dts are still missing support for the status LED. The LED code for the GS110TPP may work for the GS308T and GS310TP status LED. May have time to try that this weekend.

I already did that for the GS308T (read a few posts back) and created a PR! I will try the U-Boot hack for the port LEDs this weekend.

Ref: https://github.com/openwrt/openwrt/pull/10065

Maybe a bit of clarification: for the RTL839x and RTL93xx we do actually have configuration routines to set up automatic handling of the port-leds in the network driver. They are all quite similar. The following function exists for all these 3 SoC families:

What it does is to take up to 4 magical configuration values (led_setX) from the .dts, which define how a particular set of LEDs is accessed serially through an RTL8231. Then each port can be made part of a LED-set via e.g.:

	led-set = <0>;

Additional information like the type of PHY is taken from the PHY configuration used to define all fiber ports: RTL930X_LED_PORT_FIB_SET_SEL_CTRL. This only allows to set up automatic LED handling steered by the SoC polling the PHYs via MDIO. The SoC is able to associate up to 4 LEDs with a port which are used to code the type of link, e.g. 100MBit/1G/2.5G/5G and 10G (normally multiple LEDs are used for one link type, for example on the XGS1250, 5G is pink, which is blue plus orange, while 100M is orange alone and 10G is blue.

This configuration happens also on the RTL839x. The values from u-boot are overwritten, but with exactly the same values u-boot also uses.

The RTL838x has a slightly different way of doing things, but the general way of working is very similar: set up some registers with port<->LED relationships and configure the way the serial lines are wired with some magical values. For this, there actually was a driver, as part of the SoC's GPIO driver, which provided a GPIO for each LED. The original GPIO driver even allowed to steer the port LEDs via the Kernel, and turn them off in a dark environment. I am still using an old image for the Allnet SG8208M switch in my living room, as I really don't like LEDs flickering while watching a streamed movie.

Unfortunately, these features were sacrificed for the upstream GPIO driver. At least the automatic configuration for the LEDs to be SoC-controlled on the RTL838x could come back with a function similar to the ones for the other 3 SoC families, so that "rtk network on" would not be necessary to have such a basic feature of the switch working. Writing a GPIO driver on top of this would not be difficult, but my impression is that it would always be shot down or again ripped out by the "aesthetics over features" faction as there is really no way this can be done with aesthetically pleasing code.

I2C seems to work fine, the SFP module is detected (model and serial number are printed in kernel log), I can read the EEPROM using ethtool and hwmon also works. (There is some issue with duplicate entries in /sys/class/hwmon/ though. It looks like one gets added whenever the sfp line is printed in kernel log. Even when unplugging the module, the hwmon is not removed.)

The other GPIOs appear also to be working. In /sys/kernel/debug/gpio I can see that mod-def0 becomes hi when a module is inserted, and los changes as expected when I toggle the port on the other end (a Cisco SG300-20 switch).

I don't think the TX disable pin is an issue. As long as the PHY handle is specified in the device tree, it is always possible to get into a state where the other end reports the link as up (which should mean that the laser is active).

However, with I2C/GPIO/PHY all specified in the device tree, the real issue is that the link is not detected as up on the OpenWrt side. The LED for the SFP port is on, and it even blinks in sync with the activity LED on the other end. But I can't see any packets being received, neither on the switch itself nor on any devices connected to the other ports. This behaviour is the same whether networking was enabled in the bootloader or not.

The only difference when networking was not enabled in the bootloader is that I need to switch the media type using ethtool -s lan20 port fibre first. (For some reason it is also necessary to run ifconfig lan20 down and ifconfig lan20 up afterwards, not sure if this is expected? This is only necessary when setting the media type for the first time. Running ethtool -s lan20 port [tp|fibre] again takes effect immediately, i.e. the LED state changes and the port state on the other end also updates as expected.)

What happens for packets being received only, i.e. if you ping the device but don't send anything out. Does it continue to blink in sync with the other side? This would mean the packets are received at the MAC layer of the RTL8214FC (OK, MAC is probably the wrong word for that PHY, but there is something in the PHY that translates e.g. a 1000BX connection to 1/4 of a QSGMII link and can see IPG's to control the LEDs) or the SoC, depending on who controls the LEDs. The RTL8214FC can in principle do that, but it is quite unlikely this is the case in combination with an RTL838x, maybe you can double-check where the SFP port LEDs lead to? If the LEDs are indeed controlled by the RTL8214FC, then there is an issue with the Link between the SerDes of the SoC and the SerDes of the PHY. That would be strange as there is really not much that is configurable, both always talk QSGMII with each other. It is much more likely that the SoC contols the LEDs and then you would have an issue with your switch settings (in particular L2) if the LED blinks but you don't see packets arriving. In that case have a look at the drop counters in /sys/kernel/debug/rtl838x/drop_counters . They get cleared with every read. If the counters see the packets being dropped then this means the link to the SoC is fine, but there is a configuration issue with the switch logic and the drop counter that increases should give you a hint what is wrong.

I could finally test the U-Boot hack to turn on the port LEDs on the Netgear GS308Tv1 and it is working properly. I will submit a separate PR for making the U-Boot partition writable on the 3xx gigabit devices.

The complete command for this mod within OpenWrt is: fw_setenv bootcmd rtk network on\; boota

1 Like

What I2C driver package do you use for the communication with the SFP modules?

The LED blinks when the other end is sending packets. It doesn't blink when I try to send packets from the switch (I tested using broadcast ping, which makes the LEDs for non-SFP ports blink).

It looks like all port LEDs are connected to the same RTL8231 (pictures of LED board are in the wiki).

STP_IGR_DROP increases when packets are transmitted on the other end.

The fact that no packets are being sent, plus STP_IGR_DROP sounds to me as if you have an issue with the L2 configuration, not with the configuration of the PHY: if nothing is being sent, then this can also mean that the switch does not know that packets need to be sent over a particular link, and this correlates with packets being dropped due to some issue with STP not correctly set up so that packets are being dropped on ingress (my interpretation of the drop counter's name, there is no documentation on this).
So: how does the Forwarding Database look like, and what happens if you put a static STP entry into it with the right port and destination?

Would including a preinit script to run this command to modify the uboot environment on the affected devices be acceptable? If so I might work on a patch to do that.

I think that's a bit invasive and opaque. With the present changes in the tree at least the power LEDs are functional it seems, and code-wise it seems possible to get the port LEDs going DTS and/or driver changes).