Maybe a bit of clarification: for the RTL839x and RTL93xx we do actually have configuration routines to set up automatic handling of the port-leds in the network driver. They are all quite similar. The following function exists for all these 3 SoC families:
What it does is to take up to 4 magical configuration values (led_setX) from the .dts, which define how a particular set of LEDs is accessed serially through an RTL8231. Then each port can be made part of a LED-set via e.g.:
led-set = <0>;
Additional information like the type of PHY is taken from the PHY configuration used to define all fiber ports: RTL930X_LED_PORT_FIB_SET_SEL_CTRL. This only allows to set up automatic LED handling steered by the SoC polling the PHYs via MDIO. The SoC is able to associate up to 4 LEDs with a port which are used to code the type of link, e.g. 100MBit/1G/2.5G/5G and 10G (normally multiple LEDs are used for one link type, for example on the XGS1250, 5G is pink, which is blue plus orange, while 100M is orange alone and 10G is blue.
This configuration happens also on the RTL839x. The values from u-boot are overwritten, but with exactly the same values u-boot also uses.
The RTL838x has a slightly different way of doing things, but the general way of working is very similar: set up some registers with port<->LED relationships and configure the way the serial lines are wired with some magical values. For this, there actually was a driver, as part of the SoC's GPIO driver, which provided a GPIO for each LED. The original GPIO driver even allowed to steer the port LEDs via the Kernel, and turn them off in a dark environment. I am still using an old image for the Allnet SG8208M switch in my living room, as I really don't like LEDs flickering while watching a streamed movie.
Unfortunately, these features were sacrificed for the upstream GPIO driver. At least the automatic configuration for the LEDs to be SoC-controlled on the RTL838x could come back with a function similar to the ones for the other 3 SoC families, so that "rtk network on" would not be necessary to have such a basic feature of the switch working. Writing a GPIO driver on top of this would not be difficult, but my impression is that it would always be shot down or again ripped out by the "aesthetics over features" faction as there is really no way this can be done with aesthetically pleasing code.
I2C seems to work fine, the SFP module is detected (model and serial number are printed in kernel log), I can read the EEPROM using ethtool and hwmon also works. (There is some issue with duplicate entries in /sys/class/hwmon/ though. It looks like one gets added whenever the sfp line is printed in kernel log. Even when unplugging the module, the hwmon is not removed.)
The other GPIOs appear also to be working. In /sys/kernel/debug/gpio I can see that mod-def0 becomes hi when a module is inserted, and los changes as expected when I toggle the port on the other end (a Cisco SG300-20 switch).
I don't think the TX disable pin is an issue. As long as the PHY handle is specified in the device tree, it is always possible to get into a state where the other end reports the link as up (which should mean that the laser is active).
However, with I2C/GPIO/PHY all specified in the device tree, the real issue is that the link is not detected as up on the OpenWrt side. The LED for the SFP port is on, and it even blinks in sync with the activity LED on the other end. But I can't see any packets being received, neither on the switch itself nor on any devices connected to the other ports. This behaviour is the same whether networking was enabled in the bootloader or not.
The only difference when networking was not enabled in the bootloader is that I need to switch the media type using ethtool -s lan20 port fibre first. (For some reason it is also necessary to run ifconfig lan20 down and ifconfig lan20 up afterwards, not sure if this is expected? This is only necessary when setting the media type for the first time. Running ethtool -s lan20 port [tp|fibre] again takes effect immediately, i.e. the LED state changes and the port state on the other end also updates as expected.)
What happens for packets being received only, i.e. if you ping the device but don't send anything out. Does it continue to blink in sync with the other side? This would mean the packets are received at the MAC layer of the RTL8214FC (OK, MAC is probably the wrong word for that PHY, but there is something in the PHY that translates e.g. a 1000BX connection to 1/4 of a QSGMII link and can see IPG's to control the LEDs) or the SoC, depending on who controls the LEDs. The RTL8214FC can in principle do that, but it is quite unlikely this is the case in combination with an RTL838x, maybe you can double-check where the SFP port LEDs lead to? If the LEDs are indeed controlled by the RTL8214FC, then there is an issue with the Link between the SerDes of the SoC and the SerDes of the PHY. That would be strange as there is really not much that is configurable, both always talk QSGMII with each other. It is much more likely that the SoC contols the LEDs and then you would have an issue with your switch settings (in particular L2) if the LED blinks but you don't see packets arriving. In that case have a look at the drop counters in /sys/kernel/debug/rtl838x/drop_counters . They get cleared with every read. If the counters see the packets being dropped then this means the link to the SoC is fine, but there is a configuration issue with the switch logic and the drop counter that increases should give you a hint what is wrong.
I could finally test the U-Boot hack to turn on the port LEDs on the Netgear GS308Tv1 and it is working properly. I will submit a separate PR for making the U-Boot partition writable on the 3xx gigabit devices.
The complete command for this mod within OpenWrt is: fw_setenv bootcmd rtk network on\; boota
The fact that no packets are being sent, plus STP_IGR_DROP sounds to me as if you have an issue with the L2 configuration, not with the configuration of the PHY: if nothing is being sent, then this can also mean that the switch does not know that packets need to be sent over a particular link, and this correlates with packets being dropped due to some issue with STP not correctly set up so that packets are being dropped on ingress (my interpretation of the drop counter's name, there is no documentation on this).
So: how does the Forwarding Database look like, and what happens if you put a static STP entry into it with the right port and destination?
I think that's a bit invasive and opaque. With the present changes in the tree at least the power LEDs are functional it seems, and code-wise it seems possible to get the port LEDs going DTS and/or driver changes).
Amazed that openwrt can be used on switches these days i received an zyxel XGS1250-12 yesterday. Managed to install openwrt but I'm having boot issues. In apprx. 3/4 of the boot attempts the kernel module rtl9300 stucks at an calibraion step.
Thanks for reporting this. Interesting that the link calibration can actually fail completely.
Do you see a similar problem with the image you can find here: https://famko.zapto.org/s/aAgmnwJeFwqBp5i
There are several improvements of the calibration code in, that should also add support for the SFP+ cage.
I am sorry! That is not good. I assume you did not change the environment variables in the bootloader to allow interrupting the boot process? I believe someone else got into that trap when the testing was done for the initial support, and I think the conclusion was that now you need to edit the configuration for u-boot directly on the flash memory. Fortunately the chip is supported by flashrom. You need a SOIC clamp and flash adapter, a bundle with an adapter based on the CH341A is about 12Euros on e.g. Amazon including the clamp. Changing the bootcmd to anything that does not exist, will drop you onto the u-boot prompt. If you already have a clamp, also a Raspberry Pi can be used for flashing.
Is the right one, I guess, the chip inscription says it's a Macronix MX 25L128338F The procedure is straightforward: copy the entire chip to your workstation, then use e.g. hexer to edit the text that says "bootcmd=boota" (could also be just boot), by changing it into e.g. xoota and then boot will be interrupted next time for you so that you can set environment variable as you like. There is also a password that is being asked when you drop to the prompt, but that is empty.
bridge link shows that the bridge port stays in disabled state, so this explains why packets are not received and the STP counter increases.
22: lan20@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master switch state disabled priority 32 cost 100
From the kernel log, the bridge port never reaches forwarding state, it remains in blocking or disabled state the entire time. The link status of the interface also always remains at NO-CARRIER / LOWERLAYERDOWN, and ethtool always prints Link detected: no. I assume the link state not being reported correctly is the actual issue here?
I did some testing with the bridge removed entirely from the network config. Then I can see received packets with tcpdump. Sending doesn't work, no packets appear in tcpdump.
(Another thing I tried configuring is fixed-link in the device tree for the SFP ports along with removing the managed property. That makes this particular issue go away, but then it isn't possible to control the state of the ports anymore. This includes changing the media type, so there is no way to make the SFP ports work when the bootloader didn't initialize networking.)