Support for RTL838x based managed switches

Maybe a bit of clarification: for the RTL839x and RTL93xx we do actually have configuration routines to set up automatic handling of the port-leds in the network driver. They are all quite similar. The following function exists for all these 3 SoC families:

What it does is to take up to 4 magical configuration values (led_setX) from the .dts, which define how a particular set of LEDs is accessed serially through an RTL8231. Then each port can be made part of a LED-set via e.g.:

	led-set = <0>;

Additional information like the type of PHY is taken from the PHY configuration used to define all fiber ports: RTL930X_LED_PORT_FIB_SET_SEL_CTRL. This only allows to set up automatic LED handling steered by the SoC polling the PHYs via MDIO. The SoC is able to associate up to 4 LEDs with a port which are used to code the type of link, e.g. 100MBit/1G/2.5G/5G and 10G (normally multiple LEDs are used for one link type, for example on the XGS1250, 5G is pink, which is blue plus orange, while 100M is orange alone and 10G is blue.

This configuration happens also on the RTL839x. The values from u-boot are overwritten, but with exactly the same values u-boot also uses.

The RTL838x has a slightly different way of doing things, but the general way of working is very similar: set up some registers with port<->LED relationships and configure the way the serial lines are wired with some magical values. For this, there actually was a driver, as part of the SoC's GPIO driver, which provided a GPIO for each LED. The original GPIO driver even allowed to steer the port LEDs via the Kernel, and turn them off in a dark environment. I am still using an old image for the Allnet SG8208M switch in my living room, as I really don't like LEDs flickering while watching a streamed movie.

Unfortunately, these features were sacrificed for the upstream GPIO driver. At least the automatic configuration for the LEDs to be SoC-controlled on the RTL838x could come back with a function similar to the ones for the other 3 SoC families, so that "rtk network on" would not be necessary to have such a basic feature of the switch working. Writing a GPIO driver on top of this would not be difficult, but my impression is that it would always be shot down or again ripped out by the "aesthetics over features" faction as there is really no way this can be done with aesthetically pleasing code.

I2C seems to work fine, the SFP module is detected (model and serial number are printed in kernel log), I can read the EEPROM using ethtool and hwmon also works. (There is some issue with duplicate entries in /sys/class/hwmon/ though. It looks like one gets added whenever the sfp line is printed in kernel log. Even when unplugging the module, the hwmon is not removed.)

The other GPIOs appear also to be working. In /sys/kernel/debug/gpio I can see that mod-def0 becomes hi when a module is inserted, and los changes as expected when I toggle the port on the other end (a Cisco SG300-20 switch).

I don't think the TX disable pin is an issue. As long as the PHY handle is specified in the device tree, it is always possible to get into a state where the other end reports the link as up (which should mean that the laser is active).

However, with I2C/GPIO/PHY all specified in the device tree, the real issue is that the link is not detected as up on the OpenWrt side. The LED for the SFP port is on, and it even blinks in sync with the activity LED on the other end. But I can't see any packets being received, neither on the switch itself nor on any devices connected to the other ports. This behaviour is the same whether networking was enabled in the bootloader or not.

The only difference when networking was not enabled in the bootloader is that I need to switch the media type using ethtool -s lan20 port fibre first. (For some reason it is also necessary to run ifconfig lan20 down and ifconfig lan20 up afterwards, not sure if this is expected? This is only necessary when setting the media type for the first time. Running ethtool -s lan20 port [tp|fibre] again takes effect immediately, i.e. the LED state changes and the port state on the other end also updates as expected.)

What happens for packets being received only, i.e. if you ping the device but don't send anything out. Does it continue to blink in sync with the other side? This would mean the packets are received at the MAC layer of the RTL8214FC (OK, MAC is probably the wrong word for that PHY, but there is something in the PHY that translates e.g. a 1000BX connection to 1/4 of a QSGMII link and can see IPG's to control the LEDs) or the SoC, depending on who controls the LEDs. The RTL8214FC can in principle do that, but it is quite unlikely this is the case in combination with an RTL838x, maybe you can double-check where the SFP port LEDs lead to? If the LEDs are indeed controlled by the RTL8214FC, then there is an issue with the Link between the SerDes of the SoC and the SerDes of the PHY. That would be strange as there is really not much that is configurable, both always talk QSGMII with each other. It is much more likely that the SoC contols the LEDs and then you would have an issue with your switch settings (in particular L2) if the LED blinks but you don't see packets arriving. In that case have a look at the drop counters in /sys/kernel/debug/rtl838x/drop_counters . They get cleared with every read. If the counters see the packets being dropped then this means the link to the SoC is fine, but there is a configuration issue with the switch logic and the drop counter that increases should give you a hint what is wrong.

I could finally test the U-Boot hack to turn on the port LEDs on the Netgear GS308Tv1 and it is working properly. I will submit a separate PR for making the U-Boot partition writable on the 3xx gigabit devices.

The complete command for this mod within OpenWrt is: fw_setenv bootcmd rtk network on\; boota

1 Like

What I2C driver package do you use for the communication with the SFP modules?

The LED blinks when the other end is sending packets. It doesn't blink when I try to send packets from the switch (I tested using broadcast ping, which makes the LEDs for non-SFP ports blink).

It looks like all port LEDs are connected to the same RTL8231 (pictures of LED board are in the wiki).

STP_IGR_DROP increases when packets are transmitted on the other end.

The fact that no packets are being sent, plus STP_IGR_DROP sounds to me as if you have an issue with the L2 configuration, not with the configuration of the PHY: if nothing is being sent, then this can also mean that the switch does not know that packets need to be sent over a particular link, and this correlates with packets being dropped due to some issue with STP not correctly set up so that packets are being dropped on ingress (my interpretation of the drop counter's name, there is no documentation on this).
So: how does the Forwarding Database look like, and what happens if you put a static STP entry into it with the right port and destination?

Would including a preinit script to run this command to modify the uboot environment on the affected devices be acceptable? If so I might work on a patch to do that.

I think that's a bit invasive and opaque. With the present changes in the tree at least the power LEDs are functional it seems, and code-wise it seems possible to get the port LEDs going DTS and/or driver changes).

Hello forum users,

Amazed that openwrt can be used on switches these days i received an zyxel XGS1250-12 yesterday. Managed to install openwrt but I'm having boot issues. In apprx. 3/4 of the boot attempts the kernel module rtl9300 stucks at an calibraion step.

[    0.962644] REALTEK RTL9300 SERDES mdio-bus:1b: Detected internal RTL9300 Serdes
[    0.970962] rtl9300_configure_serdes: Port 27, SerDes is 9
[    0.981096] rtl9300_configure_serdes CMU BAND is 16
[    0.986513] rtl9300_sds_rst 31
[    1.009903] rtl9300_configure_serdes PATCHING SerDes 9
[    1.016640] rtl9300_phy_enable_10g_1g 1gbit phy: 00001140
[    1.022663] rtl9300_phy_enable_10g_1g 1gbit phy enabled: 00001140
[    1.030461] rtl9300_phy_enable_10g_1g 10gbit phy: 00002040
[    1.036551] rtl9300_phy_enable_10g_1g 10gbit phy after: 00002040
[    1.044251] rtl9300_phy_enable_10g_1g set medium: 00000002
[    1.050371] rtl9300_phy_enable_10g_1g set medium after: 00000002
[    1.077028] rtl9300_configure_serdes: Configuring RTL9300 SERDES 9, mode 1a
[    1.086792] rtl9300_serdes_mac_link_config: registers before 00000000 00001403
[    1.096852] rtl9300_serdes_mac_link_config: registers after 00000000 00001403
[    1.124785] rtl9300_force_sds_mode --------------------- serdes 9 forcing to 0 ...
[    1.133227] rtl9300_force_sds_mode: SDS: 9, mode 0
[    1.138581] rtl9300_force_sds_mode: SDS mode 1f
[    1.146610] rtl9300_force_sds_mode --------------------- serdes 9 forcing to 0 ...
[    1.155048] rtl9300_force_sds_mode: SDS: 9, mode 25
[    1.160497] rtl9300_force_sds_mode: SDS mode 1a
[    5.876146] rtl9300_force_sds_mode --------------------- serdes 9 forced to 1a DONE
[    5.884689] start_1.1.1 initial value for sds 9
[    5.917733] end_1.1.1 --
[    5.920548] start_1.1.2 Load DFE init. value
[    5.926285] end_1.1.2
[    5.928835] start_1.1.3 disable LEQ training,enable DFE clock
[    5.941243] end_1.1.3 --
[    5.944053] start_1.1.4 offset cali setting
[    5.949723] end_1.1.4
[    5.952242] start_1.1.5 LEQ and DFE setting
[    5.963890] end_1.1.5
[    5.973418] start_1.2.1 ForegroundOffsetCal_Manual
[    5.980764] end_1.2.1
[    5.988770] start_1.2.3 Foreground Calibration
[    6.002237] rtl9300_do_rx_calibration_2_3: fgcal_gray: 0, fgcal_binary 0
[    6.025687] rtl9300_do_rx_calibration_2_3: fgcal_gray: 0, fgcal_binary 0
[    6.049162] rtl9300_do_rx_calibration_2_3: fgcal_gray: 0, fgcal_binary 0
[    6.072619] rtl9300_do_rx_calibration_2_3: fgcal_gray: 0, fgcal_binary 0
[    6.096070] rtl9300_do_rx_calibration_2_3: fgcal_gray: 0, fgcal_binary 0
[    6.119536] rtl9300_do_rx_calibration_2_3: fgcal_gray: 0, fgcal_binary 0
[    6.142985] rtl9300_do_rx_calibration_2_3: fgcal_gray: 0, fgcal_binary 0
If it works the log looks like this.
[    5.938764] end_1.1.4
[    5.941308] start_1.1.5 LEQ and DFE setting
[    5.952970] end_1.1.5
[    5.962497] start_1.2.1 ForegroundOffsetCal_Manual
[    5.969845] end_1.2.1
[    5.977829] start_1.2.3 Foreground Calibration
[    5.991329] rtl9300_do_rx_calibration_2_3: fgcal_gray: 10, fgcal_binary 10
[    6.000002] rtl9300_do_rx_calibration_2_3: end_1.2.3
[    6.005518] start_1.4.1
[    6.227107] end_1.4.1
[    6.229855] start_1.4.2

This happens with the 22.02 rc4 and with the current snapshot images.

Thanks for reporting this. Interesting that the link calibration can actually fail completely.
Do you see a similar problem with the image you can find here: https://famko.zapto.org/s/aAgmnwJeFwqBp5i
There are several improvements of the calibration code in, that should also add support for the SFP+ cage.

I tried openwrt-realtek-rtl930x-zyxel_xgs1250-12-squashfs-sysupgrade from your link but now there is an kernel panic followed by an reboot.

[    0.731963] Probing RTL838X eth device pdev: 82076800, dev: 82076810
[    0.758554] Found SoC ID: 9302: RTL9302B, family 9300
[    0.764256] Using MAC 000000e04c000000
[    0.768514] set sds port 0 to 2
[    0.772040] set sds port 24 to 6
[    0.775672] set sds port 25 to 7
[    0.779258] set sds port 26 to 8
[    0.782839] rtl838x_mdio_init: illegal SMI bus number 4
[    0.788671] Error setting up netdev, freeing it again.
[    0.794948] i2c /dev entries driver
[    0.799075] rtl9300_i2c_probe probing I2C adapter
[    0.804335] i2c-rtl9300 1b00036c.i2c-rtl9300: SCL speed 100000, mode is 0
[    0.811935] rtl9300_i2c_probe scl_num 0
[    0.816222] rtl9300_i2c_probe sda_num 1
[    0.822339] NET: Registered protocol family 10
[    0.865063] Segment Routing with IPv6
[    0.869307] NET: Registered protocol family 17
[    0.874587] 8021q: 802.1Q VLAN Support v1.8
[    0.880421] sfp sfp-p12: Host maximum power 1.0W
[    0.902088] VFS: Mounted root (squashfs filesystem) readonly on device 31:7.
[    0.915241] Freeing unused kernel memory: 1196K
[    0.920287] This architecture does not have kernel memory protection.
[    0.927495] Run /sbin/init as init process
[    0.932045]   with arguments:
[    0.935364]     /sbin/init
[    0.938363]   with environment:
[    0.941840]     HOME=/
[    0.944447]     TERM=linux
[    1.771389] init: Console is alive
[    1.775746] init: - watchdog -
[    2.303762] random: fast init done
[    2.345702] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[    2.473308] CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 00000000, ra == 8031fb90
[    2.485202] Oops[#1]:
[    2.487732] CPU: 0 PID: 856 Comm: kworker/0:2 Not tainted 5.10.112 #0
[    2.494914] Workqueue: events deferred_probe_work_func
[    2.500617] $ 0   : 00000000 00000001 8077af7c 00000000
[    2.506427] $ 4   : 00000000 00000000 8244bd04 80658640
[    2.512237] $ 8   : 00125cff 802cd638 00000110 00000001
[    2.518048] $12   : 00000000 822b0889 822b088a fffffffc
[    2.523858] $16   : 82076a10 8075be68 00000000 808c0000
[    2.529669] $20   : 8075be68 808c0000 00000004 fffffffe
[    2.535479] $24   : 00000000 00000020
[    2.541290] $28   : 8244a000 8244bd78 806d0000 8031fb90
[    2.547101] Hi    : 00125cff
[    2.550287] Lo    : 1e000000
[    2.553477] epc   : 00000000 0x0
[    2.557061] ra    : 8031fb90 platform_drv_probe+0x40/0x94
[    2.563048] Status: 1100fc03 KERNEL EXL IE
[    2.567701] Cause : 50800008 (ExcCode 02)
[    2.572150] BadVA : 00000000
[    2.575338] PrId  : 00019555 (MIPS 34Kc)
[    2.579689] Modules linked in: gpio_button_hotplug crc32c_generic
[    2.586471] Process kworker/0:2 (pid: 856, threadinfo=(ptrval), task=(ptrval), tls=00000000)
[    2.595840] Stack : 82076a10 00000000 00000000 808c0000 8075be68 82076a10 00000000 8031d898
[    2.605143]         8075be68 8244be08 8075be68 8031e0e4 82010828 00000000 8244be08 8031e064
[    2.614447]         80755460 80750000 00000000 80750000 fffffffe 8031b498 827f8530 00000009
[    2.623750]         806d6720 800532e8 82015c5c 8234fb34 82076a10 00000001 82076a54 8031de6c
[    2.633054]         82076a10 80317b18 808c9de0 827f8530 82076a10 01000000 82078580 82076a10
[    2.642358]         ...
[    2.645072] Call Trace:
[    2.645075]
[    2.649434] [<8031d898>] really_probe+0x108/0x4d8
[    2.654664] [<8031e0e4>] __device_attach_driver+0x80/0x18c
[    2.660752] [<8031e064>] __device_attach_driver+0x0/0x18c
[    2.666747] [<8031b498>] bus_for_each_drv+0x70/0xb0
[    2.672175] [<800532e8>] dequeue_entity+0x38/0x2fc
[    2.677491] [<8031de6c>] __device_attach+0xe4/0x15c
[    2.682905] [<80317b18>] device_reorder_to_tail+0x6c/0x100
[    2.688997] [<8031c79c>] bus_probe_device+0x9c/0xb8
[    2.694411] [<8004dd18>] switch_mm.constprop.0+0x78/0x138
[    2.700408] [<8031ccd8>] deferred_probe_work_func+0x90/0xd0
[    2.706595] [<8058cbac>] __schedule+0x264/0x568
[    2.711631] [<8003f8bc>] process_one_work+0x1f0/0x460
[    2.717243] [<8003fcac>] worker_thread+0x180/0x5c4
[    2.722560] [<8003fb2c>] worker_thread+0x0/0x5c4
[    2.727687] [<8003fb2c>] worker_thread+0x0/0x5c4
[    2.732822] [<80045ffc>] kthread+0x13c/0x144
[    2.737562] [<80045ec0>] kthread+0x0/0x144
[    2.742107] [<80045ec0>] kthread+0x0/0x144
[    2.746656] [<80001a18>] ret_from_kernel_thread+0x14/0x1c
[    2.752649]
[    2.754293] Code: (Bad address in epc)
[    2.758454]
[    2.760094]
[    2.761814] ---[ end trace d1a4b58bfc9d9ef0 ]---
[    2.766980] Kernel panic - not syncing: Fatal exception
[    2.772785] Rebooting in 1 seconds..

I updated the XGS1250 images in place, could you try again?

I'm struggeling to get it unbricked.

I am sorry! That is not good. I assume you did not change the environment variables in the bootloader to allow interrupting the boot process? I believe someone else got into that trap when the testing was done for the initial support, and I think the conclusion was that now you need to edit the configuration for u-boot directly on the flash memory. Fortunately the chip is supported by flashrom. You need a SOIC clamp and flash adapter, a bundle with an adapter based on the CH341A is about 12Euros on e.g. Amazon including the clamp. Changing the bootcmd to anything that does not exist, will drop you onto the u-boot prompt. If you already have a clamp, also a Raspberry Pi can be used for flashing.

I have such an flash adapter here. Used it only twice.
Now flashrom detects two chips MX25L12805D and MX25L12835F/....
Is the flash procedure described somewhere here already?

Is the right one, I guess, the chip inscription says it's a Macronix MX 25L128338F The procedure is straightforward: copy the entire chip to your workstation, then use e.g. hexer to edit the text that says "bootcmd=boota" (could also be just boot), by changing it into e.g. xoota and then boot will be interrupted next time for you so that you can set environment variable as you like. There is also a password that is being asked when you drop to the prompt, but that is empty.

So far I managed to read the chip

flashrom --programmer ch341a_spi -r backup1.bin -c "MX25L12805D"

Did that twice and compared md5sum's.

Changed bootcmd with hexedit. In my case it appeared twice and the second appearance had to be changed. Had changed it before so it was bootcmd=rtk network on;boota here.

Wrote the changes back to the chip
flashrom --programmer ch341a_spi -w backup1.bin -c "MX25L12805D"

On the shell I had to enable networking first. Did an ping test to the tftp server and uploaded the firmare. Then booted with the image in memory.

rtk network on
ping 192.168.1.111
tftp  0x84f00000 192.168.1.111:zyxel-initramfs.bin
bootm

For flashrom both chips found can be used for reading and writing

2 Likes

Upgraded to your refreshed image fixed bootcmd and rebooted warm/cold a few times. So far no stuck at calibration problems or kernel panics. Thank you for your work and the support!

1 Like

bridge link shows that the bridge port stays in disabled state, so this explains why packets are not received and the STP counter increases.

22: lan20@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master switch state disabled priority 32 cost 100 

From the kernel log, the bridge port never reaches forwarding state, it remains in blocking or disabled state the entire time. The link status of the interface also always remains at NO-CARRIER / LOWERLAYERDOWN, and ethtool always prints Link detected: no. I assume the link state not being reported correctly is the actual issue here?

I did some testing with the bridge removed entirely from the network config. Then I can see received packets with tcpdump. Sending doesn't work, no packets appear in tcpdump.

(Another thing I tried configuring is fixed-link in the device tree for the SFP ports along with removing the managed property. That makes this particular issue go away, but then it isn't possible to control the state of the ports anymore. This includes changing the media type, so there is no way to make the SFP ports work when the bootloader didn't initialize networking.)