Support for RTL838x and RTL93xx based managed switches

I installed snapshot on my JG928A about 22h ago because of these recent fixes. So far, load average has decreased significantly (i.e. from ~3 to 0.25) and memory consumption is low as well.
Currently, about 15 devices are powered via PoE.

Thanks, I installed some tools (ethtool-full to decode the EEPROMs as well as i2c-tools) to have a closer look at the working vs. non-working modules. All my working ones claim to be Fibre interfaces, wheras the non-working Ubiquiti one truthfully says it's an RJ45 port.

Non-working

        Identifier                                : 0x03 (SFP)
        Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
        Connector                                 : 0x22 (RJ45)
        Transceiver codes                         : 0x10 0x00 0x00 0x00 0x20 0x40 0x04 0x80 0x16
        Transceiver type                          : Extended: 10Gbase-T with SFI electrical interface
        Encoding                                  : 0x03 (NRZ)
        BR Nominal                                : 10300MBd
        Rate identifier                           : 0x00 (unspecified)
        Length (SMF)                              : 0km
        Length (OM2)                              : 0m
        Length (OM1)                              : 0m
        Length (Copper or Active cable)           : 100m
        Length (OM3)                              : 0m
        Laser wavelength                          : 850nm
        Vendor name                               : Ubiquiti Inc.
        Vendor OUI                                : 24:5a:4c
        Vendor PN                                 : UACC-CM-RJ45-MG
        Vendor rev                                : U09
        Option values                             : 0x00 0x00
        BR margin max                             : 0%
        BR margin min                             : 0%
        Vendor SN                                 : XXXXXXXXXXXXX
        Date code                                 : 250102

Working:

        Identifier                                : 0x03 (SFP)
        Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
        Connector                                 : 0x07 (LC)
        Transceiver codes                         : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
        Transceiver type                          : 10G Ethernet: 10G Base-SR
        Encoding                                  : 0x06 (64B/66B)
        BR Nominal                                : 10300MBd
        Rate identifier                           : 0x00 (unspecified)
        Length (SMF)                              : 0km
        Length (OM2)                              : 80m
        Length (OM1)                              : 20m
        Length (Copper or Active cable)           : 0m
        Length (OM3)                              : 300m
        Laser wavelength                          : 850nm
        Vendor name                               : OEM
        Vendor OUI                                : 00:90:65
        Vendor PN                                 : SFP-10G-SR
        Vendor rev                                : 02
        Option values                             : 0x00 0x1a
        Option                                    : TX_DISABLE implemented
        BR margin max                             : 0%
        BR margin min                             : 0%
        Vendor SN                                 : YYYYYYYYYYYY
        Date code                                 : 230423

Addendum:

The two working modules (despite ostensibly from two different "brands") have the exact same EEPROM data (except for their serial number and date code). One of them supports NBASE-T, the other only 10GBASE-T.

The non-working Ubiquiti module works fine on a TP-LINK T1700G-28TQ (with vendor firmware).

Linux actually has quite a few quirks for various Vendor + Model combinations: https://elixir.bootlin.com/linux/v6.11/source/drivers/net/phy/sfp.c#L466. There is definitely nothing to fix up anything on yours.

In your case you probably want to fix up supported link modes. 10000baseSR/Full is certainly wrong and will probably confuse linux. This is part of Transceiver codes. You probably also want to add 1G, 2.5G and 5G to that list. I guess otherwise it looks fine. I am not 100% sure about the Transceiver type but that does not look unreasonable in general.

I don't think anything other than 10Gbase-T and 1000base-T is part of the SFP+ specification (table 5-3 Transceiver Compliance Codes in the linked document), no?

True. ChatGPT recommends this:

Byte Old Value New Value Why
3 0x10 0x01 Set 10GBASE-T
6 0x40 0x60 Add 1000BASE-T + 100BASE-TX
8 0x80 0x80 Leave as-is (10GBASE-T)

Afterwards, just unplug + replug the device. It will complain about a incorrect CRC but list the correct one. Then just set the expected one and it should work after reconnecting.

Ideally you backup your EEPROM with i2cdump before performing any changes.

How decoupled the rtl39xx switch driver currently is? Can it be used over SPI on Qualcomm platform (Verizon cr1000a router)

I don’t think the driver is currently regmap’d… so it would be a bit of effort. It would first need to be migrated to regmap MMIO. It would probably also want a bunch of extra things added like volatile registers, default register values, caching, and additional tuning for SPI latencies (i.e. busy polling things like the SMI CMD bit will be different under SPI than MMIO).

There’s still quite a bit of movement around this driver recently also. Do you have any particular test hardware already in mind?

Yes, version cr1000a is such hardware and I have a spare one to play with.

I remember @olliver had a bunch of patches for this driver. Is current effort a continuation of his work or a brand new code?

I’m pretty sure it’s mostly been others getting code in recently. I think @olliver got a bit frustrated with PRs going stagnent, and not getting merged.. so hasn’t been so involved recently.
There is still quite a lot left to sort out on the RTL930x and RTL931x series in particular. There’s still a requirement for uboot to do some pre-configuration, which isn’t guaranteed on all devices.

So trying to get to the bottom of what changes are missing in the LInux drivers to get ethernet working independently is still ā€œup for grabsā€.. along with tidying up a number of other interfaces (including the SPI slave capabilities for all of the various chip functions).

as @bevanweiss just noted: RTL930x definitely requires some work to be done still. It’s not in the same good state as RTL838x/RTL839x.

You could try building your own image of snapshot and add these commits to it, but it’s only a piece of the puzzle.

Unless you want to help in getting PRs like these tested: feel free to check again in a couple months or half a year what the state is. Until then you are stuck with using sfp modules that already work.

I’m definitely curious, whether all the modules you tested now, will work in a couple of months, but I assume you might return them or won’t be able to test them again in a couple of months, are you?

And important: there is no guarantee at all, that it will take just x months. it could be years before they work. I don’t have any crystal ball to look into the future. I mostly wanted to underline that you have to be patient before all of these kinds of features of rtl930x get support.

Has anyone looked at the TP-Link SG3400 JetStream Rackmount 2.5G Managed Switch, 24x RJ-45, 4x SFP+, 500W PoE++ (TL-SG3428XPP-M2)? That one seems to have a RTL93xx chip and U-Boot bootloader + Linux based on the GPL dump.

I’m considering buying one of those because I want multi-Gigabit and PoE.

Thanks, I am certainly willing to test PRs (but I have to set up a build environment for the realtek devices first, I've never tried adding custom kernel patches to my builds before). I have indeed noticed that the ST1800F is much less stable than my Zyxel 1900-10HP (the TP-Link switches loses interface connectivity after a few hours, apparently at random, because the last log messages before the interface goes down look completely normal), and even before that trying to read logfiles or dmesg sometimes hangs the SSH process. All the while, switching continues to work fine even when the management interface is down.

As for the errant SFP+ modules, I have not decided yet. I might keep at least one as I can use it on my main switch and it has got a better power usage than the older modules.

My guess is that this module implements ā€œproperā€ c45 over i2c, and that our c45 over SMBus implementation doesn’t work. Maybe you could look into that if this SFP+ really does use that protocol?

The mainline implementation shows what the module expects:

Our SMBus version is an (to my knowlegde) untested attempt to fake that using single byte SMBus transfers:

Not actually sure if this is possible to fix. But things may have improved with the latest fixes to the rtl9300-i2c driver

are you able to connect to your tp-link switch via serial when that happens? (would be interesting to know whether the OS hangs or just the connection between the interface and the OS. In other words: does logread/dmesg run fine when executed via serial in the case that the interface and therefore ssh hangs?)

also when was the last time it hung? was the snapshot release weeks or months old?

I only have the device since two days ago, so it's always been a very recent snapshot. However, the instability is likely thermal, I just realized that I might have positioned the switch in sub-optimal place (above the T1700G-28TQ and below a Sat/IP receiver, without much clearing).

I've since installed lm-sensors and repositioned the devices a bit and the cpu_thermal-virtual-0 sensor has gone down from 69 °C to 66 °C. If that does not improve things in the next hour or so, I'll power it off and try to test it in another place entirely once it has cooled down.

Apparently everything but very quick non-interactive program now locks up the current process (but not the network interface or SSH). sensors works, ps w too, ethtool lan1 hangs. dmesg used to run for a page or two before hanging, now it's not even that.

I am on the current snapshot, so unless the code is still in an unmerged PR, it has not been enough.

You can see probably see if the PR has been included by listing the log at your snapshot's rev number. When you log into the device, it shows something like OpenWrt SNAPSHOT, r30920-1c92e468d5. Grab the hash from the end and tack it onto this as the h= parameter:

https://git.openwrt.org/?p=openwrt/openwrt.git;a=shortlog;h=1c92e468d5

Everything below that commit should be in the build you're running...

EDIT: And if you've got owut installed to do upgrades, it should do all the work for you and list that link fully formed (but pointing to the latest available build, not the installed one):

$ owut check
owut - OpenWrt Upgrade Tool 2025.09.03~49e9bce7-r1 (/usr/bin/owut)
...
Build-commit   https://git.openwrt.org/?p=openwrt/openwrt.git;a=shortlog;h=d751f1e57e
...

I added an external FAN to my device. That lowers my temps to 40 to 45 degree. Those devices should definitely come with fans.

Yeah, the commits from June 2025 are definitely included in the build, so no luck there.

I'm currently on r30955-d751f1e57e.

The problem is, I want/need one without a fan because it sits in my living room. But I'm open to some Frankenstein solution (a large heatsink on the case or adding some heatsinks to the SFP cages inside).

However, something appears to be wrong with my unit, I let it cool down and removed all the SFP modules except one and it still froze the process when reading the log. I'll let it sit for an hour or two and try again with a single 1G connection to check in a total non-load situation.

Reading EEPROMs etc. was fine yesterday, but does not work today. Either the accumulated heat has loosened some solder joints or one of the new builds using owut messed something up (I rebuilt images several times to add packages).

do you have a working serial connection via a USB tty adapter or so? does this one freeze, too?