Support for RTL838x based managed switches

What was the typo?

It had reg = <25> for both ports 24 and 25

This is the fun stuff, Is there nothing plugged in at all?

Correct, a bare switch with nothing but power connected will come up with its port 1 link LED and dmesg has an entry saying

rtl83xx-switch lexra-bus0:switch@1b000000: Link is Up - 1Gbps/Full - flow control off

Later when it sets up the PHYs again it doesn't treat that port differently and just logs the same as for other ports

rtl83xx-switch lexra-bus0:switch@1b000000 lan1: configuring for phy/usxgmii link mode
rtl930x_phylink_mac_config port 8, mode 0, phy-mode: usxgmii, speed -1, link 0
rtl930x_phylink_mac_config SDS is 3
rtl930x_phylink_mac_config: Unsupported speed: -1

can you actually plug in cables and see status changes on any of the ports? Can you actually get a working link?

I'm afraid not. I'm going by ip link, I hope that's right? Independently of where I plug or unplug cables it says NO-CARRIER and correspondingly LOWERLAYERDOWN.

# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether ce:a0:42:33:81:5c brd ff:ff:ff:ff:ff:ff permaddr 00:e0:4c:00:00:00
3: lan1@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master switch0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether ce:a0:42:33:81:5d brd ff:ff:ff:ff:ff:ff permaddr 00:e0:4c:00:00:00
4: lan2@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master switch0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether ce:a0:42:33:81:5e brd ff:ff:ff:ff:ff:ff permaddr 00:e0:4c:00:00:00
5: lan3@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master switch0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether ce:a0:42:33:81:5f brd ff:ff:ff:ff:ff:ff permaddr 00:e0:4c:00:00:00
6: lan4@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master switch0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether ce:a0:42:33:81:60 brd ff:ff:ff:ff:ff:ff permaddr 00:e0:4c:00:00:00
7: lan5@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master switch0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether ce:a0:42:33:81:61 brd ff:ff:ff:ff:ff:ff permaddr 00:e0:4c:00:00:00
8: switch0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether ce:a0:42:33:81:5c brd ff:ff:ff:ff:ff:ff
9: switch0.1@switch0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether ce:a0:42:33:81:5d brd ff:ff:ff:ff:ff:ff

Mainline with dts patch looks the same as far as ip link is concerned but the on the remote end the link light turns on when I plug into any of the TEG-S750 ports. There is no link until the rtl838x_eth_stop line is logged, after which the PHYs are set up again and then the remote switch's link light turns on (and continues to work as expected when unplugging/plugging into another port). I'm guessing among the many changes in your wip branch something has diverged with how the PHYs are initialized?

I'll backport the dts to your dev branch and see how that goes. Any other ideas for diagnosing are obviously welcome but you've already done a lot of work on this and I feel I must earn my keep :slight_smile:

Edit: The -dev branch also shows no link on the remote end, so the change must be present there already.

Yeah, there's some changes, but my switches work fine :stuck_out_tongue:

did you try building/booting the -wip xgs1250 image? Would make me curious.

I have an rtl9301 and rtl9302b device here that I use for testing. But I have done most of the work now with the rtl9302b only.

Anyway, first things first. Your switch heavily relies on what u-boot does to bring up the PHY's. U-Boot is initializing them for you. It has to, because the 1250 image, will initialize 5 PHY's. sds2 represents the octal phy. sds6, 7 and 8 are the aquantia 10GE PHY's and SDS9 is the SFP port.

But your switch uses 3, 4, 5, 6 and 7. So we do have overlap, on sds 6 and 7, but that's it (lan 4 and 5). So if the others are working, it's not because what the xgs1250 image is doing for you.

Now, the PHY initialization has changed quite a bit, which is why I asked about xgs1250 on -wip. If that's also broken, then the init code doesn't work (but as I don't have any 10GE PHY's on any ports, I wouldn't know). If that's the case, it's bisect time :slight_smile:

So, assuming for a moment it's not PHY initialization, we have MDIO bus and SDS selection still.
Afaik, PHY doesn't need to be told anything to actually start working (the defaults work, and probably come from an EEPROM, as we see 1 per chip) with some basic stuff. The MDIO bus is really all that is needed to detect plug-in/plug-out and all those messages. E.g. the PHY chip will handle the plugin, and uses MDIO to inform the kernel.

This actually brings up a fair point, we know how the mdio is configured (in u-boot/linux when working) and should be able to extract some useful information from there.
https://svanheule.net/realtek/longan/register/smi_glb_ctrl
https://svanheule.net/realtek/longan/register/smi_mac_type_ctrl
https://svanheule.net/realtek/longan/register/smi_port0_15_polling_sel
https://svanheule.net/realtek/longan/register/smi_port16_27_polling_sel
https://svanheule.net/realtek/longan/register/smi_prvte_polling_ctrl
https://svanheule.net/realtek/longan/register/smi_10gphy_polling_sel_0
https://svanheule.net/realtek/longan/register/smi_port0_5_addr_ctrl
https://svanheule.net/realtek/longan/register/smi_port6_11_addr_ctrl
https://svanheule.net/realtek/longan/register/smi_port12_17_addr_ctrl
https://svanheule.net/realtek/longan/register/smi_port18_23_addr_ctrl
https://svanheule.net/realtek/longan/register/smi_port24_27_addr_ctrl
https://svanheule.net/realtek/longan/register/smi_ctrl

I'd need those registers to see what's what. (md in u-boot AFTER running rtk network on, devmem in linux are your friends)

So I've been busy bisecting (SLOWLY, my goodness those build times...) and the link broke somewhere between 6e65a3af5a8f939580d2882feac45bb6dec3d10a (most recent known working on the -dev branch) and 23d3a878101137634093c037ebec5c95038e4381 (oldest known broken on the -dev branch). A whole bunch of commits in between don't boot on my switch because of clock issues so it's hard to identify a single commit right now... I'm currently trying to see if I can identify the offending commit that prevents my switch from booting (by doing more bisecting), revert that, and then keep bisecting the remaining commits until I find the commit that broke the link light. Progress is quite slow though.

For all of this bisecting I've always just used the xgs1250 image without any modifications, and while it doesn't work particularly well for my switch (meaning: it can't find any lanN devices, presumably because MDIO/SMI addresses don't match), it does boot nicely and show a link light at least up until 6e65a3af5a8f939580d2882feac45bb6dec3d10a.

edit: To answer your question, the most recent -wip xgs1250 image also doesn't produce a link light on the remote end when booted on my switch.

My 12 year old rig I can get it down to 3 minutes. (2 1/2 minutes to build, 30 seconds to boot)-ish. By using an 'external' tree, which saves all that unpacking. With the normal setup, it takes nearly 5 minutes :frowning: Also ccache is important! as at least it can cache some stuff, shaving a lot of time too.

I'll await your bisectness, as that'll be truly useful (also I can't replicate it anyway :p)

A small status-update on my WIP branch. I've integrated the new I2C/regmap etc drivers I spoke about before, and adapted the dts to use them. For a change (which I know I should do by default, but i leave them off usually as I prefer faster compile times :p) I built all dts files so there shouldn't be any bugs in those anymore, other then of course miss-configurations.

I used iperf3 to test connectivity on a random ethernet port, 'shorted' my 2g5 ethernet ports, and plugged in my DAC cable to short things (which still spamms up/down messages, gotta figure that out) but the modules are properly detected.

So next step, is to start on a pinctrl driver, as that's holding me back on some things ... and fix the arch base driver, need to split that up as the SPI variant needs that too.

Hopefully others can run the same tests so I can push the current state to -dev (i'll build it one more time first, to see if the SFP behavior was there too).

So no, boo sadly no working switching between ports :stuck_out_tongue: just wrapping up all these other things I was working on.

Ccache can hide some nasty surprises. I've been bitten by it a few times myself and ever since I do not use it anymore.

3 Likes

fair enough; and I do occasional nuke 'build_dir' (which holds my ccache). But 15m compile times are not acceptable for doing actual development/bisecting :slight_smile:

2 Likes

Alright, after the latest updates I was able to build the image and loaded it via tftpboot. The 10G link works (tested with another XGS1 250), but the 2.5G link is not and falls back to 1G. At boot time the LED is light blue indicating a successful 2.5G connection, but ~ 25s after openwrt has finished booting up, it turns green.

Here is the output of the session:
OpenWrt XGS1250 Log Output

Here is the md output:

RTL9300# # md.l 0xbb00ca00 8 
bb00ca00: 0f0e5500 00049fff 55550000 00f9aaaa    ..U.....UU......
bb00ca10: 000000ff 00000259 001fa434 00000194    .......Y...4....
RTL9300# # md.l 0xbb00cb80 8
bb00cb80: 0a418820 16a4a0e6 2307b9ac 2f6ad272    .A. ....#.../j.r
bb00cb90: 000da108 00000000 00f00000 00000000    ................

Looks like it is the correct offset:

SMI_MAC_TYPE_CTRL (0xca04)

00000000 000 001 001 001 11 11 11 11 11 11

RESERVED		    24	8	  
MAC_P27_TYPE		21	3	0x0	10G/1G Fiber (SerDes)
MAC_P26_TYPE		18	3	0x1	10G/2G5 GPHY
MAC_P25_TYPE		15	3	0x1	10G/2G5 GPHY
MAC_P24_TYPE		12	3	0x1	10G/2G5 GPHY
MAC_P23_P20_TYPE	10	2	0x3	GPHY
MAC_P19_P16_TYPE	8	2	0x3	GPHY
MAC_P15_P12_TYPE	6	2	0x3	GPHY
MAC_P11_P8_TYPE		4	2	0x3	GPHY
MAC_P7_P4_TYPE		2	2	0x3	GPHY
MAC_P3_P0_TYPE		0	2	0x3	GPHY

Hope it helps. If there is other information you need just let me know.

Great news (actually not yet :p) everyone!
[ 0.000000] Linux version 6.1.29 (buildbot@820c9bd29424) (mips-openwrt-linux-musl-gcc (OpenWrt GCC 12.3.0 unknown) 12.3.0, GNU ld (GNU Binutils) 2.40.0) #0 Thu Feb 16 17:00:03 2023

So I just pushed https://github.com/openwrt/openwrt/pull/12726 and will start debugging stuff, hoping it's just mangled patches on my end, as I couldn't see any needed DT changes ...

But we have (almost) 6.1 :slight_smile:

3 Likes

I think that's because intiially, you are getting the uboot config; then openwrt takes over, and breaks things.

I noticed incorrect led behavior when I set the phy-mode to 10gbase-r (as afaik it should be), if I set it to 1000base-x it works fine. However, this is only for SFP. My switch only has a 2.5G phy, which works in all speeds.

but what does links after booting have in these registers? Are they still the same? :stuck_out_tongue:

I'm not liking the port-mapping bits though; they are all set to 'default' except the last one. (0xbb00cb80). I would expect that 'portX' is smi-address 'Y' as per devicetree. In linux you'll deff. get that.

I'm a bit too busy to translate those 32bits into 5 bits :stuck_out_tongue: so double check those.

Hmm, I've tried hexdump -C -s 0xbb00cb80 /dev/mem but it returns:

hexdump: /dev/mem: Bad address
bb00cb80

devmem 0xbb00cb80 32 just stalls

Any suggestions? :grinning:

If I recall correctly, that used to work before.

but how long ago? I don't think anybody tests the 1250 target from master ...

Well, if I'm not mistaken, the 2.5G also doesn't work with the v22.03.5 image - that's actually why I'm here.

2.5 speed on 1250 never worked in official openwrt, only on some Birger branch, don't know exactly what but there is a working binary, at least with my realtek 2.5gb usb/pcie adapter

1 Like

Sorry for the radio silence. I've had a tough time getting a working build to further sort out PHY initialization on my TEG-S750. I determined that (commits in the dev branch):

  • 6e65a3af "realtek: Add SMP ops registration support to MIPS generic" still has link
  • the next commit, 119a78c0 "realtek: Migrate to MIPS_GENERIC" causes my switch to hang during early boot with "Failed to get CPU clock: -2"
  • the first commit to not hang again is ad464a0b "realtek: clk: Switch RTL930X devicetree syntax" but here PHY initialization seems broken, as I no longer get link.

I tried cherry-picking ranges (e.g. ones seemingly related to the clock) in an attempt to get a non-hanging commit to further bisect the initialization but so far with no luck, all the cherry-picks end up hanging with "failed to get CPU clock".

To change things up a bit I pulled the registers you mentioned before:

smi_glb_ctrl = 0x0f895500 (modified: SMI2_POLL_SEL=0, SMI1_POLL_SEL=0, SMI0_POLL_SEL=0, SMI0_INTF_SEL=1)
smi_mac_type_ctrl = 0x000095df (modified: MAC_P25_TYPE=0x1, MAC_P24_TYPE=0x1, MAC_P23_P20_TYPE=0x1, MAC_P19_P16_TYPE=0x1, MAC_P11_P8_TYPE=0x1)
smi_port0_15_polling_sel = 0x55540000 (modified: SMI_PORT8_POLLING_SEL=0x0)
smi_port16_27_polling_sel = 0x00f0a8a8 (modified: SMI_PORT25_POLLING_SEL=0x0, SMI_PORT24_POLLING_SEL=0x0, SMI_PORT20_POLLING_SEL=0x0, SMI_PORT16_POLLING_SEL=0x0)
smi_prvte_polling_ctrl = 0x00000000 (default)
smi_10gphy_polling_sel_0 = 0x001fa434 (default)
smi_port0_5_addr_ctrl = 0x0a418820 (default)
smi_port6_11_addr_ctrl = 0x16a490e6 (modified: PORT8_ADDR = 4)
smi_port12_17_addr_ctrl = 0x2207b9ac (modified: PORT16_ADDR = 0)
smi_port18_23_addr_ctrl = 0x2f6a8672 (modified: PORT20_ADDR = 1)
smi_port24_27_addr_ctrl = 0x000de862 (modified: PORT25_ADDR = 3, PORT24_ADDR = 2)
smi_ctrl = 0x00f00000 (default)

Same thing for me, I can't access it once I've booted into OpenWrt. devmem 0xbb00cb80 32 hangs and eventually the switch resets. Did you figure this out?

I tried various approaches but unfortunately, I haven't been able to find a solution yet. The devmem command still hangs for me too, when I try read from that address.

I think we can rule out the values of the SMI registers, even though that seemed like a promising direction.
Here's what upstream was doing (link light on, but the kernel is oblivious):

[    0.825429] rtl930x_mdio_reset: RTL930X_SMI_GLB_CTRL 0f895500
[    0.831886] rtl930x_mdio_reset: RTL930X_SMI_PORT0_15_POLLING_SEL 00000000
[    0.839472] rtl930x_mdio_reset: RTL930X_SMI_PORT16_27_POLLING_SEL 00000000
[    0.847139] rtl930x_mdio_reset: RTL930X_SMI_MAC_TYPE_CTRL 00009510
[    0.854011] rtl930x_mdio_reset: RTL930X_SMI_10GPHY_POLLING_SEL_0 001fa434
[    0.861576] rtl930x_mdio_reset: RTL930X_SMI_10GPHY_POLLING_SEL_1 001fa435
[    0.869152] rtl930x_mdio_reset: RTL930X_SMI_10GPHY_POLLING_REG0_CFG 01010000
[    0.877017] rtl930x_mdio_reset: RTL930X_SMI_10GPHY_POLLING_REG9_CFG 01e7c400
[    0.884854] rtl930x_mdio_reset: RTL930X_SMI_10GPHY_POLLING_REG10_CFG 01e7e820
[    0.892806] rtl930x_mdio_reset: RTL930X_SMI_PRVTE_POLLING_CTRL 00000000

I then modified upstream's rtl838x_eth.c to hard-code the values I observed in the stock image, resulting in:

[    0.826597] dtw was going to set RTL930X_SMI_MAC_TYPE_CTRL = 00009510 but am instead setting 0x000095df
[    0.837173] dtw setting polling reg0,9,10
[    0.841626] dtw setting polling port polling sel
[    0.846779] rtl930x_mdio_reset: RTL930X_SMI_GLB_CTRL 0f895500
[    0.853168] rtl930x_mdio_reset: RTL930X_SMI_PORT0_15_POLLING_SEL 55540000
[    0.860735] rtl930x_mdio_reset: RTL930X_SMI_PORT16_27_POLLING_SEL 00f0a8a8
[    0.868406] rtl930x_mdio_reset: RTL930X_SMI_MAC_TYPE_CTRL 000095df
[    0.875278] rtl930x_mdio_reset: RTL930X_SMI_10GPHY_POLLING_SEL_0 001fa434
[    0.882844] rtl930x_mdio_reset: RTL930X_SMI_10GPHY_POLLING_SEL_1 001fa435
[    0.890419] rtl930x_mdio_reset: RTL930X_SMI_10GPHY_POLLING_REG0_CFG 0107ffe0
[    0.898284] rtl930x_mdio_reset: RTL930X_SMI_10GPHY_POLLING_REG9_CFG 0127ffe9
[    0.906148] rtl930x_mdio_reset: RTL930X_SMI_10GPHY_POLLING_REG10_CFG 0167ffea
[    0.914082] rtl930x_mdio_reset: RTL930X_SMI_PRVTE_POLLING_CTRL 00000000

... with no observable difference. The link light still comes on and ip link still says NO-CARRIER.

I then also hard-coded the same values in the wip branch, hoping it might help with PHY initialization and hence getting a link light, but even with those registers I get no link, let alone anything that the kernel would see.

Any other recommendations for what to look at? I'll also keep trying to narrow down which commit broke the link light but it's slow going.

Some progress!
I enabled more verbose output of MDIO reads/writes and realized that those were going through rtl930x_read_phy instead of rtl930x_read_mmd_phy, which I gather is what is more appropriate for my PHYs?
I modified rtl838x_eth.c to use the c45 status for both rtl930x_mdio_read_paged and rtl930x_mdio_write_paged:

- if (regnum & (MII_ADDR_C45 | MII_ADDR_C22_MMD)) {
+ if (priv->smi_bus_isc45[priv->smi_bus[mii_id]]) {

and found that the random stack traces disappeared. Ethtool is now returning more complete output, but also has imaginary link partners for all ports, independent of whether or not they have a cable connected :grimacing:

root@OpenWrt:/# ethtool lan1
Settings for lan1:
        Supported ports: [ ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
                                10000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
                                10000baseT/Full
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                             100baseT/Half 100baseT/Full
                                             10000baseT/Full
                                             2500baseT/Full
                                             5000baseT/Full
        Link partner advertised pause frame use: Symmetric Receive-only
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 10000Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 8
        Transceiver: external
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Link detected: no

Given that the wip branch still doesn't have working link but encouraged by this result I modified upstream (where the remote shows link) to use rtl930x_{read,write}_mmd_phy and saw some encouraging signs there:

root@OpenWrt:/# ethtool lan1 # has link
Settings for lan1:
        Supported ports: [ ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
                                10000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
                                10000baseT/Full
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                             100baseT/Half 100baseT/Full
        Link partner advertised pause frame use: No
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 100Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 8
        Transceiver: external
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Link detected: no
root@OpenWrt:/# ethtool lan2 # no link
Settings for lan2:
        Supported ports: [ ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
                                10000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
                                10000baseT/Full
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Speed: Unknown!
        Duplex: Unknown! (255)
        Port: MII
        PHYAD: 16
        Transceiver: external
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Link detected: no

Note that it still says no link was detected, but it does list advertised link modes (it somehow omits 1gbit...) and shows mode and speed only for the port where there's a link.