Support for RTL838x based managed switches

I had a .dts for the GS1900-48 for the 5.4 kernel. Currently that switch is the device I use to validate (V)SMP on a simpler platform than the 930x and the 931x, so the images I supplied (running 5.10) are my current testing images. Maybe an overview of the different construction sites I am working on:

839x: seems to run now relatively stable with smp enabled. For this an update of the interrupt controller code was necessary. I hunted down last week a nasty locking bug in the Ethernet driver code that was only triggered on SMP with heavy load. The driver works now quite differently with cleanups done not in IRQ context but in deferred work queues, which costs 8% performance, vs single processor and no deferred work. Probably there needs to be IRQ balancing to offset this again. But the question is how does this impact 838x and then there might be similar bugs in other drivers. We now use the internal MIPS timer for now.

9300: unable to run SMP on this platform. the 930x has a broken MIPS timer interrupt although it is the same platform in principle as the 8390. This means the RTL90300 SoC hardware timer needs to be used, which must trigger different wake-up IRQs together with the new IRQ driver to wake up both processors independently. There are still RCU stalls, deadlocks and the like and the question is whether the issues are in the timer code, the irq driver code or even in the silicon. There are 2 similar external timer + own IRQ controller examples with MIPS in the kernel (sibyte bcm1480 and sb1250) and they use considerable amounts of customized MIPS SMP code. On the Ubiquiti USW switch I have SFP+ ports now working with 10Gig and 1Gig fibre modules, DAC does not work. But I don't manage to bring up the SFP+ ports on the XGS1210/1250, probably because I miss something that is done in u-boot on the USW. The USW is the only switch that allows the use of SFP+ ports already under u-boot (and only 10Gig fibre), the XGS don't even manage all their ethernet ports in u-boot. It does not help that the Ubiquiti seems to be doing something very different to the SFP+ port once booted compared to what it does under u-boot and my expectations from the SDK. It has a devmem binary in its busybox and that shows register settings that are sometimes very different from what I would expect.

931x: Another mess. Trying to get this running using the dual core InterAptiv CPU plus either the GIC timer or the 9300 timer. It is quite telling there are no devices in the wild that use SMP, so not even sure this is possible. Chances are the GIC timer is also broken, at least the Edgecore ECS4125 uses the RTL9300 timer instead, single core of course. Getting that working together with the GIC IRQ controller and dual core is something Sebastian and I are working on since months with little progress. Linux really insists very hard on using the internal MIPS GIC timer. The switch layer is also still work-in-progress and then there were stability issues with the Ethernet driver and SMP plus things in the SMI bus polling that are not understood.

3 Likes

Just wanted to say thank you for all the hard work and creative thinking. I've used OpenWrt on a few devices, and was very interested to hear that the two GS1920-24 units I have may end up having a new lease of life through what you're working on; there are contexts where for security reasons you can't use EOL hardware, but old hardware running actively-maintained OSS is a different story. I've really enjoyed reading this thread, and look forward to trying out my GS1920s when it's ready for noobs like me. Thanks again, really appreciate it.

6 Likes

I have a new oem switch with a RTL9301 inside that has 24GE PoE and 10SFP+ ports on it. What would be the process for testing OpenWRT on this device? It has a built in serial port, and I can access the command line / u-boot.

I am unsure which OpenWRT image is the latest to download, or if I need to build one myself?

Can you say anything about it's availability?

There is enough support in master to try this. You need the right .dts of course. Could you post the output of the oem "show tech-support" command? That usually contains enough info to construct a .dts.

1 Like

Do I need to do this through a special menu? The regular seems quite basic:

testswitch(diag)%%show tech-support
--------show system--------
system configuration
system description     : Switch
system object id       : 1.3.6.1.4.1.54367.1.3.32.1
system name            : Device Name
system run time        : 04 hour 35 minute 46 second 90 tick
system location        : 
administrator contact  : 
product name           : 
interface of system    : 28
--------show version--------
  Device, Compiled on Wed Sep 22 11:35:17 CST 2021
  Software Version V1.6
  BootRom Version V1.0
  Hardware Version V1.0
  CPLD Version N/A
  Processor: 34Kc MIPS 1GHz
  Serial No.: 123456789
  Copyright (C) 2001-2021
  All Rights Reserved.
  Last reboot is cold reset.
  Uptime is 0 weeks, 0 days,  4 hours, 35 minutes
--------show cpu-utilization--------
CPU Information:
CPU Idle               : 90 %
--------show memory--------
Memory information:
Total Memory           : 256 MB
Free Memory            : 128 MB
--------show cpu-car--------
Send packet to CPU rate: 2000 pps.

Note, it says that is the processor, but the RTL9301 has a 800Mhz CPU?

There is normally no magic to the output of show tech-support. The output is fairly standardized by the Realtek SDK and is very comprehensive. Either this has not been implemented in this test device yet, or they are brewing their own stuff, like Zyxel with ZynOS on the XS1930 or Ubiquiti on the USW. Could you post the output during boot?

Indeed, the RTL930x officially only support 800MHz, but the SoC is probably about 10 years old and if they use a new process for production or good thermal management, then 1GHz sounds absolutely possible. The thermal management solutions I saw so far are fairly basic, apart from the Ubiquiti USW which plays in its own league in terms of engineering, probably designed in the US and not the far east.
You can try to boot any image for the 9300, btw, it should get you fairly far.

Just when you think there is nowhere to make progress, suddenly there is an opening: It turns out the platform now runs like a charm with the generic GIC timer and IRQ controller of the MIPS InterAptiv architecture with dual core SMP -- a completely different setup than the Realtek SDK-based machines that can be bought and which use only single core and RTL9300 timer. The same fix for the Ethernet driver to make the RTL8390 run with SMP worked here, too. Then some lessons learned from the Ubiquiti USW switch suddenly made also the MAC and PHYs work nicely together plus some SMI taming and now the mango platform looks peachy. The RTL931x supports OpenFlow for offloading of firewall, NAT and routing even for encapsulated flows, for which there is Linux driver support, so I am looking forward to some fun hacking.

7 Likes

@svanheule Have you ever figured out is there something like show tech-support for D-Link?
It still wont reset propertly, and I have no clue how to figure out the reset GPIO from the GPL dump

Which D-Link do you have? Paul has figured out the restart GPIO for the DGS-1210-10, does that help you? If you have that model, Paul has been waiting on someone to test those patches as well.

For an alternative to show tech-support, I'm afraid I don't really know a firmware-based one. You could trace pins, or loop over all GPIOs to see which one reboots your system (active low). If you get lucky, the pin assignments may also be included in the GPL dump. This was the case for the Cisco SG220 series.

1 Like

I have the DGS-1210-28P F2 rev.

Thanks for the link,I will check it out as I tried all of the GPIO-s that are default HIGH to set them to LOW but it looks like it's not on the SoC controller but the expander so that's why I wasn't able to find it.

It could be in the GPL dump, I am trying to make the sense of the SDK layout.

The reset GPIO appears to be the same on DGS-1210-28 as to what Paul found, so its 34 on the 8231

1 Like

Sorry for the long delay. I just rebuilt the latest branch (running make dirclean before just to be safe), and still get the CPU stall. Same with the image you provided for @Peppino.
I noticed that your bootlog from a working GS1900-48 says RTL839X model is 83936806, while @Peppino's and my bootlogs from non working models say RTL839X model is 83936802.

Could you try building this:

We did some improvements on stability in the last days and I did extensive performance testing and stability testing just this morning without any issues with that image on my GS1900-48. But the different cpu-id could be an issue. The last digits are the revision number of the CPU. I will need to check where there might be potential issues with this. Normally during development I disregard that revision and only take the latest version into account, as the earlier ones usually are FPGA implementations of the SoC. 02 is a very early revision, so there could indeed be peculiarities as this probably is the first one actually sold.

edwinistrator,

If you manage to compile the image, share it please, I simply don't have the time now to learn all the bits and pieces of the image build process...:frowning: yet it would be awesome to finally see this board alive again.

I more so just wanted to make sure there was not user error on my part. There seems to be so many different modes of login from enable mode, diagnoistic mode, u-boot, config t, etc that I wanted to make sure I was using the right mode to provide the output? Which mode do you want the show tech-support command from?

Here you go:

U-Boot 2011.12.(3.6.1.1) (Sep 22 2021 - 11:34:08)

Board: RTL9300 CPU:800MHz LX:175MHz DDR:600MHz
DRAM:  256 MB
SPI-F: MXIC/C22019/MMIO32-4/ModeC 1x32 MB (plr_flash_info @ 83f7cd34)
Loading 65536B env. variables from offset 0xe0000
Net:   Net Initialization Skipped
No ethernet found.
Hit Esc key to stop autoboot:  0
## dual_image_sel ... -1-0-<NULL>
## Booting image from partition ... 0
## Booting kernel from Legacy Image at 81000000 ...
   Image Name:   RTK_SDK
   Created:      2021-09-22  11:39:46 UTC
   Image Type:   MIPS Linux Kernel Image (lzma compressed)
   Data Size:    9094915 Bytes = 8.7 MB
   Load Address: 80000000
   Entry Point:  802a65b0
   Verifying Checksum ... OK
   Uncompressing Kernel Image ... OK

Starting kernel ...


Loading drivers..................................OK
Loading basic manager applications...............OK
Loading L2 applications..........................OK
Loading L3 applications..........................OK
Loading manager applications.....................OK

Username(1-64 chars):

I did some investigation. Basically, the chip revision is the last digits of the CPU-model number divided by 2:
REV-A: 0, REV-B: 1, REV-C: 2, REV-D: 3. So yours is a B, mine is a REV-D. There are plenty of statements in the SDK code that make a distinction between chips with REV < C and the newer ones. One difference concerns the RTL8396, the rest concerns the multicast configuration table. This seems to not exist on the early devices. I can try to disable that table for devices with REV < C. But debugging this will be very hard for me without actual device access. I will see what can be done.

Sorry for asking some I am sure already answered questions, I am trying to get RTL8382 support working under 5.16-rc4.
Got the RTL8231 working on DGS-1210-28P but with some hacks to drop the mutex and soc_info as those don't exist upstream, which got the reboot working.

I understand that the Realtek SMI is used to talk to the RTL8231 and it's not really a second name for MDIO like with other vendors.
Is it by any chance compatible with the upstream Realtek SMI driver they added because of the switches?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/dsa/realtek-smi-core.c?h=v5.16-rc4

Cause despite the code working, all of the drivers are interconnected currently and that doesn't look upstream friendly at all.
I haven't dealt with upstreaming MIPS so far, but have done a lot of ARM/ARM64 upstreaming.
I don't want to be one of those just complaining but without upstreaming maintaining, this will become a pain.