Support for RTL838x based managed switches

That patch has a relatively big impact on the general behaviour of these switches. The purpose is to make the layer 2 forwarding (the switching) dependent on both the destination MAC of a device as well as the VLAN. Without it, the forwarding depends on the MAC and a station ID and while this is the default hardware behaviour, it is neither documented how it works nor is it the default software configuration of these switches with the OEM software. The forwarding based on MAC and VLAN-ID is fairly standard behaviour for switches in general, but there is a quirk with the RTL devices, namely they require a VLAN 1 to be present for anything VLAN-related to work, see for example dsa.c: rtl83xx_vlan_del(). Normally, this VLAN 1 set up from userspace, via /etc/config/network, see the default configuration. This VLAN 1 is probably set up too late for the failsafe mode.
Could you investigate to which point the VLANs are configured already at the time the failsafe starts and what the bridge configuration fixes?

2 Likes

Thanks for the information. My concern was that with this patch the driver doesn't behave like other DSA drivers (and the general DSA documentation), where you can use the interfaces for each port separately if you like, and that this might ultimately make configuration more difficult (and device-specific).

Without it, the forwarding depends on the MAC and a station ID and while this is the default hardware behaviour, it is neither documented how it works nor is it the default software configuration of these switches with the OEM software.

The default behaviour here sounds like the expected default DSA behaviour (assuming that the "station ID" identifies the physical port?). If the OEM behaviour is desired, can't that be achieved through the configuration interface, rather than as a default setup in the driver?

I don't think it's that concerning that the OEM software is always configured like this, because it was presumably only designed to do normal switch things, whereas with openwrt it seems really useful to be able to use each port separately if we want.

Anyway, sorry if this has already been discussed... to answer your question:

This VLAN 1 is probably set up too late for the failsafe mode.
Could you investigate to which point the VLANs are configured already at the time the failsafe starts and what the bridge configuration fixes?

By default the failsafe / preinit network configuration just does something like this (repeating myself slightly to make it clearer):

ip link set dev $ifname up
ip -4 address add 192.168.1.1/24 broadcast 192.168.1.255 dev $ifname

On other DSA targets, it's something like this:

ip link set dev eth0 up
ip link set dev lan1 up
ip -4 address add 192.168.1.1/24 broadcast 192.168.1.255 dev lan1

With my script above, that turns into:

ip link set eth0 up
brctl addbr lan
brctl addif lan lan1
ip link set lan1 up
ip link set dev lan up
ip -4 address add 192.168.1.1/24 broadcast 192.168.1.255 dev lan

It knows how to configure vlans by adding a vlan interface on top of the specified interface, but this also doesn't seem to work. The code is here if you're interested. I'll have a poke around to try to find out why the bridge makes it work.

The Station ID is the MSTI, see e.g. here: https://en.wikipedia.org/wiki/Multiple_Spanning_Tree_Protocol
It can be set per-VLAN on the RTL-SoCs, see here: https://svanheule.net/realtek/maple/table/vlan
but it is not clear how to assign an MSTI to a VLAN, so we just use the VLAN-ID itself. In principle this should be the same, except for this strange quirk that VLAN 1 needs to exist. There was not much discussion on whether this quirk needs to be fixed in the driver itself. But it is probably better to be aware of this VLAN in userspace to avoid intransparent behaviour.

Thanks again for the information; i was a bit confused!

Poking around, it seems like the only change required to get traffic to flow is to add them to vlan 1 (via a call to vlan_set_tagged); the rest of rtl83xx_vlan_setup doesn't affect this.

It works in bridge mode because for some reason something higher up in the stack adds and then removes vlan 1 on every configured port, and rtl83xx_vlan_del skips the removal for vlan 1. This does have the effect you'd expect, in that vlan 1 traffic gets to ports where it shouldn't. This is quite unfortunate from a security perspective.

Does the vendor firmware have this issue too? (edit: looks like yes, I'll do some more reading of this thread...) If not, maybe there's some extra configuration to be done, or otherwise some hack involving the vlan mapping rules might be possible.

I do not think there is necessarily a fundamental issue with the hardware. The firmware merely configures a VLAN 1 in hardware but does not tag any ports with it. Internally this seems to be the default VLAN-id used for packets that are not tagged. My suspicion is that the hardware does the following mapping: internal VLAN 1 = no 802.1q or vlan 1, internal VLAN 0 = any VLAN. This solution is found on all SoCs until the 9300 family, the 9310 family does not seem to have this quirk. @bmork suggested the solution that is currently in place. If you want to investigate yourself, there is a reasonably readable SDK available: https://biot.com/gpl/XGS1210_OSC_20201116.7z
For the VLAN default configuration have a look at e.g. sdk/src/dal/maple/dal_maple_vlan.c:_dal_maple_vlan_init_config()

What I found in early experiments was that removing VLAN 1 from a port would block all traffic on that port regardless of VLAN. Tagged or not doesn't matter. Only membership. My "fix" was to let the driver fake removals of VLAN 1 from any port. Very ugly since it creates a hidden domain with all ports, which can't be disabled.

Now this is a long time ago, and I must admit I haven't followed the development closely. So for all I know this was an issue related to how we configured the SoC at the time, and not relevant anymore.

But magic behaviour related to VLAN 1 (or another ID configured as the switch "management VLAN") is/was very common. So common that some guides recommend staying away from VLAN 1.

Playing with the vendor firmware, it looks like you can disable vlan 1 on a port and still have traffic on other vlans flow, so either some configuration is not quite right, or they have implemented a work-around (or my test was bogus!)

The poking continues...

This PR should fix the VLAN1 strangeness that was discussed above:

I've only been able to test on RTL8380M; it'd be good to hear if it works for any other devices.

For completeness, some of the things I said above about how this works were wrong, probably because I didn't have quite enough visibility into what was going on. With JTAG set up things were much clearer.

1 Like

Worked for me on a Netgear GS108T v3.

1 Like

Any experts here who could help with a build for Zyxel GS1900-16 please?

@RaylynnKnight has worked on device support for this model at https://github.com/RaylynnKnight/openwrt/tree/ZyXEL_GS1900-16, but it's not yet merged into OpenWrt/ master (nor submitted as PR/ patch series on the mailing list) yet. That means you'll have to build it from source, pulling in the changes from that branch into master, and start testing it. You probably (really) want serial console access at this point and the ability to fine-tune/ rebuild on the source level as needed.

1 Like

Thanks. I've been in touch with @RaylynnKnigh but he is (understandably) busy with life. I am looking for someone with build skills and who can help debug/fix issues when we run into them.

Have a go with

just replace/ augment steps 2.1 and 2.2 with

$ git clone https://git.openwrt.org/openwrt/openwrt.git
Cloning into 'openwrt'...
remote: Enumerating objects: 567392, done.
remote: Counting objects: 100% (567392/567392), done.
remote: Compressing objects: 100% (149334/149334), done.
remote: Total 567392 (delta 395908), reused 565238 (delta 394532)
Receiving objects: 100% (567392/567392), 167.21 MiB | 24.78 MiB/s, done.
Resolving deltas: 100% (395908/395908), done.

$ cd openwrt/

$ git pull https://github.com/RaylynnKnight/openwrt/ ZyXEL_GS1900-16
remote: Enumerating objects: 44, done.
remote: Counting objects: 100% (24/24), done.
remote: Total 44 (delta 23), reused 23 (delta 23), pack-reused 20
Unpacking objects: 100% (44/44), 9.66 KiB | 824.00 KiB/s, done.
From https://github.com/RaylynnKnight/openwrt
 * branch                  ZyXEL_GS1900-16 -> FETCH_HEAD
Merge made by the 'recursive' strategy.
 package/boot/uboot-envtools/files/realtek            |  7 ++++---
 target/linux/realtek/dts/rtl8380_zyxel_gs1900-16.dts | 36 ++++++++++++++++++++++++++++++++++++
 target/linux/realtek/image/Makefile                  |  7 +++++++
 3 files changed, 47 insertions(+), 3 deletions(-)
 create mode 100644 target/linux/realtek/dts/rtl8380_zyxel_gs1900-16.dts

Select realtek/ zyxel_gs1900-16, add luci and enable TARGET_ROOTFS_INITRAMFS (as you will need the initramfs image for testing over the serial console).

Feel free to use this config:

##### realtek.init: start #####

### Use "make defconfig oldconfig" to expand this to a full .config

CONFIG_TARGET_realtek=y
CONFIG_TARGET_realtek_generic=y
CONFIG_TARGET_DEVICE_realtek_generic_DEVICE_allnet_all-sg8208m=y
CONFIG_TARGET_DEVICE_PACKAGES_realtek_generic_DEVICE_allnet_all-sg8208m=""
CONFIG_TARGET_DEVICE_realtek_generic_DEVICE_d-link_dgs-1210-10p=y
CONFIG_TARGET_DEVICE_PACKAGES_realtek_generic_DEVICE_d-link_dgs-1210-10p=""
CONFIG_TARGET_DEVICE_realtek_generic_DEVICE_d-link_dgs-1210-16=y
CONFIG_TARGET_DEVICE_PACKAGES_realtek_generic_DEVICE_d-link_dgs-1210-16=""
CONFIG_TARGET_DEVICE_realtek_generic_DEVICE_d-link_dgs-1210-28=y
CONFIG_TARGET_DEVICE_PACKAGES_realtek_generic_DEVICE_d-link_dgs-1210-28=""
CONFIG_TARGET_DEVICE_realtek_generic_DEVICE_inaba_aml2-17gp=y
CONFIG_TARGET_DEVICE_PACKAGES_realtek_generic_DEVICE_inaba_aml2-17gp=""
CONFIG_TARGET_DEVICE_realtek_generic_DEVICE_netgear_gs108t-v3=y
CONFIG_TARGET_DEVICE_PACKAGES_realtek_generic_DEVICE_netgear_gs108t-v3=""
CONFIG_TARGET_DEVICE_realtek_generic_DEVICE_netgear_gs110tpp-v1=y
CONFIG_TARGET_DEVICE_PACKAGES_realtek_generic_DEVICE_netgear_gs110tpp-v1=""
CONFIG_TARGET_DEVICE_realtek_generic_DEVICE_netgear_gs308t-v1=y
CONFIG_TARGET_DEVICE_PACKAGES_realtek_generic_DEVICE_netgear_gs308t-v1=""
CONFIG_TARGET_DEVICE_realtek_generic_DEVICE_netgear_gs310tp-v1=y
CONFIG_TARGET_DEVICE_PACKAGES_realtek_generic_DEVICE_netgear_gs310tp-v1=""
CONFIG_TARGET_DEVICE_realtek_generic_DEVICE_zyxel_gs1900-10hp=y
CONFIG_TARGET_DEVICE_PACKAGES_realtek_generic_DEVICE_zyxel_gs1900-10hp=""
CONFIG_TARGET_DEVICE_realtek_generic_DEVICE_zyxel_gs1900-16=y
CONFIG_TARGET_DEVICE_PACKAGES_realtek_generic_DEVICE_zyxel_gs1900-16=""
CONFIG_TARGET_DEVICE_realtek_generic_DEVICE_zyxel_gs1900-8=y
CONFIG_TARGET_DEVICE_PACKAGES_realtek_generic_DEVICE_zyxel_gs1900-8=""
CONFIG_TARGET_DEVICE_realtek_generic_DEVICE_zyxel_gs1900-8hp-v1=y
CONFIG_TARGET_DEVICE_PACKAGES_realtek_generic_DEVICE_zyxel_gs1900-8hp-v1=""
CONFIG_TARGET_DEVICE_realtek_generic_DEVICE_zyxel_gs1900-8hp-v2=y
CONFIG_TARGET_DEVICE_PACKAGES_realtek_generic_DEVICE_zyxel_gs1900-8hp-v2=""

### Enable per device rootfs
CONFIG_TARGET_MULTI_PROFILE=y
CONFIG_TARGET_PER_DEVICE_ROOTFS=y

### Debugging options
# CONFIG_KERNEL_DEBUG_INFO is not set
# CONFIG_KERNEL_DEBUG_KERNEL is not set
# CONFIG_KERNEL_KALLSYMS is not set

### Per-package build logs in <buildroot>/logs
CONFIG_DEVEL=y
CONFIG_BUILD_LOG=y

### Include package list in build
CONFIG_INCLUDE_CONFIG=y

### Longer waiting for failsafe button push
CONFIG_IMAGEOPT=y
CONFIG_PREINITOPT=y
CONFIG_TARGET_PREINIT_TIMEOUT=5

### Minify lua
CONFIG_LUCI_SRCDIET=y

### build kmod-mtd-rw, so it is available for installing later
CONFIG_PACKAGE_kmod-mtd-rw=m

### display and edit the uboot environment
CONFIG_PACKAGE_uboot-envtools=y

### luci
CONFIG_PACKAGE_uhttpd=y
CONFIG_PACKAGE_uhttpd-mod-ubus=y
CONFIG_PACKAGE_luci-mod-admin-full=y
CONFIG_PACKAGE_luci-app-firewall=y
CONFIG_PACKAGE_luci-app-opkg=y
CONFIG_PACKAGE_luci-app-uhttpd=y
CONFIG_PACKAGE_luci-app-wol=y
CONFIG_PACKAGE_luci-proto-ipv6=y
CONFIG_PACKAGE_rpcd-mod-rrdns=y
CONFIG_PACKAGE_luci-theme-bootstrap=y

### luci-ssl (mbedtls)
CONFIG_PACKAGE_px5g-wolfssl=y

### disable signed packages
CONFIG_PACKAGE_opkg=y
# CONFIG_PACKAGE_openwrt-keyring is not set
# CONFIG_PACKAGE_usign is not set
# CONFIG_SIGNED_PACKAGES is not set

##### realtek.init: stop #####

Given that you really, really want to test the initramfs image (using tftpboot over the serial console) first, there's little danger of doing anything wrong, even if you'd somehow mess up the building.

Obscure SFPs are supported too, in case anyone wondered.. This is a BX-U SFP "donated" by my ISPs CPE. I am using a ZyXEL GS1900-10HP paired with an RPi4 to replace that CPE for this fibre access::

root@gs1900-10hp-f:~# ethtool -m lan10
        Identifier                                : 0x03 (SFP)
        Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
        Connector                                 : 0x01 (SC)
        Transceiver codes                         : 0x00 0x00 0x00 0x40 0x12 0x00 0x01 0x00 0x00
        Transceiver type                          : Ethernet: BASE-BX10
        Transceiver type                          : FC: long distance (L)
        Transceiver type                          : FC: Longwave laser (LC)
        Transceiver type                          : FC: Single Mode (SM)
        Encoding                                  : 0x01 (8B/10B)
        BR, Nominal                               : 1300MBd
        Rate identifier                           : 0x00 (unspecified)
        Length (SMF,km)                           : 10km
        Length (SMF)                              : 10000m
        Length (50um)                             : 0m
        Length (62.5um)                           : 0m
        Length (Copper)                           : 0m
        Length (OM3)                              : 0m
        Laser wavelength                          : 1310nm
        Vendor name                               : Tsuhan
        Vendor OUI                                : 00:00:00
        Vendor PN                                 : THMPRS-3511-10A
        Vendor rev                                : A
        Option values                             : 0x00 0x1a
        Option                                    : RX_LOS implemented
        Option                                    : TX_FAULT implemented
        Option                                    : TX_DISABLE implemented
        BR margin, max                            : 0%
        BR margin, min                            : 0%
        Vendor SN                                 : F19021506991
        Date code                                 : 190409
        Optical diagnostics support               : Yes
        Laser bias current                        : 19.530 mA
        Laser output power                        : 0.3220 mW / -4.92 dBm
        Receiver signal average optical power     : 0.1778 mW / -7.50 dBm
        Module temperature                        : 49.00 degrees C / 120.20 degrees F
        Module voltage                            : 3.2446 V
        Alarm/warning flags implemented           : Yes
        Laser bias current high alarm             : Off
        Laser bias current low alarm              : Off
        Laser bias current high warning           : Off
        Laser bias current low warning            : Off
        Laser output power high alarm             : Off
        Laser output power low alarm              : Off
        Laser output power high warning           : Off
        Laser output power low warning            : Off
        Module temperature high alarm             : Off
        Module temperature low alarm              : Off
        Module temperature high warning           : Off
        Module temperature low warning            : Off
        Module voltage high alarm                 : Off
        Module voltage low alarm                  : Off
        Module voltage high warning               : Off
        Module voltage low warning                : Off
        Laser rx power high alarm                 : Off
        Laser rx power low alarm                  : Off
        Laser rx power high warning               : Off
        Laser rx power low warning                : Off
        Laser bias current high alarm threshold   : 65.000 mA
        Laser bias current low alarm threshold    : 1.000 mA
        Laser bias current high warning threshold : 55.000 mA
        Laser bias current low warning threshold  : 3.000 mA
        Laser output power high alarm threshold   : 1.2589 mW / 1.00 dBm
        Laser output power low alarm threshold    : 0.0447 mW / -13.50 dBm
        Laser output power high warning threshold : 0.5012 mW / -3.00 dBm
        Laser output power low warning threshold  : 0.1122 mW / -9.50 dBm
        Module temperature high alarm threshold   : 90.00 degrees C / 194.00 degrees F
        Module temperature low alarm threshold    : -10.00 degrees C / 14.00 degrees F
        Module temperature high warning threshold : 85.00 degrees C / 185.00 degrees F
        Module temperature low warning threshold  : -5.00 degrees C / 23.00 degrees F
        Module voltage high alarm threshold       : 3.6000 V
        Module voltage low alarm threshold        : 3.0000 V
        Module voltage high warning threshold     : 3.5000 V
        Module voltage low warning threshold      : 3.1000 V
        Laser rx power high alarm threshold       : 1.2589 mW / 1.00 dBm
        Laser rx power low alarm threshold        : 0.0050 mW / -23.01 dBm
        Laser rx power high warning threshold     : 0.5012 mW / -3.00 dBm
        Laser rx power low warning threshold      : 0.0126 mW / -19.00 dBm

Works perfectly. Extra bonus is that the switch is powering the RPi (with a PoE hat) as well as two APs. The RPi is set up as a router-on-a-stick, using different VLANs as WAN and LAN.

2 Likes

Is there any way to see which firmware slot (on, e.g. the GS308T) corresponds to which image1/image2 (for the purpose of setting boot partition via fw_setsys)? I'd like to flash the original netgear image into image2 and be able to boot from it, but I don't know where to point mtd.

Update: found the answer at biot.com. It's mtd6

image1 is "firmware" and image2 is "runtime2" in OpenWrt. Just use the "runtime2" name with mtd, or look at /proc/mtd to see the mapping.

Note that the bootpartition variable is zero-based and the OEM partition names are one-based. So "bootpartition = 1" means "image2". Somewhat confusing...

2 Likes

Hmm, does the GD308T firmware need to be stripped before flashing?
(from https://openwrt.org/docs/guide-user/installation/generic.uninstall#via_openwrt_cli)

I'm not sure what that page means by "stripped". That part definitely needs some more explanation and examples.

But I'm guessing that it is about additional firmware headers which should not end up in flash, used by the original firmware to verify/validate the image before flashing. This depends on vendor/device, and I have no experience with the GS308T. But the realtek devices I have tested does not use such headers. They come with firmware which can be written directly as-is. The ZyXEL firmware has a trailer, but writing that to flash is harmless.

You can try to verify the theory by comparing the start of a partition with the start of an image, using e.g hexdump. Checksums and sizes etc will of course be different, but you should recognize the pattern in the header. In particular the magic number in hte first 4 bytes.

1 Like

It looks like the GS-1900-8HP offers the same elegant access integrated into one of the "grills" that let air in and out on the sides (at least the B1 revision).

Overall openwrt has been working great on my GS1900-8HP.

One weirdness I have noticed is that traffic destined for the switch itself seems to be leaked/broadcast to different ports unexpectedly. Steps to reproduce on openwrt master:

  1. Start from stock config (management VLAN 100 on port 1 tagged)
  2. Set port 2 to VLAN 100 untagged
  3. Run wireshark on a machine connected to port 1
  4. Access the openwrt web interface on a different machine connected to port 2
  5. Observe the HTTP packets sent out port 1. (Only the packets to the switch, not from the switch.)

This does not happen with the stock firmware.

Has anyone seen this behavior?