Support for RTL838x based managed switches

I can confirm that this works. Great work, bmok!
What a stupid mistake on my side to disregard the value in
rtl8231_pin_dir :wink:
Would you mind sending a PR to the 83xx_dev branch?

And BTW: It should now be possible to use the generic SFP support in Linux with these modules: https://www.mjmwired.net/kernel/Documentation/devicetree/bindings/net/sff,sfp.txt

Wow. That was too easy. So I probably overlooked something. But anyway: added two commits to the sfp branch I sent a PR for. Some basic functionality works:

root@OpenWrt:/# ethtool -m lan10
Offset          Values
------          ------
0x0000:         03 04 07 00 00 00 01 40 00 0c 00 01 0d 00 00 00 
0x0010:         37 1b 00 00 46 69 62 65 72 53 74 6f 72 65 20 20 
0x0020:         20 20 20 20 00 00 00 00 53 46 50 31 47 2d 53 58 
0x0030:         2d 38 35 20 20 20 20 20 20 20 20 20 03 52 00 b8 
0x0040:         00 1a 00 00 44 38 37 42 32 33 31 32 34 36 35 20 
0x0050:         20 20 20 20 31 37 31 32 30 32 20 20 68 f0 01 dc 
0x0060:         00 00 08 78 6f f5 14 69 89 9b 8d 5b 28 e7 c7 ea 
0x0070:         2e 6c 40 00 00 00 00 00 00 00 00 00 ce 8f 7d cc 
0x0080:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x0090:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x00a0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x00b0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x00c0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x00d0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x00e0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x00f0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x0100:         64 00 ce 00 55 00 d8 00 94 3e 6d 92 87 5a 7a 76 
0x0110:         af c8 00 00 a6 04 00 00 1b a7 03 7b 13 93 04 ea 
0x0120:         31 2d 00 3f 1f 07 00 64 00 00 00 00 00 00 00 00 
0x0130:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x0140:         00 00 00 00 3f 80 00 00 00 00 00 00 01 00 00 00 
0x0150:         01 00 00 00 01 00 00 00 01 00 00 00 00 00 00 e0 
0x0160:         24 e1 80 ed 00 00 00 00 00 00 ff ff ff ff 82 00 
0x0170:         00 40 00 ff 00 40 ff ff 00 00 ff 00 00 00 00 00 
0x0180:         57 4f 54 52 42 39 56 42 41 41 31 30 2d 32 36 32 
0x0190:         36 2d 30 31 56 30 31 20 89 fb 55 00 00 00 00 7d 
0x01a0:         00 00 00 00 00 00 00 00 00 00 62 a4 64 00 69 9c 
0x01b0:         75 1b 81 f1 12 26 0b f5 0c f2 0f b6 00 00 aa aa 
0x01c0:         47 4c 43 2d 53 58 2d 4d 4d 44 20 20 20 20 20 20 
0x01d0:         20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 59 
0x01e0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x01f0:         00 00 00 00 00 00 00 00 ff c0 ff ff ff c0 ff ff 

root@OpenWrt:/# grep . /sys/class/hwmon/hwmon0/*
/sys/class/hwmon/hwmon0/curr1_crit:90
/sys/class/hwmon/hwmon0/curr1_crit_alarm:0
/sys/class/hwmon/hwmon0/curr1_input:0
/sys/class/hwmon/hwmon0/curr1_label:bias
/sys/class/hwmon/hwmon0/curr1_lcrit:0
/sys/class/hwmon/hwmon0/curr1_lcrit_alarm:0
/sys/class/hwmon/hwmon0/curr1_max:85
/sys/class/hwmon/hwmon0/curr1_max_alarm:0
/sys/class/hwmon/hwmon0/curr1_min:0
/sys/class/hwmon/hwmon0/curr1_min_alarm:0
/sys/class/hwmon/hwmon0/in0_crit:3795
/sys/class/hwmon/hwmon0/in0_crit_alarm:0
/sys/class/hwmon/hwmon0/in0_input:3301
/sys/class/hwmon/hwmon0/in0_label:VCC
/sys/class/hwmon/hwmon0/in0_lcrit:2805
/sys/class/hwmon/hwmon0/in0_lcrit_alarm:0
/sys/class/hwmon/hwmon0/in0_max:3465
/sys/class/hwmon/hwmon0/in0_max_alarm:0
/sys/class/hwmon/hwmon0/in0_min:3135
/sys/class/hwmon/hwmon0/in0_min_alarm:0
/sys/class/hwmon/hwmon0/name:sfp_p10
/sys/class/hwmon/hwmon0/power1_crit:708
/sys/class/hwmon/hwmon0/power1_crit_alarm:0
/sys/class/hwmon/hwmon0/power1_input:0
/sys/class/hwmon/hwmon0/power1_label:TX_power
/sys/class/hwmon/hwmon0/power1_lcrit:89
/sys/class/hwmon/hwmon0/power1_lcrit_alarm:0
/sys/class/hwmon/hwmon0/power1_max:501
/sys/class/hwmon/hwmon0/power1_max_alarm:0
/sys/class/hwmon/hwmon0/power1_min:126
/sys/class/hwmon/hwmon0/power1_min_alarm:0
/sys/class/hwmon/hwmon0/power2_crit:1259
/sys/class/hwmon/hwmon0/power2_crit_alarm:0
/sys/class/hwmon/hwmon0/power2_input:0
/sys/class/hwmon/hwmon0/power2_label:RX_power
/sys/class/hwmon/hwmon0/power2_lcrit:6
/sys/class/hwmon/hwmon0/power2_lcrit_alarm:1
/sys/class/hwmon/hwmon0/power2_max:794
/sys/class/hwmon/hwmon0/power2_max_alarm:0
/sys/class/hwmon/hwmon0/power2_min:10
/sys/class/hwmon/hwmon0/power2_min_alarm:1
/sys/class/hwmon/hwmon0/temp1_crit:100000
/sys/class/hwmon/hwmon0/temp1_crit_alarm:0
/sys/class/hwmon/hwmon0/temp1_input:36879
/sys/class/hwmon/hwmon0/temp1_label:temperature
/sys/class/hwmon/hwmon0/temp1_lcrit:-50000
/sys/class/hwmon/hwmon0/temp1_lcrit_alarm:0
/sys/class/hwmon/hwmon0/temp1_max:85000
/sys/class/hwmon/hwmon0/temp1_max_alarm:0
/sys/class/hwmon/hwmon0/temp1_min:-40000
/sys/class/hwmon/hwmon0/temp1_min_alarm:0
/sys/class/hwmon/hwmon0/uevent:OF_NAME=sfp-p10
/sys/class/hwmon/hwmon0/uevent:OF_FULLNAME=/sfp-p10
/sys/class/hwmon/hwmon0/uevent:OF_COMPATIBLE_0=sff,sfp
/sys/class/hwmon/hwmon0/uevent:OF_COMPATIBLE_N=1
1 Like

Wow, that is seriously cool stuff. So now we have working PoE and SFP on a budget device! We need to get that into OpenWRT/master.

3 Likes

On a side note: There is now a description of some of the basic workings of the RTL838x/9x SoCs. We have tried to make it as easily readable as possible. I believe it is quite interesting to see how such a heart of a switch actually works: https://biot.com/switches/description_of_the_rtl_socs

2 Likes

I can also report some progress on the T1600G-52PS: Using my self-compiled U-Boot, I was able to boot OpenWrt on the switch.

Now the fun with GPIOs/LEDs and PoE can begin!

1 Like

I think I need some guidance with LEDs, i2c and GPIOs.

LEDs: So far, I've found some LEDs, but none of the port LEDs. I noticed that all port LEDs flash green for a second and then amber for another second if I set GPIO 37 high. I can see 13 HC164 shift registers near the port LEDs, but no RTL8231 GPIO expander (I did not look at the bottom of the board).

i2c: How do I determine the i2c pins that I assume are used for configuring the PoE PSE ICs?

GPIOs: Two of the three fans are speed-controlled. Any ideas on how to determine how they are controlled (There is a FET near the fan connectors, maybe they are PWM controlled?)?

With regards to the leds, the port-leds are probably steered directly by the SoC using two lines going into all those shift registers. If you want to control them yourself, have a look at the .dts of the T2500G, you need to put the leds under software control. Normally, they would be controlled directly by the SoC. The relevant code is in &gpio0 { take-port-leds; leds-per-port = <2>; led-mode = <0xf501ea>; num-leds = <32>; min-led = <0>; };
Concerning the PoE, you probably need to look into the data sheet of the PoE chip, identify where the I2C lines are and follow them to pins on either the SoC or the RTL8231. Maybe you have a picture for us to helpt?

With regards to the fans, there is not much experience. There seem to be automatic controls where the fans are directly steered by temperature sensors. In another case (GS728TPv2) there is an MSP430 microcontroller on an I2C bus controlled by the SoC somehow which does the steering, but it is not understood how this happens. Maybe a picture could help here.

I've put the pictures on the Wiki: https://biot.com/switches/t1600g-52ps

I also noticed that the port LEDs do not work in U-Boot either. They are probably (that's an educated guess) controlled by the software: Using the stock firmware, a button push toggles the LED functionality between speed indication and PoE status indication.

The speed of all three fans is supervised by the stock firmware, as soon as I stop one of the fans, the fan LED changes its color.

Regarding tracing the pins: I don't want to take off any of the soldered heat sinks and they are, unfortunately, too big to have access to the pins.

What's the current status here? I was convinced that I had tested vlan_filtering, but of course I forgot that I didn't have the full ip-route2 suite. Turns out the command I had put in rc.local was failing:

ip link set br-lan type bridge vlan_filtering 1 ageing_time 5000

so vlan_filtering was never enabled.

But I now started writing a real network config, trying to use the recend netifd support for DSA. Which does enable vlan_filtering by default on any bridge with VLANs configured. Which unfortunately seems to be broken?

I am testing with a simple setup with a tagged uplink port and a local subinterface:

root@OpenWrt:/tmp# brctl show
bridge name     bridge id               STP enabled     interfaces
br-lan          7fff.bccf4fd16b32       no              lan8
                                                        lan6
                                                        lan4
                                                        lan2
                                                        lan9
                                                        lan7
                                                        lan10
                                                        lan5
                                                        lan3
                                                        lan1
root@OpenWrt:/tmp# bridge vlan
port              vlan-id  
lan1              1 PVID Egress Untagged
lan2              1 PVID Egress Untagged
lan3              1 PVID Egress Untagged
lan4              1 PVID Egress Untagged
lan5              1 PVID Egress Untagged
lan6              1 PVID Egress Untagged
lan7              1 PVID Egress Untagged
lan8              1 PVID Egress Untagged
                  203
lan9              1 PVID Egress Untagged
lan10             1 PVID Egress Untagged
br-lan            1 PVID Egress Untagged
                  203
root@OpenWrt:/tmp# ifconfig br-lan.203
br-lan.203 Link encap:Ethernet  HWaddr BC:CF:4F:D1:6B:32  
          inet addr:192.168.99.51  Bcast:192.168.99.255  Mask:255.255.255.0
          inet6 addr: fe80::becf:4fff:fed1:6b32/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:378 errors:0 dropped:0 overruns:0 frame:0
          TX packets:22 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:19356 (18.9 KiB)  TX bytes:1852 (1.8 KiB)

root@OpenWrt:/tmp# ping 192.168.99.1 -c 3
PING 192.168.99.1 (192.168.99.1): 56 data bytes
64 bytes from 192.168.99.1: seq=0 ttl=64 time=1.182 ms
64 bytes from 192.168.99.1: seq=1 ttl=64 time=0.792 ms
64 bytes from 192.168.99.1: seq=2 ttl=64 time=0.771 ms

--- 192.168.99.1 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.771/0.915/1.182 ms

The setup on the other end of the link is similar, and I can see that the packets do arrive nicely tagged there, as expected. But as soon as I enable vlan_filtering, tre trafiic stops:

root@OpenWrt:/tmp# echo 1 >/sys/class/net/br-lan/bridge/vlan_filtering 
[61526.831621] rtl838x_vlan_filtering: port 8
[61526.837085] rtl838x_vlan_filtering: port 9
[61526.843048] rtl838x_vlan_filtering: port 10
[61526.849091] rtl838x_vlan_filtering: port 11
[61526.853781] rtl838x_vlan_filtering: port 12
[61526.858469] rtl838x_vlan_filtering: port 13
[61526.863160] rtl838x_vlan_filtering: port 14
[61526.867852] rtl838x_vlan_filtering: port 15
[61526.872547] rtl838x_vlan_filtering: port 24
[61526.877233] rtl838x_vlan_filtering: port 26
root@OpenWrt:/tmp# ping 192.168.99.1 -c 3
PING 192.168.99.1 (192.168.99.1): 56 data bytes

--- 192.168.99.1 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss

Nothing at all is received at the other side after this. Known issue? Something I broke? Or real bug?

It is very likely this is a bug in the driver. We will investigate this.

I'm making some progress: If I play around with GPIOs 33 and 37 (which are not real GPIOs, as far as I've understood) I can get the port LEDs to flicker amber.

What I don't really understand: According to the RLT838x leaked datasheet (I'm on a RTL8393M, I don't have a datasheet for that), there are only very few GPIO lines available. However, if I toggle GPIO 15 - which is not one of the GPIOs defined in the datasheet - I can measure a changed pin on the PoE board. Which pin did I actually toggle!? If I try to toggle GPIO14 - which is defined as a GPIO in the datasheet - I get an operation not permitted error - why is that?

In the meantime, I've identified the buttons and fan status inputs.

Thanks.

I found that I can sort of make it work by adding VIDs to a port after enabling vlan_filtering. Which I believe is a well known issue with DSA drivers (and maybe not well defined in the spec). I believe there were similar issues on mvebu and ramips. Maybe there is a solution we can copy there?

But there are probably more snags. Toggling vlan_filtering multiple times tends to end up with completely broken network, with duplicate and lost packets

1 Like

@bmork kindly run me by the exact sequence to make it work please. nbd added that new bridge-vlan feature to /e/c/network and it is not working out of the box for me. I have /etc/board.d patches ready to make the unit come up by default with a master switch0 bridge using that API. however the moment i add a vlan traffic dead locks. If I know the manual steps to fix it, I can ask Felix to augment that away inside netifd.

I was also thinking that it might make sense to have the default setup, that the switches run dhcp and have a static ip on vlan 100 on port1 for mgmt purposes ?!

"GPIO"s 32-63 map to the global LED control register, see a brief description here:

For the port leds you need to enable their control via the shift registers and then try pins:
* 64-95: PORT-LED 0 (green)
* 96-127: PORT-LED 1 /yellow
* 128-159: PORT-LED 2 (red)
This was never tested on an 9x, so maybe work on the driver is needed.

There is no datasheet for the RTL839x we know of, but the suspicion is that it has considerably more actual GPIO pins than the RTL838x, not just registers, which lead to no pins like the 8x.

If it says "operation not permitted" then it should be an issue with the .dts or a driver bug, the driver thinks that GPIO is not that type.

Yes, this is the behaviour I see too. So it's definitely not working like we need it to.

What I meant is that I can make this simple use case "work" with filtering:

clean boot without any matching bridge-vlan entries i /e/c/network to avoid having filtering enabled automatically.

So I start with this, where lan8 is connected to a trunk port accepting VLAN 203 on the other side:

root@OpenWrt:/# brctl show
bridge name     bridge id               STP enabled     interfaces
br-lan          7fff.bccf4fd16b32       no              lan8
                                                        lan3
root@OpenWrt:/# bridge vlan
port              vlan-id  
lan3              1 PVID Egress Untagged
lan8              1 PVID Egress Untagged
br-lan            1 PVID Egress Untagged
                  203
root@OpenWrt:/# ifconfig br-lan.203
br-lan.203 Link encap:Ethernet  HWaddr BC:CF:4F:D1:6B:32  
          inet addr:192.168.99.51  Bcast:192.168.99.255  Mask:255.255.255.0
          inet6 addr: fe80::becf:4fff:fed1:6b32/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:319 errors:0 dropped:0 overruns:0 frame:0
          TX packets:225 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:25124 (24.5 KiB)  TX bytes:21322 (20.8 KiB)

root@OpenWrt:/# grep . /sys/class/net/br-lan/bridge/vlan_filtering 
0

This state works in the meaning that there are VLAN 203 tagged packets coming out of lan8, and the VLAN 203 tagged packets received on lan8 are swtitched to the CPU port and end up on the subinterface there.

Enabling vlan_filtering in this state break this connectivity as expected. But then adding VLAN 203 to port lan8, makes it work again:

root@OpenWrt:/# echo 1 >/sys/class/net/br-lan/bridge/vlan_filtering 
[  420.852915] rtl838x_vlan_filtering: port 10
[  420.858389] rtl838x_vlan_filtering: port 15
root@OpenWrt:/# bridge vlan add vid 203 dev lan8
[  490.599784] rtl838x_vlan_prepare: port 15
[  490.604344] VLAN 0: L2 learning: 0, L2 Unknown MultiCast Field 0,            IPv4 Unknown MultiCast Field 0, IPv6 Unknown MultiCast Field: 0
[  490.604367] Tagged ports 10008400, untag 1fffffff, prof 0, MC# 0, UC# 0, FID 0
[  490.625525] rtl838x_vlan_prepare: port 28
[  490.630014] VLAN 0: L2 learning: 0, L2 Unknown MultiCast Field 0,            IPv4 Unknown MultiCast Field 0, IPv6 Unknown MultiCast Field: 0
[  490.630034] Tagged ports 10008400, untag 1fffffff, prof 0, MC# 0, UC# 0, FID 0
[  490.651191] rtl838x_vlan_add port 15, vid_end 203, vid_end 203, flags 0
[  490.658581] rtl838x_vlan_add port 28, vid_end 203, vid_end 203, flags 0
root@OpenWrt:/# bridge vlan
port              vlan-id  
lan3              1 PVID Egress Untagged
lan8              1 PVID Egress Untagged
                  203
br-lan            1 PVID Egress Untagged
                  203

A ping from the other side og the link, showing the exact moment when I added the VLAN. Note the DUP. I typically see that at this point:

From 192.168.99.1 icmp_seq=24288 Destination Host Unreachable
From 192.168.99.1 icmp_seq=24289 Destination Host Unreachable
From 192.168.99.1 icmp_seq=24290 Destination Host Unreachable
From 192.168.99.1 icmp_seq=24291 Destination Host Unreachable
From 192.168.99.1 icmp_seq=24292 Destination Host Unreachable
From 192.168.99.1 icmp_seq=24293 Destination Host Unreachable
From 192.168.99.1 icmp_seq=24294 Destination Host Unreachable
From 192.168.99.1 icmp_seq=24295 Destination Host Unreachable
64 bytes from 192.168.99.51: icmp_seq=24297 ttl=64 time=24.1 ms
64 bytes from 192.168.99.51: icmp_seq=24297 ttl=64 time=24.3 ms (DUP!)
64 bytes from 192.168.99.51: icmp_seq=24298 ttl=64 time=0.506 ms
64 bytes from 192.168.99.51: icmp_seq=24299 ttl=64 time=0.493 ms
64 bytes from 192.168.99.51: icmp_seq=24300 ttl=64 time=0.510 ms
64 bytes from 192.168.99.51: icmp_seq=24301 ttl=64 time=0.502 ms
64 bytes from 192.168.99.51: icmp_seq=24302 ttl=64 time=0.537 ms

So this is what I meant by "working". There are lots of issues. I have not been able to do this after netifd has configured VLANs and therefore filtering. I don't know why. And filtering on the CPU port doesn't exist at all, for good and for bad. And I cannot make this work with any untagged port.

With some luck. all of these issues are related to the same minor bug :slight_smile:

See the recent discussion here: https://lore.kernel.org/netdev/20200907182910.1285496-5-olteanv@gmail.com/T/

Getting closer with this:

bjorn@canardo:/usr/local/src/openwrt$ git diff
diff --git a/target/linux/rtl838x/files-5.4/drivers/net/dsa/rtl838x_sw.c b/target/linux/rtl838x/files-5.4/drivers/net/dsa/rtl838x_sw.c
index 034a20bd5f6e..fb9f7d5cf7f4 100644
--- a/target/linux/rtl838x/files-5.4/drivers/net/dsa/rtl838x_sw.c
+++ b/target/linux/rtl838x/files-5.4/drivers/net/dsa/rtl838x_sw.c
@@ -1282,6 +1282,8 @@ static int rtl838x_setup(struct dsa_switch *ds)
 
        rtl838x_init_stats(priv);
 
+       ds->configure_vlan_while_not_filtering = true;
+
        /* Enable MAC Polling PHY again */
        rtl838x_enable_phy_polling(priv);
        pr_info("Please wait until PHY is settled\n");

But there are still bugs left. One of the more interesting is that VLAN 1 is "magic". Testing with ths simplified /e/c/network:

config interface 'lan'
        option type 'bridge'
        option ifname 'lan3 lan8'
        option proto 'none'

config interface 'oob'
        option ifname 'br-lan.203'
        option proto 'static'
        option ipaddr '192.168.99.51'
        option netmask '255.255.255.0'
        option dns '192.168.99.1'
        option dns_search 'mork.no'

config bridge-vlan
        option device 'br-lan'
        option vlan '7'
        option ports 'lan3:* lan8:t'

config bridge-vlan 
        option device 'br-lan'
        option vlan '203'
        option ports 'lan8:t'

which makes netifd configure:

root@OpenWrt:/# bridge vlan
port              vlan-id  
lan3              7 PVID Egress Untagged
lan8              7
                  203
br-lan            7
                  203
root@OpenWrt:/# grep . /sys/class/net/br-lan/bridge/vlan_filtering 
1

This still does not work. But here comes the "magic": If I add VLAN 1 to port lan8, then VLAN 203 starts working. That is, I did

root@OpenWrt:/# bridge vlan add vid 1 dev lan8
[   87.512963] rtl838x_vlan_prepare: port 15
[   87.517520] VLAN 0: L2 learning: 0, L2 Unknown MultiCast Field 0,            IPv4 Unknown MultiCast Field 0, IPv6 Unknown MultiCast Field: 0
[   87.517542] Tagged ports 10008400, untag 1fffffff, prof 0, MC# 0, UC# 0, FID 0
[   87.538698] rtl838x_vlan_prepare: port 28
[   87.543187] VLAN 0: L2 learning: 0, L2 Unknown MultiCast Field 0,            IPv4 Unknown MultiCast Field 0, IPv6 Unknown MultiCast Field: 0
[   87.543206] Tagged ports 10008400, untag 1fffffff, prof 0, MC# 0, UC# 0, FID 0
[   87.564369] rtl838x_vlan_add port 15, vid_end 1, vid_end 1, flags 0
[   87.571373] rtl838x_vlan_add port 28, vid_end 1, vid_end 1, flags 0

After this I can ping the br-lan.203 interface from the other end of the lan8 link!

Which is quite surprising. VLAN 1 is not configured on the other side of the link. Ssnooping shows neither untagged nor VLAN 1 tagged packets on the link. Removing VLAN 1 again blocks VLAN 203. Adding any other VLAN to lan8 makes no difference. Adding VLAN 1 as an untagged PVID does work. The only trigger I've found enabling the port is adding a tagged VLAN 1. Which should not really be configured at all on this bridge.

I am hoping this magic behaviour is a signature bug for anyone knowing the inner workings on this driver...

In the original firmware, there seems to always be a VLAN 1 set up by default. So maybe that is the reason for that magic. I had already considered to set it up in the driver by default. But was not sure it was necessary and anyway better left to userspace.

This is easy to work around in userspace, so let's leave it there for now and see if the assumption holds.

But I still have serious trouble with "access ports", i.e a port with a single VLAN configured as PVID and untagged. I read the excellent hw design description on https://biot.com/switches/description_of_the_rtl_socs , but had one question I couldn't find the answer to: Should we create and update entries in the VLAN table when adding an untagged VLAN?

So I went looking at the SDK for an answer. And to my suprise, this is actually quite readable once you look through the auto-generated files and figure out which parts are related to this chip.

Looking at src/dal/maple/dal_maple_vlan.c I see that dal_maple_vlan_port_add() always set member_portmask, and additionally set untag_portmask for untagged VLANs., And _dal_maple_setVlan(), which writes VLAN entries to hw, will always program both the VLAN table and the UNTAG table, regardless of adding tagged or untagged VLANs. The end result is that member_portmask has all ports with that VLAN enabled, and untag_portmask will always hold the (possibly empty) subset of ports where this VLAN is untagged.

What do you think? I can try to cook up a patch and test.

EDIT: testing this now, with mostly success: https://github.com/bmork/openwrt/tree/rtl83xx-vlan

I am not sending it as a PR since there still is a major issue I can't figure out: Packets are duplicated when using PVID/untagged VLANs. I see the same packet transmitted both with and without a tag on both directions when forwarding between an untagged and a tagged port. So that works because the other end only picks up the correct one. But it's still a serious bug in my eyes.

And I am not sure I understand the outer/inner thing correctly. Is this supposed to deal with q-in-q? Or is the reference to outer and inner related to something else? I am not sure how we'd configure an inner PVID, but in my testing I could not get PVID VLANs working when programming the port for both inner and outer PVIDs. Might not be correct..

As for the duplicate packets - is there some egress filtering we are missing somewhere maybe?

I wonder if maybe there is something

Looking closer, I noticed that all ports in fact already are members of VLAN 1 after hw init. So you don't need to do anything to configure it.

The reason it doesn't work after netifd has setup vlan_filtering is that netifd explitly deletes VLAN 1. Probably because the bridge code add the default PVID, which also is 1 unless overridden, to all ports. So the netifd code works around that by deleting VLAN 1 (it probably wants to delete the current default_pvid of the bridge instead of the hard coded 1, but that's an entirely different discussion). This netifd workaround breaks the rtl838x since you honour the delete, removing the port from the hw default "allports" member mask.

Anyway, all that's needed to make it work and hide some of the ugliness is to drop the member mask update for VLAN 1 on delete.

Of course, it's still ugly in the way that you can't really remove VLAN 1 from any port. But this is a common enough problem that most people probably are aware of, and just stay away from ever using VLAN 1.

Hi, did someone test nat performance? luci work smooth?