DSA: two-switch setup in devicetree

That's it! The driver was hardcoding to ext1. Now I have traffic. Let me undo every hack I did...

1 Like

In my case, it is wan (port 4/(0..4)). It might be a similar problem. I'll post if I have any news.

1 Like

I still couldn't find the issue... during the bootloader, the wan port behave just like a lan port. However, once it boots, that behavior changes. I believe that wan goes to a block state (to avoid LAN leakage until a driver boots). We just need to revert that. If it is just vlan conf, it should still register link events.

RTL8367RB is very close to RTL8367S that I have. If you want to use swconfig, you can use target/linux/mediatek/files-5.10/drivers/net/phy/rtk/rtl8367c driver (for example: https://github.com/openwrt/openwrt/pull/4178) or the modified rtl8367b (https://github.com/openwrt/openwrt/pull/4327)

The DSA upstream driver I'm basing my tests was tested in a chip version that only have 4 LAN ports and no WAN. If the driver have an issue with WAN, the developer would not see it. I added extra initialization code from other driver but nothing solved the issue.

I dumped all registry and I'm trying to look for some explanation.

wan port = port4 of rtl8367c?
Your register dump show that phy of port4 in power down state (reg 0x2080 = 0x1940)

Thanks a lot! But I'm not sure that it is the cause or a symptom.
And yes, it is port4. My ports are (0=lan4, 1=lan3, 2=lan2, 3=lan1, 4=wan, 7=cpu).

When I bring an interface down (ip link set dev lan1 down), I can see it modifying the PHY3@0x0 to 1940 and register at 0x2060 instantaneously reflects that change. For link up, the process is the same but the value is 1140.

Now, when I monitor the same process with wan, the 0x2080 becomes out of sync. Although I can write and read back 1140 at PHY4@0x0, the respective 0x2080 is still 1940.

Just for test add { 0x2080, 0x1340 }, reg/value pair into rtl8367c_init_jam_common init array.
Where value 0x1340 is cleared power down (bitmask 0x800) and set restart autoneg (bitmask 0x200).

0x2080 is 0x1940 after reset (just like all other ports). During DSA setup, it changes to 0x1140 with no one actually writing to "phy4 0x0" (at least not through dsa_switch_ops->phy_write).

The interesting part is that I can load rtl8367b driver (with rtl8367s patches) together with the DSA driver. It messes with CPU port but it gives me an interesting view inside the switch. swconfig does not have a problem detecting link:

root@OpenWrt:~# swconfig dev switch1 show | grep link
        link: port:0 link:down
        link: port:1 link:down
        link: port:2 link:up speed:1000baseT full-duplex auto
        link: port:3 link:down
        link: port:4 link:up speed:1000baseT full-duplex txflow rxflow auto
        link: port:5 link:down
        link: port:6 link:down
        link: port:7 link:up speed:1000baseT full-duplex txflow rxflow

While wan port is still down.

Before writing that register, PHY4@0x0 and 0x2080 were both 0x1940. When I wrote 0x1340 to 0x1940, it became 0x1140 but PHY4@0x0 was still 0x1940. After I wrote 0x1340 to PHY4@0x0, both
became 0x1140. wan is still down.

I still does not understand the relation between register PHY @ 0x0 and 0x2000+(n*0x20). Is the chip or the driver that synchronizes these two values? Would a change either location affect the other one? I just know that for port4 (wan), that relation is broken.

I'm starting to wonder if it is something special about DSA. The port4 register 0x1356 does change from 0x00e0 to 0x01f6 when a cable is plugged and that seems to be enough for swconfig.

After I added lots and lots of printk inside DSA, I noticed that PHY4@0x0 changes from 0x1940 to 0x1140 during the first time dsa_port_setup(dp) is executed (for port 0), while PHY1@0x0, PHY2@0x0 and PHY3@0x0 are unchanged. Maybe something is overflowing. I'll try to get even deeper inside the driver.

My English is not well. So it difficult for me to explain all your questions. Moreover I'm not sure that my understanding of linux kernel internals and hardware of rtl8367s is correct enough. So I'll only try to comment your message :slight_smile:
And not in order :frowning:

As I pointed above it's default state of internal phys after hardware reset.

phy_{read,write) use indirect access to phy regs. See here and here. Direct access ignore busy/wait flags in indirect control registers (but it work during init well). Bitmask 0x200 in PHY@0 force autoneg restart and it selfcleared after completion. So value 0x1140 after writing 0x1340 is correct.

swconfig read connection state directly from port's phy. And no problem because of in swconfig config there isn't linux netdev object for each port of rtl8367s. Only for SoC ethernet netdev.
In dsa config kernel try to create netdev for all ports of rtl8367s. But if netdev is not connected to phy then netdev don't known state of physical link. So wan netdev is down.
BTW If you see here and compare with:

[  225.759377] realtek-mdio mdio-bus:1d lan1 (uninitialized): failed to connect to PHY: -ENODEV
[  225.768034] realtek-mdio mdio-bus:1d lan1 (uninitialized): error -19 setting up PHY for tree 0, switch 0, port 0

you realize that dsa fail to set correspondence between port (and netdev for this port) and phy.

As author of original path for support mdio-realtek switches and rtl8367s switch and it's simpler for me to "link" to swconfig files. So I'm sorry. But I'm not familiar enough to dsa config to point errors.
Just for test I suggest to define port_{enable,disable} members of ds_ops just as simple hook to determine where dsa try to enable/disable port4 (wan)?

I found the problem! Indirect access to port 4 was really accessing port 0:

https://lore.kernel.org/netdev/20211126063645.19094-1-luizluca@gmail.com/T/#u

1 Like

@LinuxInside I think that QCA8k ports should not be in the gswip definition (phy query fails) and the qca8k register is 0x18 and not 0x10 (according to the avm sources). I also think that the phys need to be defined on the gswip mdio, because the mdio bus cannot be allocated twice.
In the avm source from the vlan_setup method, see here, it looks like the cpu port is 0 and the vlanids from the config are 2 and 3, so I would guess that its ports 2 and 3 on the qca8k that are the actual ports used for the 5490/91, but the mdio addresses are phy1 and phy2.
I dont know if two ports are allowed to point to the same ethernet gmac (ethernet = <&eth0>;) that is unclear to me. And what is also left, how the two switches can be connected in the device tree.

Nevertheless, maybe the following works to bring up the qca8k (but you need to check in the kernel sources, if the module is actually compiled).

&gswip_mdio {
	//disable because its the qca8k switch cpu
	phy0: ethernet-phy@0 {
		status = "disabled";
	};

	//leave phy1 enabled and add phy2 of qca8k
	phy2: ethernet-phy@2 {
		reg = <0x02>;
	};

	phy5: ethernet-phy@5 {
		reg = <0x05>;
	};

	phy6: ethernet-phy@6 {
		reg = <0x06>;
		reset-gpios = <&gpio 32 GPIO_ACTIVE_LOW>;
	};

	phy9: ethernet-phy@9 {
		reg = <0x09>;
	};

	phy11: ethernet-phy@11 {
		status = "disabled";
	};

	phy13: ethernet-phy@13 {
		status = "disabled";
	};
	
	//add the qca8k definition here

        switch@18 {
            compatible = "qca,qca8337";
            #address-cells = <1>;
            #size-cells = <0>;
            reset-gpios = <&gpio 44 GPIO_ACTIVE_LOW>;
            reg = <0x18>;

            ports {
                #address-cells = <1>;
                #size-cells = <0>;

                port@0 {
                    reg = <0>;
                    label = "cpu2";
                    ethernet = <&eth0>;
                    phy-mode = "rgmii";

                    fixed-link {
                        speed = <1000>;
                        full-duplex;
                    };
                };

                port@2 {
                    reg = <2>;
                    label = "lan6";
                    phy-handle = <&phy1>;
                };

                port@3 {
                    reg = <3>;
                    label = "lan7";
                    phy-handle = <&phy2>;
                };
             };
         };
};

&gswip_ports {
	port@0 {
		reg = <0>;
		label = "wan";
		phy-mode = "sgmii";
		phy-handle = <&phy6>;
	};

	port@1 {
		reg = <1>;
		label = "lan3";
		phy-mode = "rgmii";

		fixed-link {
			speed = <1000>;
			full-duplex;
		};
	};

	port@2 {
		reg = <2>;
		label = "lan2";
		phy-mode = "internal";
		phy-handle = <&phy5>;
	};

	port@4 {
		reg = <4>;
		label = "lan1";
		phy-mode = "internal";
		phy-handle = <&phy9>;
	};
};

Maybe you can test it and post any qca8k related messages from dmesg. Or maybe there is the possibility to test, if the two ports are working, but just not connected to the gswip. Anyway it seems hard to interpret those without having the actual device, I also wonder if phy0 needs to be linked to the cpu port and not being defined as fixed link.

The above does not work, however with moving the qca8k port to internal of the switch and without specifying a link between the two switches, there is error -EEXIST, so I assume that the probe has actually worked, now maybe this works:

&gswip_mdio {
 	phy5: ethernet-phy@5 {
 		reg = <0x05>;
 	};

 	phy6: ethernet-phy@6 {
 		reg = <0x06>;
 		reset-gpios = <&gpio 32 GPIO_ACTIVE_LOW>;
 	};

 	phy9: ethernet-phy@9 {
 		reg = <0x09>;
 	};

 	switch@18 {
 		compatible = "qca,qca8337";
 		#address-cells = <1>;
 		#size-cells = <0>;
 		reset-gpios = <&gpio 44 GPIO_ACTIVE_LOW>;
 		reg = <0x18>;

 		dsa,member = <0 1>;

 		ports {
 			#address-cells = <1>;
 			#size-cells = <0>;

 			sw1port0: port@0 {
 				reg = <0>;
 				label = "cpu";
 				phy-mode = "rgmii";
 				link = <&sw0port1>;

 				fixed-link {
 					speed = <1000>;
 					full-duplex;
 				};
 			};

 			port@2 {
 				reg = <2>;
 				label = "lan6";
 				phy-handle = <&phy1>;
 			};

 			port@3 {
 				reg = <3>;
 				label = "lan7";
 				phy-handle = <&phy2>;
 			};
 		};

 		mdio {
 			phy0: ethernet-phy@0 {
 				reg = <0x00>;
 			};

 			phy1: ethernet-phy@1 {
 				reg = <0x01>;
 			};

 			phy2: ethernet-phy@2 {
 				reg = <0x02>;
 			};
 		};
 	};
 };