Netgear Orbi Pro support-booting from MMC

a good way to bisec this is to check how the bootloader init the port

Are you sure that the bootloader configuration for the ethernet isn't overwritten by the kernel driver? However tomorrow I'll try to look for any hint, actualy I searched only in the kernel side of the GPL tarball without any success...

In the meantime let me recap the current situation and my tests for everyone.

The "Orbi Pro Router" is the SRR60 while the "Orbi Pro Satellite" is the SRS60.

Those are respectively the RBR50 and RBS50

I own just the SRS60, so from now on I will refer to my unit only. I'll call the ports from 1 (the one near the Sync button) to 4 (the one near the power button)


TEST 1 - OK
LAN: 1,2,3
WAN: 4

\etc\board.d\02_network:
		ucidef_set_interfaces_lan_wan "eth0" "eth1"
		ucidef_add_switch "switch0" \
			"0u@eth0" "1:lan" "2:lan" "3:lan"
DTS:
/ {
	soc {
		ess-switch@c000000 {
			status = "okay";

			switch_lan_bmp = <0x0e>;
			switch_wan_bmp = <0x10>;
		};
	};
};

&gmac0 {
	qcom,phy_mdio_addr = <4>;
	vlan_tag = <1 0x0e>;
};

&gmac1 {
	qcom,phy_mdio_addr = <3>;
	vlan_tag = <2 0x10>;
};

This configuration is working but it's not like the OEM configuration (port 1 and 4 are swapped).
If I connect the port 4 I have also this message (look at 90 seconds)

[   36.434811] ess_edma c080000.edma: eth1: GMAC Link is down
[   37.441007] br-lan: port 1(eth0) entered blocking state
[   37.441174] br-lan: port 1(eth0) entered forwarding state
[   37.446656] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[   90.552461] ess_edma c080000.edma: eth1: GMAC Link is up with phy_speed=1000
[   90.553446] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready

TEST 2 - KO
LAN: 2,3,3
WAN: 1

\etc\board.d\02_network:
		ucidef_set_interfaces_lan_wan "eth0" "eth1"
		ucidef_add_switch "switch0" \
			"0u@eth0" "2:lan" "3:lan" "4:lan"
DTS:
/ {
	soc {
		ess-switch@c000000 {
			status = "okay";

			switch_lan_bmp = <0x1c>;
			switch_wan_bmp = <0x02>;
		};
	};
};

&gmac0 {
	vlan_tag = <1 0x1c>;
};

&gmac1 {
	qcom,phy_mdio_addr = <0>;
	vlan_tag = <2 0x02>;
};

This configuration should mimic the OEM configuration. The Lan ports are working but the WAN port is not working correctly. I receive frames on the WAN port of the router (the RX counter increases when the wan is connected, but the TX counter is always 0).
Moreover, when I connect the WAN port I don't receive the message ess_edma c080000.edma: eth1: GMAC Link is up with phy_speed=1000

The cause of this seems to be qcom,phy_mdio_addr = <0>;. However the phy_mdio_addr 0 should be right because it shoud be tied to LAN 1. In fact if I change it to qcom,phy_mdio_addr = <1>; the WAN port detects the link if I connect the LAN port 2 and so on.

I managed to "solve" this using qcom,poll_required = <0>; in the &gmac1 section, but I don't like it very much because Openwrt thinks that the WAN port is always connected

And that can be exactly the problem... Could be that a special reg is needed to set to make the wan port works correctly... A good source would be check the bootloader for the reduced code...

It gets even stranger... The switch detects the link, and seems that the driver doesn't notify it to the system...

WAN port disconnected:
root@OpenWrt:/# swconfig dev switch0 show | grep link
	linkdown: ???
	link: port:0 link:up speed:1000baseT full-duplex txflow rxflow 
	link: port:1 link:down
	link: port:2 link:down
	link: port:3 link:down
	link: port:4 link:up speed:100baseT full-duplex txflow rxflow auto
	link: port:5 link:down
WAN port connected:
root@OpenWrt:/# swconfig dev switch0 show | grep link
	linkdown: ???
	link: port:0 link:up speed:1000baseT full-duplex txflow rxflow 
	link: port:1 link:up speed:1000baseT full-duplex txflow rxflow auto
	link: port:2 link:down
	link: port:3 link:down
	link: port:4 link:up speed:100baseT full-duplex txflow rxflow auto
	link: port:5 link:down

In the meantime I am still searching into the uboot part for some special register... however I have a doubt. If the bootloader sets some special register, it must be set for both the OEM image and the openwrt image because I boot them directly from the MMC. Or not?

if the system reset the switch all the special init uboot does are lost

I think I found the problem! :slight_smile:
https://github.com/openwrt/openwrt/blob/master/target/linux/ipq40xx/patches-5.4/705-net-add-qualcomm-ar40xx-phy.patch#L1882
This line prevents the status change of the port at addr 0. Removing that line makes the WAN port working as it should, but probably this isn't the right way.

Looking into the GPL sources (here), Qualcomm doesn't use the genphy_read_status function for port 0, but they change the status manually. EDIT: WRONG SOURCES!

What do you think about it? Whe should write to some core developer?

EDIT:
This user had the same problem with a different router:

Actually this looks wrong... why the mdio addr can't be zero... on ipq806x gmac we have gmac1 that has mdio addr to 0 and gmac2 that has mdio addr to 4... So an addr with mdio 0 could exist...

I think that the check is wrongly used to check if the interface is correctly setup... (no mdio addr = problem with configuration) but as it seems in some SoC to the addr 0 of the mdio interface something could be connected...

Also the fact that qcom doesn't use this check can confirm my theory...
The best thing would be check who wrote the driver and send an email about this.

I think a simple fix would be to check if phydev->mdio exist

Thanks for your advice... In the meantime I created a pull request for this device using the qcom,poll_required = <0>; trick because I don't think a proper fix would be ready soon or even before the DSA driver for ipq40xx.

Reading here, the creator of the patch should be @chunkeey. Am I right?

If you want the fix would just consist in

if (phydev->mdio)
		return genphy_read_status(phydev);

But I need to check if mdio is init even with errors or not... If you want I can check this.

Yeah, its gotta be wrong as unless RGMII based PHY is used there will be QCA807x PHY connected on MDIO addresses 0-5.
0-4 are gigabit ports from the switch and 5 is PSGMII PHY that gives you 5xSGMII in one interface to the switch.

That whole driver was supposed to be a temporary solution but it ended being a mess still used today.
Mostly due to the whole mess, Qualcomm made with PSGMII and needing to access both PHY registers as well as the SoC memory mapped registers in different subsystems.
Pretty much ignoring the Linux model of PHY-s being their own thing and forcing you to mess with both ethernet, switch and PHY registers at the same time.

So they made things even more complex than before ahahah. And this is funny since dsa doesn't support multiport and would use only one of this communication path...

DSA is perfect here as IPQ40xx only has one ethernet port and that's the switch uplink.
But it's damn hard to integrate everything to work.

Okay... I am learning now how these things works. Let me know if what I am saying is correct. :smile:

The function ar40xx_phy_read_status gets called after every "polling event" (I don't know if there is a specific name for this) on a specific MDIO address associated to a PHY.

However, because of the wrong implementation of the driver, we can choose an MDIO address for every GMAC port (CPU <--> SWITCH), instead of choosing an MDIO address for every PHY (SWITCH <--> LANs). Right?

Besides theory, I think that in the current state, the ar40xx_phy_read_status function is pointless, because if we set a wrong MDIO address, the driver crashes during the initialization. We can then remove it and the ar40xx_phy_config_aneg function altogether because of this.

I removed these functions, and everything seems to work correctly on my SRS60.

If we want to check this, I think that we can use something like this:

if (mode != PORT_WRAPPER_PSGMII) {
	if (phydev->mdio.addr != 0)
		return genphy_read_status(phydev);
}

I will try it in few hours :wink:

EDIT:
@Ansuel

if (phydev->mdio)

doesn't compile: error: used struct type value where scalar is required.
I tried:

if (phydev->mdio.bus != NULL)

and seems good to me if we want to check the mdio existence...

Hi,
Well, I don't know why MDIO 0 is supposed to be special at all.
We don't choose MDIO addresses for every GMAC as GMAC-s don't talk via MDIO and we only have 1 actual physical GMAC.
2 GMAC-s is just trickery that the current driver does.
Yeah, ar40xx_phy_read_status should be pointless as it will just call the generic Clause 22 compliant status poll function.

Probably, as Ansuel said, is just a leftover from an old check if the mdio exists or not. I don't know if removing it will generate problems with devices using the PHY in RGMII mode like the AVM Fritzrepeater 1200...

Yep that should be the right condition... (didn't notice that mdio wasn't a pointer...)

Is there anybody willing to help to test on 5 port and RGMII boards under OpenWrt?

Anyone willing to do some testing?

I have it all working except vlan doesn't work. I see that there have been various issues with ipq40xx and vlan. Can someone summarize? Is it possible to use vlan on this device currently, with kernel 5.4?
Thanks.

I had it working but sometimes it randomly stopped working. My fix is a USB-Ethernet Interface. That works without any issues now. I never really figured out what the onboard NIC issue was, since the device has a USB Port it was just easier to use that one.