Netgear Orbi Pro support-booting from MMC

Never mind. It seems the offset & size of the first kernel is the same, and the offset for rootfs is also the same for the layouts I looked at - I guess those are the numbers that matter. There must be a way official firmware builds up the partitions. Personally I'll run extroot on reserved and unallocated space LOL.
Edit: maybe we can be more audacious and set all the space following rootfs for use in rootfs? I know for Netgear WNDR4500v3/4300v2 all nonsense partitions are combined to rootfs. I'll test that on my rbr50.

I was running RBR50 firmware 2.3.0.32 before flashing. The blkdevparts is set as follows to use all space following kernel (3.3 G:-). (As a side note, when I put all OEM partitions into the blkdevparts, dmesg reported error size in partition cmdline and loaded GPT table with no mmcblk0p20; not sure why...)

blkdevparts=mmcblk0:512K@17K(0:SBL1),512K(0:BOOTCONFIG),512K(0:QSEE),512K(0:QSEE_ALT),256K(0:CDT),256K(0:CDT_ALT),256K(0:DDRPARAMS),256K(0:APPSBLENV),1M(0:APPSBL),1M(0:APPSBL_ALT),256K(0:ART),256K(ARTMTD),2M(language),256K(config),256K(pot),256K(traffic_meter),256K(pot_bak),256K(traffic_meter.bak),3840K(kernel),-(rootfs)

For Makefile, according to efi.c from firmware 2.3.0.32 I changed to these numbers: ROOTFS_SIZE := 48496640, IMAGE_SIZE := 52428800

And the generated images work well:

Flashing from router web interface works. Overlayfs is in /tmp/root after flashing; requires a subsequent sysupgrade to initialize the filesystem with 3.3G size. Even if I erase the previous kernel-2 partition the router still boots well.

Reverting to OEM firmware (2.3.0.32) works fine using nmrpflash. 2nd firmware is missing so it requires updating again with OEM firmware to populate 2nd firmware partitions (for subsequent webpage flashing to openwrt).

Failsafe, ethernet, three Wifis, USB, LEDs, reset button and WPS button all work. The CSR8811 bluetooth and its ttyQHS0 are not there, same case with Linksys EA8300. Anyway it's really awesome!

Hi @zhoushiyi213, thank you very much for your tests :slight_smile:

Probably the safest way is to use the smallest rootfs partition. The main advantage is that other partitions wont be messed up (allowing the user to boot the other partition without re-flashing [1])
While the main disadvantage is that the user will have less free space ~31 Megabyte vs ~47 Megabyte. (However I think that even 31 Megabyte is more than enough space for the average user)

I am almost sure that this is feasible but, as I said before, in my opinion it's not worth the risk messing with partitions considering the 31 Megabyte of free space (even if the risk is very very small).

Obviously this is just my opinion... We should ask the reviewers what they think about it for a definitive answer :slight_smile:

However I have a bigger problem now to solve: the wan port.

Is the wan port working for you? I wasn't able to get it working. The SRS60 doesn't have the WAN port in the OEM fw, so I didn't add it in the openwrt, but your RBR50 (and SRR60) has one and should be the port 1 of the switch.
Using this patch should be enough, but on my router only the RX counter of the WAN port works, while TX counter is always 0.

If you still have the OEM fw on the router, could you please boot it and show me te output of these commands?

hexdump /proc/device-tree/soc/ess-switch@c000000/switch_lan_bmp
hexdump /proc/device-tree/soc/ess-switch@c000000/switch_wan_bmp
hexdump /proc/device-tree/soc/edma@c080000/gmac0/vlan_tag
hexdump /proc/device-tree/soc/edma@c080000/gmac1/vlan_tag

[1] Currently is possible to read the current boot partition reading the 297th byte of /dev/mmcblk0p12 using this line (at least on SRS60/SRR60):

hexdump -v -s 297 -n 1 -e '1 "%_p"' /dev/mmcblk0p12

Changing that byte to 1 or 2, is possible to boot the other partition set
Unfortunately I don't know how emmc works so I don't know if is possible to edit a single byte on the storage or we must dump it, edit and then write back. If you know wich is the safer method, we can create a simple script to allow the user to reboot on the OEM partition :wink:

Thanks for the feedback! No I didn't test the wan port; I am using a wierd internet setup at home and I don't know much about vlan stuff so I neglected it. I test the wan port tomorrow or over the weekend.

I agree with your compatibility point. But, for the partition layout, is it possible to create two profiles? Or I guess I'll figure out a way to edit the device tree before flashing when it comes to mainline (clearly uboot checks the checksum so direct binary editing doesn't work :sweat_smile:).

As for the emmc, based on my experience with smartphones and this router it is safe to treat it as block devices. There is already a flash transition layer in between. I've been using dd and cat for a quite while on mmcblk0 of many of my phones for flashing purposes and it works fine.

1 Like

Here's the output. I'll follow up with the openwrt wan tests later.

root@RBR50:/# hexdump /proc/device-tree/soc/ess-switch@c000000/switch_lan_bmp
0000000 0000 1e00
0000004
root@RBR50:/# hexdump /proc/device-tree/soc/ess-switch@c000000/switch_wan_bmp
0000000 0000 2000
0000004
root@RBR50:/# hexdump /proc/device-tree/soc/edma@c080000/gmac0/vlan_tag
0000000 0000 0200 0000 0200
0000008
root@RBR50:/# hexdump /proc/device-tree/soc/edma@c080000/gmac1/vlan_tag
0000000 0000 0100 0000 3c00
0000008

Thank you!
Those values were very similar to mine, but the missing piece was:
qcom,phy_mdio_addr = <4>; and qcom,phy_mdio_addr = <3>; in the gmac0 and gmac1 section.

I bet that if you type these two lines in the OEM fw

hexdump /proc/device-tree/soc/edma@c080000/gmac1/qcom,phy_mdio_addr
hexdump /proc/device-tree/soc/edma@c080000/gmac0/qcom,phy_mdio_addr

you will get something like
0000000 0000 0300 and 0000000 0000 0400

However the wan port should be working now :slight_smile:

I also fixed the led brightness (actually some of the 8 leds were off)

Here you can find my latest commit rebased on the lastest master:

When we will find someone with an SRR60 and an RBS50 we could create the pull request :slight_smile:

Unfortunately the wan port is still not working correctly. :roll_eyes:
It works only if I connect my pc to the 4th port.... it's a very strange behaviour.
Probably I have to test some more qcom,phy_mdio_addr combination in the next few days...

a good way to bisec this is to check how the bootloader init the port

Are you sure that the bootloader configuration for the ethernet isn't overwritten by the kernel driver? However tomorrow I'll try to look for any hint, actualy I searched only in the kernel side of the GPL tarball without any success...

In the meantime let me recap the current situation and my tests for everyone.

The "Orbi Pro Router" is the SRR60 while the "Orbi Pro Satellite" is the SRS60.

Those are respectively the RBR50 and RBS50

I own just the SRS60, so from now on I will refer to my unit only. I'll call the ports from 1 (the one near the Sync button) to 4 (the one near the power button)


TEST 1 - OK
LAN: 1,2,3
WAN: 4

\etc\board.d\02_network:
		ucidef_set_interfaces_lan_wan "eth0" "eth1"
		ucidef_add_switch "switch0" \
			"0u@eth0" "1:lan" "2:lan" "3:lan"
DTS:
/ {
	soc {
		ess-switch@c000000 {
			status = "okay";

			switch_lan_bmp = <0x0e>;
			switch_wan_bmp = <0x10>;
		};
	};
};

&gmac0 {
	qcom,phy_mdio_addr = <4>;
	vlan_tag = <1 0x0e>;
};

&gmac1 {
	qcom,phy_mdio_addr = <3>;
	vlan_tag = <2 0x10>;
};

This configuration is working but it's not like the OEM configuration (port 1 and 4 are swapped).
If I connect the port 4 I have also this message (look at 90 seconds)

[   36.434811] ess_edma c080000.edma: eth1: GMAC Link is down
[   37.441007] br-lan: port 1(eth0) entered blocking state
[   37.441174] br-lan: port 1(eth0) entered forwarding state
[   37.446656] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[   90.552461] ess_edma c080000.edma: eth1: GMAC Link is up with phy_speed=1000
[   90.553446] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready

TEST 2 - KO
LAN: 2,3,3
WAN: 1

\etc\board.d\02_network:
		ucidef_set_interfaces_lan_wan "eth0" "eth1"
		ucidef_add_switch "switch0" \
			"0u@eth0" "2:lan" "3:lan" "4:lan"
DTS:
/ {
	soc {
		ess-switch@c000000 {
			status = "okay";

			switch_lan_bmp = <0x1c>;
			switch_wan_bmp = <0x02>;
		};
	};
};

&gmac0 {
	vlan_tag = <1 0x1c>;
};

&gmac1 {
	qcom,phy_mdio_addr = <0>;
	vlan_tag = <2 0x02>;
};

This configuration should mimic the OEM configuration. The Lan ports are working but the WAN port is not working correctly. I receive frames on the WAN port of the router (the RX counter increases when the wan is connected, but the TX counter is always 0).
Moreover, when I connect the WAN port I don't receive the message ess_edma c080000.edma: eth1: GMAC Link is up with phy_speed=1000

The cause of this seems to be qcom,phy_mdio_addr = <0>;. However the phy_mdio_addr 0 should be right because it shoud be tied to LAN 1. In fact if I change it to qcom,phy_mdio_addr = <1>; the WAN port detects the link if I connect the LAN port 2 and so on.

I managed to "solve" this using qcom,poll_required = <0>; in the &gmac1 section, but I don't like it very much because Openwrt thinks that the WAN port is always connected

And that can be exactly the problem... Could be that a special reg is needed to set to make the wan port works correctly... A good source would be check the bootloader for the reduced code...

It gets even stranger... The switch detects the link, and seems that the driver doesn't notify it to the system...

WAN port disconnected:
root@OpenWrt:/# swconfig dev switch0 show | grep link
	linkdown: ???
	link: port:0 link:up speed:1000baseT full-duplex txflow rxflow 
	link: port:1 link:down
	link: port:2 link:down
	link: port:3 link:down
	link: port:4 link:up speed:100baseT full-duplex txflow rxflow auto
	link: port:5 link:down
WAN port connected:
root@OpenWrt:/# swconfig dev switch0 show | grep link
	linkdown: ???
	link: port:0 link:up speed:1000baseT full-duplex txflow rxflow 
	link: port:1 link:up speed:1000baseT full-duplex txflow rxflow auto
	link: port:2 link:down
	link: port:3 link:down
	link: port:4 link:up speed:100baseT full-duplex txflow rxflow auto
	link: port:5 link:down

In the meantime I am still searching into the uboot part for some special register... however I have a doubt. If the bootloader sets some special register, it must be set for both the OEM image and the openwrt image because I boot them directly from the MMC. Or not?

if the system reset the switch all the special init uboot does are lost

I think I found the problem! :slight_smile:
https://github.com/openwrt/openwrt/blob/master/target/linux/ipq40xx/patches-5.4/705-net-add-qualcomm-ar40xx-phy.patch#L1882
This line prevents the status change of the port at addr 0. Removing that line makes the WAN port working as it should, but probably this isn't the right way.

Looking into the GPL sources (here), Qualcomm doesn't use the genphy_read_status function for port 0, but they change the status manually. EDIT: WRONG SOURCES!

What do you think about it? Whe should write to some core developer?

EDIT:
This user had the same problem with a different router:

Actually this looks wrong... why the mdio addr can't be zero... on ipq806x gmac we have gmac1 that has mdio addr to 0 and gmac2 that has mdio addr to 4... So an addr with mdio 0 could exist...

I think that the check is wrongly used to check if the interface is correctly setup... (no mdio addr = problem with configuration) but as it seems in some SoC to the addr 0 of the mdio interface something could be connected...

Also the fact that qcom doesn't use this check can confirm my theory...
The best thing would be check who wrote the driver and send an email about this.

I think a simple fix would be to check if phydev->mdio exist

Thanks for your advice... In the meantime I created a pull request for this device using the qcom,poll_required = <0>; trick because I don't think a proper fix would be ready soon or even before the DSA driver for ipq40xx.

Reading here, the creator of the patch should be @chunkeey. Am I right?

If you want the fix would just consist in

if (phydev->mdio)
		return genphy_read_status(phydev);

But I need to check if mdio is init even with errors or not... If you want I can check this.

Yeah, its gotta be wrong as unless RGMII based PHY is used there will be QCA807x PHY connected on MDIO addresses 0-5.
0-4 are gigabit ports from the switch and 5 is PSGMII PHY that gives you 5xSGMII in one interface to the switch.

That whole driver was supposed to be a temporary solution but it ended being a mess still used today.
Mostly due to the whole mess, Qualcomm made with PSGMII and needing to access both PHY registers as well as the SoC memory mapped registers in different subsystems.
Pretty much ignoring the Linux model of PHY-s being their own thing and forcing you to mess with both ethernet, switch and PHY registers at the same time.

So they made things even more complex than before ahahah. And this is funny since dsa doesn't support multiport and would use only one of this communication path...

DSA is perfect here as IPQ40xx only has one ethernet port and that's the switch uplink.
But it's damn hard to integrate everything to work.

Okay... I am learning now how these things works. Let me know if what I am saying is correct. :smile:

The function ar40xx_phy_read_status gets called after every "polling event" (I don't know if there is a specific name for this) on a specific MDIO address associated to a PHY.

However, because of the wrong implementation of the driver, we can choose an MDIO address for every GMAC port (CPU <--> SWITCH), instead of choosing an MDIO address for every PHY (SWITCH <--> LANs). Right?

Besides theory, I think that in the current state, the ar40xx_phy_read_status function is pointless, because if we set a wrong MDIO address, the driver crashes during the initialization. We can then remove it and the ar40xx_phy_config_aneg function altogether because of this.

I removed these functions, and everything seems to work correctly on my SRS60.

If we want to check this, I think that we can use something like this:

if (mode != PORT_WRAPPER_PSGMII) {
	if (phydev->mdio.addr != 0)
		return genphy_read_status(phydev);
}

I will try it in few hours :wink:

EDIT:
@Ansuel

if (phydev->mdio)

doesn't compile: error: used struct type value where scalar is required.
I tried:

if (phydev->mdio.bus != NULL)

and seems good to me if we want to check the mdio existence...