Cudy M3000 with Motorcomm PHY -- how to fix it

Hello,

I am seeking feedback on how to best add support for the new Cudy M3000 with the Motorcomm YT8821 PHY on its WAN port. The current situation is that I understand why the 2.5G WAN port behaves unreliably and have one possible fix ready as a PR (beware of the giant message history). However, I am not sure which approach is best for OpenWrt.

EDIT: Some additional context that I forgot to add: an older hardware revision of the Cudy M3000 is already supported by OpenWrt. That hardware revision had a different PHY chip on the WAN port (RTL8221B-VB-CG).

Background

The 2.5G WAN port in the new Cudy M3000 is connected to the Motorcomm YT8821 PHY. The PHY is not working reliably because there is an address conflict on the router's MDIO bus.

  • There is an internal GbE PHY in the MT7981B SoC (this PHY is connected to the LAN port). This PHY always listens on MDIO address 0.

  • Then there is the external YT8821 PHY. This PHY is strapped to listen on MDIO address 1, but the PHY also listens on address 0 by default (Motorcomm considers that address a broadcast address by default).

The result is that the YT8821 reacts to commands intended for the internal MT7981B PHY and this messes up its internal state. For example, it only runs at 100 Mbps on 1 Gbps links. Another example is that calling "ip link set dev eth1 down" (e.g. bringing down the LAN port) brings down the WAN port as well.

The canonical way of fixing this is to reconfigure one internal register in the YT8821 such that the YT8821 stops listening on address 0. The problem is how to achieve that.

What I have tried

I initially thought that this could be fixed in the kernel. I prototyped some patches and sent them to LKML, but the patches do not fit nicely into mainline (the fix is either not reliable, or it requires ugly changes in the core code) and so I'd say they were rejected.

The two kernel-based approaches I tried were the following:

  • First, I tried to patch the YT8821 driver so that it would stop the PHY from listening on address 0 during PHY probing. This fixed the PHY issues on the router, but closer discussion revealed that this fix is not guaranteed to work. The problem is that a probe callback in the driver is called too late. That is, the kernel will communicate with the MT7981B on address 0 before the YT8821 is told to not listen on address 0. On the other hand, the patch was just a few lines of code.

    After sending the patch to LKML, I also received feedback that the PHY should be reconfigured by the bootloader/U-Boot before the kernel boots.

    If anyone wants to try this patch, it is just the https://github.com/openwrt/openwrt/pull/21584 PR.

  • Replacing the U-Boot in the Cudy M3000 is not without disadvantages (IIUC you lose the easy TFTP rollback path to the Cudy firmware). So I tried to rework how Linux detects and brings up the PHYs such that the YT8821 would be reconfigured before Linux first touches MDIO address 0. I did succeed at that and I think the fix is now reliable. However, the cost paid for that is that the patch is more complex and touches the core PHY infrastructure in the Linux kernel.

    I sent this patch set to LKML and I again received feedback that this should be fixed by the bootloader. The PHY subsystems maintainers were (quite understandably) uneasy about the workarounds introduced to the PHY core code. (Please don't harrass them, I internally mostly agree with their conclusions).

    If anyone wants to try this patch, it is on the following branch: https://github.com/JakubVanek/openwrt/tree/cudy-m3000-kernel-based-solution/

So, I think the result is that there is no kernel-based solution that would be acceptable to the mainline kernel.

I have some ideas about how to handle this in U-Boot. I played with the U-Boot CLI and I think that the fix can be implemented through a few U-Boot commands. You just need to deassert the PHY reset line, wait for a bit and then do a few MDIO bus writes. However, the commands (mii) are not supported by the stock Cudy U-Boot and so another U-Boot build would have to be used.

What now?

I am not sure how to proceed. As I outlined in https://github.com/openwrt/openwrt/pull/21584#issuecomment-3980643746 , I see the following options:

  • OpenWrt could carry the "simple" kernel patch that appears to be working, but is not guaranteed to do so. The patch is not upstreamable.

  • OpenWrt could carry the "complex" kernel patch that is more robust, but will be more difficult to keep up-to-date and carries more risk (it touches core code instead of just one driver). This patch is also not upstreamable.

  • I think another "simple" kernel patch could be created. That patch would hard-reset the YT8821 PHY during its initialization and then quickly disable the broadcast address. This could be more reliable than the original "simple" patch. I think a similar patch was used for the RTL8221B-VB-CG earlier. The disadvantage here is that I think it might not be possible to use the same device tree for the old RTL8821B-based Cudy M3000 and for the new YT8821-based Cudy M3000. (How should new users know which one to pick, though?) I also doubt that this patch will be upstreamable.

  • OpenWrt could adopt a new "second-stage" U-Boot for the Cudy M3000. The idea is that the vendor U-Boot would boot another small U-Boot that would fix the PHY configuration and then boot Linux. I think it could be possible to achieve this by embedding both the U-Boot and Linux into a single FIT image. No kernel patches or hacks would be required then.

  • OpenWrt could decide that Cudy M3000 is only supported after replacing the vendor's bootloader with custom one. The custom bootloader would also do the PHY configuration fixing.

  • Finally, OpenWrt could decide that the maintenance burden is not worth adding support for the new Cudy M3000.

What are your thoughs on this?

Best regards

Jakub Vaněk

2 Likes

So you actually have not tried the PR and are proposing to replace it with 5-stage bootloader chain?

Hello, I don't understand - I have authored the PR and I have tested it on a M3000 that I have bought. The PR does fix the issue on the actual router. The problem with that is that we cannot be sure that it will always work. The situation with the PR is the following:

  • The kernel detects both the MT7981B internal PHY and the YT8821
  • The kernel does some basic configuration of the MT7981B internal PHY. The MDIO commands it issues are interpreted by both the MT7981B PHY and the YT8821 PHY. These commands do not corrupt the YT8821 state badly enough that it needs to be hard-reset.
  • Then the kernel configures the YT8821 PHY, including the disabling of the broadcast address
  • Finally the kernel brings up more of the MT7981B PHY. This involves some MDIO commands that would break the YT8821 if it had been still listening on the MDIO address 0.

The problem is that there is a window of time where the YT8821 accepts commands that it shouldn't. It doesn't appear to be a problem in practise now (the PHY does achieve stable gigabit speeds with that patch).

I ported a similar device, Cudy TR3000 (with the RTL8221B) to OpenWrt.

Reading #21584, looks like the solution 1 show similar results to the RTL8221B without the recent fixes, this was a very miserable experience for me and I think this solution is unacceptable for M3000 users.

Porting TR3000, I didn't think much of the fact that it wasn't partitioned properly. To partition it properly (and upgrading from 40 MB of free space to 110 MB), you need a new bootloader, and the people who made https://github.com/openwrt/openwrt/commit/6f8c58bfd8f380dfdc1d89aab29fdcdfde0ee65b did just that on a new DTS.

Personally, I would have preferred to repair the partitions in TR3000 on my initial commit even if that forced users to update their bootloader just to install OpenWrt. But yes, the loss of the signed TFTP Cudy recovery is not great.


I’m interested on the solution 3, I think it’s the lesser evil. Solution 4 looks good too, but i don’t know how “easy“ is to do it. A lot of old MIPS devices (tp-link, etc) use a 2nd stage bootloader trying to skip u-boot bugs.

I think a Cudy engineer told us how to identify a new change on TR3000 using the serial number (i don’t remember if it was a new motorcomm phy or a flash change). Maybe someone at support {at} cudy dot com knows more.

1 Like

Do you have a reference to this?
And a reference to the YT8821 PHY Datasheet?

It does sound like perhaps the Motorcomm PHY should be held in reset until it is init'd, by which stage hopefully the internal PHY has reduced its conversations...

Sure:

I also have the YT8821 datasheet, but I had to fill my personal information on the Motorcomm website to download it and it looks watermarked. I therefore don't want to share it in full. However, the relevant sections are:

The YT8821 cannot be easily held in reset until initialized. The Linux PHY core will deassert the reset line early (when the PHY is registered). This is what creates the problematic time window for the simple patch. However, what I can do is to toggle the reset line twice in yt8821_config_init() to achieve the same effect.

EDIT: The effect is not entirely the same, though. If there is a time when the YT8821 is out of reset, not yet reconfigured to not listen on address 0 and the kernel is configuring the MT7981B PHY, bad things might still happen to the MT7981B PHY. The YT8821 could be responding to MDIO reads and causing collisions even for the MT7981B driver (it could read incorrect register values). In practice it did not appear to be happening. I am not sure why, perhaps the MT7981B MDIO bus controller gives priority to the internal PHY.

The comment history under the linked PR documents the investigations. Comment https://github.com/openwrt/openwrt/pull/21584#issuecomment-3923563367 is when I realized the issue is due to an address conflict.

I am curious - which comment are you referring to? The PR went through multiple iterations as I was trying to find out what is wrong with the YT8821. (yeah, turning a single PR into a ~200 comment thread perhaps wasn't a great idea). The current revision seems to work OK: https://github.com/openwrt/openwrt/pull/21584#issuecomment-3941232307

Hello, I decided to go with your suggestion and so I created the PR https://github.com/openwrt/openwrt/pull/22259 . Thank you!

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.