Broken rt5350 ethernet after ralink reset controller etc patchset

ai8 · February 21, 2022, 12:41pm

I am attempting to learn how to make progress with this issue.

I opened an issue on github and it contains links to the patchset causing the issue and a serial console log showing the broken ethernet.

Evidently the patch author wasn't aware rt5350 requires the frame engine and the switch to be reset at the same time to prevent a pdma panic. I saw about it in a comment in the original ethernet patchset for these systems from long ago.

Also, the last of the patches in this patchset says "implement carrier detection similar to mt7620" and here is some of the new code:

int rt3050_esw_has_carrier(struct fe_priv *priv)
{
       struct rt305x_esw *esw = priv->soc->swpriv;
       u32 link;
       int i;
       bool cpuport;

       link = esw_r32(esw, RT305X_ESW_REG_POA);
       link >>= RT305X_ESW_POA_LINK_SHIFT;
       cpuport = link & BIT(RT305X_ESW_PORT6);
       link &= RT305X_ESW_POA_LINK_MASK;
       for (i = 0; i <= RT305X_ESW_PORT5; i++) {
               if (priv->link[i] != (link & BIT(i)))
                       dev_info(esw->dev, "port %d link %s\n", i, link & BIT(i) ? "up" : "down");
               priv->link[i] = link & BIT(i);
       }

       return !!link && cpuport;
}

The POA register port status bits are manipulated then checked, recorded and port link changes are reported.

The overall objective of the patch is to be able to call netif_carrier_on() when at least 1 port is up otherwise netif_carrier_off() is called.

Port 6 is the cpu port and making use of it here appears to be unnecessary. I checked a running system and in the context of the above code the cpu port bit is always 1.

I am not much of a programmer these days but I am willing to learn.

How to explain the superfluous use of the cpu port bit and the strange return statement?
Is reverting these patches an option?
Or should I try to make a new patch to fix my rt5350 issue?
Or is it normally expected that I contact the patch author and discuss the issue with him?

Any insight would be very much appreciated.

Borromini · February 21, 2022, 12:49pm

Ping @lynxis

As patch author he might be able to offer some insights.

lynxis · February 21, 2022, 1:28pm

Welcome to OpenWrt @ai8

How to explain the superfluous use of the cpu port bit and the strange return statement?

This function should return 1 is link is up or 0 if link is down.
The link variable might contain 0x5 if multiple ports have link. !!link would result in 0x0 if link == 0. But would be 0x1 if any bit is set.
By using cpuport I'm sure the link is working and present. Usually the link should never be 0. But it only "should".

# same
i = !!link;
if (link > 0) i = 1; else i = 0;

To be precise the last line could be return link && cpuport and it would be the same because the boolean operator && will treat the link > 0 as 1 and return 0 or 1.
I would guess the line was return !!link and got later extended to include the cpuport.

Is reverting these patches an option?
I would like to not revert it or only revert it for the rt5350 but not for the other platforms.
But in general it's always an option.

Or should I try to make a new patch to fix my rt5350 issue?

You could try to fix the problem by a different patch. Could you try to call the reset controller after each other?

Or is it normally expected that I contact the patch author and discuss the issue with him?

It depends how you would like to do it. It's also fine to contact the author and discuss it
because the author might know more about the platform.

In any case. Nice catch. I even tested the patch against multiple board and multiple platforms. It would be best to move the discussion of the issue itself towards the github issue.

ai8 · February 21, 2022, 3:27pm

Thank you for the welcome and the insight.

The code still has the option to use a custom reset function rather than the general fe_reset_fe() function so I'll look into that as a possible solution.