Asus TUF AX4200 support

do you think this test is great ? thanks

FYI: Latest SNAPSHOT (r25876-923d7c5531) from today, for TUF-AX6000, broke my internet connection. Not sure if that applies to AX4200 (snapshot versions are the same), but just in case, so you know.

Maybe it's the final time to move on to stable builds?

P.S.
Just installed stable build 23.05.3, and looks like like package 'fitblk' was removed from it, while looks like it should not. Somebody please confirm is it needed for this device (ASUS TUF). It is a block device driver that might mess with any subsequent sysupgrade on some devices.

1 Like

@daniel

I've tested kernel 6.6 and the latest snapshot including the mt7530 DSA patches. These changes seemed to have broke something related to how the lan ports are setup (eth0), on this device and tuf-ax6000 at least. The wan port (eth1) works as it's not controlled, afaik, by the same driver.

I reverted 1, 2 and this got me wired lan networking back.

First of all, please try with only the 2nd commit reverted. And maybe @arinc9 can help you debugging the issue in detail, we will definitely need much more information:

  • what exactly is not working?
  • is that always the case or only certain link partners are affected?
  • please install ethtool-full and show us the output of these command with and without the commit which breaks things:
    • ethtool lan1
    • ethtool -a lan1
    • ethtool --show-eee lan1
  • full bootlog with and without those two commits
2 Likes

23.05.x doesn't need fitblk yet as it is using (and will always be using) the now deprecated uImage.FIT partition parser.

1 Like

@Gingernut @deeddy I've checked TUF-AX6000 device tree. The mdio bus of the switch is defined, but there is no phy handle given on the switch ports for the switch PHYs. Fix the device tree and the issue should go away.

2 Likes

Strange that is was working before commit 4354b34f6fc5 as we had @blocktrron's original patch to allow referencing the MDIO bus applied already, and that was required and working for MT7988 as well, so it did have an effect. Looks like the solution @arinc9 then applied in favor of @blocktrron's long-pending patch is more strict then and we are going to have problems with more downstream devices...

Edit: Just double-checked, and yes, @blocktrron also did submit his patch upstream, see
https://patchwork.kernel.org/project/netdevbpf/patch/20230430112834.11520-1-mail@david-bauer.net/

@arinc9 Why did you not work with him to get the changes you wanted but just applied your own solution instead? I suppose you were aware of his patch as you mention him as Suggested-by:, but what he did was a but more than just a suggestion, it was a ready patch which had received approving reviews and the only thing he did wrong was that net-next was closed at the time.

Please try this

1 Like

This is not the case. David's patch series that included "mt7530: register OF node for internal MDIO bus" included "dt-bindings: net: dsa: mediatek,mt7530: document MDIO-bus" as well. I've made requests on that patch. After that, no new version was submitted by David.

https://patchwork.kernel.org/project/netdevbpf/list/?series=&submitter=185583&state=*&q=&archive=both&delegate=

So I've started working on the dt-bindings documentation myself. I eventually realised that describing switch MDIO buses properly is a bigger task than what it initially looked so I delayed that. It is still on my task list. In the meantime, Vladimir had started describing a proper implementation for registering OF-based switch MDIO buses. In coordination with Vladimir, I've improved the implementation and then submitted OF-based MDIO bus registration support for the MT7530 DSA subdriver. David was mentioned as they were the person that suggested OF-based MDIO bus registration. Vladimir had approved the patch. Read these patches for details:

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=91374ba537bd60caa9ae052c9f1c0fe055b39149

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=ae94dc25fd73cb41df233a20ca638ba136cc5f3a

To relieve your burden of fixing the device trees hosted on OpenWrt that were described incorrectly, I am willing to help fixing them.

Edit: That said, I don't think this issue is caused by an evolved implementation of OF-based MDIO bus registration. Read Andrew's comment:

Arınç:

Like you said, if the mdio node is not defined, the driver will assume 1:1
mapping. If not, it will need all the PHYs to be defined on the mdio node
along with on the ports node.

Andrew:

The problem is __of_mdiobus_register() :frowning:

If there is a node in DT, the scanning of all addresses is
disabled. It looks at just the addresses listed in DT. Plain
mdiobus_register() by default looks at all addresses to discover if a
PHY is there.

Hence when slave.c tries to connect to the PHY, only the PHY you
listed in DT has been found.

I don't see a way around this, at least not without adding a new
variant of of_mdiobus_register which combined DT with a bus scan.

https://patchwork.kernel.org/comment/25318259/

I don't know why this issue appeared only after my patch, but I believe my patch only revealed the issue, it did not cause it.

My thesis is: The only difference here is that instead of @blocktrron's patch we are now using the one written by @arinc9. We never had the disputed DT documentaion patch in our tree, and it wouldn't matter either, obviously, as that's only documentation. Hence I conclude that using @arinc9 patch for MDIO-bus registration from OF instead of the original implementation of @blocktrron broke our downstream DT.

I happily accept any other plausible thesis, but if #15141 works as a work-around I consider q.e.d. of my thesis above.

Edit: Maybe we are missing other upstream patches which would make @arinc9 version work as intended?

Ok I studied what my patch does (it's been a while). In this case, it's the not assigning the switch MDIO bus to ds->user_mii_bus when the switch MDIO bus is defined on the device tree.

When ds->user_mii_bus is populated, DSA will 1:1 map the port with PHY. Meaning port with address 1 will be mapped to PHY with address 1. This clearly shows that ds->user_mii_bus is only supposed to be populated when the switch MDIO bus is not defined on the device tree. Because when the switch MDIO bus is defined, the PHY addresses are defined, therefore the port nodes get the PHY addresses from the device tree. Read Vladimir's mail that is linked on my patch for the details.

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=91374ba537bd60caa9ae052c9f1c0fe055b39149

In conclusion, the PR you opened is the right approach and will address this issue. That said, please don't jump to conclusions about me such as "Why did you not work with him to get the changes you wanted but just applied your own solution instead?" in the future.

While I agree in theory, in practice that 1:1 mapping is also enforced by that hardware. You can't use PHY at address 2 for port 1, for example. It will always have to be the PHY address matching the port number on all switches support by this driver. Hence it is questionable what the additional freedom of being able to define anything else would be useful for, except for introducing bugs.

Once I get a chance I'll test the new patch and let you know.

Thanks for the quick response and support.

1 Like

Incorrect. The PHY address for the MT7530 switch and its PHYs can be changed via bootstrapping configuration. PHY address and port address values won't match in that case.

I don't see the change as introducing bugs. It reveals bad device trees as also said by Vladimir.

1 Like

This is a misunderstanding. The MDIO address of the switch itself can maybe be modified to be something else than 0x1f (maybe, yet to be seen in the wild). Nobody does that, but in theory you (maybe) could. The addresses of the built-in PHYs are fixed.

The internal PHYs are also the only things ever present on that purely internal MDIO bus of MT7531 and MT7988, which cannot be used to connect other external PHYs what-so-ever. What would be the requirement which would lead engineers to implement a way to modify their addresses? And you imagine them all modified at once or one-by-one?

There are only two pins related to MDIO on the MT7531 package SMI_MDIO and SMI_MDC. That's the bus used to connect the switch IC to the host MDIO bus where the switch will then be present on address 0x1f by default. The built-in PHYs are not present on that bus. They have to be accessed using the PHY Indirect Access Control register of the switch, which is what the driver you are maintaining does.

On MT7530 and MT7621 only bit 3 and 4 of the MDIO bus address of the switch itself can be modified using bootstrap pins (see ht_smi_addr field in HWTRAP register). Deviations from the default (0x1f) have yet to be encountered in the wild. The PHYs are present on the same bus, and seem to have fixed addresses.

Edit: Looking at MT7621 Giga Switch Programming Guide it seems like also MT7621 and MT7530 provide the option to indirectly access the PHYs rather than having them present on the same bus as the switch itself. csr_c_mdio_bps_n bit in modified hwtrap register hints to that. Method how to access PHYs indirectly doesn't seem to be documented though, but I'd assume it works just like on MT7531 and later switches. In both cases, direct or indirect access to switch PHYs I don't see any option to modify the addresses of the built-in PHYs.

"Revealing" as in "breaking". You shouldn't ever break DT compatibility, not even downstream DT. Especially not if you don't have a really good reason. Mentioned documentation-related debate has nothing to do with assigning ds->user_mii_bus or not. Not doing so created an unnecessary problem and having the only "feature" of having to provide those phy-handles is the possibility to introduce bugs.

The whole debate you are referencing here was also related to something completely different: Not being able to detect the PHYs at all if an empty mdio child node is present. This is not the case here. The MDIO bus is populated, just individual PHYs are not referenced by the corresponding ports. The solution would have been easy (using @blocktrron variant of the patch and only fixing the dt-bindings part).

1 Like

Tested your patch and it's working correctly on latest snapshot.

Thank you @arinc9 @daniel

1 Like

This is again incorrect. As I said, the PHY address for the MT7530 switch and its PHYs can be changed via bootstrapping configuration. If you don't believe me, you can read the bootstrapping section of the MT7530 programming guide to confirm that. And I did test this on an MT7530. Regardless of direct or indirect PHY access, the PHY addresses will change on the bus(es) the PHYs listen on. My company is planning to utilise this as they want to use an MT7530 and an MT7531 switch on the same bus on one of their products.

If David's patch was applied, DSA would incorrectly attempt to connect to PHYs on the bus using PHY addresses no PHYs listen on. In addition to that, since the DSA subdriver currently configures direct PHY access on MT7530 to have its PHYs on the same MDIO bus the switch also listens on, on a hardware design where there's an external PHY listening on address 0, DSA would incorrectly connect the switch's port 0 to that PHY.

I'm talking about MT7530. I'm aware that there's no bootstrapping option to change the PHY address for the MT7531 and MT7988 SoC switch and its PHYs.

I am aware. The PHY indirect access control register on MT7530 is the same as MT7531. I've tested it and can confirm it works fine. I am planning to unify PHY access on the MT7530 DSA subdriver once I complete the requirements for it.

Take a better look at the documentation.

If you have a bad device tree, don't be surprised when the driver starts enforcing what the device tree says. That's not breaking the ABI. I would say this change on the driver is absolutely justified. We don't gatekeep support of different hardware designs because of device trees that didn't properly describe certain hardware.

There're no bugs to be introduced. We actually simplify the process by not making guesses and just take what the device tree says. Unless you see the situation of drivers breaking when an improper device tree is being used as a bug.

Are we reading the same conversation? Here's part of Vladimir's mail:

This right here is the core ds->user_mii_bus functionality:

ret = phylink_of_phy_connect(dp->pl, port_dn, phy_flags);
if (ret == -ENODEV && ds->user_mii_bus) {
/* We could not connect to a designated PHY or SFP, so try to
* use the switch internal MDIO bus instead
*/
ret = dsa_user_phy_connect(user_dev, dp->index, phy_flags);
}

So, the ds->user_mii_bus is only used if we cant't do phylink_of_phy_connect(),
which follows the "phy-handle" reference.

The reasons (that I can see) why we can't do phylink_of_phy_connect() are:
(a) port_dn is NULL (probing on platform_data and not OF)
(b) port_dn exists, but has no phy-handle
(c) port_dn exists and has a phy-handle to a PHY that doesn't respond
(wrong address, ?!)

Out of those, it only makes practical sense to design for (a) and (b).

If the ds->user_mii_bus has an OF node, it means that we are not in case
(a). We are in case (b).

Normally to be in case (b), it means that the device tree looks like this:

switch {
ports {
port@0 {
reg = <0>;
};
};
};

aka port@0 is a user port with an internal PHY, not described in OF.

But to combine case (b) with the additional constraint that ds->user_mii_bus
has an OF node means to have this device tree:

switch {
ports {
port@0 {
reg = <0>;
// no phy-handle to <&phy0>
};
};

  mdio {
  	phy0: ethernet-phy@0 {
  	};
  };

};

Which is simply a broken device tree which should not be like that [1].
If the MDIO bus appears in OF, then all its PHYs must appear in OF [2].
And if all PHYs appear in OF, then you must have a phy-handle to
reference them.

[1] There exists an exception, which is the hack added by Florian in
commit 771089c2a485 ("net: dsa: bcm_sf2: Ensure that MDIO diversion
is used"). There, he starts with a valid phy-handle in the device
tree, but the DSA driver removes it. This makes phylink_of_phy_connect()
fail, and "diverts" the code to dsa_user_phy_connect(), where the
mii_bus read and write ops are in control of the DSA driver. Hence
the name "diversion to ds->user_mii_bus". It's a huge hack, make no
mistake about it.

[2] This is because __of_mdiobus_register() does this:

/* Mask out all PHYs from auto probing. Instead the PHYs listed in

  • the device tree are populated after the bus has been registered */
    mdio->phy_mask = ~0;

Incorrect, read my first paragraph.

1 Like

Where do you see a part which allows changing the address of the built-in switch PHYs? Please quote and provide a reference, I don't see that option in my datasheet, which could of course be an incomplete/stripped version. I only see the option to change the address of the switch itself to be used instead of 0x1f and using indirect-addressing also on MT7530.

Always using indirect addressing also on MT7530 chips which often don't have the interrupt line wired up means significant overhead (e.g. BananaPi R2 is such a board, MT7530 interrupt doesn't work there). You will need a multiple of MDIO bus interactions compared to having the PHYs present directly on the bus. It's surely nice to have the option to use indirect addressing if that is intended on a specific board (ie. selected via bootstrap pins). Always using indirect addressing (by overriding bootstrap setting) on MT7530 is not a good idea imho. On MT7621 it would be an option, as we know that interrupt always works and overhead is hence manageable.

Page 7 of MT7621 Programming Guide v0.4.

http://gw.stasoft.net/share/nts/datasheets/MT7621_ProgrammingGuide_GSW_v04.pdf

If I understand correctly, we're talking about the interrupts for the switch PHYs. When there're no interrupts, software will poll the PHYs every second to learn the link status and the like, which takes more computing power to achieve compared to if there were interrupts.

Let's analyse the code path for the direct and indirect access methods. I'll focus on the Clause 22 write operation. The two methods diverge with the calling of mdiobus_write_nested() for direct, mt753x_ind_c22_phy_write() for indirect access.

Direct access:

mdiobus_write_nested()
-> __mdiobus_write()
   -> bus->write()

Indirect access:

mt753x_ind_c22_phy_write()
-> readx_poll_timeout()
   -> read_poll_timeout()
      -> _mt7530_unlocked_read()
         -> mt7530_mii_read()
            -> regmap_read()
               -> _regmap_read()
                  -> map->reg_read(): mt7530_regmap_read()
                     -> bus->read()
-> mt7530_mii_write()
   -> regmap_write()
      -> _regmap_write()
         -> map->reg_write(): mt7530_regmap_write()
            -> bus->write()
-> readx_poll_timeout()
   -> read_poll_timeout()
      -> _mt7530_unlocked_read()
         -> mt7530_mii_read()
            -> regmap_read()
               -> _regmap_read()
                  -> map->reg_read(): mt7530_regmap_read()
                     -> bus->read()

I then assume we just endure the overhead on boards with MT7531 without interrupts as MT7531 does not provide direct PHY access?

Could you explain how significant is the overhead? Do you think it is significant in the amount that it would justify not having the benefits of:

  • Clearing 5 out of 32 PHY addresses on the MDIO bus the switch listens on.
  • Assuring Clause 45 access to switch PHYs. This should more than halve the amount of MDIO bus interactions for the core_write(), core_set(), and core_rmw() operations as they reach Clause 45 registers via Clause 22.
  • Simplification of the DSA subdriver.

Yes, this is identical with my datasheet. You are referring to ht_smi_addr, right?
So that's the address of the switch itself (0x1f by default). Not the addresses (0~4) of the 5 built-in PHYs. Where do you see the option to modify those exactly?

Why would we want to do that? Why keep the MDIO bus busy and waste CPU cycles on something which can also be done (and is currently done) much cheaper?
I understand the argument of "code simplicity", of course, but looking at the call trace for indirect access (thanks for tracking it down and documenting it!) compared to direct access and also the fact that a multiple (I guess in magnitude 3x~5x) of the otherwise needed load on the rather slow MDIO bus would caused when forcing indirect access, this doesn't seem to be a good idea. Not just because of phy-polling, but it would also multiply the load on mtk_gephy_config_init.

Generally, I believe the best would be to decide whether to use direct or indirect access on MT7530 and MT7621 depending on the content of ht_c_mdio_bps_n and refrain from touching csr_c_mdio_bps_n for purposes other than testing or during development. If you really really really feel the urge to always use indirect access even boards where this isn't decided using the corresponding bootstrap pin, please add a (downstream?) DT property you can use on your board to modify csr_c_mdio_bps_n -- however, as you mentioned the hardware you are using is developed by the company you work for, I suppose the best would to use the bootstrap pin to select indirect access on that hardware rather than overriding the setting in csr_c_mdio_bps_n.