Mt7621 / mt7530 programming: Disabling Flow Control on all ports

Transmit queue timeouts are unfortunately still a thing on devices with a mt7530 switch, as evident from this thread: Mtk_soc_eth watchdog timeout after r11573

One patch that has made a big difference in how often this issue is triggered, is the disabling of Flow Control on the CPU port: https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=498f1f4f5df2d077ba524f5735906bb52c12d580

However, this patch does not disable flow control completely. It only disables it on the CPU port. Devices connected directly to the switch still see flow control as being advertised, and thus will still send pause frames, which seems to be the direct cause of all the issues as documented by @kristrev. This is evident from running ethtool on a PC connected to the router: Link partner advertised pause frame use: Symmetric.

Now I REALLY want to try and disable flow control on ALL ports, to hopefully prevent any connected devices from sending any pause frames (if they honor the advertisement). I am hoping this will fix the timeouts completely. I have 6 mt7621 devices in production use. 5 of them can get hundreds of days of uptime without any issues. And 1 is running into the timeout issue ~once a week. The big difference is that the one having an issue is connected to an unmanaged switch, which presumably uses pause frames.

I am trying to reverse engineer the patch that disables flow control, so I can hopefully apply it to the other ports as well. But this has proven to be a bit more difficult than I had hoped. I am using the following document as reference: http://47.107.224.89/redmine/attachments/download/49/MT7621_ProgrammingGuide_GSW_v01.pdf

The big change in the above mentioned patch that disabled flow control on the CPU port, is this:

/* (GE1, Force 1000M/FD, FC ON, MAX_RX_LENGTH 1536) */
mtk_switch_w32(gsw, 0x2305e33b, GSW_REG_MAC_P0_MCR);
mt7530_mdio_w32(gsw, 0x3600, 0x5e33b);

is changed to:

/* (GE1, Force 1000M/FD, FC OFF, MAX_RX_LENGTH 1536) */
mtk_switch_w32(gsw, 0x2305e30b, GSW_REG_MAC_P0_MCR);
mt7530_mdio_w32(gsw, 0x3600, 0x5e30b);

Looking at the mt7530_mdio_w32 function, we can see that it is writing a 32-bit value to the switch registers. 0x3600 is the location, which is the register for MAC 6 according to the mediatek documentation. In the old version this value is:

0x5e33b = 1011110001100111011

And the new value is:

0x5e30b = 1011110001100001011

As we can see, counting from zero and from the right to left, the 4th and 5th bit have been switched from 1 to 0. Looking at the documentation, these bits are for FORCE_TX_FC_P6 and FORCE_RX_FC_P6 respectively, ie it enables/disables flow control for TX and RX on MAC 6.

The registers for MAC 0 through 5 are also in the same programming guide: 0x3000 through 0x3500. Now I am running into multiple issues that prevent me from knowing how to disable flow control on the other ports:

  • The programming guide clearly states "We would suggest don't use the register 0x3000 to 0x3400. It may not work.". What's preventing us from writing the registers for MAC 0 through MAC 4? What happens if we try anyway? I'd love to just try, but this is where the next issues come into play.
  • The 0x5e30b value contains bits specifically tailored for MAC 6. For example, it's configured as NOT being connected to a PHY and operates in MAC mode, which makes sense since it's the CPU port. However, I am unsure what values I should write for the other MACs. Which 5 MACs are the ones connected the the 5 physical ports on my device? And why is there a 7th MAC? Are the PHY/MAC mode bit and external PHY bit the only two bits that should differ from the CPU port to the 5 physical ports, or should more bits be changed?
  • Why is the same value (with some more bits in front set) also written with that weird mtk_switch_w32 function? It's using the GSW_REG_MAC_P0_MCR 0x100 constant as the register address. From the name of the constant it seems to be a register for Port 0. If I were to want to disable flow control on all ports, would I also have to use this function multiple times?

There is also another potential solution that I found, but also here I am missing some crucial knowledge that is preventing me from implementing a patch. The earlier mentioned Mediatek document also mentions something interesting specifically regarding flow control in section 2.16. Namely, bit 31 at register 0x1fe0 enables/disables flow control globally (FC_EN). My questions regarding this are:

  • the 0x1fe0 register is never written to in any of OpenWRT's code. Therefor, it's still at the default values. However, AFAIK it is impossible to only write to a single bit. So I either need to read from that register (how?) and flip the 31th bit to 0. Or does the "Reset" row in the documentation mean the default values after a reset? Because if that's the case, I can simply copy those values, change the 31th bit, and write those 32 bits in one go without having to read first.
  • Again, should I write to these registers with the mt7530_mdio_w32 function, the mtk_switch_w32 function, or both?
  • Should I write to this register at the same location in the code as where FC is disabled on MAC 6? Ie, just after the switch has been reset?

Thank you very much in advance for any help! By the way, I am trying to write this patch for the 19.07 codebase (non dsa-driver).

3 Likes

Here is a patch that mentions register 0x1fe0 and seems to set bit 31 to value 0. This read register, modify bit 31 and then write.

You will probably have to modify it or write those lines:

+   val = mt7530_mdio_r32(gsw, 0x1FE0);
+	val &= ~BIT(31);
+	mt7530_mdio_w32(gsw, 0x1FE0, val);

directly in the gsw_mt7621.c file before compiling, since the if conditions were already removed in 19.07.4 and the whole patch probably give an errors when applied.

1 Like

I've compiled my image with above changes, but unfortunately it doesn't seem to have disabled flow control completely. Running ethtool on my computer connected by cable to the router, pause frame support is still being advertised by the router it seems:

Link partner advertised pause frame use: Symmetric

Which means I probably do have to disable FC on a per-port basis as well. Your idea gave me an idea on how to do that. I am not entirely sure if this is the correct way to do this, but at least the device is able to boot and works. Unfortunately, flow control STILL seems to be enabled. Code I am using (Note: a for-loop would be much cleaner, but just trying stuff out atm):

	/* Disable Flow Control Globally */
	val = mt7530_mdio_r32(gsw, 0x1FE0);
	val &= ~BIT(31);
	mt7530_mdio_w32(gsw, 0x1FE0, val);

	/* (GE1, Force 1000M/FD, FC OFF, MAX_RX_LENGTH 1536) */
	mtk_switch_w32(gsw, 0x2305e30b, GSW_REG_MAC_P0_MCR);
	mt7530_mdio_w32(gsw, 0x3600, 0x5e30b);

	/* Disable Flow Control on MAC 0 */
	val = mt7530_mdio_r32(gsw, 0x3000);
	val &= ~BIT(4);
	val &= ~BIT(5);
	mt7530_mdio_w32(gsw, 0x3000, val);

	/* Disable Flow Control on MAC 1 */
	val = mt7530_mdio_r32(gsw, 0x3100);
	val &= ~BIT(4);
	val &= ~BIT(5);
	mt7530_mdio_w32(gsw, 0x3100, val);

	/* Disable Flow Control on MAC 2 */
	val = mt7530_mdio_r32(gsw, 0x3200);
	val &= ~BIT(4);
	val &= ~BIT(5);
	mt7530_mdio_w32(gsw, 0x3200, val);

	/* Disable Flow Control on MAC 3 */
	val = mt7530_mdio_r32(gsw, 0x3300);
	val &= ~BIT(4);
	val &= ~BIT(5);
	mt7530_mdio_w32(gsw, 0x3300, val);

	/* Disable Flow Control on MAC 4 */
	val = mt7530_mdio_r32(gsw, 0x3400);
	val &= ~BIT(4);
	val &= ~BIT(5);
	mt7530_mdio_w32(gsw, 0x3400, val);

	/* Disable Flow Control on MAC 5 */
	val = mt7530_mdio_r32(gsw, 0x3500);
	val &= ~BIT(4);
	val &= ~BIT(5);
	mt7530_mdio_w32(gsw, 0x3500, val);

	/* Disable Flow Control on MAC 6 */
	val = mt7530_mdio_r32(gsw, 0x3600);
	val &= ~BIT(4);
	val &= ~BIT(5);
	mt7530_mdio_w32(gsw, 0x3600, val);

Actually, it isn't sufficient to set the FORCE_RX_FC_PU and FORCE_TX_FC_PU bits, because they are only used when the FORCE_MODE_PU bit is set to 1. And setting this to 1 doesn't only force the FC settings, but also the other FORCE options. This has lead me to the following new version (again, this needs clean-up by defining the bitmasks once and reusing them, and a nice for-loop, but that's not the point for now):

	/* reset the switch */
	mt7530_mdio_w32(gsw, 0x7000, 0x3);
	usleep_range(10, 20);

	/* Disable Flow Control Globally */
	val = mt7530_mdio_r32(gsw, 0x1FE0);
	val &= ~BIT(31);
	mt7530_mdio_w32(gsw, 0x1FE0, val);

	/* (GE1, Force 1000M/FD, FC OFF, MAX_RX_LENGTH 1536) */
	mtk_switch_w32(gsw, 0x2305e30b, GSW_REG_MAC_P0_MCR);
	mt7530_mdio_w32(gsw, 0x3600, 0x5e30b);

	/* Disable Flow Control on MAC 0 */
	val = mt7530_mdio_r32(gsw, 0x3000);
	val |= BIT(0);
	val |= BIT(1);
	val &= ~BIT(2);
	val |= BIT(3);
	val &= ~BIT(4);
	val &= ~BIT(5);
	val &= ~BIT(6);
	val &= ~BIT(7);
	val |= BIT(15);
	mt7530_mdio_w32(gsw, 0x3000, val);

	/* Disable Flow Control on MAC 1 */
	val = mt7530_mdio_r32(gsw, 0x3100);
	val |= BIT(0);
	val |= BIT(1);
	val &= ~BIT(2);
	val |= BIT(3);
	val &= ~BIT(4);
	val &= ~BIT(5);
	val &= ~BIT(6);
	val &= ~BIT(7);
	val |= BIT(15);
	mt7530_mdio_w32(gsw, 0x3100, val);

	/* Disable Flow Control on MAC 2 */
	val = mt7530_mdio_r32(gsw, 0x3200);
	val |= BIT(0);
	val |= BIT(1);
	val &= ~BIT(2);
	val |= BIT(3);
	val &= ~BIT(4);
	val &= ~BIT(5);
	val &= ~BIT(6);
	val &= ~BIT(7);
	val |= BIT(15);
	mt7530_mdio_w32(gsw, 0x3200, val);

	/* Disable Flow Control on MAC 3 */
	val = mt7530_mdio_r32(gsw, 0x3300);
	val |= BIT(0);
	val |= BIT(1);
	val &= ~BIT(2);
	val |= BIT(3);
	val &= ~BIT(4);
	val &= ~BIT(5);
	val &= ~BIT(6);
	val &= ~BIT(7);
	val |= BIT(15);
	mt7530_mdio_w32(gsw, 0x3300, val);

	/* Disable Flow Control on MAC 4 */
	val = mt7530_mdio_r32(gsw, 0x3400);
	val |= BIT(0);
	val |= BIT(1);
	val &= ~BIT(2);
	val |= BIT(3);
	val &= ~BIT(4);
	val &= ~BIT(5);
	val &= ~BIT(6);
	val &= ~BIT(7);
	val |= BIT(15);
	mt7530_mdio_w32(gsw, 0x3400, val);

	/* Disable Flow Control on MAC 5 */
	val = mt7530_mdio_r32(gsw, 0x3500);
	val |= BIT(0);
	val |= BIT(1);
	val &= ~BIT(2);
	val |= BIT(3);
	val &= ~BIT(4);
	val &= ~BIT(5);
	val &= ~BIT(6);
	val &= ~BIT(7);
	val |= BIT(15);
	mt7530_mdio_w32(gsw, 0x3500, val);

	/* Disable Flow Control on MAC 6 */
	val = mt7530_mdio_r32(gsw, 0x3600);
	val |= BIT(0);
	val |= BIT(1);
	val &= ~BIT(2);
	val |= BIT(3);
	val &= ~BIT(4);
	val &= ~BIT(5);
	val &= ~BIT(6);
	val &= ~BIT(7);
	val |= BIT(15);
	mt7530_mdio_w32(gsw, 0x3600, val);

Unfortunately, this didn't bring me to my goal of disabling flow control either. Running ethtool on my desktop connected to the router still brings me the following:

Link partner advertised pause frame use: Symmetric

I am assuming using mt7530_mdio_w32 isn't enough, and I actually have to use mtk_switch_w32 as well to set these registers properly. But I have absolutely no clue on where to start. Any ideas?

	/* MT7621 E2 has FC bug, disable FC */
	if ((ralink_asic_rev_id & 0xFFFF) == 0x0101)
		regLink &= ~(0x3 << 4);

or
1 Like

Thank you very much for these new pointers! Unfortunately, I don't think this does anything else than what I am already doing. In the part you have quoted:

/* MT7621 E2 has FC bug, disable FC */
	if ((ralink_asic_rev_id & 0xFFFF) == 0x0101)
		regLink &= ~(0x3 << 4);

The 4th and 5th bit in regLink are set to zero. This value is then used twice in subsequent code:

mii_mgr_write(MT7530_MDIO_ADDR, 0x3600, regLink);

and

mii_mgr_write(MT7530_MDIO_ADDR, 0x3500, regLink);

Ie, bit 4 & 5 for Mac control register for Mac 5 and 6 is set to 0. Which is what I am already doing in my previous code example.

I think that you need to find what is advertising the pause feature, as it may be advertised by default by the driver.
Just disabling the bit does not mean that kernel has any idea that it's disabled.

You mean that flow control is actually disabled as wanted, but the driver still advertises support (by default)? That could very well be it actually! Still, in order to prevent connected devices from sending pause frames, we should try to shut down this advertisement completely. However, I don't see any registers that could control this behavior in the switch's documentation, nor do I see anything in the driver's code. Do you have any idea where I should be looking for hints?

Kernel only sees stuff that is advertised to it, and that is done by ethernet and/or PHY driver.

This is not set in switch HW, but rather ethernet ethtool driver most likely as that is responsible for ultimately advertising stuff to the kernel.
Note that it may just be pulling info from PHY-s and advertising that as everything is ultimatively advertised through PHY-s to the other side.

I dont work on Mediatek MIPS at all so I dont know how networking drivers are for it in OpenWrt, but its gotta be there.

Are they converted to linkmode or are still array bassed?

can it be done via the devicetree perhaps ?

Usually not, you can try disabling pauses in the ethernet fixed link.
Maybe its propagating them from there

Sorry for any confusion maybe, wasn't entirely sure if you knew what I meant with my ethtool story. But ethtool was run on my desktop connected to the router, NOT the router itself. It was my desktop which got the advertisement from its link partner (in this case the router) that pause frame use was available in both directions:

Link partner advertised pause frame use: Symmetric

I would use ethtool on the router to get pause settings, but unfortunately many ethtool functions are not implemented in the driver, including getting the pause settings:

root@OpenWrt:~# ethtool -a eth0
Pause parameters for eth0:
Cannot get device pause settings: Not supported

I think my knowledge is falling short here. I don't know what either devicetree or ethernet fixed link means here. I will start reading into these topics, but if either of you have some clarification that would be much appreciated :slight_smile:

I know that you are running it from the link partner side, but that value is not appearing from thin air but is being advertised by the PHY from the other side.

In the MT7621 dtsi you can see that under the GMAC node there is a fixed link defined for connection to the switch and that pauses are advertised there.

I can only find this in mt7621.dtsi in the master branch. The one in the 19.07 branch (which is the branch in which I am trying to disable flow control) doesn't contain the pause: https://github.com/openwrt/openwrt/blob/openwrt-19.07/target/linux/ramips/dts/mt7621.dtsi

edit: Actually, it's not even in the master branch. Only the upstream version included in the kernel contains the pause.

Well, then its gotta be hardcoded somewhere in the drivers

Unfortunately, I don't good enough in c/c++.
These 2 repo is implementation of half (Andy Padavan)/full(Wive-ng-mt) commercial device on MT7621 with no stability complaints.

I have done a lot of digging, and unfortunately, I haven't been able to find the code where the pause frame advertisement is coded in. I'd be very grateful if anyone has some pointers on where to look.

Mushoz,

In my past experience with other switch devices, the pause feature should be controlled by modifying the ANAR register in the PHY. This is a standard register. So each port will have it's own set of phy registers. The ANAR register is at address 4. The pause feature is bit 10, 1 to enable 0 to disable. After modifying the ANAR it is necessary to force a link renegotiation. This is accomplished by setting bit 9 in the Control register at address 0.

Read and write Phy registers is usually over the mdio bus so should be some kind of mdiobus_read() and mdiobus_write() call. Sorry, but I don't know the specifics for your device.

I hope this helps in someway.
Regards,
Mark

3 Likes

Ethtool driver in ramips is most likely simply calling generic functions as PHY-s should be Clause 22 compliant and pause settings are a standard thing.
So, like @mschank said you can either clear the bits or use linkmode(Dont know if that was used in 4.14 kernel) to clear the Asym and Sym pause advertisement to the kernel.

Thank you so much, this is really useful information! Unfortunately, those addresses and registers aren't documented in the mt7621 programming guide, but I have found them in the mt7620 programming guide, which fortunately shares many of the same details.

You mentioned that after disabling the pause bit I will have to force renegotiation by flipping bit 9 in the control register at address 0. In the current drivers the PHYs are disabled by setting bit 11 (POWERDOWN) in the control registers (at address 0). After setting many registers correctly, they are then brought back up by clearing bit 11 again.

I am assuming that if I clear the pause bit 10 at address 4 after the PHYs have been brought down and before they have been brought back up, forcing renegotiation isn't required, correct? Since negotiation will be done when they are brought up anyway, correct?