AR8021 phy not working in gigabit mode; how to configure pll-data cells?

Hi everyone,

I'm trying to add support for D-Link DAP-2360 A1, which is based on AR7242 using an external AR8021 10/100/1000 phy.

However, it would only work flawlessly for 100M, but lots of weird things happen when trying to use 1000M; the connection is very sporadic, lots of TCP timeouts, retransmissions, duplicate acks, packets arrive out of order etc. (resulting in much worse overall performance than 100M).

I currently have:

// edit: please skip to Post #2 for a more sophisticated attempt of solving this, including explanation of pll-regs meaning

&mdio0 {
	status = "okay";

	phy-mask = <0x1>;

	phy0: ethernet-phy@0 {
		reg = <0>;
		phy-mode = "rgmii";
	};
};

&eth0 {
	status = "okay";

	phy-mode = "rgmii";
	phy-handle = <&phy0>;

	pll-data = <0x0804081e 0x00000101 0x040800c0>;
};

Even when setting

fixed-link {
	speed = <100>;
	full-duplex;
};

there is no auto-negotiation, i.e. it would only work when using a 100M switch or by manually setting the computer's network card to 100M.
In 1000M mode, the link state is reported correctly as 1000M on both ends, but not stable.

I see that ar7242.dtsi overwrites the pll settings for eth0:

	pll-data = <0x16000000 0x00000101 0x00001616>;
	pll-reg = <0x4 0x2c 17>;
	pll-handle = <&pll>;

It's curious though that 17 / 0x11 is not 32bit-aligned, also there is no corresponding register in the AR7242 datasheet (besides DDR_PLL_CONFIG and ETH_XMII, which seem plausible), but when dumping the regs via uboot i get this:

ar7240> md 18050004 1
18050004: 0804081e    ....
ar7240> md 1805002c 1
1805002c: 00000101    ....
ar7240> md 18050010 1
18050010: 00040800    ....
ar7240> md 18050014 1
18050014: c00a0000    ....
ar7240>

which I initially attempted to interpret as

pll-data = <0x0804081e 0x00000101 0x040800c0>;

The phy is using an external 25 MHz crystal, so all the pll stuff should probably just be related to the SoC clock when communicating with the phy...

So, what exactly does pll-data do, will it overwrite the SoC PLL registers starting at 0x18050000 with those values, or will it just use them to calculate stuff for the phy timing?

The only register using the address 0x11 would be PHY_STATUS that can be accessed via MDIO, but that one is read-only (the contents being 0x7c52 after uboot initialisation). So, what exactly is the third cell for?

I could not find much about phy pll in the devicetree bindings Documentation, however I stumbled on this (not sure if eee is even relevant for such an old phy):

From source/Documentation/devicetree/bindings/net/phy.txt :

- eee-broken-100tx:
- eee-broken-1000t:
- eee-broken-10gt:
- eee-broken-1000kx:
- eee-broken-10gkx4:
- eee-broken-10gkr:
  Mark the corresponding energy efficient ethernet mode as broken and
  request the ethernet to stop advertising it.

Also I see there are modes that change the configuration of additional timing used with other ath79 devices; the AR8021 databrief states

RGMII timing modes: support both internal delay and external delay on both Rx and Tx paths

but trying rgmii-id or rdmii-rxid did not change anything either.

I only found a driver for ar8031, so ar8021 should probably work with a generic one (I guess also ar8031 would just include additional functionality besides the IEEE compliant gmii interface!?)

So at this moment, I'm not sure what could be the next thing to try...

Any help or documentation regarding the pll-data dt-binding is appreciated :slight_smile:

Just a little progress, while digging deeper into the rabbit hole... :innocent:

I'll leave this here for anyone else who might be wondering "what the hell are pll-data cells in the phy dts nodes actually doing?"

Initially found this kernel documentation on the PLL abstraction layer quite helpful, should have looked there first, rather than checking the docs for Devicetree bindings:

https://www.kernel.org/doc/html/latest/networking/phy.html

Then accidentally stumbled on some ar71xx mach files while grepping, and they have statements writing to a pll data structure e.g. like this:

ath79_eth0_pll_data.pll_1000 = 0x0e000000;
ath79_eth0_pll_data.pll_100 = 0x00000101;
ath79_eth0_pll_data.pll_10 = 0x00001313;

So as it turns out, the three pll-data cells in the .dts are actually the PLL settings for the individual link speeds 1000 / 100 / 10, e.g. in ar7242.dtsi we find:

    pll-data = <0x16000000 0x00000101 0x00001616>;
    pll-reg = <0x4 0x2c 17>;
    pll-handle = <&pll>;

For ath79 SoC's, the &pll handle refers to the base address 0x18050000, while the cells of pll-reg represent the PLL config register, the GMAC PLL register and the GMAC config bits position (shift value):

  • 0x04 is the PLL config register (at 0x18050004), where the reset and nopwdown bits for the RGMII PLL are located.
  • 17 is the shift value for the config register, defining the positions of these bits (17 for GMAC0, 19 for GMAC1).
  • 0x2c is the actual PLL config register for ar7242 GMAC0, where the value from the pll-data cell for the current link speed will be written

This is also why for using the recovery (e.g. on DAP-2360), the ethernet cable needs to be connected before entering the recovery: The PLL initialisation will happen only once, so whichever link speed has been negotiated before entering the recovery will be used for the pll setting until reboot.

To find out all three values of pll-data, set your computers network card to 1000, 100 and 10 Mbit subsequently, enter the uboot shell and dump the content of the PLL register for the currently active GMAC node and link speed:

#define AR71XX_PLL_REG_ETH0_INT_CLOCK	0x10
#define AR7242_PLL_REG_ETH0_INT_CLOCK	0x2c
#define AR913X_PLL_REG_ETH0_INT_CLOCK	0x14

#define AR71XX_PLL_REG_ETH1_INT_CLOCK	0x14
# nope, AR7242 only has one GMAC...
#define AR913X_PLL_REG_ETH1_INT_CLOCK	0x18

For AR7242 there is only GMAC0, so in the uboot shell we would just do

md 1805002c 1

and get the values 0x12000000 for 1000 Mbit, 0x00000101 for 100 Mbit and 0x00001313 for 10 Mbit.

Putting this into the .dts to override &eth0 pll-data, aaand... it doesn't work :sweat_smile:

Tried to use phy-mode = "rgmii-id", no change so far; iperf shows 96 Mbit/s on 100M but only like 500k for 1000M.

Will further try different combinations of rgmii-id, -txid, -rxid and so on.
Assuming they need to be set both in the &eth0 and the &phy0 node, or is this redundant?

Any advice is still appreciated :slightly_smiling_face:

1 Like

Since I'm just updating the wiki on the DAP-2xxx devices, I will at least add a short note on DAP-2360 and DAP-2553 here:

Could this comment be related?

Maybe the AR72xx devices have not been successfully tested when using external PHYs yet, considering these devices are usually quite dated, hence most of these wouldn't have enough memory for porting to ath79 anyways.

1 Like

I have an MR12 ath79 PR up. Same SoC, same AR8021. It uses the AR7242's FE PHY with one of its GMACs, and the AR7242's other GMAC with the AR8021. I also did the MR16 ath79 port, same AR8021 but the AR7161 CPU uses different pll-data. Here's MR12: https://github.com/openwrt/openwrt/pull/3634.

Also the AR7242 spec sheet: http://www.sarimesh.net/wp-content/uploads/2016/10/AR7242_datasheet.pdf

The MR12 PR has an important conversation about how to identify the state of the GMAC+GMAC/FE-Switch.

I don't have a moment to read your code but the aforementioned conversation in the MR12 may be enlightening.

Thanks, I saw that PR a while ago (it was probably where I also found the link to the discussion in PR #1146), but hadn't followed all the details since then.

It's curious to see there's actually two GMAC's in ar7242, though it's been too long ago for me to recall how I ended up with assuming there was only one... :innocent: wonder why they would use it though, as there is only one Gigabit Port on DAP-2360 (maybe such weird setup is required for hardware-NAT from WAN to WLAN?)

I hope I can find some time on the weekend to dig this device back out and continue testing :slight_smile: Also there would be DAP-2553 and DAP-1353 Rev. B that could profit from this.
Curiously, I managed to get the external Gigabit PHY working on another ar724x device (Zyxel NWA-112x), so I wonder what is different with the D-Link, and how the second GMAC could be used to get the first working (which indeed works already, but only when the link partner would force 100Base-T) :thinking:

for
ar7240
ar7241

GMAC0:

  • MII interface to internal PHY4
  • 1 port 100/10

GMAC1:

  • GMII interface to internal switch
  • internal PHY0 - PHY3, 4 ports 100/10

for
ar7242

GMAC0:

  • RGMII interface, no PHY

GMAC1:

  • GMII interface to internal MAC / PHY block
  • 1 port at 100/10

GMAC1 is very similar to the internal switch in both ar7240 and ar934x series, but instead of many ports on a switch it is just a single port

set phy-mode to rgmii-id and try all of these for pll-data
(the first 2 digits of the first element)

0x02
only Gig E

0x0e
TX delay and Gig E

0x1a
both delays (medium) and Gig E

0xae
invert, both delays, Gig E

0x03
Gig E and Offset Phase

0x16
both delays (small) and Gig E

0x06
TX delay and Gig E (small)

0x3e
both delays and Gig E (small)

0x9a
invert, both delays, Gig E (small)

0x92
invert, rx delays, Gig E

0xbe
invert, both delays, gig E

0x82
invert clock, Gig E

can u put the full dts setting?

Hi, the latest state of .dts contained these settings:

&mdio0 {
	status = "okay";

	phy0: ethernet-phy@0 {
		reg = <0>;
		eee-broken-1000t;
	};
};

&eth0 {
	status = "okay";

	pll-data = <0x12000000 0x00000101 0x00001313>;
	phy-mode = "rgmii-txid";
	phy-handle = <&phy0>;
};

I hadn't yet tested with the changes from the latest kernel though, last commit timestamp is from september or something...

probably that one bit (1) in speed 1000 is not needed

pll-data = <0x02000000 0x00000101 0x00001313>;