Help with RTL8211F / RK3588 (TX Issues)

Hi, I am currently working through enabling the RK3588 NanoPi R6S on Linux kernel 6.1.

I have noticed that both the 2.5Gbit PCIE attached RTL8125B interfaces work as expected however the GMAC / MDIO connected RTL8211F interface can never obtain an IP or ping any other clients.

After some digging I found that non of the traffic transmitted from the 8211F interface was seen on the other side of the connection in TCP dump, however on the 8211F side traffic is seen being received (see below)

listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
00:03:34.282286 ARP, Request who-has 192.168.1.229 tell 192.168.1.154, length 46
00:03:35.648448 ARP, Reply 192.168.1.234 is-at 1c:53:f9:1a:ba:3a (oui Unknown), length 46
00:03:43.220916 ARP, Request who-has 192.168.1.226 tell 192.168.1.218, length 46
00:03:47.585991 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 3e:b8:b6:91:30:59 (oui Unknown), length 290
00:03:49.368303 IP 192.168.1.226.10101 > 239.255.255.251.10101: UDP, length 45
00:03:49.368304 IP 192.168.1.226.10101 > 224.0.0.250.10101: UDP, length 45
00:03:54.703367 ARP, Request who-has 192.168.1.162 (Broadcast) tell 192.168.1.162, length 46
00:03:54.703368 ARP, Reply 192.168.1.162 is-at 8c:3b:ad:ba:b4:d6 (oui Unknown), length 46
00:03:55.628557 ARP, Request who-has 192.168.1.180 tell 192.168.1.218, length 46
00:03:56.631452 ARP, Request who-has 192.168.1.180 tell 192.168.1.218, length 46
00:03:57.635842 ARP, Request who-has 192.168.1.180 tell 192.168.1.218, length 46
00:03:57.780110 ARP, Request who-has 192.168.1.180 (Broadcast) tell 192.168.1.180, length 46
00:03:57.780110 ARP, Reply 192.168.1.180 is-at 8c:3b:ad:ba:b4:d6 (oui Unknown), length 46

Netstat shows no dropped packets on the TX side (eth0 in the list) and interrupts being received

netstat -i eth0 -e
Kernel Interface table
enP3p49s0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether be:8e:ef:94:cf:db  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enP4p65s0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 1a:bd:25:53:6a:f0  txqueuelen 1000  (Ethernet)
        RX packets 135  bytes 12380 (12.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 110  bytes 10470 (10.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 3e:b8:b6:91:30:59  txqueuelen 1000  (Ethernet)
        RX packets 173  bytes 19168 (18.7 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 30  bytes 9960 (9.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 57  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 152  bytes 12560 (12.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 152  bytes 12560 (12.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

And finally as far as I can tell things look OK in ethtool

sudo ethtool eth0
Settings for eth0:
        Supported ports: [ TP    MII ]
        Supported link modes:   10baseT/Full
                                100baseT/Full
                                1000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Full
                                100baseT/Full
                                1000baseT/Full
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                             100baseT/Half 100baseT/Full
                                             1000baseT/Full
        Link partner advertised pause frame use: Symmetric Receive-only
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 1000Mb/s
        Duplex: Full
        Auto-negotiation: on@
        master-slave cfg: preferred slave
        master-slave status: slave
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: external
        MDI-X: Unknown
        Supports Wake-on: ug
        Wake-on: d
        Current message level: 0x0000003f (63)
                               drv probe link timer ifdown ifup
        Link detected: yes

Can anyone point me in the right direction to diagnose this further ? Or even if they have seen this issue elsewhere?

Thanks in advance

Setup rgmii tx-delay on GMAC side (or rgmii rx-delay on rtl8211f) to correct synchronization.

I copied the delay values and rgmii mode from the BSP kernel DTS (note I have tried with and without the commented rx-delay value)

&gmac1 {
	/* Use rgmii-rxid mode to disable rx delay inside Soc */
	phy-mode = "rgmii-rxid";
	clock_in_out = "output";

	snps,no-vlhash;
	snps,reset-gpio = <&gpio3 RK_PB7 GPIO_ACTIVE_LOW>;
	snps,reset-active-low;
	/* Reset time is 20ms, 100ms for rtl8211f */
	snps,reset-delays-us = <0 20000 100000>;

	pinctrl-names = "default";
	pinctrl-0 = <&gmac1_miim
		     &gmac1_tx_bus2
		     &gmac1_rx_bus2
		     &gmac1_rgmii_clk
		     &gmac1_rgmii_bus>;

	tx_delay = <0x42>;
	/* rx_delay = <0x4f>; */

	phy-handle = <&rgmii_phy1>;
	status = "okay";
};

&mdio1 {
	rgmii_phy1: phy@1 {
		compatible = "ethernet-phy-ieee802.3-c22";
		reg = <0x1>;
	};
};

I don't know what tx-delay = <0x42>; mean for mac driver. Usually RGMII require 1.5ns or 2 ns delays. So you can try remove all delays from mac side and configure all delays on phy side. At least realtek phy driver support both delays configuration. Something like:

&mdio1 {
	rgmii_phy1: phy@1 {
		compatible = "ethernet-phy-ieee802.3-c22";
		reg = <0x1>;
		phy-mode = "rgmii-id";
	};
};

Unfortunately I have tried multiple combinations of the phy mode on both gmac1 and phy

As well as a few different gmac rx / tx delay values with the same results.

Wouldn't a sync issue cause dropped packets ?

The number of TX packets tracked on the interface statistics lines up with what is expected to be sent

Also dumped the GMAC configuration registers on OEM and 6.1

OEM

Linux localhost 5.10.110 #7 SMP Sun Mar 19 08:47:02 GMT 2023 aarch64 GNU/Linux
[   25.647045] rk_gmac-dwmac fe1c0000.ethernet: rk3588_set_to_rgmii - RK3588_GRF_GMAC_CON0 : 544
[   25.647117] rk_gmac-dwmac fe1c0000.ethernet: rk3588_set_to_rgmii - RK3588_GRF_CLK_CON1 : 512
[   25.647149] rk_gmac-dwmac fe1c0000.ethernet: rk3588_set_to_rgmii - RK3588_GRF_GMAC_CON7 : 0
[   25.647179] rk_gmac-dwmac fe1c0000.ethernet: rk3588_set_to_rgmii - offset_con reg : 0
[   25.647207] rk_gmac-dwmac fe1c0000.ethernet: rk3588_set_to_rgmii - offset_con val : 804
[   25.647237] rk_gmac-dwmac fe1c0000.ethernet: rk3588_set_gmac_speed - RK3588_GRF_CLK_CON1 : 512
[   25.650571] rk_gmac-dwmac fe1c0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[   25.650643] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8 (xid=0xa5d92c07)
DHCPOFFER of 192.168.1.157 from 192.168.1.1
DHCPREQUEST for 192.168.1.157 on eth0 to 255.255.255.255 port 67 (xid=0x72cd9a5)
DHCPACK of 192.168.1.157 from 192.168.1.1 (

6.1

Linux localhost 6.1.0 #41 SMP PREEMPT_DYNAMIC Sun Mar 19 08:51:51 GMT 2023 aarch64 GNU/Linux
[   15.528232] rk_gmac-dwmac fe1c0000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
[   15.529783] rk_gmac-dwmac fe1c0000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-1
[   19.706892] rk_gmac-dwmac fe1c0000.ethernet: rk3588_set_to_rgmii - RK3588_GRF_GMAC_CON0 : 544
[   19.707644] rk_gmac-dwmac fe1c0000.ethernet: rk3588_set_to_rgmii - RK3588_GRF_CLK_CON1 : 512
[   19.708384] rk_gmac-dwmac fe1c0000.ethernet: rk3588_set_to_rgmii - RK3588_GRF_GMAC_CON7 : 0
[   19.709133] rk_gmac-dwmac fe1c0000.ethernet: rk3588_set_to_rgmii - offset_con reg : 0
[   19.709819] rk_gmac-dwmac fe1c0000.ethernet: rk3588_set_to_rgmii - offset_con val : 804
[   19.710519] rk_gmac-dwmac fe1c0000.ethernet: rk3588_set_gmac_speed - RK3588_GRF_CLK_CON1 : 512
[   19.714047] rk_gmac-dwmac fe1c0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8 (xid=0x4a28f749)
[   32.052485] vdd_gpu_s0: disabling
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8 (xid=0x4a28f749)
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 4 (xid=0x4a28f749)
No DHCPOFFERS received.

And they all match

Ok. Does OEM include realtek phy-driver (or hardcoded phy init procedure)? Do you include realtek phy-driver?

From the looks of it they just use the mainline one from 5.10 unmodified

See https://github.com/friendlyarm/kernel-rockchip/blob/nanopi5-v5.10.y_opt/arch/arm64/configs/nanopi6_linux_defconfig#L802

I'm starting to think it's a regression in stmmac but hard to tell where

It's worth to compare realtek phy-driver too

I ported the OEM phy driver with exactly the same results. :confused: Going to get stats and register dump from ethtool on both 6.1 and OEM 5.10

@123serge123 i have also noticed

6.1

     tx_pkt_n: 218
     rx_pkt_n: 1014
     normal_irq_n: 944
     rx_normal_irq_n: 936
     napi_poll: 3044
     tx_normal_irq_n: 8
...
     irq_tx_path_in_lpi_mode_n: 49
     irq_tx_path_exit_lpi_mode_n: 49
     irq_rx_path_in_lpi_mode_n: 2
     irq_rx_path_exit_lpi_mode_n: 2
     phy_eee_wakeup_error_n: 1

Stock

     tx_pkt_n: 98
     rx_pkt_n: 143
     normal_irq_n: 127
     rx_normal_irq_n: 124
     napi_poll: 312
     tx_normal_irq_n: 127
.....
     irq_tx_path_in_lpi_mode_n: 50
     irq_tx_path_exit_lpi_mode_n: 49
     irq_rx_path_in_lpi_mode_n: 0
     irq_rx_path_exit_lpi_mode_n: 0
     phy_eee_wakeup_error_n: 0

So almost no TX IRQ's and eee wakeup errors on stock, but i gues that could also be linked to the delay issue :confused:

Found the fix, their is a mismatch in mainline vs downstream uboot that causes its clock "fix up" based on the kernels DTB set the wrong clocks for (at least) GPLL and CPLL.

Fixing this the RGMII interface now works as intended.

I have also ported the rockchip loop-back tuning code to 6.1

2 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.