[kernel 5.4.52] netifd / mvneta - PHY_INTERFACE state issue

{"kernel":"5.4.52","hostname":"OpenWrt","system":"ARMv7 Processor rev 1 (v7l)","model":"Turris Omnia","board_name":"cznic,turris-omnia","release":{"distribution":"OpenWrt","version":"SNAPSHOT","revision":"r13958-bcd7a0c095","target":"mvebu/cortexa9","description":"OpenWrt SNAPSHOT r13958-bcd7a0c095"}}


There appears to be some sort of (regression) [netifd?] bug

 kernel: [   14.688110] ------------[ cut here ]------------
 kernel: [   14.688125] WARNING: CPU: 1 PID: 3552 at drivers/net/ethernet/marvell/mvneta.c:3490 mvneta_start_dev+0x2b0/0x2b4
 kernel: [   14.688347] CPU: 1 PID: 3552 Comm: netifd Not tainted 5.4.52 #0
 kernel: [   14.688349] Hardware name: Marvell Armada 380/385 (Device Tree)
 kernel: [   14.688362] [<c010ee8c>] (unwind_backtrace) from [<c010aff4>] (show_stack+0x10/0x14)
 kernel: [   14.688373] [<c010aff4>] (show_stack) from [<c0739038>] (dump_stack+0x94/0xa8)
 kernel: [   14.688383] [<c0739038>] (dump_stack) from [<c0127648>] (__warn+0xbc/0xd8)
 kernel: [   14.688388] [<c0127648>] (__warn) from [<c01276b4>] (warn_slowpath_fmt+0x50/0x94)
 kernel: [   14.688393] [<c01276b4>] (warn_slowpath_fmt) from [<c053bc40>] (mvneta_start_dev+0x2b0/0x2b4)
 kernel: [   14.688398] [<c053bc40>] (mvneta_start_dev) from [<c053bd90>] (mvneta_open+0x14c/0x288)
 kernel: [   14.688410] [<c053bd90>] (mvneta_open) from [<c05f0d8c>] (__dev_open+0xb0/0x138)
 kernel: [   14.688416] [<c05f0d8c>] (__dev_open) from [<c05f111c>] (__dev_change_flags+0x148/0x1a4)
 kernel: [   14.688421] [<c05f111c>] (__dev_change_flags) from [<c05f1190>] (dev_change_flags+0x18/0x48)
 kernel: [   14.688433] [<c05f1190>] (dev_change_flags) from [<c0619390>] (dev_ifsioc+0x294/0x2f8)
 kernel: [   14.688438] [<c0619390>] (dev_ifsioc) from [<c06197d0>] (dev_ioctl+0x31c/0x5c4)
 kernel: [   14.688445] [<c06197d0>] (dev_ioctl) from [<c05cc388>] (sock_ioctl+0x3f4/0x58c)
 kernel: [   14.688453] [<c05cc388>] (sock_ioctl) from [<c0254d74>] (do_vfs_ioctl+0x9c/0x8cc)
 kernel: [   14.688458] [<c0254d74>] (do_vfs_ioctl) from [<c02555d8>] (ksys_ioctl+0x34/0x60)
 kernel: [   14.688463] [<c02555d8>] (ksys_ioctl) from [<c0101000>] (ret_fast_syscall+0x0/0x54)
 kernel: [   14.688466] Exception stack(0xebd97fa8 to 0xebd97ff0)
 kernel: [   14.688471] 7fa0:                   0003e004 00000000 00000008 00008914 bed5ea88 bed5ea80
 kernel: [   14.688475] 7fc0: 0003e004 00000000 00000001 00000036 b6e806a9 b6eca8f0 b6e80684 0003e790
 kernel: [   14.688478] 7fe0: 0003dd3c bed5ea68 0001552c b6f13890
 kernel: [   14.688480] ---[ end trace bedf9403d1df59de ]---

The kernel source code points to https://github.com/torvalds/linux/blob/v5.4/drivers/net/ethernet/marvell/mvneta.c#L3490

state->interface == PHY_INTERFACE_MODE_SGMII


There is another such report Davidc502- wrt1200ac wrt1900acx wrt3200acm wrt32x builds - #5349 by shm0

Entered master post:

./scripts/getver.sh r13831 
52cdd6185ec3320fe58de992c4a1750794082bfd

my finger is currently pointing at the 5.4.52 push, the changelog has some mvneta changes. The 5.4.53 bump on the ML does not compile for this target for me, but there was a 5.4.54 bump today, so maybe reconcile the current 5.4 mvebu patch set against that to see if the issue lies there.

1 Like

There are just two patches related to SGMII and only this one would appear remotely / potentially applicable

considering the node's log

mvneta f1034000.ethernet eth2: switched to inband/1000base-x link mode

though not sure why it would cause an issue for netifd.

Decided to just short circuit the breakage on the 5.4.53 ML patch by removing the two sfp patches that were breaking things. Flashed image to a mamba and the error is gone, more mvneta in the changelog

1 Like

This is fixed in .53

Did the same but i also have this weird bug, for a while now.

If i enable rx/tx Flow Control on any of my ports my upload speed drops from 37 Mbit/s to 10 Mbits/s, this is independent of the OS, testsystem or network card.
As example i have a separate unmanaged switch behind my WRT-1200AC, which always enables RX/TX checksumming and any device behind this switch will than also just get 10 MBits/s uploads.

mv88e6085 f1072004.mdio-mii:00 wan: Link is Up - 1Gbps/Full - flow control rx/tx
mv88e6085 f1072004.mdio-mii:00 lan1: Link is Up - 1Gbps/Full - flow control rx/tx

vs

mv88e6085 f1072004.mdio-mii:00 wan: Link is Up - 1Gbps/Full - flow control rx/tx
mv88e6085 f1072004.mdio-mii:00 lan1: Link is Up - 1Gbps/Full - flow control off

Maybe someone else can reproduce this?

PS: My download speeds stay the same and don't suffer from this weird bug. I think i first noticed this a while ago, when switching from the 4.x to the 5.4 kernel images.

I have Flow Control enabled and don't have such problems.
Is it even possible to disable Flow Control via ethtool on the WRT* devices? Last time I tried it didn't work.
However, when I disable Flow control on any device that is connected to the router Flow Control gets correctly disabled.

Can't say I have noticed anything along this line in my environs.

Thats how i do it, only problem is the unmanaged switch, since i can't disable flow-control on it, therefor any device behind it even if i disable flow-control gets reduced speeds. Kinda sucks, since only devices i can directly connect to the router ports will get max. uploads, if i also disable flow-control.

@anomeome

I also just observed this, I have a TPE-TG44ES switch and my upload speed is capped at 11Mbps, with a dumb switch it's working as it should
I don't recall when the slow speed began

[20311.763328] mv88e6085 f1072004.mdio-mii:00 lan1: Link is Up - 1Gbps/Full - flow control rx/tx

Upload

Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.160.20.20, port 1655
[  5] local 10.160.20.1 port 5201 connected to 10.160.20.20 port 1656
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  1.11 MBytes  9.33 Mbits/sec
[  5]   1.00-2.00   sec  1.13 MBytes  9.51 Mbits/sec
[  5]   2.00-3.00   sec  1.13 MBytes  9.50 Mbits/sec
[  5]   2.00-3.00   sec  1.13 MBytes  9.50 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-3.00   sec  3.49 MBytes  9.76 Mbits/sec                  receiver
iperf3: the client has terminated

Download

-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.160.20.20, port 1830
[  5] local 10.160.20.1 port 5201 connected to 10.160.20.20 port 1831
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  48.8 MBytes   410 Mbits/sec    0    218 KBytes
[  5]   1.00-2.00   sec  56.2 MBytes   472 Mbits/sec    0    218 KBytes
[  5]   2.00-3.00   sec  95.0 MBytes   797 Mbits/sec    0    218 KBytes
[  5]   3.00-4.00   sec   112 MBytes   944 Mbits/sec    0    218 KBytes
[  5]   4.00-5.00   sec  93.8 MBytes   786 Mbits/sec    0    218 KBytes
[  5]   4.00-5.00   sec  93.8 MBytes   786 Mbits/sec    0    218 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-5.00   sec   421 MBytes   707 Mbits/sec    0             sender
iperf3: the client has terminated
-----------------------------------------------------------

I'm talking about lan ports, not wan port

LE:
Update to .62 seems to fix the issue !

1 Like

Has this feature reappeared. I am seeing low throughput on both 5.4.x and 5.10.x images, interestingly only on a mamba target, no issue noticed on rango.

I only have 3200acm using 5.10.5 no issues, as you mentioned

1 Like