BPI-R4 10G Ethernet transceiver frequently disconnects

Hello! I hope it is okay to ask here, as this is primarily a hardware question, and not really an oWRT issue.

I am using a BPI-R4 as my main router, running the latest release candidate at the time of writing this (OpenWrt 24.10.0-rc4 r28211-d55754ce0d). I had been using older versions before, with the same issue. Originally I was supplying the BPI-R4 with the weak factory issued 24W PSU, I have switched to a 65W USB-C PD PSU for the time being to exclude one possible cause.

My desktop PC ("Killer" E3100G 2.5G NIC) is currently connected to one of the SFP+ ports with a 10Gtek ASF-10G2-T 10Gbit Ethernet transceiver (Marvell AQR113C, according to the manufacturer). It works, as in I can get stable 2.35 Gbit with iperf, but every now and then (sometimes works for hours, sometimes crashes every 10 seconds) the link disconnects and reconnects.

Sys log of such an event:

Sat Dec 28 16:40:00 2024 daemon.notice netifd: Network device 'eth1' link is down
Sat Dec 28 16:40:00 2024 kern.info kernel: [ 3777.108737] mtk_soc_eth 15100000.ethernet eth1: Link is Down
Sat Dec 28 16:40:00 2024 kern.info kernel: [ 3777.114719] br-lan: port 1(eth1) entered disabled state
Sat Dec 28 16:40:06 2024 kern.info kernel: [ 3782.935451] mtk_soc_eth 15100000.ethernet eth1: Link is Up - 2.5Gbps/Full - flow control rx/tx
Sat Dec 28 16:40:06 2024 kern.info kernel: [ 3782.935477] br-lan: port 1(eth1) entered blocking state
Sat Dec 28 16:40:06 2024 kern.info kernel: [ 3782.949295] br-lan: port 1(eth1) entered forwarding state
Sat Dec 28 16:40:06 2024 daemon.notice netifd: Network device 'eth1' link is up

ip -s -s link show output shows CRC RX errors (not counting up at every disconnect) and transns TX errors (counting up at every disconnect):

root@nexus:~# ip -s -s link show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether ce:57:c4:7f:88:de brd ff:ff:ff:ff:ff:ff
    RX:  bytes packets errors dropped  missed   mcast
      11890970   66116      0     151       0       0
    RX errors:  length    crc   frame    fifo overrun
                     0      4       0       0       0
    TX:  bytes packets errors dropped carrier collsns
      66402906   76715      0       0       0       0
    TX errors: aborted   fifo  window heartbt transns
                     0      0       0       0     119

I am using a 1m CAT7 cable to connect the transceiver to the patch panel. The patch panel is a CAT6a model. There is an approximate 10m of CAT7 cable inside the walls connected on the other end to a CAT6a outlet. From the outlet to my PX I am using another meter of CAT7 cable.

Is there any way to further debug this issue? Is it a transceiver issue? A cable issue? A hardware issue of the BPI-R4? I can't physically move my desktop over to the other room, and I can't move the router to my room without shutting my whole network down for good, so I can't really test with a short direct connection.

The metrics reported by the transceiver seem uneventful:

Can you connect the PC to one of the other (1gbit) ports over the same cabling to see if that is stable?

It's not really like for like but lacking additional options, it would you at least some indication on cabling and desktop NIC (some of those 2.5gbit things are let's say substandard).

Personally, I'd suspect the transceiver. 2.5gbit is a weird mode for many of them. IF the above is stable, try running 10gbit transceiver at 1gbit for a while.

1 Like

The PC connected to a 1 Gbit port on the BPI works fine. I guess could patch some other device in the house that only has a 1 Gbit NIC to the transceiver and see what happens. Thank you!

In this case, my suspicion would be the transceiver but hard to be completely sure...

Any mention of eth0 in that log as well?

I had the issue that the whole eth0 was gone and the driver / kernel module had to be reloaded.