Edgerouter 4, Port numbers are a real mess in the log

Port 1 receive error code 7, packet dropped

I am receiving a lot of this error in system log, it can be 7 or 10 or 11 so the number changes but it is flooding the log.

It is both in 21.02-rc1 and today’s 21.02-snapshot, I kind of hoped for luck and it would be gone after I installed todays snapshot.

It is only in the Edgerouter 4, never seen the error in WRT3200ACM.

I found a similar forum post https://forum.openwrt.org/t/port-0-receive-error-code-10-packet-dropped/67049 that turned out to be a broken Ethernet cable, I tried to remove the cable attached to the port 1 (eth1) but that cable really only runs to the patch panel and has no actual device connected. And it didn’t make any difference for the errors.

What makes the theory of broken hardware worse is that port 2 (eth 2) is an identical twin in the network setup port 1 and port 2 has never this error, so could the router hardware be broken?

So I tried to move one of my computers from my external switch to ER4 port 1 (eth 1) to see if it works. It does work and the errors got worse in the log when I made a speed test.

But in the log I looked at the information above the port errors that was written when I changed the computer ethernet cable and found a disturbing thing!?

Sat Jul 10 11:42:37 2021 kern.info kernel: [ 3594.470962] switch0: port 4(lan3) entered disabled state
Sat Jul 10 11:42:37 2021 daemon.notice netifd: Network device 'lan3' link is down
Sat Jul 10 11:42:58 2021 kern.notice kernel: [ 3615.941460] lan1: 1000 Mbps Full duplex, port 2, queue 2
Sat Jul 10 11:42:58 2021 kern.info kernel: [ 3615.946870] switch0: port 2(lan1) entered blocking state
Sat Jul 10 11:42:58 2021 kern.info kernel: [ 3615.952201] switch0: port 2(lan1) entered forwarding state
Sat Jul 10 11:42:58 2021 daemon.notice netifd: Network device 'lan1' link is up

Isn’t the port numbers in synch with OpenWRT LAN numbers and the hardware painted Eth numbers for EdgeRouter 4?

Because this would imply that Port 1 error is the internet cable on LAN 0/eth0?

Now that I know Port 1 is actually LAN0/eth0 it was easier to find the fault of the dropped packets. It turned out to be in the APC UPS gigabit HV-protection where they went missing.

Made a bug report https://bugs.openwrt.org/index.php?do=details&task_id=3923

The issue is that the Network labels the ports as lanX rather than ethX.

base-files/etc/board.d/01_network:      ucidef_set_interfaces_lan_wan "eth1 eth2" "eth0"
base-files/etc/board.d/01_network:      ucidef_set_interfaces_lan_wan "lan1 lan2 lan3" "lan0"
base-files/etc/board.d/01_network:      ucidef_set_interfaces_lan_wan "lan1 lan2 lan3 lan4 lan5" "lan0"
base-files/etc/board.d/01_network:      ucidef_set_interfaces_lan_wan "eth0" "eth1"
files/arch/mips/boot/dts/cavium-octeon/cn7130_ubnt_edgerouter-6p.dts:                   label = "lan5";
files/arch/mips/boot/dts/cavium-octeon/cn7130_ubnt_edgerouter-6p.dts:                   label = "lan3";
files/arch/mips/boot/dts/cavium-octeon/cn7130_ubnt_edgerouter-6p.dts:                   label = "lan4";
files/arch/mips/boot/dts/cavium-octeon/cn7130_ubnt_edgerouter-4.dts:                    label = "lan3";
files/arch/mips/boot/dts/cavium-octeon/cn7130_ubnt_edgerouter-e300.dtsi:                        label = "lan0";
files/arch/mips/boot/dts/cavium-octeon/cn7130_ubnt_edgerouter-e300.dtsi:                        label = "lan1";
files/arch/mips/boot/dts/cavium-octeon/cn7130_ubnt_edgerouter-e300.dtsi:                        label = "lan2";
patches-5.10/700-allocate_interface_by_label.patch:label = "lan0";
patches-5.4/700-allocate_interface_by_label.patch:label = "lan0";

From target/linux/octeon/base-files/etc/board.d/01_network
image

I had heard people talk about the lan labels causing issues, but I'm not sure if it's going to be changed or not.

https://edgewaternetworks.secure.force.com/kb/articles/KnowledgeBase_Solution/10861 has a list of the Error codes..

They are listing Code 7 as 7 = GMX FCS error: the RGMII packet had an FCS error.

The LANx thing was programmed right from the beginning from PHYx when the device was introduced to OpenWRT.

I have no real idea why because the white paint on the black tin can say ethx. Why couldn’t we just call the ports what they actually is named?

I guess we have to be happy that it actually isn’t a port hardwired as WAN in OpenWRT because that would be a complete disaster :smiley:

1 Like

But the errors seems to be gone if I don't run the data through the over voltage protection in the UPS. But the strange thing is that the WRT3200ACM doesn’t se this problem?

I supposed you can call it whatever you want :slight_smile: When I brought up the Itus devices, I went with an eth0/eth1/eth2 format and matched it to the labels on the device (eth0 as WAN, eth1/2 as LAN). Confusing the issue is that "eth0" will default to "xxx1" because "lan0" looks silly, I guess..

However, I don't have a switch to worry about, either. I could set any of the ports to be WAN and get away with it in software.

I'm working on the ER-10x and I'm going to run into similar issues, I think. Not only dealing with 2 separate switches and how they are labelled, but that one of the switches isn't even DSA.

1 Like

In DSA I think unlabeled ports are so much better then a port called WAN or LAN because it gives more freedom when building a bigger system.
Even in wrt3200acm when the ports are called WAN and LAN1-4 that is really only paint on the plastic.
Inside the hardware every port is is only defined in the switch setup. But in OpenWRT the ports are labeled as WAN and LAN1-4, which is out of sync to the actual numbers on the actual switch port numbers.
It feels more like it has always been a wan port and lanx ports and then it always has to be so.
Think about how many would file complaint if home routers would come from factory without this painted?:joy:

But the edgerouters are not really made for home users primarily so even from factory the ER4 comes without setup or port label. What you whant to use the ports for is up to you.

But the biggest problem in this post is that if the log say I have a problem on Port 3, then it actually means Port 2.
Where have Port 0 gone in the system that is handling the system log data?
So now if you get a Portx error the fault finding code is like this:
Port0= ?
Port1= LAN0/Eth0.
Port2= LAN1/Eth1.
Port3= LAN3/Eth3.
Port4= LAN4/Eth4.

What makes everything more confusing is that we from rc2 also have ports without numbers in the DSA setup directly connected to the LANx as actual ports.

Programmers have a saying: There are 10 people in the world, the ones that know the binary language and the ones that don’t.

You are probably right, 99% of the people in the world don’t know that 0 is a number with a important value in binary language.
This makes a mess for the ones that actually reads 0 as a significant value.

But this problem was in Windows also up to version 7.
If you named files 1,2,3,4,5,6,7,8,9,10 and said to windows to sort them you got:
10
1
2
3

9

But if you named them 01,02,03,04,05,06,07,08,09,10
You got
01
02
03

10

Now when the Port1 fault is gone. It seems that we actually have a Port0 that have error 10, packages dropped. Actually as early as in the kernel log just after the port initialization (without any Port0!?)
But Edgerouter 4 doesn’t have a Eth-1 so as far as I can guess that means the CPU port?

10 = length mismatch error: the RGMII packet had a length that did not match the length field in the L2 HDR.

It this fault really anything to put energy into? It doesn’t flood the log anymore. Everything works as expected.
It seems more like the EdgeRouter4 simply have better functions to detect the fault than WRT3200ACM because they both used the same UPS that gave the first fault. But the WRT3200ACM don’t see anything wrong according to the log?

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.