Severe WAN performance degradation when Fast Ethernet devices are connected to a Gigabit switch and transfer data over LAN (probably present on most OpenWrt versions)

I also assume this means a device with a built-in gigabit ethernet switch?

You also didn't mention you have connect track drop errors and that the LAN 1 port changes colors to indicate a 100 Mbps connection when this occurs on a downstream switch...

Have you ruled out the gigabit switch and cabling?

Also what was causing that error?

Yes I mean a device with a built-in gigabit ethernet switch.
But actually it doesn't matter if we use an additional external gigabit switch connected with cable to the built-in switch.
About the cabling you can see this post Ipq806x NSS build (Netgear R7800 / TP-Link C2600 / Linksys EA8500) - #1878 by sppmaster
But with the ISP cheap Gigabit router I have no problems at all with cable connections and the same network setup and devices.
image
This is the speedtest result on the desktop PC (used as a file server) with the ISP router during simultaneous playing of three 4K movies on all three devices connected to the Gigabit switch with total video bitrate over 300Mbps. The desktop PC with Plex server installed on it, streams data (three 4K movies) to three 100Mbps devices (smart TV and two Android TV boxes). The WAN speed test is run simultaneously on the PC.
All other gigabit routers I've tested (Netgear R7800, TP-Link WDR4300, Belkin RT3200) simply fail with this no matter if they use OpenWrt or stock firmware.

Wait...

I edited my post before the time...and I thought my statement was gonna be factitious...but from your statements (3rd time I've read the wording that made me think), it seems feasible...so I'll put it in question form instead...

Yes/no: Are you under the impression that your download speed in the pic is ~1230.04 Mbps (i.e. 930.04 + 300 == 1230.04)?

Also, I assume the 4K movies are streaming from the same WAN as the speedtest...or are they on the fileserver...or an IPTV network/VLAN?

When a switch has only a 100 Mb link out to a device, but incoming packets for it from a 1000 port are received at a higher data rate than that, there is a problem.

What it should do is drop those packets that can't be down-converted and dispatched immediately because the outgoing port is still busy with the previous packet. TCP logic should eventually cause the source to slow down, and the rest of the network is unaffected.

What it actually does is flow-control the device on the 1000 port to stop sending. But there's no provision to say "Don't send any more packets for MAC address X" -- only "Don't send any more packets at all." So all other links from the 1000 device are also interrupted.

3 Likes

Sounds like a reasonable approach to take, when explained like this...

2 Likes

No.
It's written that the movies are streamed from PC file server hence from LAN.

I don't see a logical reason why the Gigabit device while sending data over LAN to another 100Mbps LAN device, is unable to download from WAN at Full Duplex Gigabit speed. Instead it can only download/upload at 20-30Mbps (with huge ping) which is ridiculously slow and it doesn't come close even to the Fast Ethernet era speeds.
It's completely able to do so while sending data to another Gigabit LAN device. Simply said the desktop PC can download from WAN at full 1Gbps speed while sending data at the same time to another Gigabit LAN device at 1Gbps speed. We have Full Duplex Gigabit speed as should be expected.

This issue makes Gigabit ethernet looks like a joke when a 100Mbps device is present on the LAN. Simply because if I put a 100Mbps switch the PC will still download from WAN at 100Mbps.
See another post with tests here - Ipq806x NSS build (Netgear R7800 / TP-Link C2600 / Linksys EA8500) - #1884 by sppmaster
It's really tricky when there is an additional gigabit switch (not the built-in one) connected to the router because I can see the link speed is 1000Mbps, but some clients have only 100Mbps NICs and when there is LAN traffic between the clients, this becomes a nightmare because all the clients connected to the external switch cannot download from Internet. The speed most of the time is just 5-6Mbps.
As long as the LAN traffic is stopped all three 100Mbps devices can download from WAN at 100Mbps speeds simultaneously as it is expected.

As I don't own a countless number of routers in my hands to test I suggest everyone who has a will and possibilities to try and run this tests with 100Mbps device and confirm or deny this behaviour.
I've described the steps to reproduce the issue in this post - Netgear R7800 exploration (IPQ8065, QCA9984) - #3187 by sppmaster
Several other users as can be seen in a discussion preceding and following the above post confirmed the observations.
You can see the @quarky post too - Netgear R7800 exploration (IPQ8065, QCA9984) - #3190 by quarky

I would repeat those tests, but measuring the speed between the 1Gbps device and the router (not the internet). If @mk24 is right, you will see a slow transfer speed, this proving the issue is at the switch, and it is not related to OpenWrt.

2 Likes

@eduperez Thanks for the suggestion. @Ansuel @quarky may be interested too.
Let's rock.

  1. First results are when only a single 4K TV channel plays on a TV (100Mbps - 192.168.1.116). That's a download transfer from WAN.

  1. In the second case I've added a 4K movie streamed from PC (1Gbps - 192.168.1.2) to AndroidTV box (100Mbps - 192.168.1.188). That's a LAN transfer from PC to AndroidTV.

  1. In the third case two 4K movies play on two AndroidTV boxes. The PC transfers two movie streams to both AndroidTV boxes. These are two LAN transfers from PC to two 100Mbps devices. The 4K TV channel from first case is stopped.

Tests were performed on Belkin RT3200 with latest snapshot version of OpenWrt. Software and Hardware offloading are turned on and CPU load is near zero during the tests.
Probably confirming a switch issue when there are mixed 100Mbps and 1Gbps devices connected. On the second screenshots of every case on the leftmost top and bottom windows is the data from iperf3 test between the PC 192.168.1.2 and the router 192.168.1.1. They speak for themselves.
I still think this is a software bug because I see same result on different routers having different built-in switches.
All speedtests were performed on the PC (1Gbps - 192.168.1.2).
For reference I've included the ping times to the AndroidTV boxes (IPs 116 and 188), to the router and to WAN (pinging 1.1.1.1).
There is a screen from the task manager that shows the current LAN data transfer bit-rate from the PC to the 100Mbps devices.

1 Like

This is what pause frames are for. From the problem description, I guess it's not enabled here. Check status with

ethtool --show-pause lanxx

and enable it where needed.

1 Like


What device?

root@RT3200:~# ethtool --show-pause lan1
Pause parameters for lan1:
Autonegotiate:  on
RX:             off
TX:             off
RX negotiated:  on
TX negotiated:  on

Switch and sending device primarily

root@RT3200:~# ethtool --show-pause br-lan
Pause parameters for br-lan:
Cannot get device pause settings: Not supported
root@RT3200:~# root@RT3200:~# ethtool --show-pause eth0
-ash: root@RT3200:~#: not found

I meant the switch port like you showed first. Sorry for the confusion. Bridge devices cannot support this. But the sending device has to, since that's the one which needs to pause when the 100M link is full.

root@RT3200:~# ethtool --show-pause lan1
Pause parameters for lan1:
Autonegotiate:  on
RX:             off
TX:             off
RX negotiated:  on
TX negotiated:  on

root@RT3200:~# ethtool --show-pause lan2
Pause parameters for lan2:
Autonegotiate:  on
RX:             off
TX:             off

root@RT3200:~# ethtool --show-pause lan3
Pause parameters for lan3:
Autonegotiate:  on
RX:             off
TX:             off

root@RT3200:~# ethtool --show-pause lan4
Pause parameters for lan4:
Autonegotiate:  on
RX:             off
TX:             off
RX negotiated:  on
TX negotiated:  on

I am not big on pause frames at all, try disabling them entirely.

Secondly, per port buffering on the switch is required to keep packets flowing at higher rates. If it's a global buffer on the switch, the 100Mbit backlog will hurt.

1 Like

Sure, but is that typically an option on the switches you find in some wifi home router? Asking because I don't know...

If it isn't, then what's the second best alternative? Disconnect all the 100M devices? Restrict all the other devices to 100M? Dedicate another cheap switch to 100M devices and connect it to a second NIC on the PC, at least ensuring per-port buffers there?

And the PC? That's the device which has to halt if this should do any good.

1 Like

image
II've disabled Flow Control on the PC Gbit NIC (Realtek).
Now when I run a single or double data transfer from the PC to 100Mbps device/s (case 2 and 3) I am able to download from WAN at full speed on the PC. That wasn't possible with Flow Control Enabled.
The downside of this is that now in case 3 the LAN traffic is interrupted (ping command gives an error) and movies just stop playing.
In case 2 it still manages to transfer one movie stream but the goal is to have three simultaneous LAN data streams without interruptions.
I think we are approaching the main goal and narrowing down the list of possible reasons.
From this video https://www.youtube.com/watch?v=ULSJxhfD244
I understand that now I need to check if the 802.1Qbb is supported and enabled.
Looking through the R7800 switch documentation [https://github.com/Deoptim/atheros/blob/master/QCA8337-datasheet.pdf]
I see lots of info about 802.1Q.

Simplest solution is connect another gbit switch to the gbit switch, and hang the 100Mbit devices off of the secondary switch. Buffers will accrue in that switch.

2 Likes

If you can, configure the movie server to restrict its bandwidth use to under 100 Mb, so that its packets don't cause a traffic jam in the switch.