Severe WAN performance degradation when Fast Ethernet devices are connected to a Gigabit switch and transfer data over LAN (probably present on most OpenWrt versions)

Wait...

I edited my post before the time...and I thought my statement was gonna be factitious...but from your statements (3rd time I've read the wording that made me think), it seems feasible...so I'll put it in question form instead...

Yes/no: Are you under the impression that your download speed in the pic is ~1230.04 Mbps (i.e. 930.04 + 300 == 1230.04)?

Also, I assume the 4K movies are streaming from the same WAN as the speedtest...or are they on the fileserver...or an IPTV network/VLAN?

When a switch has only a 100 Mb link out to a device, but incoming packets for it from a 1000 port are received at a higher data rate than that, there is a problem.

What it should do is drop those packets that can't be down-converted and dispatched immediately because the outgoing port is still busy with the previous packet. TCP logic should eventually cause the source to slow down, and the rest of the network is unaffected.

What it actually does is flow-control the device on the 1000 port to stop sending. But there's no provision to say "Don't send any more packets for MAC address X" -- only "Don't send any more packets at all." So all other links from the 1000 device are also interrupted.

3 Likes

Sounds like a reasonable approach to take, when explained like this...

2 Likes

No.
It's written that the movies are streamed from PC file server hence from LAN.

I don't see a logical reason why the Gigabit device while sending data over LAN to another 100Mbps LAN device, is unable to download from WAN at Full Duplex Gigabit speed. Instead it can only download/upload at 20-30Mbps (with huge ping) which is ridiculously slow and it doesn't come close even to the Fast Ethernet era speeds.
It's completely able to do so while sending data to another Gigabit LAN device. Simply said the desktop PC can download from WAN at full 1Gbps speed while sending data at the same time to another Gigabit LAN device at 1Gbps speed. We have Full Duplex Gigabit speed as should be expected.

This issue makes Gigabit ethernet looks like a joke when a 100Mbps device is present on the LAN. Simply because if I put a 100Mbps switch the PC will still download from WAN at 100Mbps.
See another post with tests here - Ipq806x NSS build (Netgear R7800 / TP-Link C2600 / Linksys EA8500) - #1884 by sppmaster
It's really tricky when there is an additional gigabit switch (not the built-in one) connected to the router because I can see the link speed is 1000Mbps, but some clients have only 100Mbps NICs and when there is LAN traffic between the clients, this becomes a nightmare because all the clients connected to the external switch cannot download from Internet. The speed most of the time is just 5-6Mbps.
As long as the LAN traffic is stopped all three 100Mbps devices can download from WAN at 100Mbps speeds simultaneously as it is expected.

As I don't own a countless number of routers in my hands to test I suggest everyone who has a will and possibilities to try and run this tests with 100Mbps device and confirm or deny this behaviour.
I've described the steps to reproduce the issue in this post - Netgear R7800 exploration (IPQ8065, QCA9984) - #3187 by sppmaster
Several other users as can be seen in a discussion preceding and following the above post confirmed the observations.
You can see the @quarky post too - Netgear R7800 exploration (IPQ8065, QCA9984) - #3190 by quarky

I would repeat those tests, but measuring the speed between the 1Gbps device and the router (not the internet). If @mk24 is right, you will see a slow transfer speed, this proving the issue is at the switch, and it is not related to OpenWrt.

2 Likes

@eduperez Thanks for the suggestion. @Ansuel @quarky may be interested too.
Let's rock.

  1. First results are when only a single 4K TV channel plays on a TV (100Mbps - 192.168.1.116). That's a download transfer from WAN.

  1. In the second case I've added a 4K movie streamed from PC (1Gbps - 192.168.1.2) to AndroidTV box (100Mbps - 192.168.1.188). That's a LAN transfer from PC to AndroidTV.

  1. In the third case two 4K movies play on two AndroidTV boxes. The PC transfers two movie streams to both AndroidTV boxes. These are two LAN transfers from PC to two 100Mbps devices. The 4K TV channel from first case is stopped.

Tests were performed on Belkin RT3200 with latest snapshot version of OpenWrt. Software and Hardware offloading are turned on and CPU load is near zero during the tests.
Probably confirming a switch issue when there are mixed 100Mbps and 1Gbps devices connected. On the second screenshots of every case on the leftmost top and bottom windows is the data from iperf3 test between the PC 192.168.1.2 and the router 192.168.1.1. They speak for themselves.
I still think this is a software bug because I see same result on different routers having different built-in switches.
All speedtests were performed on the PC (1Gbps - 192.168.1.2).
For reference I've included the ping times to the AndroidTV boxes (IPs 116 and 188), to the router and to WAN (pinging 1.1.1.1).
There is a screen from the task manager that shows the current LAN data transfer bit-rate from the PC to the 100Mbps devices.

1 Like

This is what pause frames are for. From the problem description, I guess it's not enabled here. Check status with

ethtool --show-pause lanxx

and enable it where needed.

1 Like


What device?

root@RT3200:~# ethtool --show-pause lan1
Pause parameters for lan1:
Autonegotiate:  on
RX:             off
TX:             off
RX negotiated:  on
TX negotiated:  on

Switch and sending device primarily

root@RT3200:~# ethtool --show-pause br-lan
Pause parameters for br-lan:
Cannot get device pause settings: Not supported
root@RT3200:~# root@RT3200:~# ethtool --show-pause eth0
-ash: root@RT3200:~#: not found

I meant the switch port like you showed first. Sorry for the confusion. Bridge devices cannot support this. But the sending device has to, since that's the one which needs to pause when the 100M link is full.

root@RT3200:~# ethtool --show-pause lan1
Pause parameters for lan1:
Autonegotiate:  on
RX:             off
TX:             off
RX negotiated:  on
TX negotiated:  on

root@RT3200:~# ethtool --show-pause lan2
Pause parameters for lan2:
Autonegotiate:  on
RX:             off
TX:             off

root@RT3200:~# ethtool --show-pause lan3
Pause parameters for lan3:
Autonegotiate:  on
RX:             off
TX:             off

root@RT3200:~# ethtool --show-pause lan4
Pause parameters for lan4:
Autonegotiate:  on
RX:             off
TX:             off
RX negotiated:  on
TX negotiated:  on

I am not big on pause frames at all, try disabling them entirely.

Secondly, per port buffering on the switch is required to keep packets flowing at higher rates. If it's a global buffer on the switch, the 100Mbit backlog will hurt.

1 Like

Sure, but is that typically an option on the switches you find in some wifi home router? Asking because I don't know...

If it isn't, then what's the second best alternative? Disconnect all the 100M devices? Restrict all the other devices to 100M? Dedicate another cheap switch to 100M devices and connect it to a second NIC on the PC, at least ensuring per-port buffers there?

And the PC? That's the device which has to halt if this should do any good.

1 Like

image
II've disabled Flow Control on the PC Gbit NIC (Realtek).
Now when I run a single or double data transfer from the PC to 100Mbps device/s (case 2 and 3) I am able to download from WAN at full speed on the PC. That wasn't possible with Flow Control Enabled.
The downside of this is that now in case 3 the LAN traffic is interrupted (ping command gives an error) and movies just stop playing.
In case 2 it still manages to transfer one movie stream but the goal is to have three simultaneous LAN data streams without interruptions.
I think we are approaching the main goal and narrowing down the list of possible reasons.
From this video https://www.youtube.com/watch?v=ULSJxhfD244
I understand that now I need to check if the 802.1Qbb is supported and enabled.
Looking through the R7800 switch documentation [https://github.com/Deoptim/atheros/blob/master/QCA8337-datasheet.pdf]
I see lots of info about 802.1Q.

Simplest solution is connect another gbit switch to the gbit switch, and hang the 100Mbit devices off of the secondary switch. Buffers will accrue in that switch.

2 Likes

If you can, configure the movie server to restrict its bandwidth use to under 100 Mb, so that its packets don't cause a traffic jam in the switch.

This is my current setup (additional gigabit switch TP-Link connected to the built-in RT3200 or R7800 gigabit switch) and all the 100Mbps devices are connected to the additional TP-Link switch. Doesn't work well.

I'll have to find out if that is possible. I use that PC for other tasks too and preferably want a full WAN speed.

Or as others suggested you need a switch dedicated to doing the 1000->100 conversion for the problem device. That should be the only thing that switch does, and the only device connected to it. It's going to jam when the movies start but the point is it should not disrupt the whole network.

Configure for no flow control between the two switches, if possible. The extra packets the movie server is trying to send need to be dropped.

I was talking about the movie server application running within the PC doing other things should have an internal bandwidth limit.

1 Like