Severe WAN performance degradation when Fast Ethernet devices are connected to a Gigabit switch and transfer data over LAN (probably present on most OpenWrt versions)

I've filed an issue regarding this performance degradation on Github - https://github.com/openwrt/openwrt/issues/9985
Tracking continues here https://github.com/openwrt/openwrt/issues/9202

I found out a new unpleasant issue. WAN slowdown issue (bug) when there is simultaneous LAN traffic between 1Gbps device and 100Mbps device.
Summary of the problem - When a 100Mbps client/s (Fast Ethernet) is/are connected to a router gigabit switch and there is a LAN data transfer between a Gigabit device (desktop PC in my case) and at least one 100Mbps device, the desktop PC (gigabit device) can only download/upload to WAN at really low speeds. The WAN performance drop to just 20-30Mbps download/upload speed with huge ping.

All details are given in several threads on the forum. Starting with this one.

I've tried this with several gigabit routers - Netgear R7800, TP-Link WDR4300, Belkin RT3200. With all of them the results are completely repeatable using the same bench test setup.
I've tried several OpenWRT versions with default and custom configurations. The issue is present on all stock firmwares too and they are based on older OpenWRT code.

No such loss of performance can be observed when only Gigabit clients are connected to the gigabit switch. Instead we achieve Gigabit Full Duplex speeds as can be expected - simultaneous 1Gbps download and 1Gbps upload WAN/LAN performance no matter of the LAN traffic.

2 Likes

Not to diminish your issue, but I think your claim "All OpenWrt versions" (I corrected the spelling to the desired capitalization) requires more evidence than just a few examples, also just claiming what is backed by evidence "several router models and OpenWrt versions" is already severe enough.

Same for "The issue is present on all stock firmwares too"...

But to summarize: you observe that when an end device with a gigabit link to the switch ports "talks" to a fastethernet device on the same switch all communications between that device the switch get downgraded to fastethernet speeds? Does this also affect other devices that communicate with gigabit speeds over different switch ports?

4 Likes

I've probably not explained it using the proper wording. I've started with editing the title.
Under All OpenWrt versions I've meant - all versions I've tried, and that is not exactly correct. Same for the stock firmwares. I hope it's a bit clearer now.
Back to the WAN/LAN performance drop issue.
I've explained the issue in great details in several other posts on the forum. I don't know if it is suitable to copy and paste the same information here again.
I think that this issue is significant and deserves its own thread.
So here it is what I've found so far.
A desktop PC is connected at 1Gbps to the R7800 LAN port 4. There is a Plex server installed on the desktop PC.
When a 4K movie with bitrate ~70Mbps is played on the TV box and on the 4K TV set, the Internet (WAN) download/upload speeds on the desktop PC connected by cable drop dramatically from 940/680 Mbps with 1ms ping to just anything between 25-350/100-200Mbps. The speedtest results are really low and inconsistent (varying hugely from as low as 25Mbps to 350Mbps) with high ping from 50 to over 350 ms.
I can hardly reach the speeds on the below picture during a LAN traffic from a desktop PC streaming two 4K movies with video bitrate around 70Mbps to Android TV (playing the movie) and Android Set-top box.

sppmaster_0-1649420207833.png

I can reproduce this using iperf3 to create LAN traffic between two or more LAN clients.

If I connect a Laptop with 1Gbps connection instead of 100Mbps device, the WAN slowdown doesn't occur.

But it does occur when I connect the same Laptop but with 100Mbps cable (4 wires).

I have this WAN down/up speeds limitation only during active LAN transfer/s between a PC and another 100Mbps LAN client/s. You need simultaneous WAN and LAN transfers. I use iperf3 to better shape the scenario.
Otherwise this doesn't occur.

I also assume this means a device with a built-in gigabit ethernet switch?

You also didn't mention you have connect track drop errors and that the LAN 1 port changes colors to indicate a 100 Mbps connection when this occurs on a downstream switch...

Have you ruled out the gigabit switch and cabling?

Also what was causing that error?

Yes I mean a device with a built-in gigabit ethernet switch.
But actually it doesn't matter if we use an additional external gigabit switch connected with cable to the built-in switch.
About the cabling you can see this post Ipq806x NSS build (Netgear R7800 / TP-Link C2600 / Linksys EA8500) - #1878 by sppmaster
But with the ISP cheap Gigabit router I have no problems at all with cable connections and the same network setup and devices.
image
This is the speedtest result on the desktop PC (used as a file server) with the ISP router during simultaneous playing of three 4K movies on all three devices connected to the Gigabit switch with total video bitrate over 300Mbps. The desktop PC with Plex server installed on it, streams data (three 4K movies) to three 100Mbps devices (smart TV and two Android TV boxes). The WAN speed test is run simultaneously on the PC.
All other gigabit routers I've tested (Netgear R7800, TP-Link WDR4300, Belkin RT3200) simply fail with this no matter if they use OpenWrt or stock firmware.

Wait...

I edited my post before the time...and I thought my statement was gonna be factitious...but from your statements (3rd time I've read the wording that made me think), it seems feasible...so I'll put it in question form instead...

Yes/no: Are you under the impression that your download speed in the pic is ~1230.04 Mbps (i.e. 930.04 + 300 == 1230.04)?

Also, I assume the 4K movies are streaming from the same WAN as the speedtest...or are they on the fileserver...or an IPTV network/VLAN?

When a switch has only a 100 Mb link out to a device, but incoming packets for it from a 1000 port are received at a higher data rate than that, there is a problem.

What it should do is drop those packets that can't be down-converted and dispatched immediately because the outgoing port is still busy with the previous packet. TCP logic should eventually cause the source to slow down, and the rest of the network is unaffected.

What it actually does is flow-control the device on the 1000 port to stop sending. But there's no provision to say "Don't send any more packets for MAC address X" -- only "Don't send any more packets at all." So all other links from the 1000 device are also interrupted.

3 Likes

Sounds like a reasonable approach to take, when explained like this...

2 Likes

No.
It's written that the movies are streamed from PC file server hence from LAN.

I don't see a logical reason why the Gigabit device while sending data over LAN to another 100Mbps LAN device, is unable to download from WAN at Full Duplex Gigabit speed. Instead it can only download/upload at 20-30Mbps (with huge ping) which is ridiculously slow and it doesn't come close even to the Fast Ethernet era speeds.
It's completely able to do so while sending data to another Gigabit LAN device. Simply said the desktop PC can download from WAN at full 1Gbps speed while sending data at the same time to another Gigabit LAN device at 1Gbps speed. We have Full Duplex Gigabit speed as should be expected.

This issue makes Gigabit ethernet looks like a joke when a 100Mbps device is present on the LAN. Simply because if I put a 100Mbps switch the PC will still download from WAN at 100Mbps.
See another post with tests here - Ipq806x NSS build (Netgear R7800 / TP-Link C2600 / Linksys EA8500) - #1884 by sppmaster
It's really tricky when there is an additional gigabit switch (not the built-in one) connected to the router because I can see the link speed is 1000Mbps, but some clients have only 100Mbps NICs and when there is LAN traffic between the clients, this becomes a nightmare because all the clients connected to the external switch cannot download from Internet. The speed most of the time is just 5-6Mbps.
As long as the LAN traffic is stopped all three 100Mbps devices can download from WAN at 100Mbps speeds simultaneously as it is expected.

As I don't own a countless number of routers in my hands to test I suggest everyone who has a will and possibilities to try and run this tests with 100Mbps device and confirm or deny this behaviour.
I've described the steps to reproduce the issue in this post - Netgear R7800 exploration (IPQ8065, QCA9984) - #3187 by sppmaster
Several other users as can be seen in a discussion preceding and following the above post confirmed the observations.
You can see the @quarky post too - Netgear R7800 exploration (IPQ8065, QCA9984) - #3190 by quarky

I would repeat those tests, but measuring the speed between the 1Gbps device and the router (not the internet). If @mk24 is right, you will see a slow transfer speed, this proving the issue is at the switch, and it is not related to OpenWrt.

2 Likes

@eduperez Thanks for the suggestion. @Ansuel @quarky may be interested too.
Let's rock.

  1. First results are when only a single 4K TV channel plays on a TV (100Mbps - 192.168.1.116). That's a download transfer from WAN.

  1. In the second case I've added a 4K movie streamed from PC (1Gbps - 192.168.1.2) to AndroidTV box (100Mbps - 192.168.1.188). That's a LAN transfer from PC to AndroidTV.

  1. In the third case two 4K movies play on two AndroidTV boxes. The PC transfers two movie streams to both AndroidTV boxes. These are two LAN transfers from PC to two 100Mbps devices. The 4K TV channel from first case is stopped.

Tests were performed on Belkin RT3200 with latest snapshot version of OpenWrt. Software and Hardware offloading are turned on and CPU load is near zero during the tests.
Probably confirming a switch issue when there are mixed 100Mbps and 1Gbps devices connected. On the second screenshots of every case on the leftmost top and bottom windows is the data from iperf3 test between the PC 192.168.1.2 and the router 192.168.1.1. They speak for themselves.
I still think this is a software bug because I see same result on different routers having different built-in switches.
All speedtests were performed on the PC (1Gbps - 192.168.1.2).
For reference I've included the ping times to the AndroidTV boxes (IPs 116 and 188), to the router and to WAN (pinging 1.1.1.1).
There is a screen from the task manager that shows the current LAN data transfer bit-rate from the PC to the 100Mbps devices.

1 Like

This is what pause frames are for. From the problem description, I guess it's not enabled here. Check status with

ethtool --show-pause lanxx

and enable it where needed.

1 Like


What device?

root@RT3200:~# ethtool --show-pause lan1
Pause parameters for lan1:
Autonegotiate:  on
RX:             off
TX:             off
RX negotiated:  on
TX negotiated:  on

Switch and sending device primarily

root@RT3200:~# ethtool --show-pause br-lan
Pause parameters for br-lan:
Cannot get device pause settings: Not supported
root@RT3200:~# root@RT3200:~# ethtool --show-pause eth0
-ash: root@RT3200:~#: not found

I meant the switch port like you showed first. Sorry for the confusion. Bridge devices cannot support this. But the sending device has to, since that's the one which needs to pause when the 100M link is full.

root@RT3200:~# ethtool --show-pause lan1
Pause parameters for lan1:
Autonegotiate:  on
RX:             off
TX:             off
RX negotiated:  on
TX negotiated:  on

root@RT3200:~# ethtool --show-pause lan2
Pause parameters for lan2:
Autonegotiate:  on
RX:             off
TX:             off

root@RT3200:~# ethtool --show-pause lan3
Pause parameters for lan3:
Autonegotiate:  on
RX:             off
TX:             off

root@RT3200:~# ethtool --show-pause lan4
Pause parameters for lan4:
Autonegotiate:  on
RX:             off
TX:             off
RX negotiated:  on
TX negotiated:  on

I am not big on pause frames at all, try disabling them entirely.

Secondly, per port buffering on the switch is required to keep packets flowing at higher rates. If it's a global buffer on the switch, the 100Mbit backlog will hurt.

1 Like

Sure, but is that typically an option on the switches you find in some wifi home router? Asking because I don't know...

If it isn't, then what's the second best alternative? Disconnect all the 100M devices? Restrict all the other devices to 100M? Dedicate another cheap switch to 100M devices and connect it to a second NIC on the PC, at least ensuring per-port buffers there?

And the PC? That's the device which has to halt if this should do any good.

1 Like