Severe WAN performance degradation when Fast Ethernet devices are connected to a Gigabit switch and transfer data over LAN (probably present on most OpenWrt versions)

Maybe a router in between the main router and the 100mbps devices, with the 100mbps devices in a different sub-net could work-around the issue that @sppmaster is facing? I figured the router should have sufficient memory and capabilities to throttle the network traffic enough to not swamp the router's switches connected to the 100mbps devices.

A router in between would probably work, but would likely be overkill. All that would really be doing is breaking the flow control pause frame propagation between the router's switch and the downstream switch, and just disabling flow control on the router ports would achieve the same thing.

I looked at the programming guide for the MT7531 switch used on the E8450 and RT3200, but it doesn't really describe how its flow control algorithm works other than what bits are used to turn it on and off. However it's probably safe to say that in most of these basic switch devices in home routers, it is pretty dumb and will result in pause frames being sent on upstream ports when a downstream port's transmit queue fills up - which could be due to it being a slower link speed, or due to it being connected to another switch which is sending out pause frames of its own. You can end up with one slow device having unnecessary ripple effects across the entire network.

Ethernet flow control should really only be enabled in specific cases where you know all devices in use are going to handle it intelligently, and there are no slow devices on the network to clog up the works. On a typical home network it's just likely to cause issues.

2 Likes

Thanks for your comments.
I've deployed my R7800 back in service now.
On it I can only get info from ethtool about eth0 and eth1
I cannot use LAN1, 2, etc.

R7800 currently is still using swconfig for the switches. Typically eth1 is the LAN switch port GMAC, so disable flow control for that.

1 Like

@robhancock @quarky

root@R7800:~# ethtool -A eth1 autoneg off rx off tx off
rx unmodified, ignoring
tx unmodified, ignoring

The command is not accepted.

root@R7800:~# ethtool --show-pause eth1
Pause parameters for eth1:
Autonegotiate:  on
RX:             off
TX:             off

Are you running your R7800 on any of the NSS accelerated builds? If you are, and hopefully the SSDK Shell are also compiled it, you can use the ssdk_sh utility to change just about anything for the qca8337 switch parameters.

I'm currently using ACwifidude NSS build.

I checked @ACwifidude repo. It doesn't include SSDK. My own personal builds do include the SSDK utility.

With the help of @quarky I managed to turn off Flow Control of the R7800 switch.

Unfortunately it just helped with WAN speed but at the expense of LAN traffic disruption

So in this situation it is a chose only one option situation - WAN or LAN. Both are not possible at the same time.
That will not allow what I can achieve with ISP router, but it has a terrible WLAN an of course has nothing that OpenWrt gives me.

It is interesting that there are similar performance issues on the Mikrotik forum when using mixed 100Mbps and 1Gbps devices. First one is here https://forum.mikrotik.com/viewtopic.php?t=185253

I've even considered possibly to get a Mikrotik RB5009UG+S+IN as it has a 10G SFP+ cage port but there is similar WAN performance issue although the users complain when they mix 2.5G and 1G devices.
https://forum.mikrotik.com/viewtopic.php?p=920760#p895221
So spending a sum of money to only see the same issue with 100Mbps devices maybe is not so good choice.

I guess your options are:

  1. get gbps-capable end devices
  2. get a better switch with per port queues (might help)
  3. try to fudge something with cheap gigabit switches isolating each of the 100 Mbps end devices
  4. try to rate limit the 100 Mbps devices so that they never trigger the pause frame generation.

Any other options?

P.S.: Have you tried just disabling pause frame processing on the server?

1 Like
  1. I cannot do anything about it - TV manufacturer put 100Mbps NIC and other devices are from ISP for the TV service (they have 100Mbps NICs too).
    2, 3 and 4 - I don't think I'll bother with them. I'm looking for one device solution for neat environment.

I've tried here. Same result as in my previous post - WAN traffic at full speed at the expense of LAN traffic disruption.

The only thing that I don't understand is how another manufacturer (I talk about the ISP router) can resolve the same situation and what they used to overcome the issue I face when mixed devices are connected to the switch. It's strange that big and famous manufacturers cannot resolve this. And as we all see there are still a lot of Fast Ethernet devices around.

I guess there are more options, since you know the IP addresses of the problematic machines, you could implement traffic shapers on the server that make sure these three? hosts will only ever be serviced at 94Mbps or so, which if set correctly should avoid tickling the switch to generate pause frames. (The problem is that pause frames operate per link, not per flow/connection/destination, so to avoid running into this issue you need to make sure you do not trigger the issue).

Add a multiport NIC to the server and connect each of the offending devices directly with one NIC port.

1 Like

Today I had some spare time and devoted it to testing this further for almost three hours.
You can see here that when Flow Control was disabled on the R7800 built-in switch I couldn't get uninterrupted LAN traffic.
In the next tests I've used R7800 with default switch settings (Flow Control is Enabled).
This time I decided to disable the Flow Control of the PC NIC (Realtek) as I posted here.
As you can see from the post I didn't get the uninterrupted LAN traffic with Belkin RT3200.
For my latest tests I've used R7800 but I've disabled Flow Control of the PC Realtek NIC. As @moeller0 suggested

To my surprise this time I was able to get full WAN speed on the PC while it was streaming the same 4K movie stream (I've used for the previous tests) simultaneously to three 100Mbps devices (two 4K AndroidTV boxes and a 4K Smart TV) connected to second gigabit switch and one Laptop, connected as a 100Mbps device to the R7800 built-in switch, which was running iperf3 session to the PC server.
This way I had four simultaneous LAN transfers all from the PC to four different 100Mbps devices. All LAN traffic was going almost completely uninterrupted. I had only 4 or 5 single occasional ping losses to the 100Mbps devices for more than 30 minutes while I was running numerous (more than 50) WAN speed tests on the PC at full WAN speed. I didn't observe any LAN traffic interruption during this long test session.
With this setup I can say that my task is 99% possible.
I doubt that I can surely say the issue is completely resolved only because there are different hardware combinations that still cannot complete the goal of uninterrupted WAN/LAN traffic at full duplex speeds.
Any other thoughts from anyone here on the subject?

Reading this Hamlet's question https://blogs.cisco.com/perspectives/to-flow-or-not-to-flow
and this one http://rjapproves.com/netapp-vs-vmware-flow-control-dilemma/
I knew at least that Flow Control when enabled/disabled can give different results depending on the specific configuration and use case.
The question remains that for some routers as R7800 there is no way to disable the Flow Control of the built-in switch and more importantly are there other ways to workaround this situation.

LAN traffic will necessarily be interrupted since the speedtest or movie server is throwing data toward the 100 Mb device at 1 Gb. The switch's options when that happens is to either flow-control the source or drop most of the packets that are arriving faster than they can be sent out.

The general principle of the Internet is in case of a bottleneck somewhere along the way, packets will be dropped. There's nothing else to do with them. There are more sophisticated schemes (not found in consumer-grade switches) to also notify the source that specific packets are being dropped, and it should please slow down sending on that connection.

2 Likes

DCTCP with CE-marking switches perhaps?

1 Like

Any form of ECN will work, no need for DCTCP! Aggh, @moeller0...

1 Like

Didn't know there are switches with rfc3168 marking behaviour.

WRED is commonly available on mid-to-enterprise grade switches. The brick wall RED configuration for DCTCP I have no idea if anyone uses, and while we've been disparaging about RED, it does work somewhat.

I want to say here that a working workaround for me is to disable Flow Control of the Desktop PC server gigabit NIC. At least when using R7800.
This is for the people that experience the same performance degradation as in my case.

2 Likes