II've disabled Flow Control on the PC Gbit NIC (Realtek).
Now when I run a single or double data transfer from the PC to 100Mbps device/s (case 2 and 3) I am able to download from WAN at full speed on the PC. That wasn't possible with Flow Control Enabled.
The downside of this is that now in case 3 the LAN traffic is interrupted (ping command gives an error) and movies just stop playing.
In case 2 it still manages to transfer one movie stream but the goal is to have three simultaneous LAN data streams without interruptions.
I think we are approaching the main goal and narrowing down the list of possible reasons.
From this video https://www.youtube.com/watch?v=ULSJxhfD244
I understand that now I need to check if the 802.1Qbb is supported and enabled.
Looking through the R7800 switch documentation [https://github.com/Deoptim/atheros/blob/master/QCA8337-datasheet.pdf]
I see lots of info about 802.1Q.
Simplest solution is connect another gbit switch to the gbit switch, and hang the 100Mbit devices off of the secondary switch. Buffers will accrue in that switch.
If you can, configure the movie server to restrict its bandwidth use to under 100 Mb, so that its packets don't cause a traffic jam in the switch.
This is my current setup (additional gigabit switch TP-Link connected to the built-in RT3200 or R7800 gigabit switch) and all the 100Mbps devices are connected to the additional TP-Link switch. Doesn't work well.
I'll have to find out if that is possible. I use that PC for other tasks too and preferably want a full WAN speed.
Or as others suggested you need a switch dedicated to doing the 1000->100 conversion for the problem device. That should be the only thing that switch does, and the only device connected to it. It's going to jam when the movies start but the point is it should not disrupt the whole network.
Configure for no flow control between the two switches, if possible. The extra packets the movie server is trying to send need to be dropped.
I was talking about the movie server application running within the PC doing other things should have an internal bandwidth limit.
I have to add once again here that WLAN devices and other Gigabit devices connected with cable to the built-in switch can download at full WAN speed as long as they do not transfer any data over LAN to another 100Mbps device/s. The bottleneck is only between the Gigabit client connected to a LAN port 1 and other LAN port 2 that the 100Mbps device is connected to and only when there is simultaneous WAN and LAN traffic between the clients/ports.
I'll check but three 4K movies total to more than 200-250Mbps sometimes.
And the main issue is that if only one movie is streamed over LAN, the other 100Mbps AndroidTV box that has HBOMax cannot play another movie (from HBOMax) because the WAN speed on this device suffers a lot.
The setup should allow three TVs/AndroidTV boxes to be able to play simultaneously three mixed WAN/LAN streams (HBOMax, YouTube any other video streaming service as WAN transfers and 1 or 2 LAN streams) without WAN performance loss. Currently this is not possible with OpenWrt routers I've tried. But it is really possible with ISP cheap gigabit router that unfortunately doesn't have good WLAN and I want OpenWrt to support my networks.
It would seem to me that @mk24 is right on the money here.
Would the solution be to disable the switch flow-control setting for the LAN port connected to the gigabit server?
But as @dtaht stated, it could swamp the switch buffers as the switch cannot send the frames fast enough to the 100mbps devices. I would have thought that the TCP protocol would have throttle the traffic between the gigbait server and the 100mbit client.
Edit: It would seem to be that one of the solution would be, as @mk24 suggested, that the streaming server application to limit the per stream bandwidth. So if it is able to limit the bandwidth of a 4K stream to say 50mbits, a gigabit server should be able to serve maybe 15 clients (all on different 100mbps) ports, without affecting it's download speed, as the switch will not be exhausting it's buffer?
I am using a mix of Newifi D2 (gigabit) and a lot of TL-WR740N (fast ethernet). They have been working fine for years and I don't observe such issue. Maybe the issue does not present on these 2 models.
In cases like this where you have mixed speed devices on the switch, it's not surprising that flow control is causing problems. It should generally NOT be turned on - for proper rate control and bandwidth sharing you want packets to be dropped if there is not enough bandwidth for them, you don't want pause frames being sent that may obstruct unrelated traffic. If a 1Gbps device is sending packets to a 100 Mbps device, you don't want the switch to pause the traffic from the 1 Gbps device when the 100 Mbps port output buffer fills up, because there's still bandwidth available for it to send to other ports on the switch, which it can't when it's paused. When the 100 Mbps output buffers fill up, the excess packets should just get dropped and TCP will know to limit the transmit rate accordingly.
I'm not sure if there's an easy way to turn off flow control on all ports in OpenWRT? But likely it should be disabled by default.
Edit: I've added this into /etc/rc.local (System - Startup in LuCI) on my E8450 to do this on all ports on startup, not sure if there is a better way to do it. This requires the ethtool package is installed, obviously:
ethtool -A lan1 autoneg off rx off tx off
ethtool -A lan2 autoneg off rx off tx off
ethtool -A lan3 autoneg off rx off tx off
ethtool -A lan4 autoneg off rx off tx off
ethtool -A wan autoneg off rx off tx off
Would UDP traffic swamp the switch buffer causing the same issue?
Yes, if something blasts UDP at a device faster than its link can handle it, the switch will end up dropping some packets. In that case it's up to the application layer to figure out it is sending too fast.
The problem with flow control is it doesn't just make the slow device suffer with packet loss, but makes everyone else suffer because their traffic is getting stalled out, potentially even when the slow device is not involved with those streams.
On the E8450 and RT3200, the CPU port into the switch also has flow control enabled. Though it is running at 2.5 Gbps, so it's less likely to run into problems - but it could still be an issue depending on how exactly the switch manages flow control. It appears that can't be controlled using ethtool because it's not auto-negotiated, but specified in the device tree in the fixed-link settings between the CPU and the switch, so it would likely need to be changed there.
Maybe a router in between the main router and the 100mbps devices, with the 100mbps devices in a different sub-net could work-around the issue that @sppmaster is facing? I figured the router should have sufficient memory and capabilities to throttle the network traffic enough to not swamp the router's switches connected to the 100mbps devices.
A router in between would probably work, but would likely be overkill. All that would really be doing is breaking the flow control pause frame propagation between the router's switch and the downstream switch, and just disabling flow control on the router ports would achieve the same thing.
I looked at the programming guide for the MT7531 switch used on the E8450 and RT3200, but it doesn't really describe how its flow control algorithm works other than what bits are used to turn it on and off. However it's probably safe to say that in most of these basic switch devices in home routers, it is pretty dumb and will result in pause frames being sent on upstream ports when a downstream port's transmit queue fills up - which could be due to it being a slower link speed, or due to it being connected to another switch which is sending out pause frames of its own. You can end up with one slow device having unnecessary ripple effects across the entire network.
Ethernet flow control should really only be enabled in specific cases where you know all devices in use are going to handle it intelligently, and there are no slow devices on the network to clog up the works. On a typical home network it's just likely to cause issues.
Thanks for your comments.
I've deployed my R7800 back in service now.
On it I can only get info from ethtool about eth0 and eth1
I cannot use LAN1, 2, etc.
R7800 currently is still using swconfig for the switches. Typically eth1 is the LAN switch port GMAC, so disable flow control for that.
root@R7800:~# ethtool -A eth1 autoneg off rx off tx off
rx unmodified, ignoring
tx unmodified, ignoring
The command is not accepted.
root@R7800:~# ethtool --show-pause eth1
Pause parameters for eth1:
Autonegotiate: on
RX: off
TX: off
Are you running your R7800 on any of the NSS accelerated builds? If you are, and hopefully the SSDK Shell are also compiled it, you can use the ssdk_sh
utility to change just about anything for the qca8337
switch parameters.
I'm currently using ACwifidude NSS build.