10Mb client to a cheap 1Gb switch => HOL blocking?

Dear everyone,

apologies for an OT question. This is only marginally/potentially related to OpenWRT. But it is related tightly to cheap switch hardware. For the lack of a better forum, I'm asking here, because this place is blessed with people who have their hands dirty with this tech.

This hasn't happened to me personally, I've been asked this question in a forum... but I'm highly curious if there is an explanation, or at least a confirmation of this issue.
The problem can be formulated thusly:

Consider a cheap dumb gigabit SoHo switch, or a WiFi AP with a small switch subsystem built in. Under normal circumstances, the switch is capable of its gigabit at wire speed, and the router CPU can do several hundred Mbps.
The switch is in operation, everything works, users are doing their internet stuff. If upstream bandwidth allows, hundreds of Mbps are flying.
And then suddenly, someone plugs in an Ethernet client device, that links at 10 Mbps. In my particular example, a computer gets shut down or suspended, with the LAN port kept powered and linked for WoL. The NIC only shifts down to 10 Mbps to save StandBy power.
At that moment when a port on the switch links at 10 Mbps, everybody else's TRAFFIC SLOWS DOWN to a 10 Mbps crawl.

There's no big volume of multicasts, the sleeping PC does not communicate at its NIC at all, there's no apparent reason for the 10Mbps port to unleash 802.3x flow control / head-of-line blocking.

I didn't want to believe it, and had no explanation (except it smells of 802.3x flow control), but then I've found another report of this.

The fresh case reported to me involves a Mikrotik hAP AX3, based on the IPQ6010 - not sure if the switch is integrated in the SoC.

The behavior doesn't make any sense to me, other than that it's a bug or a serious deficiency in the switch silicon. This should be taken care of by queuing per port on egress. In general, most switch chipset datasheets claim that the matrix is non-blocking. Not sure about the IPQ6010.

Have you met this before, anyone?
Any ideas are welcome :slight_smile:

I do not need workarounds, I can come up with some myselfs.
I would just appreciate confirmation of the issue, if it rings a bell.
Thanks for your attention :slight_smile:

There have been two other recent threads on this general topic... Seems like a misconfiguration of the switch chip.

Thank you - for confirming that this is real, and for this sort of polite response, from a moderator, to an OT question :slight_smile:

The symptoms are making me wonder, if this "feature" was intended to be somehow useful in practical devices. (Analogous to the stolen VLAN in an IPQ4018.) And if this "innovation" could be contagious across vendors, or is a default feature in the chipmaker's firmware / default config that noone cares much about... Or if it's just some sloppy silicon design shortcut (buffering arrangement?) that cannot run different ports at asynchronous transfer rates.

Referring to one of the other threads, yes I did suggest to my "other original poster" to turn off anything that smells of 802.3x flow control about the switch. Reportedly, turning it off at the culprit port doesn't help. Also, there are other anecdotal reports claiming that the thing can even spread between boxes - if you add an external switch, and plug the slow device there, the symptom is still there in the WiFi AP - but I wasn't able to verify if disabling 802.3x in such scenaria does or doesn't help. (I guess 802.3x could be the only way, how this could spread between switch matrices.) So, one of the workarounds may be, "insert a gigabit switch that doesn't do this" :slight_smile:

I don't know what is happening under the hood, but I do know that the generally accepted and expected behavior of modern switches is that each port should negotiate with its peer and slower ports should not impact the faster ports. That would be non-blocking behavior.

Honestly, this is the best I can offer at this point. :laughing:

But on the other side of this, although I don't really have any useful information or suggestions (beyond using another gigabit switch), you may look at that host that is dropping down to 10Mbps and see if you can get the port to behave better. Presumably that means disabling the ethernet energy savings and/or having the ethernet turn off entirely when the host goes to sleep.