Hello,
I have pretty weird issue.
I installed openwrt (19.07.6) on TL-WDR3600, v1.4. All went fine, until I replaced my current router.
When I start uploading something big towards internet, pretty soon after the upload is started, the router starts to flood ALL network ports (including WAN) with pause frames, and never stops, until I unplug the WAN port.
When I unplug it, it stops and If I try upload again, it again starts flooding.
This effectively disables my entire network + it also effects the network before the router. Basically all is being flooded with pause frames and freezes.
I assumed faulty hardware and tried with another TL-WDR3600, same v1.4, the same behavior.
Then I suspected vendor, and flashed one MikroTik: RB951G-2HnD, which is basically the same atheros chipset, still the same behavior.
I have the radios switched off and using only cables.
Can anyone give me some clues, what might be causing the problem?
I currently have TL-WDR3600, v1.1, which is running pretty old version of dd-wrt (from year 2014), and I have no such issues with it. I also never tested with it, because I do not want to break the only working router.
Hello,
Thank you for your reply. Some of your questions are answered in the initial description, but I agree it could have been more clear.
Here is how the network works
While the router is flooding with pause frames, also bot IPTV boxes get disconnected + the LAN has no connectivity to the router (no ping, na arp resolution), because the router is flooding with pause frames on all its ports.
Unplugging the WAN port of the router for a few seconds, stops the flood with pause frames and all goes back to normal, until the next time in which I try to upload something to internet.
To be honest I did not spent a lot of time investigating if the donwnloads have similar effect.
At first I thought that the ISP is sending me these frames, but today I tested the following:
Hooked a laptop in the L2 unmanaged switch and started sniffing.
Reproduced the issue
Unplugged everything from the L2 switch leaving only the sniffing laptop and the router. The frames continue to flow without interruption until I unplug the WAN port of the router for few seconds.
Ok, that's useful debugging. Next question is does the problem happen if you bypass the unmanaged switch and put the router direct to the ISP?
I suspect an interaction between the unmanaged switch and the routers built in switch is causing the pause frame flood. Without the unmanaged switch I expect it to go away. I'm not sure what the long term solution is but it would be good to do this test.
Hello,
I tested the following to test removing of the L2 switch
Configured 3 ports, including the WAN port in the same VLAN
Connected ISP + both STBs into this vlan, effectively removing the L2 switch
Made the same test, and here it got even more weird. The STB-s again got disconnected, complaining that they have no connection, while the internet traffic continued flowing.
I did not sniffed the traffic with this config, but I can give it another try if you think this can help.
The IPTV boxes are working, until I trigger the werid condition. If you think it can help, I can reproduce it and sniff the traffic before and after the router, so we can be sure, what is happening ?
Ok I've been thinking about what is going on here, and here's my working hypothesis:
There's nothing on the OpenWrt router which would normally generate pause frames unless you somehow enable that or there's a driver issue in this hardware (I actually have/had this hardware and never experienced this so ... I think it's not a problem with the OpenWrt unless it's introduced recently).
So I think what's going on is:
The ISP router / device is generating pause frames
The noncompliant unmanaged switch is flooding them to every port
The OpenWrt switch is maybe also noncompliant and flooding to every port....
The unmanaged switch is receiving the flood from the OpenWrt and.... flooding to every port
Lather rinse repeat
In the absence of the unmanaged switch... you may still get a pause frame and it may still flood to every port on the OpenWrt but none of the devices it floods to will reflect it back to the OpenWrt so you will probably not get a continuous flood-loop.
Is that at all consistent with what you see going on?
I trigger the condition with upload towards internet.
The switch is 10/100 TP-LINK.
Now the test:
Here is the topology:
When I start heavy uploading towards internet, the upload goes fine.
The TV looks like it is working, but If I start playing with the STBs they do not execute commands and so on. If I restart them, they are unable to connect anymore.
I sniffed on the LAN side, nothing unusual.
On the WAN side however, the situation is as follows, every 5-10 secs I see burst of pause frames (around 200-300 packets).
ok, so it's probably old enough that it doesn't comply with pause frame requirements. This suggests why it's flooding.
All the stuff you've said suggests that my suggested mechanism is probably the one at work. The thing to figure out is why the ISP device is sending pause frames. Can you configure that ISP device at all and turn off ethernet flow control?
put it in place of your previous unmanaged switch. This switch is suitable for use on the WAN side of your router because you can place its management interface on a different VLAN and it will respect that VLAN setting. The other candidate, a tp-link sg108e has an intentional flaw that it listens on any untagged packets on any port.
I think that the frames are being generated and sent from my router and not ISP. Remember the test with the 10/100 switch in front. I was able to see the pause frames with all but my laptop and the router disconnected. Unless it somehow loops itself I still think that the router is generating the packets.
I have no access to ISP device and can not configure it.
Regarding the switch, I will probably take one Juniper from the office, just for the test and see what happens.
I am not sure what you mean, but if I am not mistaken I think that they use Huawei on the other side of the connection.
After all, nothing explains why I have no such issues when I use my current router, which I am trying to replace.
We can assume faulty 10/100 switch, bad bahvior on ISP side and so on, but they all disappear when I place my current TL-wdr3600 v1.1, running dd-wrt with kernel 3.10.37. It just works.
The mechanism of the 10/100 switch flooding, and also the OpenWrt flooding causes the flood-loop which locks up your network.
If you have only one device, you don't get that flood loop. So yes, the OpenWrt is non-compliant and flooding the packets (maybe, did you see them on the LAN of your OpenWrt?) but without that second switch in place you don't get the continuous flood-loop.
Who is generating them? I still think it's the ISP. Without the second switch in place, the OpenWrt may flood them, but it doesn't loop back and therefore there's no infinite pause.