LAN stops working every now and then

That doesn't seem exactly like the problem I'm having. You see I can't ping router or other devices if I'm connected via Ethernet. I also can't get IP on WAN interface and can't get IP from DHCP client. The WiFi is working fine, I can ping other devices (not the Ethernet one), access router though SSH etc.

I don't think my router ever recovered but I couldn't afford to wait 20 minutes to get my network back online.

Do you use DNSCrypt-Proxy?

No, I don't.

If I were you I would check if this isn't just a DNS problem. Have you tried pinging IP addresses, like Google's DNS servers (8.8.8.8)?

If you don't, then you should try, next time when this happens to you.

Does renewing a lease cause a client to lose connection (say would a Roku stop streaming)?

I haven't noticed the "hang" being connected to DHCP.
@lacamester said his DHCP service (dnsmasq) was disabled entirely. Problem with DHCP also doesn't explain the issue with WAN port. If this was a DHCP issue then only some computers would have lost connection, since they have different lease expire times and Internet on WiFi would be working fine. The WiFi clients are unaffected.

My SmokePing is running now for 4 days without a single lost packet on LAN. I can also say for sure that I never had any outages that lasted for more than 10 to 15 minutes because one of my monitoring servers is behind the router. So, I am afraid, I cannot help debug this problem. Seems my router is not affected.

Update 2017-05-24 12:28: Added specs:

I am running LEDE Reboot 17.01.1 r3316-7eb58cf109, essentially default settings apart from: LAN is 10.0.0.0/24, router has static IPs, NAT leakage workaround, WLAN, odhcpd, and some IPv6 features are disabled.

Can you tell me which build are you running and do you use SQM for any chance?

Sorry, I should have. I edited my original posting. I run the router without added packages, and without special QoS.

So. I used the latest tp-link "portugal" firmware for more than a week now... and had no issues, lan/wan stops whatsoever with it. Seems like it's a firmware issue after all (If it stays stable for a month, then I'll use it permanently). :slight_smile:

But how does it affect LEDE then?
Would you mind trying to flash LEDE onto your device after sometime to check if the issue has been resolved there too? Maybe there was an issue with u-boot or something?

I'm thinking of a kernel-patch, or some device-driver fixes (tp-link uses the same gnu-tools, daemons as lede/*wrt according to the log entrys).

I had an issue when I installed a brand new v4 with Lede in a datacenter as a firewall between the servers and the rest of the network.
With default settings it did not pass any traffic towards the wan port (if I remember correctly, RX packet counter was growing but TX wasn't - or maybe the other way around).
Anyway, after lowering MTU to 1400, it immediately started working properly.
Maybe you could give that a try.

How do I lower MTU in LEDE?

config interface wan
        option ifname    'eth0.2'
        option proto     'static'
        option mtu       '1400'
        list ipaddr      'xxx/24'
        list ipaddr      'yyy/32'
        option gateway   'zzz'
        list dns         'aaa'
        list dns         'bbb'
  1. Do you maybe use any USB devices with your v4?
  2. Do you maybe use SQM/QoS?
  3. How fast is your Internet connection?

Also do you think you could try at some point LEDE 17.01.1 (or soon to be relased 17.01.2) after flashing the Portugal firmware? I wonder if the Portugal firmware made some changes that fixed some parts of flash memory that the LEDE wasn't touching. I would try this myself but I don't have access to my v4 for now.

Everyone who's having this issue please add you vote here, in the bug report.
Thank you.

https://bugs.lede-project.org/index.php?do=details&task_id=762

On my 17.01.1, a71xx based build I was having similiar events where my static connection became unresponsive, including pings to 8.8.8.8 and the router itself 192.168.2.1. On OpenBSD, if I took down my network connection and restarted it, the connection would be restored.

I was not using IVP6 and I disabled all the IVP6 check boxes I could find in luci. At the same time I specified dns servers per @mike advice here. I have not had an unresponsive connection in 3 days. I think the improvement is likely due to IVP6 and if others can replicate, it should help identify the problem.

I was using OpenWrt Chaos Calmer 15.05.1 on TP-Link TL-WR1043N/ND v2 for a long time. Since LEDE is a successor of OpenWRT I tried the latest stable build LEDE 17.01.2 and almost immediately got this bug. Built-in switch fails about every 10 minutes with all Ethernet ports. So no WAN connection, no LAN connection, the only way is to reload switch configuration through wi-fi or power cycle the whole device.

OpenWRT on WR1043NDv4 doesn't really work. There is no 15.05.1 release, only trunk.

Can you try flashing original firmware but portugal version and then LEDE?
Can you also provide a list of custom packages you installed onto the router? I never got a fail of switch so soon after flashing LEDE.

In followup, My static NIC connections would become unresponsive on a daily basis. When I disabled IPV6 and manually set my WAN dns servers the problem went away. I have not had an unresponsive connection in over 2 weeks. Running 17.01.2/ar71xx on a Trendnet TEW-732BR.