Archer AC1750 Dropping Connection

The attached image shows the current network topology I have configured at home.


For reasons beyond my grasp, the TP-Link AC1750 periodically loses its internet connection.

During these outages LuCI remains responsive. On the overview page the IPv4 and IPv6 links appear to be active. Traffic on the various interfaces drops but the CPU load stays up.

Since everything looks normal in the UI I'm not sure what kind of errors to look for or how to begin troubleshooting the problem. I've already re-flashed OpenWrt

I've skimmed the system log looking for clues but I'm not really sure how to read it. I'm happy to post bits here if it will be of help.

Other notes:
The same router (running a slightly older version of OpenWrt) was previously in use with a DSL modem with no issues.
I applied SQM following the instructions at https://openwrt.org/docs/guide-user/network/traffic-shaping/sqm (this caused a 10MBPS slowdown of my downstream speed and a commensurate increase in my upstream, but that's a topic for another thread).

1 Like

What does sys log say when it happens? Whgich OpenWRT version?

I haven't been around to see exactly when it happens. I estimated the last dropout time and checked the log for clues but the same messages that appear around then seem to appear at times when the connection was stable.

The Archer AC1750 is running 19.07.4. I'm trying to update my post with a new image but the option seems to have vanished.

To try and identify the moment of disconnect I did the following:

  • Restored the router to stock using firstboot && reboot now
  • Installed luci-app-statistics to log bandwith usage
  • Began writing System Log to a USB drive plugged into the router.
    I'll post the log file and a screenshot of the RRDTOOL graphs when I capture something that looks relevant.

So far I've been unable to correlate an outage to a System or Kernel log entry.

But I have observed that during an outage there is still a lot of activity on eth0.1 but not very much on br-lan.

How would I proceed if the logs contained no useful information in diagnosing the disconnect? As I mentioned earlier, during an outage the overview page shows no change to the WAN connections.

Still multiple dropouts every day.
Just now I noticed a large gap (about 10 minutes) in the system log around the time a dropout occurs. This was reflected in the log file on the USB stick.

The most recent dropout occurred at around 21:40 today. Here are the surrounding logs:

Wed Dec  9 20:34:09 2020 daemon.info dnsmasq-dhcp[18776]: DHCPREQUEST(br-lan) 192.168.1.222 xx:xx:xx:xx:xx:xx
Wed Dec  9 20:34:09 2020 daemon.info dnsmasq-dhcp[18776]: DHCPACK(br-lan) 192.168.1.222 xx:xx:xx:xx:xx:xx P50
Wed Dec  9 21:44:56 2020 daemon.err collectd[1497]: Exiting normally.
Wed Dec  9 21:44:56 2020 daemon.err collectd[1497]: collectd: Stopping 2 read threads.

Note no activity recorded between 21:34 and 21:44

During outages OpenWrt's Ping, Traceroute and Nslookup diagnostics all fail.
The overview page, however shows that the WAN connection is still up and while I do see dropouts on the realtime graphs and RRDTOOL statistics page, LuCI behaves normally.

Could this be caused by malware on one of the network devices?

I think luci-app-statistics is insufficient in diagnosing the issue.
I'd like to monitor each individual device on the network.
I once tried and failed to get YAMon running. Can anyone recommend another per-client monitoring extension?

Just now I returned to my computer to find it without internet again.
When I opened LuCI and went to check the RRDTOOL stats, the Graphs page was not displaying the tabs for the various reports. I logged out and back in, refreshed the page. Nada. I restarted the router and the tabs reappeared but there was no activity in the graphs. :roll_eyes:

I checked the RRTTOOL storage folder, /mnt/sda1/RRDTOOL, and all the log files seem to be intact with recent time stamps but the graph tool won't pull any records except for the few minutes that elapsed in the time since I rebooted the router.

Moreover, for reasons I have yet to understand, the syslog is missing all of yesterday (Friday) and has entries for today starting at 14:15 (it's 14:45 as of this writing).

I checked my log backup file stored in /mnt/sda1/syslog.txt and it contains the same data as the live report. I assumed that this file gets appended and not overwritten, but I guess that's not the case...?

It really feels like someone is messing with me.

This is becoming quite the exercise in frustration.

Here's a recent sys log file (MAC addresses redacted):

Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1096]: exiting on receipt of SIGTERM
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: started, version 2.80 cachesize 150
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: DNS service limited to local subnets
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP no-DHCPv6 no-Lua TFTP no-conntrack no-ipset no-auth no-DNSSEC no-ID loop-detect inotify dumpfile
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq-dhcp[1783]: DHCP, IP range 192.168.1.100 -- 192.168.1.249, lease time 12h
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: using local addresses only for domain test
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: using local addresses only for domain onion
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: using local addresses only for domain localhost
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: using local addresses only for domain local
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: using local addresses only for domain invalid
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: using local addresses only for domain bind
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: using local addresses only for domain lan
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: reading /tmp/resolv.conf.auto
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: using local addresses only for domain test
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: using local addresses only for domain onion
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: using local addresses only for domain localhost
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: using local addresses only for domain local
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: using local addresses only for domain invalid
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: using local addresses only for domain bind
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: using local addresses only for domain lan
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: using nameserver 206.248.154.22#53
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: using nameserver 206.248.154.170#53
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: read /etc/hosts - 4 addresses
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: read /tmp/hosts/dhcp.cfg01411c - 2 addresses
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq-dhcp[1783]: read /etc/ethers - 0 addresses
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: read /etc/hosts - 4 addresses
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq[1783]: read /tmp/hosts/dhcp.cfg01411c - 2 addresses
Sat Dec 12 14:14:20 2020 daemon.info dnsmasq-dhcp[1783]: read /etc/ethers - 0 addresses
Sat Dec 12 14:14:21 2020 daemon.notice netifd: Interface 'wan6' is now up
Sat Dec 12 14:14:21 2020 daemon.info dnsmasq[1783]: reading /tmp/resolv.conf.auto
Sat Dec 12 14:14:21 2020 daemon.info dnsmasq[1783]: using local addresses only for domain test
Sat Dec 12 14:14:21 2020 daemon.info dnsmasq[1783]: using local addresses only for domain onion
Sat Dec 12 14:14:21 2020 daemon.info dnsmasq[1783]: using local addresses only for domain localhost
Sat Dec 12 14:14:21 2020 daemon.info dnsmasq[1783]: using local addresses only for domain local
Sat Dec 12 14:14:21 2020 daemon.info dnsmasq[1783]: using local addresses only for domain invalid
Sat Dec 12 14:14:21 2020 daemon.info dnsmasq[1783]: using local addresses only for domain bind
Sat Dec 12 14:14:21 2020 daemon.info dnsmasq[1783]: using local addresses only for domain lan
Sat Dec 12 14:14:21 2020 daemon.info dnsmasq[1783]: using nameserver 206.248.154.22#53
Sat Dec 12 14:14:21 2020 daemon.info dnsmasq[1783]: using nameserver 206.248.154.170#53
Sat Dec 12 14:14:21 2020 daemon.info dnsmasq[1783]: using nameserver 2607:f2c0::1#53
Sat Dec 12 14:14:21 2020 daemon.info dnsmasq[1783]: using nameserver 2607:f2c0::2#53
Sat Dec 12 14:14:21 2020 user.notice firewall: Reloading firewall due to ifup of wan6 (eth0.2)
Sat Dec 12 14:14:22 2020 daemon.warn odhcpd[1220]: A default route is present but there is no public prefix on lan thus we don't announce a default route!
Sat Dec 12 14:14:22 2020 daemon.info dnsmasq[1783]: read /etc/hosts - 4 addresses
Sat Dec 12 14:14:22 2020 daemon.info dnsmasq[1783]: read /tmp/hosts/odhcpd - 0 addresses
Sat Dec 12 14:14:22 2020 daemon.info dnsmasq[1783]: read /tmp/hosts/dhcp.cfg01411c - 2 addresses
Sat Dec 12 14:14:22 2020 daemon.info dnsmasq-dhcp[1783]: read /etc/ethers - 0 addresses
Sat Dec 12 14:14:27 2020 daemon.info dnsmasq-dhcp[1783]: DHCPDISCOVER(br-lan) 192.168.1.109 xx:xx:xx:xx:xx:xx
Sat Dec 12 14:14:27 2020 daemon.info dnsmasq-dhcp[1783]: DHCPOFFER(br-lan) 192.168.1.109 xx:xx:xx:xx:xx:xx
Sat Dec 12 14:14:27 2020 daemon.info dnsmasq-dhcp[1783]: DHCPDISCOVER(br-lan) 192.168.1.109 xx:xx:xx:xx:xx:xx
Sat Dec 12 14:14:27 2020 daemon.info dnsmasq-dhcp[1783]: DHCPOFFER(br-lan) 192.168.1.109 xx:xx:xx:xx:xx:xx
Sat Dec 12 14:14:27 2020 daemon.info dnsmasq-dhcp[1783]: DHCPREQUEST(br-lan) 192.168.1.109 xx:xx:xx:xx:xx:xx
Sat Dec 12 14:14:27 2020 daemon.info dnsmasq-dhcp[1783]: DHCPACK(br-lan) 192.168.1.109 xx:xx:xx:xx:xx:xx mercku
Sat Dec 12 14:14:48 2020 daemon.err collectd[1334]: Sleeping only 2s because the next interval is 169365.703 seconds in the past!
Sat Dec 12 14:14:50 2020 daemon.err uhttpd[1280]: luci: accepted login on / for root from 192.168.1.222
Sat Dec 12 14:15:28 2020 authpriv.info dropbear[2036]: Child connection from 192.168.1.222:56311
Sat Dec 12 14:15:35 2020 authpriv.notice dropbear[2036]: Password auth succeeded for 'root' from 192.168.1.222:56311

Looks like I'm just using this thread as a personal troubleshooting journal but I'm open to outside wisdom if anyone wants to jump in.

Here's a screencap from luci-app-statistics from the most recent outage.


You can see the outage occurring at apporimately 16:15. What's interesting to note are the two last charts with activity, "Transfer on lo" and "Packets on lo". Both are inactive until the moment of the outage, at which point there is a surge followed by continued activity until the router is reset and traffic resumes on the other interfaces.

As well, there is a surge in CPU activity during the outages:

What would explain the sudden surge of activity on the loopback adapter and the correlated CPU spike?

The syslog shows nothing that I would consider unusual in the entries preceding the outage (hardware addresses redacted):

Sun Dec 13 14:56:49 2020 daemon.info dnsmasq-dhcp[1783]: DHCPDISCOVER(br-lan) xx:xx:xx:xx:xx:xx
Sun Dec 13 14:56:49 2020 daemon.info dnsmasq-dhcp[1783]: DHCPOFFER(br-lan) 192.168.1.109 xx:xx:xx:xx:xx:xx
Sun Dec 13 14:56:49 2020 daemon.info dnsmasq-dhcp[1783]: DHCPREQUEST(br-lan) 192.168.1.109 xx:xx:xx:xx:xx:xx
Sun Dec 13 14:56:49 2020 daemon.info dnsmasq-dhcp[1783]: DHCPACK(br-lan) 192.168.1.109 xx:xx:xx:xx:xx:xx mercku
Sun Dec 13 16:16:24 2020 daemon.err uhttpd[1280]: luci: accepted login on / for root from 192.168.1.222

I've swapped the D-Link and TP-Link routers and the internet has been up steadily for about 34 hours, which is longer than it'd stayed up in the previous config. I'd still rather be using OpenWrt so I'll probably find another router that can run a late version and put that in place of the D-Link.
In it second-NAT position the TP-Link has been stably connected as well. The outages may remain a mystery.

The D-Link had an outage this morning but I don't have any monitoring on it yet so I can't see the time or any other information. The kids were all in 'class' so I set the Mercku up in pole position.