Router Misbehaving After WAN Connection Drops

Hi

I have an annoying bug with my Linksys WRT1900AC loaded with LEDE Reboot 17.01.4 r3560-79f57e422d. Everytime the WAN connection drops my router locks up; the router doesn't respond to pings, the Wi-Fi LED's on the front of the physical router freeze (don't flicker) and the LuCI interface is unaccessible.

I previsouly made a forum post about this and I was recommended to use the Travelmate package and make a new thread to get some help setting it up.

Is the the package to use to solve the problem mentioned above? If so could someone please go through how to set it up in the LuCI interface?

Any help highly appreicated. Many thanks

Will

In the old thread there was no travelmate recommendation regarding your problem (wired WAN). Travelmate is only a Wireless WAN-Manager (WWAN), this package is mainly intended for travel router.

Cheers for the reply. What do you suggest I can use to keep the LAN and DHCP alive while the WAN tries to reconnect?

Will

Sorry, I have no idea. Maybe you should reset your config and start with a fresh install, personally I've never seen such behaviour. And please change the misleading title of this thread ... thanks.

This is not expected behavior, there is no reason why a router should lock when it loses WAN connection. Please, post your "/etc/config/network" file here. Also, it would be interesting to open a SSH session, execute "logread -f", and then cut the WAN connection; perhaps we can get some log message about why is this happening.

Hi

I've updated the title. Please let me know if it needs a more suitable name.

This is the network configuration file below:-

config interface 'loopback'
	option ifname 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'fd39:7eb9:e056::/48'

config interface 'lan'
	option type 'bridge'
	option ifname 'eth0'
	option proto 'static'
	option ipaddr '192.168.1.1'
	option netmask '255.255.255.0'
	option delegate '0'

config interface 'wan'
	option ifname 'eth1'
	option _orig_ifname 'eth1'
	option _orig_bridge 'false'
	option proto 'pppoe'
	option username 'isp_email@domain.com'
	option password 'isp_password'
	option keepalive '5 5'
	option ipv6 'auto'
	option mtu '1508'

config switch
	option name 'switch0'
	option reset '1'
	option enable_vlan '1'

config switch_vlan
	option device 'switch0'
	option vlan '1'
	option ports '0 1 2 3 5'

config switch_vlan
	option device 'switch0'
	option vlan '2'
	option ports '4 6'

config interface 'guest_wlan'
	option _orig_ifname 'wlan0-1'
	option _orig_bridge 'false'
	option proto 'static'
	option ipaddr '10.0.0.1'
	option netmask '255.255.255.0'
	option delegate '0'
	option dns '208.67.220.220 208.67.222.222'

Last night I've managed to note down how to replicate my problem. I simply unplugged the WAN cable from my fibre-optic modem and then disconnected from the 5GHz AP and then re-connected. I then went to my router's LuCI interface page and was presented with this message.

Kind regards

Will

Looks like your router gets out of RAM when the WAN goes down... that's weird, I cannot imagine why.
Could you please try the test that I suggested in my last post?

Hi eduperez

I did the following command in PuTTY

logread -f

and pulled out the WAN cable out. Is there a way I can hide maybe even filter out my personal information as I would like to hide but there is so much of it?

To give you a timescale of what happened to the physical router, pretty much after pulling the cable out the LuCI GUI was slow and the webpages were taking a long time to load. After about ten minutes the Wi-Fi AP dropped out (the lights on the front of the unit confirmed this).

Also is there a way to generate this log to a file instead or is that not recommended?

Many thanks

Will

logread -f >> /tmp/mylogread.txt

Hi jwoods

Can I generate the file anywhere else as it is lost when I have to switch the router due to it crashing?

Will

Sure.../tmp was just an example.

Hi guys

I opened Command Prompt on Windows and ran a constant ping to the router (192.168.1.1) and never dropped, and then I ran the following command in PuTTY:-

logread -f >> /mylogread.txt

and I unplugged the cable form the WAN port left it it for five minutes and then disconnected from the AP on my Android phone, to which it then failed to re-connect. Also the LuCI GUI became inaccessible showing this message:-
Lede%20Error

After about 6/7 minutes I decided to reconnect the CAT6a cable and see if the WAN re-establish. Weirdly enough the LuCI started working again but it was ridiculously slow. However, at no surprise the WAN didn't reconnect even after starting it through LuCI > Network > Interfaces.

mylogread.txt
https://pastebin.com/vmgyjqVv

I hope someone can shed some light on this.

Many thanks

Will

This is what I could see on your logs:

  • There are lots of DNS queries for "localhost" from your own router... that looks odd to me. You seem to be running a script or a program that is constantly launching those queries.
  • WAN connection is lost at about 11:49:29, I guess that is when you unplug the ethernet wire.
  • Device then keeps trying to re-establish the WAN connection.
  • At 11:51:28, you open a new connection from your computer.
  • At 11:54:58, the disaster starts: uhttpd fails (Segmentation Fault) and starts reporting "Out of memory".
  • At 11:58:24, squid also fails because there is not enough memory.

Thus, my two cents are:

  • Your device is not crashing, just getting out of RAM.
  • With the popularization of HTTPS, web proxies such as SQUID become useless.
  • Perhaps it is not related to the issue here, but I would investigate about the DNS requests.
1 Like

Thank you very much for the reply.

I've had a quick look into the DNS. It could possibly be the DNSCrypt and/or Dynamic DNS packages I have installed. I'll try disabling them, re-test and report back.

As a side note in relation to RAM, is there a way to monitor or log RAM usage?

Will

Try disable Squid, I think your device not have enough resources for Squid cache.

top, htop, free, ...

Hey Guys I'm back and unfortunately with bad news...

I ended up re-installing LEDE Reboot 17.01.4, updated all my packages via SSH using PuTTY and installed the following packages:
luci-ssl
luci-app-upnp
luci-app-sqm
luci-app-ddns
ca-bundle
ddns-scripts
hostapd-common
hostapd-utils
wpad
qm-scripts

I then disabled the DDNS scripts package thinking this was the cause of the DNS request via the menu 'System > Startup' and clicked on the 'Enabled' button which turned into to 'Disabled'. Finally I rebooted my router and left if for a few minutes before then unplugging the WAN cable, sitting and waiting patiently for LuCI to start behaving slow and locking up after about 15 minutes or so. On the overview page I took some screenshots of the RAM usage available when it slowly drifted down to 9% compared to the 78% I have when everything is all up and working. As well slow loading webpages within LuCI I would also get an out of memory message in plain text. What's weird is when I could get to the 'Status > Processes' page and look under the RAM usage column there wasn't anything over 1%. I was expecting something to be hogging the RAM resources...

I'll be uploading images in the morning.

System Log

This is all very odd. Hope To hear back from someone soon.

Many thanks

Will

@willowen100 -

How are you updating the packages? It is not recommended to update core packages because it can introduce issues related to kernel versions and other dependencies.

My recommendation would be to re-flash with 17.01.4 and only setup the basics (do not update any existing packages, do not install any new ones). Test at this point.

Then, one at a time, install packages (again, do not update core packages; installing new packages is fine, though)... do it one at a time (aside from direct dependencies, of course) and test between each package installation. At some point, the problem will show up again and you should have a better idea of what might be causing the issue (base install, or a specific package, etc.). The best way to troubleshoot is to keep it simple and isolate variables.

1 Like

Thank you for the reply psherman.

When I updated the packages I did it through SSH using OPKG Packet Manager commands. If you recommend not updating the packages and only flashing the firmware and installing the fresh packages I do want, I'll give that a go and let you know.

Many thanks

Will

Yes, please try what I recommended. Start simple. Does the problem manifest when you have a fresh install (with no extra packages installed, no upgrades). If so, you are either dealing with a bug or a configuration issue with the base firmware. Installing packages one-by-one after verifying that the base install is okay (assuming that is the case) will help narrow down the packages responsible for the issue.

Read this thread (and feel free to search for other threads related) for information about why it is not recommended to update the core packages. If you are really intent on updating those core packages, it would probably be best to build your own firmware from the source or to use a snapshot.