R7800 loses some wired connections after a few days

Hi everyone,

something weird is happening to my local network and so far I've managed to narrow it down to my R7800 (OpenWrt 19.07.4 r11208-ce6496d796) and a couple of switches and I really have no idea how to further narrow down what the real source of the problem is.

This is what my network looks like:

                   Modem
                    |
                    |
         Netgear    R7800    19.07
         |       |       |       |
         |       |       |     aplicance
         |       |   Switch3
         |  Switch 2
   Switch 1

in the last few months my network has several times collapsed. When that happens, anything connected to Switch 1 and Switch 2 is unreachable. Switch 3 and everything connected to it seems to work just fine. So is the appliance.

The only way I can regain stability is by turning off the R7800 and rebooting Switch 1 and Switch 2.

Like I said, this has happened several times already. This last time (today) I tried to gather some logs on the R7800 but I couldn't. Logread wouldn't allow me to jump back 6h (it happened in the middle of the night) and despite my efforts, I just couldn't figure out where the R7800/OpenWRT stores the logfiles utilised by logread to display the information. dmesg wouldn't display anything out of the ordinary though.

I know that if I could provide you guys with logs, you would be able to help me much quicker but I just couldn't afford to have my network down as I'm running important services.

As the other culprits are switches, I guess that there is a Layer 2 issue but it feels weird that everything runs fine for weeks, and then all the sudden this problem hits me. This last time my R7800 has been up for 20 days in a row.

What would you guys suggest I do next time this happens?

Thanks!

Nobody? I would appreciate any inputs. Please, forgive me if I’m missing any detail.
Thanks

Guys, could somebody help me be prepare when the time comes, and this issue hits me back?

First:
The logfiles are strored here (in Luci):
System/Logging

Second:
Try to narrow down your prob to ONE device. You said you had to reboot switches and R7800.
Try to reboot one device after the other, to find out which one causes the prob.
Try to find one which service doesn't work. DNS, DHCP, IP ...
Wireshark and tcpdump is our friend.

Logs are stored in a circular RAM buffer.
The size is defined in /etc/config/system (as kB).
You can increase the size to enable longer period of logs. I have increased it to 128 kB, but even 512 or 1024 should be quite ok in R7800 that has lots of RAM.

Separate kernel logs (with "dmesg") might show something if there has been something really wrong at kernel level.

You might also sysupgrade to the current 19.07.7, which contains 6 months worth of fixes compared to 19.07.4

Yes, I already tried to isolate the problem further by just acting on one device at a time but the network wouldn’t come up. Thanks for the idea though!

Thanks! I assume that this would result in logread obtaining access to larger chunks of data then, correct?

dmesg didn’t show anything strange

Yo be honest, I have never dealt with upgrading my Openwrt. I read up a bit on it when I installed it for the first time, but it felt a little daunting. I already thought that newer versions may contain a fix for my case. I will give this another try. Many thanks!!

Right. The default is only 64 kB buffer, which can get get consumed quickly in a busy router. If you enlarge that to 512 kB, there is 8x space for the log items.

You might also sysupgrade to the current 19.07.7, which contains 6 months worth of fixes compared to 19.07.4

hi @hnyman! I've attempted a couple of times to do an upgrade but I have given up the times because I just didn't feel confortable.

I found this article: https://www.reddit.com/r/openwrt/comments/kikmjk/proper_way_of_upgrading_openwrt_and_keeping_all/
which links to this set of scripts https://github.com/richb-hanover/OpenWrtScripts#opkgscriptsh

additionally I consider this https://openwrt.org/docs/guide-user/additional-software/imagebuilder but apparently one needs to have a Linux OS at hand, and it is not my case.

I feel I need to be doing something wrong as these days things are way more streamlined than this.

Do you happen to know of a solid howto or tutorial? can I just do a sysupgrade and then run the scrip I just referenced above?

Thanks a million in advance!