something weird is happening to my local network and so far I've managed to narrow it down to my R7800 (OpenWrt 19.07.4 r11208-ce6496d796) and a couple of switches and I really have no idea how to further narrow down what the real source of the problem is.
in the last few months my network has several times collapsed. When that happens, anything connected to Switch 1 and Switch 2 is unreachable. Switch 3 and everything connected to it seems to work just fine. So is the appliance.
The only way I can regain stability is by turning off the R7800 and rebooting Switch 1 and Switch 2.
Like I said, this has happened several times already. This last time (today) I tried to gather some logs on the R7800 but I couldn't. Logread wouldn't allow me to jump back 6h (it happened in the middle of the night) and despite my efforts, I just couldn't figure out where the R7800/OpenWRT stores the logfiles utilised by logread to display the information. dmesg wouldn't display anything out of the ordinary though.
I know that if I could provide you guys with logs, you would be able to help me much quicker but I just couldn't afford to have my network down as I'm running important services.
As the other culprits are switches, I guess that there is a Layer 2 issue but it feels weird that everything runs fine for weeks, and then all the sudden this problem hits me. This last time my R7800 has been up for 20 days in a row.
What would you guys suggest I do next time this happens?
The logfiles are strored here (in Luci):
Try to narrow down your prob to ONE device. You said you had to reboot switches and R7800.
Try to reboot one device after the other, to find out which one causes the prob.
Try to find one which service doesn't work. DNS, DHCP, IP ...
Wireshark and tcpdump is our friend.
Logs are stored in a circular RAM buffer.
The size is defined in /etc/config/system (as kB).
You can increase the size to enable longer period of logs. I have increased it to 128 kB, but even 512 or 1024 should be quite ok in R7800 that has lots of RAM.
Separate kernel logs (with "dmesg") might show something if there has been something really wrong at kernel level.
You might also sysupgrade to the current 19.07.7, which contains 6 months worth of fixes compared to 19.07.4
Thanks! I assume that this would result in logread obtaining access to larger chunks of data then, correct?
dmesg didn’t show anything strange
Yo be honest, I have never dealt with upgrading my Openwrt. I read up a bit on it when I installed it for the first time, but it felt a little daunting. I already thought that newer versions may contain a fix for my case. I will give this another try. Many thanks!!