Move/copy syslog


#1

Hi there,
I run OpenWrt 18.06.1 r7258-5eb055306f / LuCI openwrt-18.06 branch (git-18.228.31946-f64b152)
Kernel Version 4.9.120
on a TP-LINK TD-W8970 modem/router.
I have strange issue loosing LAN connection, that I'd like to investigate. However I find that when this occurs, the syslog file is missing for more than 24 hours. This seem to delete part of the log even when it was working fine.
So I'd like to copy the log in another outside place at regular interval. I have a HDD connected /mnt/sda.
How could I copy it on the HDD? What file, what trigger?
Thanks


#2

Assuming youve installed rsyslogd package, you should just need to configure a selector and then specify a filename:


#3

Log buffers are in RAM, so after a reboot there are no logs available.

Most likely there has been a reboot for some reason.

you can use cron for timing. E.g. every 15 minutes.

There is no file, but just the round-robin internal log buffers, and you can see the log contents as output of "logread" and "dmesg" commands.

Most routers have no real-time clock, so the date/time will revert to last known good time based on the newest /etc file.

If there is a kernel crash leading to reboot, you may need to have a serial console connection, as the network connectivity might break up rather early, maybe too early for rsyslog etc. tools to catch crashes in logs.


#4

I agree with hnyman , it does sound like the device is rebooting for some reason so serial console logging is probably the best way to go.

But if you dont yet have a serial connection ready you couid try this kind of selector in rsyslog.
Make a directory for your logs.

mkdir /mnt/sda/syslog

Then in /etc/rsyslog.conf put this in there

*.warn          /mnt/sda/syslog/msgs

Then manually retart rsyslogd and wait for thr problem to happen next. But serail logging is probably going to be needed . Good luck.


#5

My "trivially simple" approach to ad-hoc log retention is to ssh in from a "desktop" and follow the logs, using logread for a "stock" OpenWrt install. It won't catch anything that happens after ssh or the network goes down, nor does it capture the subsequent reboot, but it's fast if you don't have rsyslog or syslog-ng installed and configured to log to persistent storage or a remote host.


#6

Thank you for your answers.
I installed rsyslog and added the line Sparks wrote. It's working but I get the kernel log only few lines. I'd prefer to get the full syslog (what I can read with logread).
I don't know how to get it.

What is strange is for example the last issue: The router is always on and I used the computers 30/12, 31/12. 01/01, morning I looked at the syslog. About half the log was 30 afternoon and the next line was from 01 morning the other half. Nothing on 31/12.


#7

So long as youre getting kernel events your in with a decent chance of catching the error. Other events while nice probably wont be needed to see the problem. But serial is the only sure way to see everything at the point of failure


#8

Nothing so strange, and the basic reason why I assume that there has been a reboot on 01 morning.

There is no real-time clock, so the router in the boot process the clock is initially set by sysfixtime to the most recent timestamp of any file found in /etc. Unless something writes there, the last time stamp is from last boot time, usually /etc/urandom.seed file. By default, writing to flash in runtime is avoided, so nothing normally writes there after the boot has completed.

(At the end of the boot process ntpd starts, and it may also take a while before ntpd gets and sets the correct time, so the time stamp may also remain at firmware compilation date.)

So, your log having dates of 30 in early part and 01 after NTP has git the real time, sounds quite normal for a reboot on 01.

You can observe that behaviour also at manual reboot. There will be a gap in log time stamps.

You can minimise boot time log time stamp gap by touching something in /etc every hour or so, so that in the next reboot the time starts from the last touch moment.


#9

Thank you all of you.
I have now a better understanding what could have happened. I'll leave like that, wait and see if happen again.
However I have an error message that appears at regular manner but has no effect on the router.
Fri Jan 4 10:40:29 2019 kern.warn kernel: [227517.098040] br-lan: received packet on eth0.1 with own address as source address
Why? How to avoid it?


#10

Sounds like some process is sending packets to the router’s own lan ip which is traversing the network stack needlessly.

However the warning is there since it is more likely something (interfaces/vlan tags, routes, firewall/iptables)has been misconfigured ..


#11

Are you using some wireless extender, some guest network? Maybe you could start a new topic where you will provide the necessary configuration.


#12

@trendy
I don't use wireless extender nor guest network, only basic network config.

@Sparks
How to find out what is the process that could do that. Is there a tool to use?


#13

tcpdump, which on an OpenWrt device can be supplied by the tcpdump-mini package for most uses.

Wireshark is an excellent tool, but far too much for any all-in-one router to run. It can run on a "sesktop" either monitoring a switch port of a switch configurable for port mirroring (which includes many OpenWrt devices), or being fed a packet stream from, for example, tcpdump through ssh.

I would start with tcpdump on the router and go from there. The first thing I would look for is packets with the address of br-lan being received on eth0 (you can skip filtering on the VLAN number to start with).

Perhaps something like (from the man page of tcpdump and pcap-filter from any `nix-like machine or the Internet)

tcpdump -ni eth0 -Q in src host [the IP address of br-lan] 

That should give you insight into the source MAC address, which may help in understanding where they are coming from. It will also give packet highlights; adding -v will give more detail on the packet contents.


#14

For some reason I didn't get any more the error message like

This is the good news.
But, after few days waiting, I got the original issue twice in 2 days.
The configuration is:
Cable from antenna -> PoE -> Lan4/WAN -> router. Lan3, Lan 1 -> desktops, Lan2 -> switch 4 ports.
Switch1 from router, switch2 -> laptop. switch3 -> analogue telephone adapter (VoIP)

First issue:
Laptop (switch2), 1 desktop (lan3), phone on. I was in another room. I didn't notice any power outage. When I am back,
Lan3 LED off, desktop is not connected any more.
Laptop working fine. From LUCI, I see router switch Lan3 "no link"

[my comment: cron started a job to switch off wifi]
Jan  8 22:15:00 OpenWrt hostapd: wlan0: interface state ENABLED->DISABLED
Jan  8 22:15:00 OpenWrt hostapd: wlan0: AP-STA-DISCONNECTED 00:9c:95:xxxxxxx
Jan  8 22:15:00 OpenWrt hostapd: wlan0: AP-DISABLED 
Jan  8 22:15:00 OpenWrt hostapd: wlan0: CTRL-EVENT-TERMINATING 
Jan  8 22:15:00 OpenWrt hostapd: nl80211: deinit ifname=wlan0 disabled_11b_rates=0
[ my comment: reboot from laptop, LUCI]
Jan  7 09:29:09 OpenWrt rsyslogd: [origin software="rsyslogd" swVersion="8.18.0" x-pid="970" x-info="http://www.rsyslog.com"] start
Jan  7 09:29:10 OpenWrt kernel: [   31.911074] lantiq,xrx200-net 1e108000.eth eth0: port 0 got link
Jan  7 09:29:10 OpenWrt firewall: Reloading firewall due to ifup of wan (eth0.2)
Jan  7 09:29:11 OpenWrt hostapd: Configuration file: /var/run/hostapd-phy0.conf
Jan  7 09:29:11 OpenWrt kernel: [   33.723695] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
Jan  7 09:29:11 OpenWrt kernel: [   33.752781] br-lan: port 2(wlan0) entered blocking state
Jan  7 09:29:11 OpenWrt kernel: [   33.756732] br-lan: port 2(wlan0) entered disabled state
Jan  7 09:29:11 OpenWrt kernel: [   33.762993] device wlan0 entered promiscuous mode
Jan  7 09:29:11 OpenWrt hostapd: wlan0: interface state UNINITIALIZED->COUNTRY_UPDATE

As you can see, I had to reboot to get Lan3 on. So the issue is not an inopportune reboot. Nothing is logged.

Second issue the day after:
Lan3 and Lan4/WAN become off. The only way was to reboot from laptop, LUCI.
No log with error before the issue.
The only difference is that when I restarted the second time I had a clean shutdown with disabling wlan, vlan and unmount the fs before restart. As you can see I didn't get that the first time.

Not easy to see what happen. I'll carry on the log.


#15

Maybe try process of elimination - try each if the below individually over 1-2 weeks

  • swapping the network cable between your desktop and lan3 port
  • move your desktop from lan2 to something else (lan1 if free)
  • if any other switches are currently plugged in , suggest removing those for a period and see if the issue still happens
  • remove any usb devices if they are power hungry. The power supply for most devices is offen only just able to do job
  • check if the device is maybe overheating

If the issue still keeps happening , maybe the device is on its last legs :man_shrugging:


#16

Good idea, I swapped Lan2 and Lan3
Now waiting

Edit Sat 19 Jan
I had the issue yesterday. I was looking video from Internet, suddenly it stopped downloading.
Lan3/Switch and Lan4/Wan gone, no link, Lan2 was OK.
Nothing noticeable in the log, look like a reboot but when I reboot from Luci all the ports are restating and working fine.
More and more I think the culprit is the device itself.


#17

Yeh if the same ports on the owrt device do that again, and with different devices plugged in, then your device is probably in its final months/weeks.

On the positive side, its the perfect excuse to upgrade to something better :sunglasses: