I can pipe a ping response to 8.8.8.8 to a text file in Openwrt
I wonder if there is a way to only record the failed internet response from the wan port? I wonder if it is something in my network or AT&T isn't giving me a stable connection.
Something like the following might work but I have not tested it.
Adjust parameters to your liking, just execute from the command line, it will write to /tmp/noping, it pings every 5 sec, if you do it faster you might get blocked:
{ while sleep 5; do if ! ping -qc1 -W2 8.8.8.8 &> /dev/null; then echo "no ping at $(date)" >> /tmp/noping; fi; done; } &
Thanks for making a solution for me. I am going to test it and get back. I might just use a cron job that executes every 5 seconds and the command can be inspired by your code.
When you say "AT&T", what type of service do you mean?
ADSL over copper? Or some kind of fiber optic service?
AT&T's aging copper network is no longer being well-maintained because in many places >80% of their former land-line customer base has migrated to cell phones.
About 4 years ago I had a customer with an AT&T-provided 8 Mbit down / 1 Mbit up ADSL line that had terrible intermittent problems with packet loss, random loss of sync, etc. Of course the issues got blamed on pretty much everything else in the network, Wifi, and home wiring -- until I finally made some shell scripts that polled the ADSL modem's command-line status utility once per second & logged same to a text file. I gathered data for a period of weeks and transformed it into graphs showing when the ADSL carrier signal quality degraded or dropped entirely. Combined with L3 (TCP/IP-layer) ping-time charts it was clear that the bad pings were being caused by physical issued with the ADSL line. I even got a duplicated ADSL modem & swapped that in, to rule out any possible issue with the ADSL modem itself.
That's what it took to eventually make my case to AT&T and get it escalated up the ladder. In the end they had to send out techs on 2 or 3 different visits to fix issues that were indeed on AT&T's part of the copper loop & DSLAM systems.
John: I am in Sonoma county and where we live, I have fibre termination at about 400 yards away. From there we have copper and these copper wires too are very very old. The local AT&T guys dont even wish to touch these unless they have to. Between AT&T and the inherent reliability of sites we visit or the stability of wifi connections, I am trying to narrow down working from upstream.
Exactly what I need. Mine is 50 down and 8 up data speed. Do you have these scripts somewhere may be? I wish I was smart with programming and linux. My plus is that I have no fear and I use the recipe I get from well-wishers on this forum.
I see. Yes, that last 400 yards of old copper could be a factor.
If you are trying to troubleshoot a customer or site problem from the outside it could be difficult. If there's a way to gain access and a CLI or shell-like access to the device that terminates the copper connection you could do similar to what I described above. Just keep it in mind if you have exhausted all other options and still can't figure out the cause.
They don't give access to shell any more. Someone I know has TMO internet, there even dns address change is restricted. Basically can change nothing on a TMO box.
Well, the scripts were specific to the exact device in question, which was a ZyXel P-660HN-51 ADSL2 modem. It had / has a management interface, accessible from the LAN side, with a basic CLI including status-reporting functions.
The final result was a graph like this, over a period of several weeks. The red-shaded area was before AT&T's first service call. The yellow-shaded area was after the first service call. The green area was after AT&T's second service call. The black line is ADSL modem signal-to-noise ratio in dB, the green line above is ICMP ping latency in milliseconds, shown inverted for easier visual correlation with the ADSL line problems.
I belive the initial problem was a failing DSLAM port, which, when fixed, still left issues with a flaky copper splice somewhere between the DSLAM and the CPE, which they then fixed.
Graph created in "DataGraph" for Mac. (Excellent piece of software, that.)
If you can obtain the make & model number of the CPE (customer premise equipment) which provides internet gateway access to the router I can help you determine whether it has an accessible command-line interface or similar functionality for gathering line-quality statistics.
I would suggest simply pinging normally (every second) and capturing all of the results. The ping latency (in milliseconds,) if graphed over a long period, can also be a very useful clue for diagnosing line-quality issues. Copper (like Wifi) has a wide range of usable signal margin; devices will "fall back" to lower and lower carrier speeds while still keeping the connection up if at all possible.
So, perhaps gather 100% of the ping data & then graph the response time on a chart. Again, highly recommend DataGraph for this purpose if you happen to be a Mac user, though it has its own learning curve. If you want to provide raw logs to me in .CSV format I can probably help you out by producing graphs fairly quickly.
Hi John
I got the noping tested and working as follows. Many thanks to this forum and now, I will wait for a few days to find out if the AT&T is in fact the problem. It may be something else.
Thanks a lot.
I used the following:
I have a reboot every day at 3 am on my router and so my scheduled job reads like this:
I created a bash file and chmod that with +x and saved as follows:
root@R7800:/tmp# cat /etc/config/pingtest.sh
#!/bin/sh
while true;
do
if ! ping -qc1 -W2 8.8.8.8 &> /dev/null; then echo "no ping at $(date)" >> /tmp/noping;
fi;
sleep 5
done;
&
The outage is not consistent. However, it does appear in close proximity. Example all outages were between 3:00 pm and 5:00 pm.
The noping file suggests that the outage is never more than 2 or 3 minutes. And, all the outages put together, they don't add up to more than 3 to 5 instances in a narrow window.
There are days that I have no outages.
And, yet, the impression from my wife and my own observation is that there are many times our phone struggles to serve content. And, we experience this throughout the day. All out devices are on wireless and the Access points are on gigabit Ethernet and configured with fast roam. And same for the laptop. We often disable / enable to get out connections and I suspect this got ingrained from early days of windows power cycle.
I have a two vlan network. One for IOT devices, guests and TV etc. The second is for residents surfing, zooming, accessing file shares (SMB).
I will try to debug/study and report back my discovery. Thank you for the support.
When you have the outages, are you able to still connect to the router itself?
It would be good to see if there is an issue with the upstream network, but not your router... if your ISP uses DHCP to provide an address to your router, we could check the logs to see if the DHCP lease failed to renew.
Also, while you are at it, take a look at the DHCP lease time that the ISP issues... if the outage occurs when it is set to expire, that's a clear sign that things are not renewing.