Regarding the dhcp fw rules: Same here ...
It's not clear which one is this. I am referring to the IP you are using on your OpenWrt as gateway in the wan interface and is default gatway for your network.
The problem occurred again few minutes ago. The WAN interface was connected for 2d 1h 9m 16s
and the lease was expiring in 4 days (4d 22h 50m 44s
).
-
dmesg
:
[46022.627061] ieee80211 phy0: Mac80211 start BA a4:45:19:19:2d:af
[46607.075744] ieee80211 phy0: radar detected by firmware
[46607.623240] ieee80211 phy0: channel switch is done
[46607.628058] ieee80211 phy0: change: 0x60
[46608.117127] ieee80211 phy1: Mac80211 start BA a4:45:19:19:2d:af
[46612.247855] ieee80211 phy0: Mac80211 start BA a4:45:19:19:2d:af
[46612.282072] ieee80211 phy0: Mac80211 start BA a4:45:19:19:2d:af
[46612.342072] ieee80211 phy0: Mac80211 start BA a4:45:19:19:2d:af
[46612.402062] ieee80211 phy0: Mac80211 start BA a4:45:19:19:2d:af
[46612.462065] ieee80211 phy0: Mac80211 start BA a4:45:19:19:2d:af
[46612.522065] ieee80211 phy0: Mac80211 start BA a4:45:19:19:2d:af
[46612.582069] ieee80211 phy0: Mac80211 start BA a4:45:19:19:2d:af
[46613.368079] ieee80211 phy0: Mac80211 start BA a4:45:19:19:2d:af
[46715.662384] ieee80211 phy1: Mac80211 start BA a4:45:19:19:2d:af
[53815.348931] ieee80211 phy1: Mac80211 start BA a4:45:19:19:2d:af
[64732.883841] ieee80211 phy0: Mac80211 start BA a4:45:19:19:2d:af
[77440.893488] ieee80211 phy0: Mac80211 start BA a4:45:19:19:2d:af
[110682.524963] ieee80211 phy0: Mac80211 start BA a4:45:19:19:2d:af
[132533.645448] ieee80211 phy1: Mac80211 start BA a4:45:19:19:2d:af
-
logread
. For reference I lost connection few minutes before May 28 13:00
Thu May 28 10:17:08 2020 authpriv.info dropbear[9975]: Child connection from 192.168.0.160:50369
Thu May 28 10:17:11 2020 authpriv.notice dropbear[9975]: Password auth succeeded for 'myuser' from 192.168.0.160:50369
Thu May 28 10:17:50 2020 authpriv.notice sudo: myuser : TTY=pts/0 ; PWD=/home/myuser ; USER=root ; COMMAND=/bin/ash
Thu May 28 10:39:32 2020 daemon.err uhttpd[1764]: luci: accepted login on / for root from 192.168.0.160
Thu May 28 10:49:13 2020 authpriv.info dropbear[9975]: Exit (myuser): Disconnect received
Thu May 28 12:59:05 2020 daemon.err uhttpd[1764]: luci: accepted login on / for root from 192.168.0.160
Thu May 28 12:59:43 2020 authpriv.info dropbear[10602]: Child connection from 192.168.0.160:54109
Thu May 28 12:59:46 2020 authpriv.notice dropbear[10602]: Password auth succeeded for 'myuser' from 192.168.0.160:54109
Thu May 28 12:59:52 2020 authpriv.notice sudo: myuser : TTY=pts/0 ; PWD=/home/myuser ; USER=root ; COMMAND=/bin/ash
- ping WAN IP: SUCCESS
- ping ISP modem 'administration' IP: SUCCESS
- ping default Gateway: FAILED (Note: I can't ping even when my connection is working.)
- ping ISP DNS servers: FAILED
- ping Google DNS: FAILED
- ping Quad9 DNS: FAILED
- ip route: routes are ok with the correct default gateway
- netstat:
- the only established connection is for my SSH session.
- remaining are ports listening.
- no connection to outside the LAN
- ifstatus wan: the only difference with this morning is the uptime
- arp:
- the arp content is indentical to the one I got earlier this morning.
- the arp for the default gateway is present.
Once I collected all that information I ran ifup wan
. The DHCP lease expiration went back to 7 days but I still could not connect outside the LAN.
This time I didn't reboot the router. I only restarted the WAN interface and the connection to internet started to work again.
Let me know if there are other information I should collect.
Here is the config of my wan interface:
Regading my last post, the *194 is the inside interface, the 193 the ouside interface - when i loose connection to the internet (no ping 1.1.1.1), i can ping both - the *194 and the *193 ip!
Then it is not dhcp issue.
This makes troubleshooting more difficult.
Last resort, run a tcpdump on the wan interface when the problem reoccurs:
tcpdump -i eth1.2 -ev
Try to ping, browse, to see if traffic is going out.
Try to traceroute or mtr 1.1.1.1 to see where it stops.
However if you can reach your ISP then it might as well not be your problem.
hmmm... was that via;
ip neigh show | grep REACHABLE
you can trigger a tcpdump or your ping scripts in the connection down function
#!/bin/bash
VERBOSE=1
LOOPDELAY=3
BACKOFFUP=3
BACKOFFSTALE=1
STALEMAX="3"
WANREPAIRMAX=10000 #WANREPAIRMAX=10
###############
STALECOUNT="0"
DOWNCOUNT="0"
CHECKCOUNT="0"
WANREPAIRCOUNT=0
WANREPAIRSTATE="INIT"
WANSTATUS=
GWIP="`ip route | grep default | cut -d' ' -f3`"
if [ ! -z "$GWIP" ]; then WANSTATUS="UP"; else WANSTATUS="DOWN"; fi
WANIF=$(uci show network | grep 'network.wan.ifname' | cut -d"'" -f2)
[ -z "$WANIF" ] && echo "unknown WANIF" && exit 0
WANIP=$(ip route show | grep default | cut -d' ' -f9)
WANIFAUTO="`ip neigh show | grep -w "$GWIP" | cut -d' ' -f3`"
echo " WANIF: $WANIF@$WANIP>$GWIP"
sleep 2
getgwip() {
GWIP="`ip route | grep default | cut -d' ' -f3`"
if [ ! -z "$GWIP" ]; then WANSTATUS="UP"; else WANSTATUS="DOWN"; fi
}
checkup() {
CHECKCOUNT=$[$CHECKCOUNT + 1]
echo -n "CHECKUP:"
if ping -c 2 -w 3 "${GWIP}" &>/dev/null; then
#echo " [ok]";
WANSTATUS="UP"; return 0
else
#echo " [down]";
WANSTATUS="DOWN"; return 1
fi
}
connectiondown() {
if [ "$WANREPAIRSTATE" = "FIXING" ]; then
echo "connection still down"
else
logger -t wan-mon "${WANIF} DOWN:`date +%Y%m%d-%H%M` ${DOWNCOUNT}"
/etc/init.d/network restart; sleep 7
logger -t wan-mon "${WANIF} `date +%Y%m%d-%H%M` RESTARTING NETWORK..."
WANREPAIRCOUNT=$[$WANREPAIRCOUNT + 1]
WANREPAIRSTATE="FIXING" #REPAIR FAILED WAIT
fi
WANSTATE="DOWN"
showstats
}
connectionup() {
logger -t wan-mon "${WANIF} UP:`date +%Y%m%d-%H%M` ${DOWNCOUNT}"
}
showstats() {
echo -n "${WANRESULT:-LEARNING}"
echo -n ":::"
echo -n "$WANIF@$WANIP>$GWIP [$WANSTATUS]"
echo -n " STALE[$STALECOUNT]"
echo -n " CHECK[$CHECKCOUNT]"
echo -n " DOWN[$DOWNCOUNT]"
echo -n " WANREPAIR[$WANREPAIRCOUNT]"
echo -n " REPAIRSTATE[$WANREPAIRSTATE]"
echo ""
}
while true; do
if (("$WANREPAIRCOUNT" > "$WANREPAIRMAX")); then
echo "ENDING SCRIPT ::: WANREPAIRCOUNT: $WANREPAIRCOUNT > WANREPAIRMAX: $WANREPAIRMAX"
exit 21
fi
getgwip
if [ -z "$GWIP" ]; then
connectiondown
else
WANSTATE="`ip neigh show dev $WANIF | grep -w "$GWIP"`"
case $WANSTATE in
*"FAILED"*)
WANRESULT="FAILED"
WANSTATUS="DOWN"
if [ "$WANREPAIRSTATE" = "FIXING" ]; then
echo "hmmm... fixing but still down"
fi
;;
*"STALE"*)
WANRESULT="STALE"
STALECOUNT=$[$STALECOUNT + 1]
;;
*"REACHABLE"*)
WANRESULT="REACHABLE"
STALECOUNT=0
if [ "$WANREPAIRSTATE" = "FIXING" ]; then
WANREPAIRSTATE="REPAIRED"
logger -t wan-mon "${WANIF} UP:`date +%Y%m%d-%H%M` ${DOWNCOUNT} NETWORK RESTORED..."
fi
sleep $BACKOFFUP
;;
*"DELAY"*)
WANRESULT="DELAY"
;;
*)
WANRESULT="UNKNOWN"
WANSTATUS="DOWN"
;;
esac
showstats
if (("$VERBOSE" > "0")); then
: #logger -t wan-mon-stat "`showstats`";
fi
if (("$STALECOUNT" > "$STALEMAX")); then
if checkup; then
STALECOUNT=0
else
DOWNCOUNT=$[$DOWNCOUNT + 1]
connectiondown
fi
fi
if [ "$WANSTATUS" == "DOWN" ]; then
echo "STATE is DOWN at bottom loop"; sleep 1
if checkup; then
STALECOUNT=0
else
DOWNCOUNT=$[$DOWNCOUNT + 1]
connectiondown
fi
fi
if (("$STALECOUNT" < "$STALEMAX")); then sleep $BACKOFFSTALE; fi
fi
sleep $LOOPDELAY
done
echo "Exiting"
exit 0
Unfortunately, no. I simply ran arp
command and checked that the gateway's IP was there. I'll use the command you provided for the next time.
I'll try to check with my ISP if there is a reason why I cannot ping the gateway. In the mean time I'll add the tcpdump
command to my script.
Here you wrote, that you do not have internet access.
Here you wrote, that you do have internet access.
Both can't be true. Either you have internet access or you do not have internet access.
Please notice, that it can take up to 30 seconds to assign the wan interface a new IP.
The next time the failure appears, check the wan IP before you run ifup wan, then run ifup wan, wait 1 minute and check the wan IP again. If the old and new IP are different, it verifies my thesis of my earlier post.
However it's exactly what is happening every time. Just before 13h00 I lost internet connection. By no internet connection I mean that the computer I was using (macbook pro, wired connection) couldn't reach any website in a browser. I couldn't connect to any website (I use www.perdu.com for test) or ping public DNS servers. I couldn't connect outside the LAN, thus no internet connection.
As I mentioned previously all LAN connections were working. From the router I could ping my public IPv4 address (82.64.x.x) on the WAN interface. I still couldn't reach the internet. The IP is static. By static I mean that my ISP always give me the same IPv4. A static DHCP lease.
The WAN interface still had the 82.64.x.x IP when I executed ifup wan
. The lease was renewed immediately and I kept the same 82.64.x.x IP. My internet connection was restored only after restarting the WAN interface. The modem was up the whole time and it didn't require any action.
The next time it happens I will also test few IPv6 addresses to see if I get the same thing. I'd like to see if both WAN and WAN6 interfaces are impacted. I'll also get the output of ip a
to have all the IP assigned on the router. I'll post back when I have updates.
Finally, just to be sure I'll contact my ISP again tomorrow to ask them if they see anything suspicious on my line or the modem itself.
This falsifies my suspicion. My assumption was, that the modem changes the IP and the linksys router operates with the old, invalid IP.
When you always get the same IP from the modem, the ifup wan doesn't cause anything.
Restarting the wan interface from LuCi is the same as running ifup wan from a ssh command line.
So far I have no further idea. For me your problem is located either in the modem or with your ISP.
If your problem does not exist, when you operate your modem in router mode, then I suspect the bridge mode in your modem as cause for your troubles.
It happened again today. Same thing as yesterday: no 'connection' outside the LAN. Unfortunately nothing new in logread
and dmesg
.
But I got something interesting this time. IPv6 is still working when it happens. For example I could not ping the IPv4 for forum.openwrt.org (139.59.210.197
), but it works if I use IPv6 2a03:b0c0:3:d0::168b:9001
.
Same for other destinations. For example I can ping AdGuard DNS IPv6 (2a00:5a60::ad2:ff
), but not it's IPv4 (176.103.130.131
). Same thing with Google and Quad9 DNS servers.
It's clear that the internet connection is still working but some traffic doesn't work.
This time I didn't have time to run tcpdump
. My script was still doing ping
and traceroute
on some IPs when things started to work again. This time I didn't restart anything. I would say it took about 10-15 minutes for connection to work again. Next time I'll start tcpdump
first and let it run in background while other tests are running.
Edit: Here when I mention tcpdump
it is the command suggested previously to only capture packages going out of wan (tcpdump -i eth1.2 -ev
)
That might choke your router. The example I gave earlier will capture all packets going out of wan.
Of course. I edited my last reply to clarify my 'intentions'. When I mention tcpdump
, I actually use the command you suggested (tcpdump -i eth1.2 -ev
). I'll also add the -w
flag to 'record' the dump so I can have a look at it after. Hopefully the next data collect will be usable.
This might be worse, as it will fill the flash.
Nothing to worry about: I have a SSD connected to the router with about 200Gb of free space. That should be enough
I'm trying to figure out if this is similar to my issue openwrt.org/t/replug-ethernet-everyday/55686/2
does unplugging the ethernet from your modem and plugging it back in reconnect the network to the internet?
I am also not able to connect to my modem.
And it's happening during high internet use on random days with no pattern.
I didn't try that. So far I either rebooted or restarted the network interface.
I'm back: a new occurrence of the problem today. Hopefully the data I collected will make sense.
-
ip neigh show all
: the default gateway for my internet access (82.64.x.x) shows as REACHABLE. -
ping IPv4 addresses: they all fail.
- my ISP DNS servers
- 9.9.9.9
- 8.8.8.8
- 1.1.1.1
-
ping IPv6 addresses: OK
- 2a00:5a60::ad1:ff
- 2a03:b0c0:3:d0::168b:9001:
-
I captured a tcpdump (
tcpdump -i eth1.2 -ev
). Unfortunately my skills here are limited and I don't know what I should look for exactly (44k lines)- I did a dumb
grep -c
on few strings that seems to appear the most. -
ACK
: 37219 -
TCP segment of a reassembled PDU
: 11190 -
Application Data
: 4724
- I did a dumb
-
I notice in the tcpdump that one IP appear more often than any other. Of the 44k likes, 31532 lines for
68.65.192.73
alone (CrashPlan). Indeed CrashPlan is running on my macbook pro. -
I can ping IPv6 addresses and not IPv4. I think it's safe to say that my internet connection is working, but only for IPv6 traffic.
-
Could it be a form of network congestion ? Like CrashPlan "flooding" my WAN connection ?