PPPoE disconnects every few hours

Hi!

I have 19.07.2 on Netgear WNDR 3700v2 connected to a FTTH-ethernet adapter, using PPPoE.
The WAN devide is a VLAN ( eth1.3902 ).
It works fine, except once or twice per day it disconnects. (or sometimes more often)
The log shows each time something like this:

Mon Apr 20 02:28:02 2020 daemon.notice pppd[5025]: Modem hangup
Mon Apr 20 02:28:02 2020 daemon.info pppd[5025]: Connect time 361.2 minutes.
Mon Apr 20 02:28:02 2020 daemon.info pppd[5025]: Sent 29699099 bytes, received 741119006 bytes.
Mon Apr 20 02:28:02 2020 daemon.notice netifd: Network device 'pppoe-wan' link is down
Mon Apr 20 02:28:02 2020 daemon.notice pppd[5025]: Connection terminated.
Mon Apr 20 02:28:02 2020 daemon.info pppd[5025]: Sent PADT
Mon Apr 20 02:28:02 2020 daemon.notice netifd: Interface 'wan' has lost the connection
Mon Apr 20 02:28:02 2020 daemon.warn dnsmasq[1565]: no servers found in /tmp/resolv.conf.auto, will retry
Mon Apr 20 02:28:02 2020 daemon.info pppd[5025]: Exit.
Mon Apr 20 02:28:03 2020 daemon.notice netifd: Interface 'wan' is now down
Mon Apr 20 02:28:03 2020 daemon.notice netifd: Interface 'wan' is disabled
Mon Apr 20 02:28:03 2020 daemon.notice netifd: Interface 'wan' is enabled
Mon Apr 20 02:28:03 2020 daemon.notice netifd: Interface 'wan' is setting up now
Mon Apr 20 02:28:03 2020 daemon.err insmod: module is already loaded - slhc
Mon Apr 20 02:28:03 2020 daemon.err insmod: module is already loaded - ppp_generic
Mon Apr 20 02:28:03 2020 daemon.err insmod: module is already loaded - pppox
Mon Apr 20 02:28:03 2020 daemon.err insmod: module is already loaded - pppoe
Mon Apr 20 02:28:03 2020 daemon.info pppd[5475]: Plugin rp-pppoe.so loaded.
Mon Apr 20 02:28:03 2020 daemon.info pppd[5475]: RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7
Mon Apr 20 02:28:03 2020 daemon.notice pppd[5475]: pppd 2.4.7 started by root, uid 0
Mon Apr 20 02:28:03 2020 daemon.info pppd[5475]: PPP session is 1
Mon Apr 20 02:28:03 2020 daemon.warn pppd[5475]: Connected to a4:XX:XX:XX:XX:44 via interface eth1.3902
Mon Apr 20 02:28:03 2020 kern.info kernel: [32285.373578] pppoe-wan: renamed from ppp0
Mon Apr 20 02:28:03 2020 daemon.info pppd[5475]: Renamed interface ppp0 to pppoe-wan
Mon Apr 20 02:28:03 2020 daemon.info pppd[5475]: Using interface pppoe-wan
Mon Apr 20 02:28:03 2020 daemon.notice pppd[5475]: Connect: pppoe-wan <--> eth1.3902
Mon Apr 20 02:28:06 2020 daemon.info pppd[5475]: CHAP authentication succeeded: CHAP authentication success
Mon Apr 20 02:28:06 2020 daemon.notice pppd[5475]: CHAP authentication succeeded
Mon Apr 20 02:28:06 2020 daemon.notice pppd[5475]: peer from calling number A4:XX:XX:XX:XX:44 authorized
Mon Apr 20 02:28:06 2020 daemon.notice pppd[5475]: local  IP address 46.164.XX.XX
Mon Apr 20 02:28:06 2020 daemon.notice pppd[5475]: remote IP address 94.127.XX.XX
Mon Apr 20 02:28:06 2020 daemon.notice pppd[5475]: primary   DNS address 212.18.32.10
Mon Apr 20 02:28:06 2020 daemon.notice pppd[5475]: secondary DNS address 212.18.32.12
Mon Apr 20 02:28:06 2020 daemon.notice netifd: Network device 'pppoe-wan' link is up
Mon Apr 20 02:28:06 2020 daemon.notice netifd: Interface 'wan' is now up
Mon Apr 20 02:28:06 2020 daemon.info dnsmasq[1565]: reading /tmp/resolv.conf.auto
Mon Apr 20 02:28:06 2020 daemon.info dnsmasq[1565]: using local addresses only for domain test
Mon Apr 20 02:28:06 2020 daemon.info dnsmasq[1565]: using local addresses only for domain onion
Mon Apr 20 02:28:06 2020 daemon.info dnsmasq[1565]: using local addresses only for domain localhost
Mon Apr 20 02:28:06 2020 daemon.info dnsmasq[1565]: using local addresses only for domain local
Mon Apr 20 02:28:06 2020 daemon.info dnsmasq[1565]: using local addresses only for domain invalid
Mon Apr 20 02:28:06 2020 daemon.info dnsmasq[1565]: using local addresses only for domain bind
Mon Apr 20 02:28:06 2020 daemon.info dnsmasq[1565]: using local addresses only for domain lan
Mon Apr 20 02:28:06 2020 daemon.info dnsmasq[1565]: using nameserver 212.18.32.10#53
Mon Apr 20 02:28:06 2020 daemon.info dnsmasq[1565]: using nameserver 212.18.32.12#53
Mon Apr 20 02:28:06 2020 user.notice firewall: Reloading firewall due to ifup of wan (pppoe-wan)

How to proceed? Enable debug log for PPP?

When I use the providers router instead of mine, no problems arise. (the connection is stable)

Regards,
David

Yes, enable the PPP debug log. To do so, run this command over ssh:

echo debug >> /etc/ppp/options

I am not sure if there is a way to do this via the web GUI (Luci).

1 Like

I enabled debug (can verify from logs that debug lines are logged), but on next disconnect, nothing was logged:

Tue Apr 21 10:04:32 2020 daemon.notice pppd[6515]: Modem hangup
Tue Apr 21 10:04:32 2020 daemon.info pppd[6515]: Connect time 1094.4 minutes.
Tue Apr 21 10:04:32 2020 daemon.info pppd[6515]: Sent 612360704 bytes, received 3423392867 bytes.
Tue Apr 21 10:04:32 2020 daemon.notice netifd: Network device 'pppoe-wan' link is down
Tue Apr 21 10:04:32 2020 daemon.debug pppd[6515]: Script /lib/netifd/ppp-down started (pid 7655)
Tue Apr 21 10:04:32 2020 daemon.notice pppd[6515]: Connection terminated.
Tue Apr 21 10:04:32 2020 daemon.debug pppd[6515]: Send PPPOE Discovery V1T1 PADT session 0x1 length 28
Tue Apr 21 10:04:32 2020 daemon.debug pppd[6515]:  dst a4:xx:xx:xx:xx:44  src c4:xx:xx:xx:xx:ed
Tue Apr 21 10:04:32 2020 daemon.debug pppd[6515]:  [host-uniq  00 00 19 73] [AC-cookie  ed 89 xx xx xx xx xx xx xx xx xx xx xx xx xx 75]
Tue Apr 21 10:04:32 2020 daemon.info pppd[6515]: Sent PADT
Tue Apr 21 10:04:32 2020 daemon.debug pppd[6515]: Waiting for 1 child processes...
Tue Apr 21 10:04:32 2020 daemon.debug pppd[6515]:   script /lib/netifd/ppp-down, pid 7655
Tue Apr 21 10:04:32 2020 daemon.debug pppd[6515]: Script /lib/netifd/ppp-down finished (pid 7655), status = 0x1
Tue Apr 21 10:04:32 2020 daemon.info pppd[6515]: Exit.

after that the next connection is logged (successful, until it breaks later in the same way)

What do the ~10 lines before this line show? Anything like PADT received?

No, nothing. The last line is totally unrelated:

Tue Apr 21 09:55:28 2020 authpriv.info dropbear[7623]: Exit (root): Exited normally
Tue Apr 21 10:04:32 2020 daemon.notice pppd[6515]: Modem hangup

1 Like

Some more details...
The ISP is A1.si
The owner of the FTTH cable/fiber is telekom.si

The fiber ends in a Innbox G21 device from where a regular ethernet cable goes to an Ubee TVW625 router.
I replaced that router with mine.

I started "tcpdump -i eth1.3902 pppoed" and it captured this when a disconnect happened:

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1.3902, link-type EN10MB (Ethernet), capture size 262144 bytes
11:44:09.749322 PPPoE PADO [Service-Name] [AC-Name "SIMB_TABOR_BNG1"] [Host-Uniq 0x44**************************************************AAAAAAAA] [AC-Cookie ".5b.***********v"]
11:44:09.754297 PPPoE PADS [ses 0x1] [Service-Name] [Host-Uniq 0x44**************************************************AAAAAAAA]
11:44:13.115312 PPPoE PADT [ses 0x1]
11:44:13.126422 PPPoE PADT [ses 0x1] [Host-Uniq 0x00002A04] [AC-Cookie 0xED****************************75]
tcpdump: pcap_loop: The interface went down
4 packets captured
22 packets received by filter
0 packets dropped by kernel

The logs:

Thu Apr 23 11:44:13 2020 daemon.notice pppd[10756]: Modem hangup

Also, the stats on eth1.3902 gets reset:

before: ifconfig eth1.3902

eth1.3902 Link encap:Ethernet  HWaddr C4:XX:XX:XX:XX:ED
          inet6 addr: fe80::c6xx:xxff:fexx:xxed/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:682037 errors:0 dropped:0 overruns:0 frame:0
          TX packets:91218 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:373700490 (356.3 MiB)  TX bytes:26176360 (24.9 MiB)

after:

eth1.3902 Link encap:Ethernet  HWaddr C4:XX:XX:XX:XX:ED
          inet6 addr: fe80::c6xx:xxff:fexx:xxed/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:255930 errors:0 dropped:0 overruns:0 frame:0
          TX packets:42015 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:126494373 (120.6 MiB)  TX bytes:25853728 (24.6 MiB)

Mmmh, a PADT is a termination message, so which side was sending this, your CPE to the Upstream PPPoE-server, or the server to your modem?

For sniffing Layer 2 protocols, like PPPoE, please use the "-e" option, so that tcpdump shows MAC addresses.

Not sure. If pppd sent it, would that not be logged? I have the debug option specified and it logs other ppp packets.

I will try next time tcpdump with the -e option.

OK, it happened again, this time tcpdump had the -e option:

OK, more data:
I started tcpdump with the -e option as suggested on openwrt forum: tcpdump -e -v -i eth1.3902 pppoed

on disconnect, this was logged (times are UTC):

tcpdump: listening on eth1.3902, link-type EN10MB (Ethernet), capture size 262144 bytes
18:23:09.204991 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39 (oui Unknown), ethertype PPPoE D (0x8863), length 97: PPPoE PADO [Service-Name] [AC-Name "SIMB_TABOR_BNG1"] [Host-Uniq 0x44************long_number******************************AA] [AC-Cookie ".5b************v"]
18:23:09.209905 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39 (oui Unknown), ethertype PPPoE D (0x8863), length 58: PPPoE PADS [ses 0x1] [Service-Name] [Host-Uniq 0x44************long_number******************************AA]
tcpdump: pcap_loop: The interface went down
2 packets captured
4 packets received by filter
0 packets dropped by kernel

ppp logs (excerpt):

11:44:13 2020 daemon.debug pppd[10756]: Send PPPOE Discovery V1T1 PADT session 0x1 length 28
11:44:13 2020 daemon.debug pppd[10756]:  dst a4:7b:2c:9e:c7:44  src c4:xx:xx:xx:xx:ed

11:44:13 2020 daemon.debug pppd[11381]: Send PPPOE Discovery V1T1 PADI session 0x0 length 12
11:44:13 2020 daemon.debug pppd[11381]:  dst ff:ff:ff:ff:ff:ff  src c4:xx:xx:xx:xx:ed

11:44:13 2020 daemon.debug pppd[11381]: Recv PPPOE Discovery V1T1 PADO session 0x0 length 51
11:44:13 2020 daemon.debug pppd[11381]:  dst c4:xx:xx:xx:xx:ed  src a4:7b:2c:9e:c7:44


11:44:13 2020 daemon.debug pppd[11381]: Send PPPOE Discovery V1T1 PADR session 0x0 length 32
11:44:13 2020 daemon.debug pppd[11381]:  dst a4:7b:2c:9e:c7:44  src c4:xx:xx:xx:xx:ed

11:44:13 2020 daemon.debug pppd[11381]: Recv PPPOE Discovery V1T1 PADS session 0x1 length 1211:44:13 2020 daemon.debug pppd[11381]:  dst c4:xx:xx:xx:xx:ed  src a4:7b:2c:9e:c7:44

11:44:13 2020 daemon.warn pppd[11381]: Connected to a4:7b:2c:9e:c7:44 via interface eth1.3902



18:23:12 2020 daemon.notice pppd[11381]: Modem hangup

Actually I think it might be interesting to capture not just pppoed packets, but all packets that clearly do not encapsulate IPv4 or IPv6. This would be (tested only without VLANs):

tcpdump -evpni eth1.3902 'not ether[0x14:2] = 0x21 and not ether[0x14:2] = 0x57'

Note: this also captures LCP echo messages, might be a bit too noisy. Decide yourself whether you want the comfort of knowing that they are sent and received well just before the disconnect, or if you want to filter them out. Add and not (ether[0x14:2] = 0xc021 and (ether[0x16] = 0x09 or ether[0x16] = 0x0a)) if you want to suppress them too.

I had a disconnect again, this time I captured the traffic with: tcpdump -evpni eth1.3902 'not ether[0x14:2] = 0x21 and not ether[0x14:2] = 0x57' as suggested.

10:42:01.795885 c4:xx:xx:xx:xx:ed > a4:xx:xx:xx:xx:44, ethertype PPPoE S (0x8864), length 30: PPPoE  [ses 0x1] LCP (0xc021), length 10: LCP, Echo-Request (0x09), id 20, length 10
        encoded length 8 (=Option(s) length 4)
          Magic-Num 0x277f6119
10:42:01.807239 a4:xx:xx:xx:xx:44 > c4:xx:xx:xx:xx:ed, ethertype PPPoE S (0x8864), length 56: PPPoE  [ses 0x1] LCP (0xc021), length 10: LCP, Echo-Reply (0x0a), id 20, length 10
        encoded length 8 (=Option(s) length 4)
          Magic-Num 0x631aa5be
10:42:04.799255 c4:xx:xx:xx:xx:ed > a4:xx:xx:xx:xx:44, ethertype PPPoE S (0x8864), length 30: PPPoE  [ses 0x1] LCP (0xc021), length 10: LCP, Echo-Request (0x09), id 21, length 10
        encoded length 8 (=Option(s) length 4)
          Magic-Num 0x277f6119
10:42:04.809234 a4:xx:xx:xx:xx:44 > c4:xx:xx:xx:xx:ed, ethertype PPPoE S (0x8864), length 56: PPPoE  [ses 0x1] LCP (0xc021), length 10: LCP, Echo-Reply (0x0a), id 21, length 10
        encoded length 8 (=Option(s) length 4)
          Magic-Num 0x631aa5be
10:42:06.801533 c4:xx:xx:xx:xx:ed > a4:xx:xx:xx:xx:44, ethertype PPPoE S (0x8864), length 30: PPPoE  [ses 0x1] LCP (0xc021), length 10: LCP, Echo-Request (0x09), id 22, length 10
        encoded length 8 (=Option(s) length 4)
          Magic-Num 0x277f6119
10:42:06.807236 a4:xx:xx:xx:xx:44 > c4:xx:xx:xx:xx:ed, ethertype PPPoE S (0x8864), length 56: PPPoE  [ses 0x1] LCP (0xc021), length 10: LCP, Echo-Reply (0x0a), id 22, length 10
        encoded length 8 (=Option(s) length 4)
          Magic-Num 0x631aa5be
10:42:08.803817 c4:xx:xx:xx:xx:ed > a4:xx:xx:xx:xx:44, ethertype PPPoE S (0x8864), length 30: PPPoE  [ses 0x1] LCP (0xc021), length 10: LCP, Echo-Request (0x09), id 23, length 10
        encoded length 8 (=Option(s) length 4)
          Magic-Num 0x277f6119
10:42:08.817232 a4:xx:xx:xx:xx:44 > c4:xx:xx:xx:xx:ed, ethertype PPPoE S (0x8864), length 56: PPPoE  [ses 0x1] LCP (0xc021), length 10: LCP, Echo-Reply (0x0a), id 23, length 10
        encoded length 8 (=Option(s) length 4)
          Magic-Num 0x631aa5be
tcpdump: pcap_loop: The interface went down
52753 packets captured
52760 packets received by filter
0 packets dropped by kernel

ppp log:

Fri Apr 24 10:42:32 2020 daemon.notice pppd[11986]: Modem hangup
Fri Apr 24 10:42:32 2020 daemon.info pppd[11986]: Connect time 979.3 minutes.
Fri Apr 24 10:42:32 2020 daemon.info pppd[11986]: Sent 325423487 bytes, received 1643504692 bytes.

It seems echo requests come more often towards the end, almost every second. Earlier they are also irregular, but come at 30-60 second intervals.

I have the same ISP and the same type of home connection that you do. But a different modem device. I'm also getting regular PPPoE disconnects. Sometimes it's stable for days, weeks and sometimes it will reset every 20m-4h or so.

Maybe it's not a software issue, but an upstream ISP issue?

Of course that is possible. That is why I opened this thread: to find out the cause (and possibly find a solution)

I noticed that with the ISP router there is no such problem (I run ping all the time, with my router, I see ping failures for at least 10 seconds each time it disconnects, while with ISP router it never fails)

Maybe they don't support custom routers.

So what we see here is that LCP echo requests appear approximately once in two seconds, but there are 24 seconds between the last request/reply pair and the disconnection. The echo requests are generated locally, so no, this is not an ISP issue, we have just confirmed it is something on your end. So, please: cat /etc/config/network, let's see if there are any dangerous or missing options there.

# cat /etc/config/network

config interface 'loopback'
        option ifname 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fd84:d068:c6a3::/48'

config interface 'lan'
        option type 'bridge'
        option ifname 'eth0'
        option proto 'static'
        option ipaddr '192.168.1.1'
        option netmask '255.255.255.0'
        option ip6assign '60'

config device 'lan_eth0_dev'
        option name 'eth0'
        option macaddr 'c6:**:**:**:**:ec'

config interface 'wan'
        option proto 'pppoe'
        option password '********'
        option ipv6 'auto'
        option username '*********'
        option ifname 'eth1.3902'

config interface 'wan6'
        option ifname 'eth1'
        option proto 'dhcpv6'

config switch
        option name 'switch0'
        option reset '1'
        option enable_vlan '1'
        option blinkrate '2'

config switch_vlan
        option device 'switch0'
        option vlan '1'
        option ports '0 1 2 3 5'

config switch_port
        option device 'switch0'
        option port '1'
        option led '6'

config switch_port
        option device 'switch0'
        option port '2'
        option led '9'

config switch_port
        option device 'switch0'
        option port '5'
        option led '2'

It is a fresh install of OpenWRT 19.07.2, I just set a root password and set up the PPPoE (entering: username, password, net device)

wan6 looks wrong, but not sure if it is relevant (i.e. whether the DHCPv6 code can bring the interface down, thus interfering with PPPoE). Still, please edit it as follows:

config interface 'wan6'
	option ifname '@wan'
	option proto 'dhcpv6'

... so that it does the DHCPv6 thingy on pppoe-wan (via the symbolic reference), not on eth1.

Here is my config from a Buffalo WBMR-HP-G300H, where I have to fake the MAC address:

config interface 'dsl0'
	option ifname 'dsl0'
	option proto 'none'
	option macaddr '00:24:a5:bd:40:21'

config interface 'wan'
	option ifname '@dsl0'
	option proto 'pppoe'
	option ipv6 '1'
	option username '....'
	option keepalive '60 5'
	option password '....'

config interface 'wan6'
	option ifname '@wan'
	option proto 'dhcpv6'

Yes, dsl0 is a real interface here (because of the built-in ADSL modem), you'll have to use ifname eth1.3902 in the first block.

is proto dhcpv6 OK?
When it is actually PPPoE? (actually there is no IPv6 service at all)

Anyway, I did as you suggested. Followed by /etc/init.d/network reload

Yes, it's OK, and it works here. DHCPv6, when deployed on DSL, is layered upon an existing PPPoE link, where PPP only configured link-local addresses. If you are sure that there is no IPv6, just delete the wan6 interface. You can always recreate it later when your ISP upgrades the infrastructure.