Pppoe intermittent disconnects on FTTH

I have been experiencing intermittent disconnects (usually happens between 2-150 minutes) with a pppoe connection with FTTH.

I enabled debug logging in the pppd options and it appears that my connection is terminated due to the ISP server sending me an LCP TermReq packet:

Here is an example:

Thu Oct  1 21:37:37 2020 daemon.debug pppd[24186]: rcvd [LCP TermReq id=0x5b] 07 38 00 00 00 00 00 00 01 98 00 00 01 da 2e 9e 3d 7f e2 4f 22 40
Thu Oct  1 21:37:37 2020 daemon.info pppd[24186]: LCP terminated by peer
Thu Oct  1 21:37:37 2020 daemon.info pppd[24186]: Connect time 32.8 minutes.
Thu Oct  1 21:37:37 2020 daemon.info pppd[24186]: Sent 6776715 bytes, received 66813993 bytes.
Thu Oct  1 21:37:37 2020 daemon.notice netifd: Network device 'pppoe-wan' link is down
Thu Oct  1 21:37:37 2020 daemon.debug dnsmasq[2575]: stopped listening on pppoe-wan(#66): 84.x.x.x port 53
Thu Oct  1 21:37:37 2020 daemon.debug pppd[24186]: Script /lib/netifd/ppp-down started (pid 24402)
Thu Oct  1 21:37:37 2020 daemon.debug pppd[24186]: sent [LCP TermAck id=0x5b]
Thu Oct  1 21:37:37 2020 daemon.notice netifd: Interface 'wan' has lost the connection
Thu Oct  1 21:37:37 2020 daemon.warn dnsmasq[2575]: no servers found in /tmp/resolv.conf.d/resolv.conf.auto, will retry
Thu Oct  1 21:37:37 2020 daemon.debug pppd[24186]: Script /lib/netifd/ppp-down finished (pid 24402), status = 0x1

I captured the pppoe data (and then all of the data) on the wan port waiting for this to reproduce, and this is the non IP packets:

$ /usr/sbin/tcpdump -e -r ./pppoe_dump_1.pcap  | tail
reading from file ./pppoe_dump_1.pcap, link-type EN10MB (Ethernet)
12:11:21.516842 00:02:3b:10:2e:b1 (oui Unknown) > 74:da:da:xx:xx:xx (oui Unknown), ethertype PPPoE S (0x8864), length 60: PPPoE  [ses 0x1a] LCP (0xc021), length 28: LCP, Echo-Reply (0x0a), id 93, length 28
12:11:22.215794 00:06:19:30:1c:ee (oui Unknown) > cf:00:00:00:00:00 (oui Unknown), ethertype Loopback (0x9000), length 64: Loopback, skipCount 0, Reply, receipt number 1, data (44 octets)
12:11:26.256096 00:06:19:30:1c:ee (oui Unknown) > cf:00:00:00:00:00 (oui Unknown), ethertype Loopback (0x9000), length 64: Loopback, skipCount 0, Reply, receipt number 1, data (44 octets)
12:11:26.519692 74:da:da:xx:xx:xx (oui Unknown) > 00:02:3b:10:2e:b1 (oui Unknown), ethertype PPPoE S (0x8864), length 30: PPPoE  [ses 0x1a] LCP (0xc021), length 10: LCP, Echo-Request (0x09), id 94, length 10
12:11:27.264920 00:06:19:30:1c:ee (oui Unknown) > cf:00:00:00:00:00 (oui Unknown), ethertype Loopback (0x9000), length 64: Loopback, skipCount 0, Reply, receipt number 1, data (44 octets)
12:11:28.274663 00:06:19:30:1c:ee (oui Unknown) > cf:00:00:00:00:00 (oui Unknown), ethertype Loopback (0x9000), length 64: Loopback, skipCount 0, Reply, receipt number 1, data (44 octets)
12:11:29.112870 00:02:3b:10:2e:b1 (oui Unknown) > 74:da:da:xx:xx:xx (oui Unknown), ethertype PPPoE S (0x8864), length 60: PPPoE  [ses 0x1a] LCP (0xc021), length 28: LCP, Term-Request (0x05), id 16, length 28
12:11:29.121857 74:da:da:xx:xx:xx (oui Unknown) > 00:02:3b:10:2e:b1 (oui Unknown), ethertype PPPoE S (0x8864), length 26: PPPoE  [ses 0x1a] LCP (0xc021), length 6: LCP, Term-Ack (0x06), id 16, length 6
12:11:29.285003 00:06:19:30:1c:ee (oui Unknown) > cf:00:00:00:00:00 (oui Unknown), ethertype Loopback (0x9000), length 64: Loopback, skipCount 0, Reply, receipt number 1, data (44 octets)
12:11:31.305106 00:06:19:30:1c:ee (oui Unknown) > cf:00:00:00:00:00 (oui Unknown), ethertype Loopback (0x9000), length 64: Loopback, skipCount 0, Reply, receipt number 1, data (44 octets)

My simple wan config is below:

config interface 'wan'
        option ifname 'wan'
        option proto 'pppoe'
        option username 'xxx@xx'
        option password 'xxxx'

I have not yet contacted my ISP because when I plug a Raspberry Pi 4 running Ubuntu 20.04.1 directly into the connection and use it as a pppoe client and router, the connection is stable. It stands to reason that something unique with my router is causing this.

I am using the a home built latest SNAPSHOT build on my router. This router is not yet supported officially, but I already have a PR to add support here: https://github.com/openwrt/openwrt/pull/3468 None of the code changes are related to pppd. It only has minor changes in partition layout, boot headers, firmware image signatures that the factory firmware accepts, etc.

I have tried enabling and disabling keepalive_adaptive in the network configuration as well as adjusting ping intervals and it did not help. I also tried building ppp with and without multilink (which didn't help either).

I am wondering what else I can try before contacting my ISP because I suspect this issue is addressable on my side. Also, I have my doubts whether they'll route my case to an engineer that knows what an LCP TermReq is.

Thanks!

Bump.

This appears to be an issue in the latest snapshot builds, as earlier snapshot builds do not have this issue.

I tried installing older builds on my router (manually patched), and they do not have this issue. As of now I've narrowed it down to:

Good: 17e64b9447959858c5c85f7f6c98264775585711
Bad: bf2870c1d9e162f27925e9c40eca9e0069096f3b

To put it another way, there are 63 possible commits:

$ git log --oneline 17e64b9447959858c5c85f7f6c98264775585711..bf2870c1d9e162f27925e9c40eca9e0069096f3b | wc -l
63

Thanks to luuhungit I guessed correctly to check at the beginning and end of 5.4.65, which narrowed it down. Many of these commits are related to my router's chipset (MT7621), so I'm guessing that the issue only occurs on these boards.

It seems I'm not the only one having these issues. See here and here.

Any help on this would be greatly appreciated. I'm out of ideas and won't have time to bisect this for a while.

I am using DIR-882 with MT7615N , my log output is different, I changed the LCP interval and threshold to 60 5 respectivley.

I have enabled debug logs for /etc/ppp/options but not getting any info on why I am getting modem hangup.

Tue Oct 13 04:41:57 2020 daemon.notice pppd[28784]: Modem hangup
Tue Oct 13 04:41:57 2020 daemon.info pppd[28784]: Connect time 16.2 minutes.
Tue Oct 13 04:41:57 2020 daemon.info pppd[28784]: Sent 2203993 bytes, received 79961744 bytes.
Tue Oct 13 04:41:57 2020 daemon.notice netifd: Network device 'pppoe-wan' link is down
Tue Oct 13 04:41:57 2020 daemon.debug dnsmasq[2212]: stopped listening on pppoe-wan(#23): 87.200.238.220 port 53
Tue Oct 13 04:41:57 2020 daemon.debug pppd[28784]: Script /lib/netifd/ppp-down started (pid 30028)
Tue Oct 13 04:41:57 2020 daemon.notice pppd[28784]: Connection terminated.
Tue Oct 13 04:41:57 2020 daemon.debug pppd[28784]: Send PPPOE Discovery V1T1 PADT session 0x1 length 28
Tue Oct 13 04:41:57 2020 daemon.debug pppd[28784]:  dst 00:00:5e:00:01:76  src 74:da:da:be:d1:8b
Tue Oct 13 04:41:57 2020 daemon.debug pppd[28784]:  [host-uniq  70 70 00 00] [AC-cookie  51 42 44 80 74 3b ae 1d 5f 04 f0 77 3e 08 03 11]
Tue Oct 13 04:41:57 2020 daemon.info pppd[28784]: Sent PADT
Tue Oct 13 04:41:57 2020 daemon.debug pppd[28784]: Waiting for 1 child processes...
Tue Oct 13 04:41:57 2020 daemon.debug pppd[28784]:   script /lib/netifd/ppp-down, pid 30028
Tue Oct 13 04:41:57 2020 daemon.notice netifd: Interface 'wan' has lost the connection
Tue Oct 13 04:41:57 2020 daemon.debug pppd[28784]: Script /lib/netifd/ppp-down finished (pid 30028), status = 0x1
Tue Oct 13 04:41:57 2020 daemon.info pppd[28784]: Exit.
Tue Oct 13 04:41:57 2020 daemon.notice netifd: Interface 'wan' is now down
Tue Oct 13 04:41:57 2020 daemon.notice netifd: Interface 'wan' is setting up now
Tue Oct 13 04:41:57 2020 daemon.err insmod: module is already loaded - slhc
Tue Oct 13 04:41:57 2020 daemon.err insmod: module is already loaded - ppp_generic
Tue Oct 13 04:41:57 2020 daemon.err insmod: module is already loaded - pppox
Tue Oct 13 04:41:57 2020 daemon.err insmod: module is already loaded - pppoe
Tue Oct 13 04:41:57 2020 daemon.info pppd[30065]: Plugin rp-pppoe.so loaded.
Tue Oct 13 04:41:57 2020 daemon.info pppd[30065]: RP-PPPoE plugin version 3.8p compiled against pppd 2.4.8
Tue Oct 13 04:41:57 2020 daemon.notice pppd[30065]: pppd 2.4.8 started by root, uid 0
Tue Oct 13 04:41:57 2020 daemon.debug pppd[30065]: Send PPPOE Discovery V1T1 PADI session 0x0 length 12
Tue Oct 13 04:41:57 2020 daemon.debug pppd[30065]:  dst ff:ff:ff:ff:ff:ff  src 74:da:da:be:d1:8b
Tue Oct 13 04:41:57 2020 daemon.debug pppd[30065]:  [service-name] [host-uniq  71 75 00 00]
Tue Oct 13 04:41:57 2020 daemon.debug pppd[30065]: Recv PPPOE Discovery V1T1 PADO session 0x0 length 58
Tue Oct 13 04:41:57 2020 daemon.debug pppd[30065]:  dst 74:da:da:be:d1:8b  src 00:00:5e:00:01:76
Tue Oct 13 04:41:57 2020 daemon.debug pppd[30065]:  [service-name] [AC-name DXB-KAZ-BNG-7750SR12-1] [host-uniq  71 75 00 00] [AC-cookie  51 42 44 80 74 3b ae 1d 5f 04 f0 77 3e 08 03 11]
Tue Oct 13 04:41:57 2020 daemon.debug pppd[30065]: Send PPPOE Discovery V1T1 PADR session 0x0 length 32
Tue Oct 13 04:41:57 2020 daemon.debug pppd[30065]:  dst 00:00:5e:00:01:76  src 74:da:da:be:d1:8b
Tue Oct 13 04:41:57 2020 daemon.debug pppd[30065]:  [service-name] [host-uniq  71 75 00 00] [AC-cookie  51 42 44 80 74 3b ae 1d 5f 04 f0 77 3e 08 03 11]
Tue Oct 13 04:41:57 2020 daemon.debug pppd[30065]: Recv PPPOE Discovery V1T1 PADS session 0x1 length 12
Tue Oct 13 04:41:57 2020 daemon.debug pppd[30065]:  dst 74:da:da:be:d1:8b  src 00:00:5e:00:01:76
Tue Oct 13 04:41:57 2020 daemon.debug pppd[30065]:  [service-name] [host-uniq  71 75 00 00]
Tue Oct 13 04:41:57 2020 daemon.debug pppd[30065]: PADS: Service-Name: ''
Tue Oct 13 04:41:57 2020 daemon.info pppd[30065]: PPP session is 1
Tue Oct 13 04:41:57 2020 daemon.warn pppd[30065]: Connected to 00:00:5e:00:01:76 via interface wan
Tue Oct 13 04:41:57 2020 daemon.debug pppd[30065]: using channel 14
Tue Oct 13 04:41:57 2020 kern.info kernel: [ 2461.875282] pppoe-wan: renamed from ppp0
Tue Oct 13 04:41:57 2020 daemon.info pppd[30065]: Renamed interface ppp0 to pppoe-wan
Tue Oct 13 04:41:57 2020 daemon.info pppd[30065]: Using interface pppoe-wan
Tue Oct 13 04:41:57 2020 daemon.notice pppd[30065]: Connect: pppoe-wan <--> wan
Tue Oct 13 04:41:57 2020 daemon.debug pppd[30065]: sent [LCP ConfReq id=0x1 <mru 1492> <magic 0x604cfbc5>]
Tue Oct 13 04:41:58 2020 daemon.info dnsmasq-dhcp[2212]: DHCPREQUEST(br-lan) 10.10.9.112 24:18:1d:a1:05:16
Tue Oct 13 04:41:58 2020 daemon.info dnsmasq-dhcp[2212]: DHCPACK(br-lan) 10.10.9.112 24:18:1d:a1:05:16 Galaxy-S8
Tue Oct 13 04:42:00 2020 daemon.debug pppd[30065]: sent [LCP ConfReq id=0x1 <mru 1492> <magic 0x604cfbc5>]
Tue Oct 13 04:42:00 2020 daemon.debug pppd[30065]: rcvd [LCP ConfAck id=0x1 <mru 1492> <magic 0x604cfbc5>]
Tue Oct 13 04:42:03 2020 daemon.debug pppd[30065]: sent [LCP ConfReq id=0x1 <mru 1492> <magic 0x604cfbc5>]
Tue Oct 13 04:42:06 2020 daemon.debug pppd[30065]: sent [LCP ConfReq id=0x1 <mru 1492> <magic 0x604cfbc5>]
Tue Oct 13 04:42:07 2020 daemon.debug pppd[30065]: rcvd [LCP ConfReq id=0xdc <mru 1492> <auth pap> <magic 0x270de922>]
Tue Oct 13 04:42:07 2020 daemon.debug pppd[30065]: sent [LCP ConfAck id=0xdc <mru 1492> <auth pap> <magic 0x270de922>]
Tue Oct 13 04:42:09 2020 daemon.debug pppd[30065]: sent [LCP ConfReq id=0x1 <mru 1492> <magic 0x604cfbc5>]
Tue Oct 13 04:42:12 2020 daemon.debug pppd[30065]: sent [LCP ConfReq id=0x1 <mru 1492> <magic 0x604cfbc5>]
Tue Oct 13 04:42:15 2020 daemon.debug pppd[30065]: sent [LCP ConfReq id=0x1 <mru 1492> <magic 0x604cfbc5>]
Tue Oct 13 04:42:16 2020 daemon.notice pppd[30065]: Modem hangup
Tue Oct 13 04:42:16 2020 daemon.notice pppd[30065]: Connection terminated.
Tue Oct 13 04:42:16 2020 daemon.debug pppd[30065]: Send PPPOE Discovery V1T1 PADT session 0x1 length 28
Tue Oct 13 04:42:16 2020 daemon.debug pppd[30065]:  dst 00:00:5e:00:01:76  src 74:da:da:be:d1:8b
Tue Oct 13 04:42:16 2020 daemon.debug pppd[30065]:  [host-uniq  71 75 00 00] [AC-cookie  51 42 44 80 74 3b ae 1d 5f 04 f0 77 3e 08 03 11]
Tue Oct 13 04:42:16 2020 daemon.info pppd[30065]: Sent PADT
Tue Oct 13 04:42:16 2020 daemon.info pppd[30065]: Exit.
Tue Oct 13 04:42:16 2020 daemon.notice netifd: Interface 'wan' is now down
Tue Oct 13 04:42:16 2020 daemon.notice netifd: Interface 'wan' is setting up now
Tue Oct 13 04:42:16 2020 daemon.err insmod: module is already loaded - slhc
Tue Oct 13 04:42:16 2020 daemon.err insmod: module is already loaded - ppp_generic
Tue Oct 13 04:42:16 2020 daemon.err insmod: module is already loaded - pppox
Tue Oct 13 04:42:16 2020 daemon.err insmod: module is already loaded - pppoe
Tue Oct 13 04:42:16 2020 daemon.info pppd[30249]: Plugin rp-pppoe.so loaded.
Tue Oct 13 04:42:16 2020 daemon.info pppd[30249]: RP-PPPoE plugin version 3.8p compiled against pppd 2.4.8
Tue Oct 13 04:42:16 2020 daemon.notice pppd[30249]: pppd 2.4.8 started by root, uid 0
Tue Oct 13 04:42:16 2020 daemon.debug pppd[30249]: Send PPPOE Discovery V1T1 PADI session 0x0 length 12
Tue Oct 13 04:42:16 2020 daemon.debug pppd[30249]:  dst ff:ff:ff:ff:ff:ff  src 74:da:da:be:d1:8b
Tue Oct 13 04:42:16 2020 daemon.debug pppd[30249]:  [service-name] [host-uniq  29 76 00 00]
Tue Oct 13 04:42:21 2020 daemon.debug pppd[30249]: Send PPPOE Discovery V1T1 PADI session 0x0 length 12
Tue Oct 13 04:42:21 2020 daemon.debug pppd[30249]:  dst ff:ff:ff:ff:ff:ff  src 74:da:da:be:d1:8b
Tue Oct 13 04:42:21 2020 daemon.debug pppd[30249]:  [service-name] [host-uniq  29 76 00 00]
Tue Oct 13 04:42:26 2020 daemon.debug pppd[30249]: Send PPPOE Discovery V1T1 PADI session 0x0 length 12
Tue Oct 13 04:42:26 2020 daemon.debug pppd[30249]:  dst ff:ff:ff:ff:ff:ff  src 74:da:da:be:d1:8b
Tue Oct 13 04:42:26 2020 daemon.debug pppd[30249]:  [service-name] [host-uniq  29 76 00 00]
Tue Oct 13 04:42:31 2020 daemon.warn pppd[30249]: Timeout waiting for PADO packets
Tue Oct 13 04:42:31 2020 daemon.err pppd[30249]: Unable to complete PPPoE Discovery
Tue Oct 13 04:42:31 2020 daemon.info pppd[30249]: Exit.
Tue Oct 13 04:42:31 2020 daemon.notice netifd: Interface 'wan' is now down

What version of OpenWRT are you using. This is in the status overview page in LuCI. Example: OpenWrt SNAPSHOT r0+14461-bedbb3fb8a

Also, could you post the logs before the disconnect occurs (after the connection is associated). The logs you posted are of a failed reconnect, not of the disconnect itself.

Before these lines, the router had a connection properly established. That's the disconnect reason that is more interesting.

@dchard ran into the same issue, and has already found out which commit is the culprit by doing a git bisect. His findings can be found in this post: Mtk_soc_eth watchdog timeout after r11573

1 Like

Yep.. The problem still persists. Reverting that single commit solves the issue. Notified @nbd about it via email 2 weeks ago, nothing happened yet. He is a busy man, hopefully this will get resolved soon.

1 Like

Today https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=b59d5c8f0eebb6d15d7cefe487c17fad0ee4a524 was merged. I compiled a new build and it seems the PPPoE problem is gone finally.

1 Like

Thanks man! I will test today or tomorrow Xiaomi AC2100 snapshot.

Tested last 24 hours and PPPOE is stable. Thanks !

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.