Possible regression in pppd

hi.

after i upgraded to latest 22.03.2 stable release my router started acting strangely: on daily basis my pppoe connection was broke. after enabling pppd debug I found that in such cases pppd logs this message rcvd [LCP TermReq id=0x2 "Peer not responding"] and as good peer, owrt pppd terminates the connection. then owrt redials and re-establishes the connection.

i reported this to ISP pointing out that i receive the above error message, whether they using any connection related timeout for example. it had been happened roughly every 24 hours so it seemed a plausible cause. but ISP confirmed no timeout is set.

because i have never had this problem before i reverted to 21.02.5 and for a week now i have not experienced this issue. same router, same cabling, only owrt version and as i think pppd version is changed.

is this a possible regression? i.e. pppd version 22.03.2 has a bug like not responding to peer's keepalive request hence the ISP sends termination request. or this totally different, i should set this or that in pppoe config?

setting keepalive (lcp-echo) on my side did not helped on 22.03. the only thing helped so far is using 21.02.

any idea please?

Did you keep the settings when upgraded from 21 to 22 or configured from scratch?
What are the settings that are running in both cases?

ps wwww | grep ppp
uci show network.wan

root@fw:~# ps wwww | grep ppp
24936 root      1100 R    grep ppp
30205 root      1072 S    /usr/sbin/pppd nodetach ipparam wan ifname pppoe-wan lcp-echo-interval 60 lcp-echo-failure 1 lcp-echo-adaptive +ipv6 set AUTOIPV6=1 nodefaultroute usepeerdns maxfail 1 user ???????? password ???????? ip-up-script /lib/netifd/ppp-up ipv6-up-script /lib/netifd/ppp6-up ip-down-script /lib/netifd/ppp-down ipv6-down-script /lib/netifd/ppp-down mtu 1492 mru 1492 plugin rp-pppoe.so nic-eth1
30384 root       804 S    odhcp6c -s /lib/netifd/dhcpv6.script -P0 -t120 pppoe-wan

root@fw:~# uci show network.wan
network.wan=interface
network.wan.proto='pppoe'
network.wan.username='????????'
network.wan.password='????????'
network.wan.ipv6='auto'
network.wan.device='eth1'
network.wan.keepalive='1 60'

i copied network config after upgrade as i have not read / been aware of any changes to network setup between v21 and v22 impacting pppoe. was i wrong?

This is too tight, as it will consider the peer dead with one failed lcp echo request, sent every 60 seconds. Better change it to 6 10

Better not copy, because there may be certain changes and the configuration files will not be compatible. What I do is to take a backup of the configuration running on the old version, upgrade without keeping settings, and then use the backup as a guide to configure from the beginning. If you are sure that the config files remain the same, then you can copy-paste.

but this means my end will send lcp echo requests and expects answer. the error message i saw is different: the remote end (i.e. the isp) sends a termination request because peer (i.e. my end) does not respond. so i don't think it is related to this keepalive setting.
anyhow, i understand it is too aggressive and can change it back, but with default 5 60 (as i recall the values) had the same result.

re upgrade process: usually i do your way but there was not a single note i could find nowhere (not even in source) if pppoe configuration method is changed. but i can change protocol, save, set pppoe again to see any difference it will be.

changing protocol back and forth on wan interface, no significant difference i can see comparing to previously set config:

root@fw:~# uci show network.wan
network.wan=interface
network.wan.device='eth1'
network.wan.keepalive='1 60'
network.wan.proto='none'
root@fw:~# uci show network.wan
network.wan=interface
network.wan.device='eth1'
network.wan.proto='pppoe'
network.wan.username='????????'
network.wan.password='????????'
network.wan.ipv6='auto'

Do you have handy the logs from the debug ppp by any chance?

yup

Summary
 rcvd [LCP TermReq id=0x2 "Peer not responding"]
 LCP terminated by peer (Peer not responding)
 Connect time 667.1 minutes.
 Sent 2744739301 bytes, received 168791919 bytes.
 Script /lib/netifd/ppp-down started (pid 28328)
 sent [LCP TermAck id=0x2]
 Script /lib/netifd/ppp-down finished (pid 28328), status = 0x1
 Modem hangup
 Connection terminated.
 Connect time 667.1 minutes.
 Sent 2744739301 bytes, received 168791919 bytes.
 Plugin pppoe.so loaded.
 PPPoE plugin from pppd 2.4.9
 Send PPPOE Discovery V1T1 PADI session 0x0 length 12
 dst ff:ff:ff:ff:ff:ff  src xx:xx:xx:xx:xx:6b
 [service-name] [host-uniq  4c 6f 00 00]
 Recv PPPOE Discovery V1T1 PADO session 0x0 length 44
 dst xx:xx:xx:xx:xx:6b  src xx:xx:xx:xx:xx:9d
 [AC-name ISP] [service-name] [AC-cookie  d2 cf ea 07 75 d6 69 0d 1e 
 aa e6 cb f7 69 bb f6 60 42 00 00] [host-uniq  4c 6f 00 00]
 Send PPPOE Discovery V1T1 PADR session 0x0 length 36
 dst xx:xx:xx:xx:xx:9d  src xx:xx:xx:xx:xx:6b
 [service-name] [host-uniq  4c 6f 00 00] [AC-cookie  d2 cf ea 07 75 d6 
 69 0d 1e aa e6 cb f7 69 bb f6 60 42 00 00]
 Recv PPPOE Discovery V1T1 PADO session 0x0 length 44
 dst xx:xx:xx:xx:xx:6b  src xx:xx:xx:xx:xx:57
 [AC-name ISP] [service-name] [AC-cookie  56 49 b6 c0 34 3b cf 19 46 
 31 18
 43 d6 e2 b4 f0 67 42 00 00] [host-uniq  4c 6f 00 00]
 Recv PPPOE Discovery V1T1 PADO session 0x0 length 44
 dst xx:xx:xx:xx:xx:6b  src xx:xx:xx:xx:xx:a3
 [AC-name ISP] [service-name] [AC-cookie  65 37 db 48 c2 1b 06 f2 62 
 6c 78
 f0 60 d7 c2 b6 6b 42 00 00] [host-uniq  4c 6f 00 00]
 Recv PPPOE Discovery V1T1 PADO session 0x0 length 44
 dst xx:xx:xx:xx:xx:6b  src xx:xx:xx:xx:xx:15
 [AC-name ISP] [service-name] [AC-cookie  d1 3d 80 58 52 63 da 8b fb 
 90 f8 ec 37 a7 09 b7 76 42 00 00] [host-uniq  4c 6f 00 00]
 Recv PPPOE Discovery V1T1 PADS session 0x525 length 12
 dst xx:xx:xx:xx:xx:6b  src xx:xx:xx:xx:xx:9d
 [service-name] [host-uniq  4c 6f 00 00]
 PADS: Service-Name: ''
 PPP session is 1317
 Connected to xx:xx:xx:xx:xx:9d via interface br-wan
 using channel 68
 Renamed interface ppp0 to pppoe-wan
 Using interface pppoe-wan
 Connect: pppoe-wan <-- br-wan
 sent [LCP ConfReq id=0x1 <mru 1492 <magic 0x85bf01c4]
 rcvd [LCP ConfReq id=0x1 <mru 1492 <auth pap <magic 0xe959c12d]
 sent [LCP ConfAck id=0x1 <mru 1492 <auth pap <magic 0xe959c12d]
 rcvd [LCP ConfAck id=0x1 <mru 1492 <magic 0x85bf01c4]
 sent [LCP EchoReq id=0x0 magic=0x85bf01c4]
 sent [PAP AuthReq id=0x1 user="???????" password=<hidden]
 rcvd [LCP EchoReq id=0x0 magic=0xe959c12d]
 sent [LCP EchoRep id=0x0 magic=0x85bf01c4]
 rcvd [LCP EchoRep id=0x0 magic=0xe959c12d]
 rcvd [PAP AuthNak id=0x1 "Too many sessions\n"]
 Remote message: Too many sessions^J
 PAP authentication failed
 sent [LCP TermReq id=0x2 "Failed to authenticate ourselves to peer"]
 rcvd [LCP TermReq id=0x2 "Authentication failed"]
 sent [LCP TermAck id=0x2]
 rcvd [LCP TermAck id=0x2]
 Connection terminated.
 Plugin pppoe.so loaded.
 PPPoE plugin from pppd 2.4.9
 Send PPPOE Discovery V1T1 PADI session 0x0 length 12
 dst ff:ff:ff:ff:ff:ff  src xx:xx:xx:xx:xx:6b
 [service-name] [host-uniq  49 70 00 00]
 Recv PPPOE Discovery V1T1 PADO session 0x0 length 44
 dst xx:xx:xx:xx:xx:6b  src xx:xx:xx:xx:xx:9d
 [AC-name ISP] [service-name] [AC-cookie  d2 cf ea 07 75 d6 69 0d 1e 
 aa e6 cb f7 69 bb f6 60 42 00 00] [host-uniq  49 70 00 00]
 Send PPPOE Discovery V1T1 PADR session 0x0 length 36
 dst xx:xx:xx:xx:xx:9d  src xx:xx:xx:xx:xx:6b
 [service-name] [host-uniq  49 70 00 00] [AC-cookie  d2 cf ea 07 75 d6 
 69 0d 1e aa e6 cb f7 69 bb f6 60 42 00 00]
 Recv PPPOE Discovery V1T1 PADO session 0x0 length 44
 dst xx:xx:xx:xx:xx:6b  src xx:xx:xx:xx:xx:15
 [AC-name ISP] [service-name] [AC-cookie  d1 3d 80 58 52 63 da 8b fb 
 90 f8 ec 37 a7 09 b7 76 42 00 00] [host-uniq  49 70 00 00]
 Recv PPPOE Discovery V1T1 PADO session 0x0 length 44
 dst xx:xx:xx:xx:xx:6b  src xx:xx:xx:xx:xx:a3
 [AC-name ISP] [service-name] [AC-cookie  65 37 db 48 c2 1b 06 f2 62 
 6c 78
 f0 60 d7 c2 b6 6b 42 00 00] [host-uniq  49 70 00 00]
 Recv PPPOE Discovery V1T1 PADO session 0x0 length 44
 dst xx:xx:xx:xx:xx:6b  src xx:xx:xx:xx:xx:57
 [AC-name ISP] [service-name] [AC-cookie  56 49 b6 c0 34 3b cf 19 46 
 31 18
 43 d6 e2 b4 f0 67 42 00 00] [host-uniq  49 70 00 00]
 Recv PPPOE Discovery V1T1 PADS session 0x73 length 12
 dst xx:xx:xx:xx:xx:6b  src xx:xx:xx:xx:xx:9d
 [service-name] [host-uniq  49 70 00 00]
 PADS: Service-Name: ''
 PPP session is 115
 Connected to xx:xx:xx:xx:xx:9d via interface br-wan
 using channel 69
 Renamed interface ppp0 to pppoe-wan
 Using interface pppoe-wan
 Connect: pppoe-wan <-- br-wan
 sent [LCP ConfReq id=0x1 <mru 1492 <magic 0x2838f07a]
 rcvd [LCP ConfReq id=0x1 <mru 1492 <auth pap <magic 0xcf4a60ec]
 sent [LCP ConfAck id=0x1 <mru 1492 <auth pap <magic 0xcf4a60ec]
 sent [LCP ConfReq id=0x1 <mru 1492 <magic 0x2838f07a]
 rcvd [LCP ConfAck id=0x1 <mru 1492 <magic 0x2838f07a]
 sent [LCP EchoReq id=0x0 magic=0x2838f07a]
 sent [PAP AuthReq id=0x1 user="???????" password=<hidden]
 rcvd [LCP EchoReq id=0x0 magic=0xcf4a60ec]
 sent [LCP EchoRep id=0x0 magic=0x2838f07a]
 rcvd [LCP EchoRep id=0x0 magic=0xcf4a60ec]
 rcvd [PAP AuthAck id=0x1 ""]
 PAP authentication succeeded
 peer from calling number xx:xx:xx:xx:xx:9d authorized
 sent [IPCP ConfReq id=0x1 <addr 0.0.0.0 <ms-dns1 0.0.0.0 <ms-dns2 
 0.0.0.0]
 sent [IPV6CP ConfReq id=0x1 <addr fe80::196c:a709:fd40:b718]
 rcvd [IPCP ConfReq id=0x1 <addr 10.0.0.1]
 sent [IPCP ConfAck id=0x1 <addr 10.0.0.1]
 rcvd [IPV6CP ConfReq id=0x1 <addr fe80::a1b8:325c:bfd3:02a0]
 sent [IPV6CP ConfAck id=0x1 <addr fe80::a1b8:325c:bfd3:02a0]
 rcvd [IPCP ConfNak id=0x1 <addr ??.??.??.136 <ms-dns1 ??.??.??.4
 <ms-dns2 ??.??.??.8]
 sent [IPCP ConfReq id=0x2 <addr ??.??.??.136 <ms-dns1 ??.??.??.4
 <ms-dns2 ??.??.??.8]
 rcvd [IPV6CP ConfAck id=0x1 <addr fe80::196c:a709:fd40:b718]
 local  LL address fe80::196c:a709:fd40:b718
 remote LL address fe80::a1b8:325c:bfd3:02a0
 rcvd [IPCP ConfAck id=0x2 <addr ??.??.??.136 <ms-dns1 ??.??.??.4
 <ms-dns2 ??.??.??.8]
 local  IP address ??.??.??.136
 remote IP address 10.0.0.1
 primary   DNS address ??.??.??.4
 secondary DNS address ??.??.??.8
 Script /lib/netifd/ppp-up started (pid 28949)
 Script /lib/netifd/ppp-up finished (pid 28949), status = 0x1

Do you have anything before receiving the "peer not responding" ?

nope. that's the beauty of it.

This seems to indicate that your ISP still thinks the original session is operational, no?

Then maybe you could run this to (hopefully) capture the culprit?
tcpdump -i eth1 pppoed
You can add -w /tmp/pppoe.pcap to make it save to a file.

maybe. it looks to me that 3 servers are responding to discovery message, not sure each want to start the connection negotiation and then got this too many session failure (which would make sense imho).

or there is some bug in pppd which causing this whole problem:
i mean, receiving termination request due to peer not responding implies that my ppp client has vanished in the air from isp point of view. so when owrt ppp tries to re-establish connection almost immediately (which is by design i guess) then on isp side accouting did not cleared off previous session. that can be in other reason for too many session message.

am not sure which, or even only these 2 causes exists but on v21 this is not happening at all.

i tried that, it does not help. i see nothing LCP related before the actual disconnection, only normal data traffic. and as when wan is down obviously tcpdump exists so cannot even see the early packages when wan is coming up.

tcpdump should not stop because it is running on eth1, not on pppoe-wan.

remember this is happening roughly every 24hours, unpredictable the exact time, so running tcpdump on my wan would generate a lots of file IO :slight_smile: I can use -W to rotate files but still heavy file IO. but probably that's only thing i can do. I'll give a try and report back (will need some time though).

pppoed filter is not producing hardly any traffic.

tcpdump 'ether[0x0c:2] == 0x8863 or ether[0x0c:2] == 0x8864 and ether[0x0f] != 0'
you mean?

It should be the same as 8863.
However I am puzzled that you don't get any other LCP messages before the "peer not responding" and I was hoping to at least verify that you receive them.

# tcpdump -r eth1.log0
reading from file eth1.log0, link-type EN10MB (Ethernet)
16:55:07.433844 PPPoE PADT [ses 0x8ab] [Generic-Error "RP-PPPoE: Child pppd process terminated"]
16:55:10.994155 PPPoE PADI [Service-Name] [Host-Uniq 0x346B0000]
16:55:15.999543 PPPoE PADI [Service-Name] [Host-Uniq 0x346B0000]
16:55:16.147136 PPPoE PADO [AC-Name "ISP"] [Service-Name] [AC-Cookie 0xD2CFEA0775D6690D1EAAE6CBF769BBF660420000] [Host-Uniq 0x346B0000]
16:55:16.147136 PPPoE PADO [AC-Name "ISP"] [Service-Name] [AC-Cookie 0x6537DB48C21B06F2626C78F060D7C2B66B420000] [Host-Uniq 0x346B0000]
16:55:16.147136 PPPoE PADO [AC-Name "ISP"] [Service-Name] [AC-Cookie 0xD13D80585263DA8BFB90F8EC37A709B776420000] [Host-Uniq 0x346B0000]
16:55:16.147777 PPPoE PADO [AC-Name "ISP"] [Service-Name] [AC-Cookie 0x5649B6C0343BCF1946311843D6E2B4F067420000] [Host-Uniq 0x346B0000]
16:55:16.148247 PPPoE PADR [Service-Name] [Host-Uniq 0x346B0000] [AC-Cookie 0xD2CFEA0775D6690D1EAAE6CBF769BBF660420000]
16:55:16.151939 PPPoE PADS [ses 0x507] [Service-Name] [Host-Uniq 0x346B0000]

/storage/var/log/ppp.log:
rcvd [LCP TermReq id=0x3 "Peer not responding"]
LCP terminated by peer (Peer not responding)
Connect time 1010.0 minutes.
Sent 3114903594 bytes, received 1962732333 bytes.
Script /lib/netifd/ppp-down started (pid 27101)
Script /lib/netifd/ppp-down started (pid 27110)
sent [LCP TermAck id=0x3]
Script /lib/netifd/ppp-down finished (pid 27101), status = 0x1
Script /lib/netifd/ppp-down finished (pid 27110), status = 0x1
Modem hangup
Connection terminated.
Connect time 1010.0 minutes.
Sent 3114903594 bytes, received 1962732333 bytes.
Plugin pppoe.so loaded.
PPPoE plugin from pppd 2.4.9
pppd options in effect:
debug           # (from /etc/ppp/options)
nodetach                # (from command line)
logfile /storage/var/log/ppp.log                # (from /etc/ppp/options)
maxfail 1               # (from command line)
ifname pppoe-wan                # (from command line)
dump            # (from /etc/ppp/options)
set AUTOIPV6=1          # (from command line)
ip-up-script /lib/netifd/ppp-up         # (from command line)
ip-down-script /lib/netifd/ppp-down             # (from command line)
ipv6-up-script /lib/netifd/ppp6-up              # (from command line)
ipv6-down-script /lib/netifd/ppp-down           # (from command line)
plugin pppoe.so         # (from command line)
user ??????            # (from command line)
password ??????         # (from command line)
eth1            # (from command line)
eth1            # (from command line)
noaccomp                # (from /etc/ppp/options)
mru 1492                # (from command line)
mtu 1492                # (from command line)
nopcomp         # (from /etc/ppp/options)
lcp-echo-failure 1              # (from command line)
lcp-echo-interval 60            # (from command line)
lcp-echo-adaptive               # (from command line)
ipparam wan             # (from command line)
noipdefault             # (from /etc/ppp/options)
nodefaultroute          # (from command line)
usepeerdns              # (from command line)
+ipv6           # (from command line)
Send PPPOE Discovery V1T1 PADI session 0x0 length 12
 dst ff:ff:ff:ff:ff:ff  src xx:xx:xx:xx:xx:6b
 [service-name] [host-uniq  34 6b 00 00]
Send PPPOE Discovery V1T1 PADI session 0x0 length 12
 dst ff:ff:ff:ff:ff:ff  src xx:xx:xx:xx:xx:6b
 [service-name] [host-uniq  34 6b 00 00]
Recv PPPOE Discovery V1T1 PADO session 0x0 length 44
 dst xx:xx:xx:xx:xx:6b  src xx:xx:xx:xx:xx:9d
 [AC-name ISP] [service-name] [AC-cookie  d2 cf ea 07 75 d6 69 0d 1e aa e6 cb f7 69 bb f6 60 42 00 00] [host-uniq  34 6b 00 00]
Send PPPOE Discovery V1T1 PADR session 0x0 length 36
 dst xx:xx:xx:xx:xx:9d  src xx:xx:xx:xx:xx:6b
 [service-name] [host-uniq  34 6b 00 00] [AC-cookie  d2 cf ea 07 75 d6 69 0d 1e aa e6 cb f7 69 bb f6 60 42 00 00]
Recv PPPOE Discovery V1T1 PADO session 0x0 length 44
 dst xx:xx:xx:xx:xx:6b  src xx:xx:xx:xx:xx:a3
 [AC-name ISP] [service-name] [AC-cookie  65 37 db 48 c2 1b 06 f2 62 6c 78 f0 60 d7 c2 b6 6b 42 00 00] [host-uniq  34 6b 00 00]
Recv PPPOE Discovery V1T1 PADO session 0x0 length 44
 dst xx:xx:xx:xx:xx:6b  src xx:xx:xx:xx:xx:15
 [AC-name ISP] [service-name] [AC-cookie  d1 3d 80 58 52 63 da 8b fb 90 f8 ec 37 a7 09 b7 76 42 00 00] [host-uniq  34 6b 00 00]
Recv PPPOE Discovery V1T1 PADO session 0x0 length 44
 dst xx:xx:xx:xx:xx:6b  src xx:xx:xx:xx:xx:57
 [AC-name ISP] [service-name] [AC-cookie  56 49 b6 c0 34 3b cf 19 46 31 18 43 d6 e2 b4 f0 67 42 00 00] [host-uniq  34 6b 00 00]
Recv PPPOE Discovery V1T1 PADS session 0x507 length 12
 dst xx:xx:xx:xx:xx:6b  src xx:xx:xx:xx:xx:9d
 [service-name] [host-uniq  34 6b 00 00]
PADS: Service-Name: ''
PPP session is 1287
Connected to xx:xx:xx:xx:xx:9d via interface eth1
using channel 3
Renamed interface ppp0 to pppoe-wan
Using interface pppoe-wan
Connect: pppoe-wan <--> eth1
sent [LCP ConfReq id=0x1 <mru 1492> <magic 0x5de5c2e>]
rcvd [LCP ConfReq id=0x1 <mru 1492> <auth pap> <magic 0xe13dd43f>]
sent [LCP ConfAck id=0x1 <mru 1492> <auth pap> <magic 0xe13dd43f>]
sent [LCP ConfReq id=0x1 <mru 1492> <magic 0x5de5c2e>]
rcvd [LCP ConfAck id=0x1 <mru 1492> <magic 0x5de5c2e>]
sent [LCP EchoReq id=0x0 magic=0x5de5c2e]
sent [PAP AuthReq id=0x1 user="???????" password=<hidden>]
rcvd [LCP EchoReq id=0x0 magic=0xe13dd43f]
sent [LCP EchoRep id=0x0 magic=0x5de5c2e]
rcvd [LCP EchoRep id=0x0 magic=0xe13dd43f]
rcvd [PAP AuthAck id=0x1 ""]
PAP authentication succeeded
peer from calling number xx:xx:xx:xx:xx:9d authorized
sent [IPCP ConfReq id=0x1 <addr 0.0.0.0> <ms-dns1 0.0.0.0> <ms-dns2 0.0.0.0>]
sent [IPV6CP ConfReq id=0x1 <addr fe80::bdff:c969:d175:456b>]
rcvd [IPV6CP ConfReq id=0x1 <addr fe80::f864:fb04:ca5e:7e1a>]
sent [IPV6CP ConfAck id=0x1 <addr fe80::f864:fb04:ca5e:7e1a>]
rcvd [IPCP ConfReq id=0x1 <addr 10.0.0.1>]
sent [IPCP ConfAck id=0x1 <addr 10.0.0.1>]
rcvd [IPCP ConfNak id=0x1 <addr ??.??.??.210> <ms-dns1 ??.??.??.4> <ms-dns2 ??.??.??.8>]
sent [IPCP ConfReq id=0x2 <addr ??.??.??.210> <ms-dns1 ??.??.??.4> <ms-dns2 ??.??.??.8>]
rcvd [IPV6CP ConfAck id=0x1 <addr fe80::bdff:c969:d175:456b>]
local  LL address fe80::bdff:c969:d175:456b
remote LL address fe80::f864:fb04:ca5e:7e1a
Script /lib/netifd/ppp6-up started (pid 27588)
rcvd [IPCP ConfAck id=0x2 <addr ??.??.??.210> <ms-dns1 ??.??.??.4> <ms-dns2 ??.??.??.8>]
local  IP address ??.??.??.210
remote IP address 10.0.0.1
primary   DNS address ??.??.??.4
secondary DNS address ??.??.??.8
Script /lib/netifd/ppp-up started (pid 27600)
Script /lib/netifd/ppp-up finished (pid 27600), status = 0x1
Script /lib/netifd/ppp6-up finished (pid 27588), status = 0x0
---- 2022-12-02 16:56:25: ifup of wan (pppoe-wan)

I'm out of ideas. Looking at the uptime of the terminated sessions, it looks like it's random. Unfortunately there is no log of lost LCP, but I suppose that your line is working fine when the termination occurs. Have you noticed any packet loss before it occurs by any chance?