after i upgraded to latest 22.03.2 stable release my router started acting strangely: on daily basis my pppoe connection was broke. after enabling pppd debug I found that in such cases pppd logs this message rcvd [LCP TermReq id=0x2 "Peer not responding"] and as good peer, owrt pppd terminates the connection. then owrt redials and re-establishes the connection.
i reported this to ISP pointing out that i receive the above error message, whether they using any connection related timeout for example. it had been happened roughly every 24 hours so it seemed a plausible cause. but ISP confirmed no timeout is set.
because i have never had this problem before i reverted to 21.02.5 and for a week now i have not experienced this issue. same router, same cabling, only owrt version and as i think pppd version is changed.
is this a possible regression? i.e. pppd version 22.03.2 has a bug like not responding to peer's keepalive request hence the ISP sends termination request. or this totally different, i should set this or that in pppoe config?
setting keepalive (lcp-echo) on my side did not helped on 22.03. the only thing helped so far is using 21.02.
root@fw:~# ps wwww | grep ppp
24936 root 1100 R grep ppp
30205 root 1072 S /usr/sbin/pppd nodetach ipparam wan ifname pppoe-wan lcp-echo-interval 60 lcp-echo-failure 1 lcp-echo-adaptive +ipv6 set AUTOIPV6=1 nodefaultroute usepeerdns maxfail 1 user ???????? password ???????? ip-up-script /lib/netifd/ppp-up ipv6-up-script /lib/netifd/ppp6-up ip-down-script /lib/netifd/ppp-down ipv6-down-script /lib/netifd/ppp-down mtu 1492 mru 1492 plugin rp-pppoe.so nic-eth1
30384 root 804 S odhcp6c -s /lib/netifd/dhcpv6.script -P0 -t120 pppoe-wan
root@fw:~# uci show network.wan
network.wan=interface
network.wan.proto='pppoe'
network.wan.username='????????'
network.wan.password='????????'
network.wan.ipv6='auto'
network.wan.device='eth1'
network.wan.keepalive='1 60'
i copied network config after upgrade as i have not read / been aware of any changes to network setup between v21 and v22 impacting pppoe. was i wrong?
This is too tight, as it will consider the peer dead with one failed lcp echo request, sent every 60 seconds. Better change it to 6 10
Better not copy, because there may be certain changes and the configuration files will not be compatible. What I do is to take a backup of the configuration running on the old version, upgrade without keeping settings, and then use the backup as a guide to configure from the beginning. If you are sure that the config files remain the same, then you can copy-paste.
but this means my end will send lcp echo requests and expects answer. the error message i saw is different: the remote end (i.e. the isp) sends a termination request because peer (i.e. my end) does not respond. so i don't think it is related to this keepalive setting.
anyhow, i understand it is too aggressive and can change it back, but with default 5 60 (as i recall the values) had the same result.
re upgrade process: usually i do your way but there was not a single note i could find nowhere (not even in source) if pppoe configuration method is changed. but i can change protocol, save, set pppoe again to see any difference it will be.
changing protocol back and forth on wan interface, no significant difference i can see comparing to previously set config:
root@fw:~# uci show network.wan
network.wan=interface
network.wan.device='eth1'
network.wan.keepalive='1 60'
network.wan.proto='none'
root@fw:~# uci show network.wan
network.wan=interface
network.wan.device='eth1'
network.wan.proto='pppoe'
network.wan.username='????????'
network.wan.password='????????'
network.wan.ipv6='auto'
maybe. it looks to me that 3 servers are responding to discovery message, not sure each want to start the connection negotiation and then got this too many session failure (which would make sense imho).
or there is some bug in pppd which causing this whole problem:
i mean, receiving termination request due to peer not responding implies that my ppp client has vanished in the air from isp point of view. so when owrt ppp tries to re-establish connection almost immediately (which is by design i guess) then on isp side accouting did not cleared off previous session. that can be in other reason for too many session message.
am not sure which, or even only these 2 causes exists but on v21 this is not happening at all.
i tried that, it does not help. i see nothing LCP related before the actual disconnection, only normal data traffic. and as when wan is down obviously tcpdump exists so cannot even see the early packages when wan is coming up.
remember this is happening roughly every 24hours, unpredictable the exact time, so running tcpdump on my wan would generate a lots of file IO I can use -W to rotate files but still heavy file IO. but probably that's only thing i can do. I'll give a try and report back (will need some time though).
It should be the same as 8863.
However I am puzzled that you don't get any other LCP messages before the "peer not responding" and I was hoping to at least verify that you receive them.
I'm out of ideas. Looking at the uptime of the terminated sessions, it looks like it's random. Unfortunately there is no log of lost LCP, but I suppose that your line is working fine when the termination occurs. Have you noticed any packet loss before it occurs by any chance?