Pppd default lcp-echo options

emulti · July 19, 2019, 3:48pm

HW: Netgear DGN3500 OpenWrt 18.06.4 r7808-ef686b7292
Since upgrading to 18.06 on this router, I have been experiencing ADSL line drops several times a day. Often the link is unable to reconnect without a reboot.

The message in syslog at the start of the link drop is something like 'no LCP echo received', so I started to invesigate the LCP echo settings in the WAN 'Advanced Settings' of Luci. I wanted to set LCP echo failure threshold to 0 (ignore failures) to see if false faulures were being detected, but found that a '0' value is not reflected in the UI after saving.

Therefore I investigated the ppp command-line options that were running with '0' set in Luci:

/usr/sbin/pppd nodetach ipparam wan ifname pppoe-wan lcp-echo-interval 1 lcp-echo-failure 5 lcp-echo-adaptive ....

This is sending LCP echo requests once a second, and tearing down the connection after five failures- not ignoring failures as the UI suggests, in fact.

According to this commit (Aug 2018):
https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=555c592304023a0d24216a6d8ed9d525602ae21

When LCP echos are disabled, the fallback is to hardcoded values in /lib/netifd/proto/ppp.sh set around line 120:
[ -n "$keepalive" ] || keepalive="5 1"

However, it seems that the hardcoded values are reversed, because
[ -n "$keepalive" ] || keepalive="1 5"
gives the expected (from Luci UI) values for lcp-echo-failure and lcp-echo-interval:

/usr/sbin/pppd nodetach ipparam wan ifname pppoe-wan lcp-echo-interval 5 lcp-echo-failure 1 lcp-echo-adaptive....

It would also appear that the 'keepalive' variable pulled by json_get function in ppp.sh is not being read, because when set to '0' it does not result in the lcp-echo prarmeters being omitted from the ppp commandline. I don't know enough about the internal structure of OpenWRT to debug this further.

I'll keep an eye on the number of disconnects I get with the longer lcp-ech-interval, but thought it worth mentioning that the ppp options were not being set as expected.

emulti

anon45274024 · July 19, 2019, 3:57pm

/etc/ppp/options

- #debug
+ debug

gets ppp pretty chatty in the logs and likely helps to debug issues.

emulti · July 19, 2019, 4:23pm

Yes, it gives a lot of info about the connection setup and intial LCP echo request/reply. But it doesn't seem to give ongoing monitoring of LCP messages after the link is up (unless there aren't any).

It seems lcp-echo-adaptive is a patch, with the function:
/*

* If adaptive echos have been enabled, only send the echo request if

* no traffic was received since the last one.

```
*/
```

So the LCP echo interval only applies when the line is quiet.

I'm pretty sure the order of the fallback keepalive parameters in ppp.sh is a bug. On my system, pppd now reconnects quickly and reliably. With '0' set in the Luci UI the command-line parameters match the UI defaults, even if echo requests are not in fact disabled.

Time will tell if the link is more stable.

jow · July 19, 2019, 4:31pm

The issue is that LuCI does not actually write 0 into the config but omits the parameter entirely and simply displays a 0 placeholder instead. The ppp.sh handler on the other hand falls back due to [ -n "$keepalive" ] || keepalive="5 1"

You can try to edit /usr/lib/lua/luci/model/cbi/admin_network/proto_pppoe.lua on your router and change m:del(section, "keepalive") to m:set(section, "keepalive", "0")

emulti · July 19, 2019, 4:36pm

Thanks, will try that.

Looking at the commit I linked to, the values 5 and 1 could be wrong way round:

  [ -n "$keepalive" ] || keepalive="5 1"

  local lcp_failure="${keepalive%%[, ]*}"
  local lcp_interval="${keepalive##*[, ]}"

Should the defaults be 5 seconds between echo requests, 1 failure, or 1 second between requests, 5 failures? I have read that sending echo requests too frequenlty has an undesirable effect on the link throughput.

jow · July 19, 2019, 4:37pm

It means 1 second between requests and 5 failures. Failures is the first number, interval the second.

anon45274024 · July 19, 2019, 4:40pm

Suppose that would cause a flood on the system log, at least judging by the LCP frequency observed from a tcpdump.

Just mentioned the debug option since it might be more verbose on the disconnects.

I played around those settings and monitored changes with tcpdump. as far as I can tell it works as intended/advertised.

This node is set (via LuCI) with interval 10 and threshold 6

emulti · July 19, 2019, 4:53pm

The edit to proto_pppoe.lua as suggested does indeed make the lcp-echo options from the command line disappear, thank you.

If I understand you correctly, LCP requests are deliberately suppressed from the logs, which is sensible.

The Luci-displayed default is 5 seconds for LCP echo-interval, which was why I thought the default values might be the wrong way around. The values 5 and 1 only apply when the Luci parameter is set (or defaulted) to zero.

Thank you both for your help. I'll keep monitoring the connection to see if it is more stable with a larger number of permitted failures between echo request/replies.

I am thinking that there might be errors caused by transient noise on the line which were causing a drop when a single echo failure is detected.

jow · July 19, 2019, 7:04pm

Turns out that the fix was already in LuCI master but not backported to OpenWrt 18.06. This was done with https://github.com/openwrt/luci/commit/a388be0f28751707f8b52fc53f3dc94ba2ad9ace now

emulti · July 20, 2019, 11:38am

That's good. I can confirm that since applying the change to the pppoe interface settings so that the hardcoded defaults are no longer used, my connection has been stable with no LCP echo checking.

However, current running parameters are
lcp-echo-failure 3
lcp-echo-interval 5

This site is subect to RF interference. The unscreened copper cable runs down the front of the house only 1.5m from a constant flow of vehicles arriving and leaving roadside parking.