PPPOE reconnection failing (lantiq built in modem)

I have a rather strange situation where my DSL doesn't complete PPPoE when it's disconnected by the ISP

Setup:

  • Running a w8970 with master build using the onboard DSL modem to connect to the ISP (no intermediate PPPoE devices)
  • PTM based setup
  • ISP used vlan7
  • My ISP disconnects circuit after a week of being online
  • DSL modem tries to reconnect but never does.

Here are timeouts:

Jul 19 05:04:31 imagiswitch pppd[1781]: No response to 10 echo-requests
Jul 19 05:04:31 imagiswitch pppd[1781]: Serial link appears to be disconnected.
Jul 19 05:04:31 imagiswitch pppd[1781]: Connect time 8040.8 minutes.
Jul 19 05:04:31 imagiswitch pppd[1781]: Sent 779840993 bytes, received 3319298415 bytes.
Jul 19 05:04:32 imagiswitch pppd[1781]: Script /lib/netifd/ppp-down started (pid 31702)
Jul 19 05:04:32 imagiswitch netifd: Network device 'pppoe-easybell' link is down
Jul 19 05:04:32 imagiswitch pppd[1781]: sent [LCP TermReq id=0x2 "Peer not responding"]
Jul 19 05:04:32 imagiswitch netifd: Network device 'ffvpn' link is down
Jul 19 05:04:32 imagiswitch netifd: Interface 'ffvpn' has link connectivity loss
Jul 19 05:04:32 imagiswitch netifd: Interface 'ffvpn' is now down
Jul 19 05:04:35 imagiswitch pppd[1781]: sent [LCP TermReq id=0x3 "Peer not responding"]
Jul 19 05:04:36 imagiswitch netifd: Interface 'ffvpn' is disabled
Jul 19 05:04:36 imagiswitch netifd: Interface 'easybell' has lost the connection
Jul 19 05:04:36 imagiswitch netifd: Interface 'henet' has lost the connection
Jul 19 05:04:36 imagiswitch netifd: tunnel '6in4-henet' link is down
Jul 19 05:04:36 imagiswitch netifd: Interface 'henet' is now down
Jul 19 05:04:36 imagiswitch netifd: Interface 'henet' is setting up now
Jul 19 05:04:37 imagiswitch netifd: Interface 'henet' is now down
Jul 19 05:04:38 imagiswitch pppd[1781]: Connection terminated.
Jul 19 05:04:38 imagiswitch pppd[1781]: Send PPPOE Discovery V1T1 PADT session 0x1862 length 0
Jul 19 05:04:38 imagiswitch pppd[1781]:  dst fc:48:ef:2c:57:f4  src 00:20:da:86:23:75
Jul 19 05:04:38 imagiswitch pppd[1781]:
Jul 19 05:04:38 imagiswitch pppd[1781]: Sent PADT
Jul 19 05:04:38 imagiswitch pppd[1781]: Modem hangup
Jul 19 05:04:42 imagiswitch pppd[1781]: Script /lib/netifd/ppp-down finished (pid 31702), status = 0x0
Jul 19 05:04:45 imagiswitch netifd: Network device 'ptm0' link is down
Jul 19 05:04:45 imagiswitch netifd: VLAN 'ptm0.7' link is down
Jul 19 05:04:45 imagiswitch netifd: Interface 'easybell' has link connectivity loss
Jul 19 05:04:46 imagiswitch pppd[1781]: Terminating on signal 15
Jul 19 05:04:46 imagiswitch pppd[1781]: Exit.
Jul 19 05:05:27 imagiswitch netifd: Network device 'ptm0' link is up
Jul 19 05:05:27 imagiswitch netifd: VLAN 'ptm0.7' link is up
Jul 19 05:05:27 imagiswitch netifd: Interface 'easybell' has link connectivity
Jul 19 05:05:27 imagiswitch netifd: Interface 'easybell' is setting up now

So far so good. However I noticed that the discovery phase never completes:

Jul 19 05:05:27 imagiswitch pppd[32407]: Plugin rp-pppoe.so loaded.
Jul 19 05:05:27 imagiswitch pppd[32407]: RP-PPPoE plugin version 3.8p compiled against pppd     2.4.7
Jul 19 05:05:27 imagiswitch pppd[32407]: pppd 2.4.7 started by root, uid 0
Jul 19 05:05:27 imagiswitch pppd[32407]: Send PPPOE Discovery V1T1 PADI session 0x0 length 10
Jul 19 05:05:27 imagiswitch pppd[32407]:  dst ff:ff:ff:ff:ff:ff  src 00:20:da:86:23:75
Jul 19 05:05:27 imagiswitch pppd[32407]:  [service-name] [PPP-max-payload  05 dc]
Jul 19 05:05:32 imagiswitch pppd[32407]: Send PPPOE Discovery V1T1 PADI session 0x0 length 10
Jul 19 05:05:32 imagiswitch pppd[32407]:  dst ff:ff:ff:ff:ff:ff  src 00:20:da:86:23:75
Jul 19 05:05:32 imagiswitch pppd[32407]:  [service-name] [PPP-max-payload  05 dc]
Jul 19 05:05:37 imagiswitch pppd[32407]: Send PPPOE Discovery V1T1 PADI session 0x0 length 10
Jul 19 05:05:37 imagiswitch pppd[32407]:  dst ff:ff:ff:ff:ff:ff  src 00:20:da:86:23:75
Jul 19 05:05:37 imagiswitch pppd[32407]:  [service-name] [PPP-max-payload  05 dc]
Jul 19 05:05:42 imagiswitch pppd[32407]: Timeout waiting for PADO packets
Jul 19 05:05:42 imagiswitch pppd[32407]: Unable to complete PPPoE Discovery
<snip>
Jul 19 05:17:33 imagiswitch netifd: Network device 'ptm0' link is down
Jul 19 05:17:33 imagiswitch netifd: VLAN 'ptm0.7' link is down
Jul 19 05:17:33 imagiswitch netifd: Interface 'easybell' has link connectivity loss

To get around this I thought adding the following

cat /etc/ppp/ip-down.d/reset-dsl.sh
#!/bin/sh
/etc/init.d/dsl_control restart

To debug this, I have tried the following:

/etc/init.d/dsl_control restart - doesn't work
/etc/init.d/networking restart - doesn't work
Rebooting the w8970  - works :)

I'm presuming this is something to do with the vdsl firmware or even the system not sending a reset to the remote end or ... ?

To check for auto-reconnect during the week before the ISP does their weekly reconnect, I have pulled the DSL cable for a few seconds - results in a reconnect.

Configs are as follows:

config switch_vlan
    option device 'ptm0'
    option vlan '7'

config dsl 'dsl'
    option xfer_mode 'ptm'
    option firmware '/lib/firmware/lantiq-vrx200-a.bin'
    option annex 'a'
    option line_mode 'vdsl'

config device 'dsl_dev'
    option name 'ptm0'
    option proto 'none'
    option mtu '1508'
    option delegate '0'

config device 'dsl_vlan'
    option name 'ptm0.7'
    option proto 'none'
    option mtu '1508'
    option delegate '0'

config interface 'easybell'
    option proto 'pppoe'
    option ifname 'ptm0.7'
    option username 'xxxx'
    option password 'xxxx'
    option mtu '1500'
    option peerdns '0'
    option ipv6 '0'
    option demand '0'
    option persist 'true'
    option maxfail '0'
    option holdoff '10'
    option keepalive '10 5'
    option pppd_options 'lcp-echo-adaptive mtu 1500 debug'

For the record, a normal connect looks like this and I can reconnect any time before the 7 day disconnect. The device will reconnect just fine.

Jul 19 06:03:14 imagiswitch netifd: Network device 'ptm0' link is up
Jul 19 06:03:14 imagiswitch netifd: VLAN 'ptm0.7' link is up
Jul 19 06:03:14 imagiswitch netifd: Interface 'easybell' has link connectivity
Jul 19 06:03:14 imagiswitch netifd: Interface 'easybell' is setting up now
Jul 19 06:03:15 imagiswitch pppd[6386]: Plugin rp-pppoe.so loaded.
Jul 19 06:03:15 imagiswitch pppd[6386]: RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7
Jul 19 06:03:15 imagiswitch pppd[6386]: pppd 2.4.7 started by root, uid 0
Jul 19 06:03:15 imagiswitch pppd[6386]: Send PPPOE Discovery V1T1 PADI session 0x0 length 10
Jul 19 06:03:15 imagiswitch pppd[6386]:  dst ff:ff:ff:ff:ff:ff  src 00:20:da:86:23:75
Jul 19 06:03:15 imagiswitch pppd[6386]:  [service-name] [PPP-max-payload  05 dc]
Jul 19 06:03:15 imagiswitch pppd[6386]: Recv PPPOE Discovery V1T1 PADO session 0x0 length 42
Jul 19 06:03:15 imagiswitch pppd[6386]:  dst 00:20:da:86:23:75  src fc:48:ef:2c:58:9e
Jul 19 06:03:15 imagiswitch pppd[6386]:  [service-name] [PPP-max-payload  05 dc] [AC-name rdsl-brln-de80.mediaways.net]
Jul 19 06:03:15 imagiswitch pppd[6386]: Send PPPOE Discovery V1T1 PADR session 0x0 length 10
Jul 19 06:03:15 imagiswitch pppd[6386]:  dst fc:48:ef:2c:58:9e  src 00:20:da:86:23:75
Jul 19 06:03:15 imagiswitch pppd[6386]:  [service-name] [PPP-max-payload  05 dc]
Jul 19 06:03:15 imagiswitch pppd[6386]: PADS: Service-Name: ''
Jul 19 06:03:15 imagiswitch pppd[6386]: PPP session is 30413
Jul 19 06:03:15 imagiswitch pppd[6386]: Connected to fc:48:ef:2c:58:9e via interface ptm0.7
Jul 19 06:03:15 imagiswitch pppd[6386]: using channel 3
Jul 19 06:03:15 imagiswitch pppd[6386]: Using interface pppoe-easybell
Jul 19 06:03:15 imagiswitch pppd[6386]: Connect: pppoe-easybell <--> ptm0.7
Jul 19 06:03:15 imagiswitch pppd[6386]: sent [LCP ConfReq id=0x1 <magic 0x49a2daca>]

Any pointers would be most helpful!

Would you please check the output of dmesg for the kernel warning mentioned in FS#494. I suspect you are facing the same issue. After that crash no traffic flows via the ptm interface.

1 Like

@mkresin checked - no kernel dmesg output from this

However this case seems similar (ifdown/up and dsl_control don't help reset the device).

In my case the the remote end terminates the connection after a set amount of time (7 days). I'm guessing weekly IP address reshuffling. But this seems to leave the ptm0 device in an odd state.

Jul 19 05:04:31 imagiswitch pppd[1781]: No response to 10 echo-requests
Jul 19 05:04:31 imagiswitch pppd[1781]: Serial link appears to be disconnected.
Jul 19 05:04:31 imagiswitch pppd[1781]: Connect time 8040.8 minutes.
Jul 19 05:04:31 imagiswitch pppd[1781]: Sent 779840993 bytes, received 3319298415 bytes.

When that happens again plase try /etc/init.d/dsl_control stop & unload the module
modul order is
rmmod drv_dsl_cpe_api
rmmod ltq_ptm_vr9
rmmod drv_mei_cpe
rmmod drv_ifxos

If no crash happen here try load it again with modprobe with reserved load order.

And mini jumpo frames is supported? in germany no provider normaly support it by default. (Vodafone Telekom etc)
EDIT: interface easybell let me think you use a german provider but Anex A should not be used in Germany
dont put this in a script it can trigger a reboot loop

1 Like

Great point @trismo on unloading the modules. I'll make sure that I add that to a ppp-down script as a workaround. Thanks for the ordering too.

To your comment - yes in Germany on Easybell. The mini-jumbo was me being optimistic after seeing the following in the PPPoE setup:

PPP-max-payload  05 dc

is that size (1500) referring to the vlan tagged frame or naked frame?

Thanks for the Annex A pointer. Interestingly it still connected /etc/init.d/dsl_control status below. I've updated the firmware the b version now.

ATU-C Vendor ID:                          Broadcom 161.183
ATU-C System Vendor ID:                   Broadcom
Chipset:                                  Lantiq-VRX200 Unknown
Firmware Version:                         5.8.1.8.1.6
API Version:                              4.17.18.6
XTSE Capabilities:                        0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2
Annex:                                    B
Line Mode:                                G.993.2 (VDSL2)
Profile:                                  17a
Line State:                               UP [0x801: showtime_tc_sync]
Forward Error Correction Seconds (FECS):  Near: 1823 / Far: 1908244
Errored seconds (ES):                     Near: 1 / Far: 141
Severely Errored Seconds (SES):           Near: 0 / Far: 44
Loss of Signal Seconds (LOSS):            Near: 0 / Far: 23
Unavailable Seconds (UAS):                Near: 29 / Far: 29
Header Error Code Errors (HEC):           Near: 0 / Far: 0
Non Pre-emtive CRC errors (CRC_P):        Near: 38 / Far: 0
Pre-emtive CRC errors (CRCP_P):           Near: 0 / Far: 0
Power Management Mode:                    L0 - Synchronized
Latency [Interleave Delay]:               8.0 ms [Interleave]   8.0 ms [Interleave]
Data Rate:                                Down: 38.066 Mb/s / Up: 9.861 Mb/s
Line Attenuation (LATN):                  Down: 25.7 dB / Up: 35.7 dB
Signal Attenuation (SATN):                Down: 23.3 dB / Up: 36.7 dB
Noise Margin (SNR):                       Down: 6.1 dB / Up: 5.7 dB
Aggregate Transmit Power (ACTATP):        Down: 7.4 dB / Up: 14.5 dB
Max. Attainable Data Rate (ATTNDR):       Down: 40.943 Mb/s / Up: 10.276 Mb/s
Line Uptime Seconds:                      923
Line Uptime:                              15m 23s

You use last lede 17.01> version
Let clean up you config first.

#Delete this section under wan > ptm0.7 do the vlan magic :slight_smile:
config switch_vlan
option device 'ptm0'
option vlan '7'
#Delete option firmware (option annex 'b' this set the Firmware for you)
config dsl 'dsl'
option xfer_mode 'ptm'
option firmware '/lib/firmware/lantiq-vrx200-a.bin'
option annex 'b' # change to a to b
option line_mode 'vdsl'

config device 'dsl_dev'
option name 'ptm0'
option proto 'none' # delete
option mtu '1508'
option delegate '0' # delete

#Delete this completely
config device 'dsl_vlan'
option name 'ptm0.7'
option proto 'none'
option mtu '1508'
option delegate '0'

#This looks ok
config interface 'easybell'
option proto 'pppoe'
option ifname 'ptm0.7'
option username 'xxxx'
option password 'xxxx'
option mtu '1500'
option peerdns '0'
option ipv6 '0'
option demand '0'
option persist 'true'
option maxfail '0'
option holdoff '10'
option keepalive '10 5'
option pppd_options 'lcp-echo-adaptive mtu 1500 debug'

To mtu pppoe-wan interface should now have mtu with 1500
check with "ip link show dev pppoe-easybell"

Semi-related, under Firewall ensure that WAN has MSS Clamping enabled.

Thanks @jackiechun and @trismo.

Sadly,

root@imagiswitch:~# ip link show dev ptm0
7: ptm0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1508 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
   link/ether 00:20:da:86:23:75 brd ff:ff:ff:ff:ff:ff  
root@imagiswitch:~# ip link show dev ptm0.7
16: ptm0.7@ptm0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:20:da:86:23:75 brd ff:ff:ff:ff:ff:ff
root@imagiswitch:~# ip link show dev pppoe-easybell
20: pppoe-easybell: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc htb state UNKNOWN mode DEFAULT group default qlen 3
link/ppp

Current config is as follows - I've pushed the most relevant bits of my config into a gist at https://gist.github.com/imaginator/bd831471e30048d1922c4c9b99f82ee9

and the diffconfig here: https://gist.github.com/imaginator/5cea39f3ffe8bb9afe992c71d3f4a607 (interestingly I'm missing the RTC feature to set the HWclock but that's another story...)

Ok try with
config interface 'easybell'
option mtu '1508'

setting

config interface 'easybell'
option mtu '1508'

Doesn't help either. I expect the baby-jumbo frame issue is Lantiq vrx200 pppoe @ 1500 bytes reoccurring.

I think mini jumbo frame was activated by a mistake.
You can ask the Support but i dont think you get mini jumbo frames.
Remove all mtu related options from the config.

example
Telekom germany on BNG DSLAM there is support for jumbo frame with max mtu 8000 but telekom closed source property pppoe server software dont allow more then 1492 mru even for real fiber connection.