PPPoE disconnects all the time

Updates:

  • I've tested configuring the LAN ports for the WAN connection and the problem persists;
  • Out of desperation, I contacted the ISP. Level 2 support verified their PPP logs and confirmed they're deauthing the connection. This seems to be the root cause. They suspect a problem with my modem and are sending someone to verify. They reset it to defaults so I lost IPv6, oh well...

I'll mark it solved once everything is settled.

More updates:

Considering @RHBH has got his PPPoE connection working just fine on a DIR-882 on r13938-15f585afc5, I flashed the same image to test if it'd work.
Spoiler: it didn't.
But the "great" news is that, for the first time, one of the lan interfaces showed intermittency on this version! So I believe there's definitely something wrong with the mt76 drivers - RHBH is having wifi issues.
Granted, I flashed a DIR-882 firmwate on a DIR-878 device, but it's very nearly the same hardware and I'm writing this though it, if it were incompatible it'd probably have bricked the device or at least LAN wouldn't work at all.

I'm trying a different setup to see if my connection becomes stable: I configured the ISP modem as router, and OpenWRT DIR-878 running DIR-882 r13938 WAN through DHCP, effectively doing double-NAT.

Here's the system log that shows the newfound LAN intermittency: https://pastebin.com/CqFNPmEu

Further testing connecting the desktop to the modem directly and dialing PPPoE via Linux shows everything works fine, except when OpenWRT is involved.

Should I file a bug report at this point?

Correct. DIR-867 A1, DIR-878 A1 and DIR-882 A1 shares exactly the same hardware specs and even the same base board. The differences is that DIR-867 is limited to just 3x3 streams at hardware level while DIR-882 has the USB ports the other two lack. DIR-1760 A1, DIR-1960 A1 and DIR-2660 A1 also shares the same base board and hardware specs, apart from having 256 MB of RAM instead of 128 MB and having 128 MB NAND flash instead of 16 MB NOR flash.

Maybe you have a defective unit?
Did anyone with the same model complained about having the same problem as you?
Was your unit working as it should on stock firmware?

1 Like

It is a possibility. I didn't test the stock firmware because I specifically purchased this unit to run OpenWRT on it, so I got it out of the box and flashed. I'll search how to restore to stock firmware for further testing - I need to "make sure the firmware is unencrypted" but I have no idea how to do that.

  • The wan interface connect/disconnect persists with DHCP - either I have a defective unit or this is a driver/kernel bug.

  • Someone filed a bug report facing the exact same issue on another device. - @RHBH the bug OP also is facing WiFi chaos

I'll try building the firmware from the master branch - once I find out how to do that - as suggested in the bug comments and, if that doesn't work, I'll try testing with the stock firmware.

Thanks again for your help @Handyman

Read this, it'll help...

1 Like

Absolutely, thank you. The wiki is pretty good but it's sometimes hard to find where exactly in it the information you need is.

EDIT:
Flashed the stock firmware v1.20, successfully decrypted with dlink-decrypt.
The WAN connection is up and I'm gonna see if it lasts or if the random drops persist, to rule out a layer 1 failure completely.

My DIR-882 with r13938 had no WiFi issues, I've been using it since July 26.

I had WiFi issues with r14088, but I didn't reset to factory defaults when upgrading from r13938 to r14088, could this be the reason I had issues? Anyway, I'll do further testing later, currently I'm back in r13938.

Regarding PPPoE, I never encountered issues as you described.

I would try the following settings in WAN port

# Override MTU
uci set network.wan.mtu='1492'
# Inactivity timeout - Close inactive connection after the given amount of seconds, use 0 to persist connection
uci set network.wan.demand='0'
# LCP echo failure threshold - Presume peer to be dead after given amount of LCP echo failures, use 0 to ignore failures
uci set network.wan.keepalive='0 5'
# Force Link - Set interface properties regardless of the link carrier (If set, carrier sense events do not invoke hotplug handlers).
uci set network.wan.force_link='1'

Make sure you set the MTU properly, I guess for PPPoE you don't use anything above 1492 due PPPoE overhead (8 bytes).

This is a possibility, but at this point I'm pretty sure we have a bad driver. Just doing some further testing to rule out what's left to rule out - so far the stock firmware hasn't had any drops.

This wouldn't work - it turns out the drops are happening at the link layer, considering that:

, my router is either defective or this is an OpenWRT bug. So far I believe the later, the stock firmware is running stable but I'll wait for an uptime of at least 3h before jumping to any conclusions.

Thanks for the help!

As my WiFi issues started in the newest build, I think they might be related to the new drivers available in snapshot since Aug 06.

https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=bab0d1c33c5b2c3ed7a02ed840d22f7dd3624619

But this is a subject to discussion in another topic.

I think our issues might be related if they're both caused by a similar or the same bug in the driver. But I don't know the code to be able to tell.

Update:

The stock firmware has been running for over 3h without any issues whatsoever nor a single drop, I consider the router itself confirmed good.
This is an OpenWRT bug. I'm going to open a bug report related to the one above.

EDIT: Bug report opened. I'm done troubleshooting this issue unless someone else asks for more information. I'm sticking to the stock firmware until this bug is fixed or someone shows me I made a mistake somewhere and I can somehow fix it by myself.

As stated on OpenWRT webpage Snapshots are Development builds. These contain the latest technology, but may not work well, or at all.
So stay on stock firmware till a stable build is released for the D-Link DIR - 878.

1 Like

I also have problems with WAN on https://bugs.openwrt.org/index.php?do=details&task_id=3271, Archer C60 WAN disconnects and Udhcpc does not work correctly.

Thanks! The more people speak up, more likely it is it will get fixed.
I'll link your reports on my bug report.

In bugs.openwrt.org i see that several have similar problems.

1 Like

Hey @Dynamo, is this fixed for most people on 21.02.0? I have just upgraded to 21.02.0 and I appear to be experiencing the issue.

I have a https://openwrt.org/toh/hwdata/elecom/elecom_wrc-2533gst2 and for the last year I was using the snapshot version that was linked to the wiki at 2020/06/27 09:05. I don't know which precise hash it was running and what code was included. About 2 months ago the wan down disconnects got progressively worse and worse I thought it might be the hardware failing or perhaps a bad cable to the router but I tried various things to no avail.

After reading the various linked pages from this one it seemed like perhaps the related code had been fixed and just afew weeks ago a 21.02.0 firmware was released for my device so I went to give it a go.
The wan down errors are much worse on this new version 21.02.0 as far as I can tell. Is there a work around patch that you have found to help with the issue? The only fixes I found were in code related to drivers other than MediaTek MT7621A.

Any info on the current status of this issue would be appreciated, are others still experiencing it? It makes OpenWRT pretty unusable for me unfortunately.

Hey @ev6ds !

I can confirm that this problem has been fixed for my device (the DIR-878) on 21.02. I'm still running the rc4 version as I still haven'd had time to upgrade to the final release.

WAN link uptime is about 4 days now, never ran into this issue again after installing 21.02.0-rc4.

I opted to simply wait until a version that had the issue fixed was available.

I'm baffled to hear it doesn't fix it for your device... I sincerely have no idea what's going on. If I can provide any other info that might help you, please let me know.

Maybe you could test rc4?

Godspeed,

Thanks so much for the fast reply Dynamo!

Just now I did a quick compare of the logs for the issue I was getting with the snapshot version from (2020-06-27) and the logs from the error I'm getting on 21.02.0 and they appear slightly different.

snapshot version from (2020-06-27)

Sat Sep 25 21:53:05 2021 daemon.info pppd[31865]: Plugin rp-pppoe.so loaded.
Sat Sep 25 21:53:05 2021 daemon.info pppd[31865]: RP-PPPoE plugin version 3.8p compiled against pppd 2.4.8
Sat Sep 25 21:53:05 2021 daemon.notice pppd[31865]: pppd 2.4.8 started by root, uid 0
Sat Sep 25 21:53:06 2021 kern.info kernel: [796777.181834] mt7530 mdio-bus:1f wan: Link is Down
Sat Sep 25 21:53:06 2021 daemon.notice netifd: Network device 'wan' link is down
Sat Sep 25 21:53:06 2021 daemon.notice netifd: Interface 'wan6' has link connectivity loss
Sat Sep 25 21:53:06 2021 daemon.notice netifd: Interface 'wan' has link connectivity loss
...
Sat Sep 25 21:53:15 2021 daemon.info pppd[32185]: Plugin rp-pppoe.so loaded.
Sat Sep 25 21:53:15 2021 daemon.info pppd[32185]: RP-PPPoE plugin version 3.8p compiled against pppd 2.4.8
Sat Sep 25 21:53:15 2021 daemon.notice pppd[32185]: pppd 2.4.8 started by root, uid 0
Sat Sep 25 21:53:20 2021 daemon.notice netifd: Network device 'wan' link is down
Sat Sep 25 21:53:20 2021 daemon.notice netifd: Interface 'wan6' has link connectivity loss
Sat Sep 25 21:53:20 2021 daemon.notice netifd: Interface 'wan' has link connectivity loss
Sat Sep 25 21:53:20 2021 kern.info kernel: [796790.493625] mt7530 mdio-bus:1f wan: Link is Down

21.02.0

Sun Sep 26 07:13:40 2021 daemon.info pppd[30717]: Plugin rp-pppoe.so loaded.
Sun Sep 26 07:13:40 2021 daemon.info pppd[30717]: RP-PPPoE plugin version 3.8p compiled against pppd 2.4.8
Sun Sep 26 07:13:40 2021 daemon.notice pppd[30717]: pppd 2.4.8 started by root, uid 0
Sun Sep 26 07:13:55 2021 daemon.warn pppd[30717]: Timeout waiting for PADO packets
Sun Sep 26 07:13:55 2021 daemon.err pppd[30717]: Unable to complete PPPoE Discovery
Sun Sep 26 07:13:55 2021 daemon.info pppd[30717]: Exit.
Sun Sep 26 07:13:55 2021 daemon.notice netifd: Interface 'wan' is now down

They do appear slightly different so perhaps the original issue I was having with the older snapshot version has been fixed. It was originally failing after 5 seconds with no error at all. Now it's failing after 15 seconds with a warning "Timeout waiting for PADO packets". Perhaps I just got unlucky with my service provider experiencing some outage precisely when I was testing it. I'll keep my eye on things and update here if I find the new cause when it happens again. It looks like the new issue is some kind of timeout by design rather than a random failure of the underlying firmware.

Just updating things appear to have stabilized and I haven't had the issue again. I'm guessing it was a provider issue that coincidentally happened right after updating.

2 Likes