Sysupgrade vs PPPoE: redial / hang-up problems with fast booting devices (Xiaomi AX3200 (mediatek/mt7622, 22.03), Xiaomi AX3600, Dynalink DL-WRX36 (ipq807x/generic))

My ISP (DIGI) uses PPPoE with a weekly connect time.
They don't like multiple dial-ins, and responds with Too many sessions. And when the router don't hang up the connection properly, one should wait a few minutes to timeout the Too many sessions response.

Problem 1: immediate redialing

Solved! :white_check_mark:
I'm unable to reproduce it on SNAPSHOT

Here is a log about my problem:

  1. PPPoE connection ends with Connect time expired
  2. router redials immediately and ISP accepts
  3. got a modem hangup with connection time 0.1 minutes
  4. subsequent redials fail until the Too many sessions times out at ISP
  5. after 6 minutes the router is able to create the pppoe-wan connection

A similar problem I could reproduce with a simple ifdown wan command:

  • the router redials the PPPoE connection
  • ISP denies with Too many sessions
  • wired access to router lost for a few seconds (:thinking:)
  • ISP accepts in a minute ... instead of 5+ minutes

What's going on?
Why the first redial terminates after a few bytes?
Why does start the router redialing even if I explicitly asked for ifdown wan?

Problem 2: Too many sessions after sysupgrading the device

See comment #11!

1 Like

hi,

did it ever resolve for you?

my theory is that your ISP have multiple PPP servers an the initial ppp discovery is received and answered by all servers, while may the previous session not yet fully teared down", then accounting system kicks in and disallow you to have multiple sessions because you just pay for one service.

I'll try to retest in a few days!

thanks. i am just wondering if my problem here Possible regression in pppd is related or not. i am using the same ISP as you, so might be an ISP problem at the end. but it is basically impossible to get any proper response from them (other than everything is superb on their end but for an extra charge they can come to me and check locally if the problem exists) ...

I don't think so: I never had this problem before with my earlier devices. I had ath79, ramips/mt7621, ipq40xx devices before.

Just tested with ifdown wan on SNAPSHOT r21900-ac21dff5b6. It works nicely, no redial happens:

root@AX3200-B24 ~ # ifdown wan; logread -f -l 20 |grep pppd
Wed Jan 25 20:21:56 2023 daemon.info pppd[2191]: Terminating on signal 15
Wed Jan 25 20:21:56 2023 daemon.info pppd[2191]: Connect time 4.3 minutes.
Wed Jan 25 20:21:56 2023 daemon.info pppd[2191]: Sent 3877277 bytes, received 160303956 bytes.
Wed Jan 25 20:21:56 2023 daemon.debug pppd[2191]: Script /lib/netifd/ppp-down started (pid 11222)
Wed Jan 25 20:21:56 2023 daemon.debug pppd[2191]: Script /lib/netifd/ppp-down started (pid 11228)
Wed Jan 25 20:21:56 2023 daemon.debug pppd[2191]: sent [LCP TermReq id=0x2 "User request"]
Wed Jan 25 20:21:56 2023 daemon.debug pppd[2191]: Script /lib/netifd/ppp-down finished (pid 11222), status = 0x1
Wed Jan 25 20:21:56 2023 daemon.debug pppd[2191]: Script /lib/netifd/ppp-down finished (pid 11228), status = 0x1
Wed Jan 25 20:21:56 2023 daemon.debug pppd[2191]: rcvd [LCP TermAck id=0x2]
Wed Jan 25 20:21:56 2023 daemon.notice pppd[2191]: Connection terminated.
Wed Jan 25 20:21:56 2023 daemon.info pppd[2191]: Connect time 4.3 minutes.
Wed Jan 25 20:21:56 2023 daemon.info pppd[2191]: Sent 3877277 bytes, received 160303956 bytes.
Wed Jan 25 20:21:56 2023 daemon.info pppd[2191]: Exit.

It also works with reboot.

EDIT:
sysupgrade doesn't work - Too many sessions after reboot

Testing on 22.03-SNAPSHOT, r20042-28e1770a3b:

  • ifdown wan works, no redials
  • reboot works, no redials
  • sysupgrade doesn't work - Too many sessions after reboot

Testing on 22.03.3, r20028-43d71ad93e:

  • ifdown wan works, no redials
  • reboot works, no redials
  • sysupgrade doesn't work - Too many sessions after reboot

I'll keep the router on 22.03.3 and check the weekly connection hang-up.

well if you are comfortable with mtkwifi drivers and swconfig i could let you try a build
22.03.2 or 22.03.3

Thanks, no need!

1 Like

thanks for doing these tests. could you please elaborate sysupgrade part?

Sure.
I'm mostly on SNAPSHOT builds: the builds come from master or openwrt-22.03.

So sysupgrade means running auc and upgrading from one nightly build to another: a simple sysupgrade command which upgrades the device and reboots.

It looks like the router doesn't hang-up the connection while sysupgrade-ing and after restart it gets Too many sessions for a few minutes from the ISP.

that's interesting as reboot (i guess you mean issue the reboot command) works, one would assume manual reboot and sysupgrade reboot would be the same.

1 Like

Changed to the ipq807x based Xiaomi AX3600, and I was able to reproduce with sysupgrade the Too many sessions problem. :ok_hand:

Anyway, it has 3 days uptime, so need a few more days to check the automatic redial.
EDIT: it works nicely! :ok_hand:


Now I think this sysupgrade vs PPPoE hang-up stuff never worked, and is surfaced only by these new devices. They applying updates and reboot in a few seconds. :upside_down_face:

I also have digi with an ax3600 and had problems with the pppoe stuff.
Sometimes when the router crashed or I rebooted disconnecting the router from the mains, I had to reboot the ont because that error blocked me from connecting the internet.

As a workaround I use ifdown wan after downloading the new image from the URL got from auc -y -n.

I also discovered that It seems like when digi gives me a new sesion, the problem resurfaces at my end too. Sometimes at random times (probably because my session expires) the router loses the wan interface with the same problem (not really sure, I'll have to get logs the next time). I normally don't keep the router up for a week, but I will have to check that in the future.

I also have a Xiaomi 4a Gigabit (MT7621) flashed with openwrt so this weekend if I have time I'll check if the problem persists.

Hello.
I have the same problem with DIGI and ax3600 and one of the latest snapshots. Did you solved the problem?

This is the log for the problem:

Wed May 24 08:10:34 2023 daemon.info pppd[2613]: LCP terminated by peer (Peer not responding)
Wed May 24 08:10:34 2023 daemon.info pppd[2613]: Connect time 722.3 minutes.
Wed May 24 08:10:34 2023 daemon.info pppd[2613]: Sent 1848981962 bytes, received 724318089 bytes.
Wed May 24 08:10:34 2023 daemon.notice netifd: Network device 'pppoe-wan' link is down
Wed May 24 08:10:34 2023 daemon.notice netifd: Interface 'wan' has lost the connection
Wed May 24 08:10:37 2023 daemon.notice pppd[2613]: Connection terminated.
Wed May 24 08:10:37 2023 daemon.info pppd[2613]: Connect time 722.3 minutes.
Wed May 24 08:10:37 2023 daemon.info pppd[2613]: Sent 1848981962 bytes, received 724318089 bytes.
Wed May 24 08:10:37 2023 daemon.notice pppd[2613]: Modem hangup
Wed May 24 08:10:37 2023 daemon.info pppd[2613]: Exit.
Wed May 24 08:10:37 2023 daemon.notice netifd: Interface 'wan' is now down
Wed May 24 08:10:37 2023 daemon.notice netifd: Interface 'wan' is setting up now
Wed May 24 08:10:37 2023 daemon.info pppd[17766]: Plugin pppoe.so loaded.
Wed May 24 08:10:37 2023 daemon.info pppd[17766]: PPPoE plugin from pppd 2.4.9
Wed May 24 08:10:37 2023 daemon.notice pppd[17766]: pppd 2.4.9 started by root, uid 0
Wed May 24 08:10:47 2023 daemon.notice hostapd: phy1-ap0: AP-STA-DISCONNECTED [deleted mac]
Wed May 24 08:10:52 2023 daemon.warn pppd[17766]: Timeout waiting for PADO packets
Wed May 24 08:10:52 2023 daemon.err pppd[17766]: Unable to complete PPPoE Discovery
Wed May 24 08:10:52 2023 daemon.info pppd[17766]: Exit.

Unfortunately no.
@McGiverGim and I have been talking for a bit about the problems with digi.
One theory that I have is that digi is doing some sort of load balancing and changing the server that the router connects to and openwrt doesn't know how to reconnect and you lose the pppoe session.

I want to try to use pfsense as the router and see if the problem is with the pppoe implementation of openwrt or digi is doing something non-standad.

Maybe @McGiverGim can help you a bit more than me.

If it is useful, I have to say that I was using DIGI's own router ZTE H298Q for almost 2 years and no complaint about disconnections. I don't remember any of them in that period. The problems arrived when I swaped this router for the xiaomi ax3600 on openwrt.

Very interesting, I didn't saw this thread until now.
I'm on AX3600 too, and these are the two problems with Digi that I've noticed:

  • The first, as you have detected, is the amount of time needed to reconnect after a sysupgrade. I haven't noticed that it only happens on sysupgrade and not on reboots, very interesting. The router seems to take the IP from the ISP but the routing does not work at all and it needs usually some minutes to start working, and sometimes I need several reboots because it does not seem to start working.
  • The second one, having enabled IPv6, it works perfectly, but after some time (maybe two days, maybe 3 weeks...) some applications start failing and not having internet. It happens with wallapop, hbo max, etc. Only happens to applications that try to use IPv6. IPv4 continues working. The most strange is that others continue working. I can do a ping using IPv6 without problem, for example. In my latest tests seems that a service network reload (not restart) fixes the issue, without loosing connectivity, but I can't confirm until it fail again. (EDIT: I verified that this is not true)

I will keep an eye on this thread, let's see if we can fix the problem.