BT HomeHub 5A not reconnecting when re-plugging DSL cable

Hi all,
in the last days I was testing a bit with the BTHH5A to verify if it's reliable in case of DSL disconnections, or cable unplug/replug (which may happen sometimes in my case), and check if it reconnects without issues.

Speaking of the unplug/replug use case, what I see is some random behaviour. Sometimes the connection is re-established as soon as the DSL cable is plugged back in, other times the wan interface just seems to never wake up again, independently from how much time the modem has been left without cable. When this happens, no matter what kind of reset I perform (ip link set up/down, reset of the interface via LuCI, reset of the network service), the interface is just dead and reconnection never triggered... or at least it isn't for a looong time.

This night I unplugged/replugged the cable a couple of times until the interface didn't came back again (yes, this is all it's required to make it happen, just a handful of tries with some random minutes between each other). I then left the cable plugged in for the whole night, just to see if it would ever connect again by its own. And well, in the morning I found it back online! Great! But... these are the logs since the disconnection, up to the reconnection complete with PPPoE session established (some pppd debug noise stripped):

Sep 29 00:57:58 OpenWrt kernel: [102292.565925] leave showtime                  
Sep 29 00:57:58 OpenWrt netifd: Network device 'dsl0' link is down              
Sep 29 00:57:58 OpenWrt netifd: Interface 'wan' has link connectivity loss      
Sep 29 00:57:59 OpenWrt pppd[21809]: Terminating on signal 15                   
Sep 29 00:57:59 OpenWrt pppd[21809]: Connect time 36.9 minutes.                 
Sep 29 00:57:59 OpenWrt pppd[21809]: Sent 5257644 bytes, received 138515652 bytes.
Sep 29 00:57:59 OpenWrt netifd: Network device 'pppoe-wan' link is down         
Sep 29 00:57:59 OpenWrt pppd[21809]: Script /lib/netifd/ppp-down started (pid 22040)
Sep 29 00:57:59 OpenWrt pppd[21809]: sent [LCP TermReq id=0x2 "User request"]   
Sep 29 00:57:59 OpenWrt pppd[21809]: Script /lib/netifd/ppp-down finished (pid 22040), status = 0x1
Sep 29 00:58:02 OpenWrt pppd[21809]: sent [LCP TermReq id=0x3 "User request"]   
Sep 29 00:58:03 OpenWrt netifd: Interface 'wan' is now down                     
Sep 29 00:58:03 OpenWrt netifd: Interface 'wan' is disabled                     
Sep 29 00:58:03 OpenWrt kernel: [102297.599151] IPv6: ADDRCONF(NETDEV_UP): dsl0: link is not ready
Sep 29 00:58:03 OpenWrt netifd: Interface 'wan' is enabled                      
Sep 29 00:58:03 OpenWrt dnsmasq[21250]: no servers found in /tmp/resolv.conf.auto, will retry
Sep 29 03:48:17 OpenWrt kernel: [112510.502412] enter showtime, cell rate: 0 - 2150, 1 - 2150, xdata addr: 0xa6040000
Sep 29 03:48:17 OpenWrt kernel: [112510.508878] IPv6: ADDRCONF(NETDEV_CHANGE): dsl0: link becomes ready
Sep 29 03:48:17 OpenWrt netifd: Network device 'dsl0' link is up                
Sep 29 03:48:17 OpenWrt netifd: Interface 'wan' has link connectivity           
Sep 29 03:48:17 OpenWrt netifd: Interface 'wan' is setting up now               
Sep 29 03:48:17 OpenWrt kernel: [112510.526041] enter showtime, cell rate: 0 - 2150, 1 - 2150, xdata addr: 0xa6040000
Sep 29 03:48:17 OpenWrt insmod: module is already loaded - slhc                 
Sep 29 03:48:17 OpenWrt insmod: module is already loaded - ppp_generic          
Sep 29 03:48:17 OpenWrt insmod: module is already loaded - pppox                
Sep 29 03:48:17 OpenWrt insmod: module is already loaded - pppoe                
Sep 29 03:48:17 OpenWrt pppd[22678]: Plugin rp-pppoe.so loaded.                 
Sep 29 03:48:17 OpenWrt pppd[22678]: RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7
Sep 29 03:48:17 OpenWrt pppd[22678]: pppd 2.4.7 started by root, uid 0          
Sep 29 03:48:22 OpenWrt pppd[22678]: PPP session is 1863                        
Sep 29 03:48:22 OpenWrt pppd[22678]: Connected to 54:4b:8c:97:37:10 via interface dsl0
Sep 29 03:48:22 OpenWrt pppd[22678]: using channel 8                            
Sep 29 03:48:22 OpenWrt kernel: [112516.011569] pppoe-wan: renamed from ppp0    
Sep 29 03:48:22 OpenWrt pppd[22678]: Renamed interface ppp0 to pppoe-wan        
Sep 29 03:48:22 OpenWrt pppd[22678]: Using interface pppoe-wan                  
Sep 29 03:48:22 OpenWrt pppd[22678]: Connect: pppoe-wan <--> dsl0               
Sep 29 03:48:22 OpenWrt pppd[22678]: PAP authentication succeeded               
Sep 29 03:48:22 OpenWrt pppd[22678]: local  IP address edit                     
Sep 29 03:48:22 OpenWrt pppd[22678]: remote IP address 192.168.100.1            
Sep 29 03:48:22 OpenWrt pppd[22678]: primary   DNS address 85.38.28.6           
Sep 29 03:48:22 OpenWrt pppd[22678]: secondary DNS address 85.38.28.7           
Sep 29 03:48:22 OpenWrt pppd[22678]: Script /lib/netifd/ppp-up started (pid 22775)
Sep 29 03:48:23 OpenWrt netifd: Network device 'pppoe-wan' link is up           
Sep 29 03:48:23 OpenWrt netifd: Interface 'wan' is now up                       
Sep 29 03:48:23 OpenWrt dnsmasq[21250]: reading /tmp/resolv.conf.auto           
Sep 29 03:48:23 OpenWrt dnsmasq[21250]: using local addresses only for domain test
Sep 29 03:48:23 OpenWrt dnsmasq[21250]: using local addresses only for domain onion
Sep 29 03:48:23 OpenWrt dnsmasq[21250]: using local addresses only for domain localhost
Sep 29 03:48:23 OpenWrt dnsmasq[21250]: using local addresses only for domain local
Sep 29 03:48:23 OpenWrt dnsmasq[21250]: using local addresses only for domain invalid
Sep 29 03:48:23 OpenWrt dnsmasq[21250]: using local addresses only for domain bind
Sep 29 03:48:23 OpenWrt dnsmasq[21250]: using local addresses only for domain lan
Sep 29 03:48:23 OpenWrt dnsmasq[21250]: using nameserver 85.38.28.6#53          
Sep 29 03:48:23 OpenWrt dnsmasq[21250]: using nameserver 85.38.28.7#53          
Sep 29 03:48:23 OpenWrt pppd[22678]: Script /lib/netifd/ppp-up finished (pid 22775), status = 0x1
Sep 29 03:48:23 OpenWrt firewall: Reloading firewall due to ifup of wan (pppoe-wan)

As you can see, it took almost 3 hrs to reconnect since the time i replugged the cable (just few minutes after the cable disconnection at the start of the logs). How can this be?

Also, when the wan interface goes down, pppd terminates, and I guess this is the expected behaviour as far as I've read around, since interfaces and connections should be managed by netifd, which is in charge to restart it where appropriate. So as I understand it, the culprit should not be pppd itself, but the condition that triggers a new PPPoE session -presumably some sort of DSL line detection- which is not triggered for a long time, as we can see. Please correct me about any of this if I'm wrong.

In any case, this is how pppd gets executed, if it matters:

/usr/sbin/pppd nodetach ipparam wan ifname pppoe-wan lcp-echo-interval 1 lcp-echo-failure 5 lcp-echo-adaptive +ipv6 nodefaultroute usepeerdns maxfail 1 user <user> password <password> ip-up-script /lib/netifd/ppp-up ipv6-up-script /lib/netifd/ppp6-up ip-down-script /lib/netifd/ppp-down ipv6-down-script /lib/netifd/ppp-down mtu 1492 mru 1492 plugin rp-pppoe.so nic-dsl0

I played a bit with the persist and maxfail options in /etc/ppp/options trying to keep pppd always up on the interface, but they don't seem the get applied, and since pppd executes fine whenever it's called, they don't seem to be much relevant after all.

I'd like to note that I never ever had this sort of problem using other commercial modem with OEM firmwares, so it should not be on the ISP side. And in general, when not pulling wires around, the DSL wan looks reasonably stable.

Thanks everyone for the support.

What version of OpenWrt is installed?

See also
https://forum.openwrt.org/t/bt-homehub-dsl-wan-gets-stuck-on-boot/74683

Perhaps test an older version such as OpenWrt 18 or LEDE 17 ?

Hi @bill888,

What version of OpenWrt is installed?

OpenWrt 19.07.4, r11208-ce6496d796

See also
BT Homehub DSL WAN gets "stuck" on boot

I guess I already stumbled on that post. I don't know if it's the same problem... quite the opposite actually: my modem is working fine on boot, or at least in many tests it always succeeded on the first connection. Also, when the interface enters that locked state, a reboot is exactly the only thing that seems to unlock it, while an interface restart from LuCI doesn't do anything, as opposed to him.

Perhaps test an older version such as OpenWrt 18 or LEDE 17 ?

This is another test, yes... I don't know if I can take it shortly.

Waiting to downgrade and test with version 18, I was thinking about this a bit more... considering that no activity is shown whatsoever when re-plugging the dsl cable (not in system logs nor in ubus monitor which implies nothing in the LuCI interface either), and that the first relevant message after this long interval is

kernel: [112510.502412] enter showtime, cell rate: 0 - 2150, 1 - 2150, xdata addr: 0xa6040000

does it mean that this sould be debugged below the kernel layer? Or is there some option to let the interface be more verbose and notify more detailed status informations (like dsl line probing and the like) during this dead interval of time? If someone has more suggestions about debugging, I can see what i can do with it.

Thanks.

Hi all, some updates on this issue...

I tried with OpenWrt 18.06.8 r7989-82fbd85747, and after many tests it looks like the issue is not present. The modem always starts resyncing as soon as the cable is plugged back in... like in a matter of seconds as I would expect it to be.

Now I'm curious... I've gone through the 19.07.0 changelog for lantiq to see if some changes introduced by the new major may have caused this, but I'm not sure there is anything related... So if someone has some advices about further debugging it let me know. Also I should probably fill a bug report at this point i guess...

1 Like

fwiw, I don't know whether you wish to test 19.07.0 to confirm whether the issue you discovered has been present ever since 19.07 was released?

fwiw, debugging will be a nightmare without the help of developer who is familiar with the device, or someone who knows how to build images and roll back commits until the culprit commit is found which is extremely time consuming, when previously having to trace bugs in HH5A. (Random Reboot bug in 17.01 and Packet Steering bug in 18.06)

I personally only use 17.01 or 18.06 on a pair of actual DSL connected HH5As. I do use 19.07 for non-DSL applications.

Yes I was taking 19.07.0 as reference just for the fact that it contains the most of the changes for the lantiq target, compared to the subsequent minors, but yes, I could also try 19.07.0 maybe. Anyways now it's just a matter to see when it was introduced (always keeping in mind that 18.06 could be a false negative as well due to the random nature of the issue itself)...

Yes I can imagine, especially time consuming because in this case one has to manually plug/unplug cables, and wait to see what happens, and also it's not deterministic at all, and during this process no one at home would be able to access the internet, which is bad... But if some dev has some easy but meaningful test to do which doesn't steal too much effort and time on their side I can try nonetheless.

Thanks for sharing your experience, 18.0.6 looks stable enough indeed... I would say the same for 19.07.4 if it wasn't for this issue, for what i could see.

I would speculate other Lantiq devices may be affected too. I haven't yet seen other similar comments posted on the forum though.

We don't know what version of OpenWrt is running on this HH5A I previously quoted.
https://forum.openwrt.org/t/bt-homehub-dsl-wan-gets-stuck-on-boot/74683

It just seems too much of a coincidence that both reports appear in past fortnight with 19.07.4 only released about one month ago.

Well in my case it appears now because 19.07.4 is the first release I used with dsl ever. I wish I had a longer experience so I could have more comparison elements from older versions...

Also the issue itself is not that easy to spot, since it requires someone who purposely starts unplugging/replugging the cable multiple times until it fails, which is not a common and sane relationship with a modem, I guess. I wouldn't have noticed it either if I left the modem connected for months until the next power outage, which wouldn't have triggered it anyways. But (un)fortunately I explicitly needed to check its reliability in case of cable reconnection. So there may be a chance that it was already present on previous releases and dormant for long time as well...

Ok, I can confirm this for 19.07.0 too. I'm going to stick with 18.06.8 again for some time to double check that it's not affected.

1 Like

This is likely not even an OpenWrt issue. Your log file entries

Sep 29 00:58:03 OpenWrt dnsmasq[21250]: no servers found in /tmp/resolv.conf.auto, will retry
Sep 29 03:48:17 OpenWrt kernel: [112510.502412] enter showtime, cell rate: 0 - 2150, 1 - 2150, xdata addr: 0xa6040000

suggest that the modem itself was unable to synchronize in these three hours. If it were synchronized and OpenWrt was unable to connect, you would have a slew of PPPoE connection attempts in the log. (Once the firmware is uploaded to the modem chipset, and the modem is started, it really does its own thing. It's its own machine in a way.)

Consequently, there are only two possibilities:

  1. you are hitting some bug in the modem firmware

or, what I think is more likely:

  1. your ISP/DSLAM/DLM/"the other side of the DSL line" is taking offense with multiple unprompted disconnects and will refuse to synchronize with your end for some time

Hi @takimata, thanks for joining.

Yes... in fact, just to be sure, I posted the logs with pppd in debug mode and also asked for hints to get some more infos from the modem itself. Like you say, it's not syncing at all.

Yes, this is what I suspect the most, also because:

I don't know... I first noticed the issue exactly because it happened within the first couple of tries. If it didn't, I'm sure I would have labeled it as working since I wouldn't have taken more than 5ish tries (if I know myself enough...). Also:

  • considering a random interval of at least 5 minutes between retries, to a maximum of -let's say- 1 hour (that's how I tested it more or less)
  • in some tests it happened after ~2/3 attempts after at least a day of uptime, so no previous "suspect" activity... even with 5 mins interval it doesn't sound like a lot of noise until the 3rd one to me... but ok, you never know the ISP setup
  • with 18.06 I think I literally flooded them with reconnection attempts and they always succeeded
  • in some other tests I left it unplugged overnight, and when I plugged it back after ~8 hours it wasn't syncing either
  • if it was ISP side, it doesn't explain why a reboot always suddenly fixes it
  • I remember I never had this issue with commercial modems over the years

But hey, everything is possible... I'll try my best to make 18.06 fail as well.

When the problem appears, do you recall what does DSL Status report in LuCI > Status > Overview page?

ie. is 'Status' reported as 'Down' with no 'Data Rate' displayed?

fwiw, it wouldn't be too difficult to try alternative Lantiq (vr9) firmware .bin file in 19.07 to see if it turns out to be a simple fix. eg. use earlier one if 18.06 doesn't exhibit the issue?

Just to complete the picture: What kind of DSL line are you on? ADSL2+? VDSL2? VDSL2 with vectoring? (It can't very well be VDSL with Supervectoring since the HH5A chipset doesn't support it.)

Yes it's DOWN with no data rates and stats, just as the cable was still unplugged.

which one is it? /lib/firmware/lantiq-vrx200-a.bin ? I also see the vdsl_fw_install.sh script which can download some firmware... about the 18.06 firmware, maybe I can try once it proves to be good.

ADSL2+

Correction. I just compared 18.06.4 to 19.07.0 and it turns out the Lantiq firmware is the same. Same version also reported in 17.01.6 with no active DSL connection.

-----------------------------------------------------
 OpenWrt 18.06.4, r7808-ef686b7292
 -----------------------------------------------------
root@OpenWrt:~# cd /lib
root@OpenWrt:/lib# cd firmware
root@OpenWrt:/lib/firmware# ls
ath10k
ath9k-eeprom-pci-0000:00:0e.0.bin
lantiq-vrx200-a-to-b.bspatch
lantiq-vrx200-a.bin
regulatory.db
xcpe_581816_580B11.LICENSE
xcpe_581816_580B11.bin
xcpe_581816_580B11_to_579906_574402.bspatch

root@OpenWrt:/lib/firmware# /etc/init.d/dsl_control status
ATU-C Vendor ID:
ATU-C System Vendor ID:
Chipset:                                  Lantiq-VRX200
Firmware Version:                         5.8.1.8.1.6
API Version:                              4.17.18.6


 -----------------------------------------------------
 OpenWrt 19.07.0, r10860-a3ffeb413b
 -----------------------------------------------------
root@OpenWrt:~# ls /lib/firmware
ath10k
ath9k-eeprom-pci-0000:00:0e.0.bin
lantiq-vrx200-a-to-b.bspatch
lantiq-vrx200-a.bin
regulatory.db
xcpe_581816_580B11.LICENSE
xcpe_581816_580B11.bin
xcpe_581816_580B11_to_579906_574402.bspatch

root@OpenWrt:~# /etc/init.d/dsl_control status
ATU-C Vendor ID:
ATU-C System Vendor ID:
Chipset:                                  Lantiq-VRX200
Firmware Version:                         5.8.1.8.1.6
API Version:                              4.17.18.6

When there is an active VDSL2 annex B connection, the version number is reported as: 5.7.9.9.0.6

   \  DE    \  /    Reboot (17.01.6, r3979-2252731af4)
    \________\/    -----------------------------------------------------------

root@LEDE:~# /etc/init.d/dsl_control status
ATU-C Vendor ID:                          Broadcom 193.144
ATU-C System Vendor ID:                   Broadcom
Chipset:                                  Lantiq-VRX200 Unknown
Firmware Version:                         5.7.9.9.0.6
API Version:                              4.17.18.6
XTSE Capabilities:                        0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2
Annex:                                    B
Line Mode:                                G.993.2 (VDSL2)
Profile:                                  17a
Line State:                               UP [0x801: showtime_tc_sync]

The vdsl_fw_install.sh script is used to apparently download a vectoring firmware.
https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=0653e3127054993e908878f70fbbe1a1b784dd61

1 Like

It would have to be. There hasn't been a firmware update since late 2016, which I believe is due to more recent firmware versions not being open sourced anymore.

To the OP: I have been using OpenWrt on Lantiq with an ADSL line until recently, for the better part of a decade actually, and never once encountered the problem you are describing. Especially the fact that the line eventually comes back seems to confirm my guess is that you are provoking the DLM just a bit too much. But again, it's just a guess. You would probably have to talk to your ISP to confirm what's going on, maybe they can even check the logfiles of your line card to see what's breaking and why.

1 Like

Ok so there should be no difference... interesting. Maybe I can try with a different firmware.

Mhh ok I wouldn't have suspected a firmware bug so much if I knew that... so to sum up 18.06 should be definitely affected as well, and if it is either the bug is not in the firmware or it has been there for at least 4 yrs, which seems a bit unlikely but who knows...

Good to hear... what I'm concerned about is that it may have gone unnoticed given its very specific nature: it doesn't happen once connection is established, nor on reboots, just when purposely unplugging the cable and not every time. I guess most users (myself included, but not this time) can leave a cable attached to the back of a device for months, or years I'd say... In that case one may never stumble upon it, or even that rare time it happened somehow, reporting a thing that a reboot can fix and you never see again for the rest of your life may not be worth the effort...

(If you want to know why it's relevant to me, that's because when our area is hit by lightning storms, they may happen to propagate especially through the phone line. Over the past years, we already suffered damages to various equipments more than few times... so we often find ourselves unplugging only the phone cable during those events, rather than the power adapter too, which in my case is also harder to reach)

Thanks for the link, very interesting... but is that jsut a BT thing or does it apply to DSL in general? Because I'm not a BT customer, i don't even live in England...
In any case, I would not explain some of the tests i documented above, one for all the reboot which always get things working again... or the cable replug after 8hrs and no sync going on (fixed by a reboot, obviously).

I'm not the first to suggest a phone line surge protector, am I?

I really can't answer that, sorry.

Eheh not really, thanks... but I've never really considered one of them as I read not so positive opinions about their effectiveness against lightning strikes. I'm talking about those sort of pass-through boxes with a hanging ground wire fitting in a hand, idk if more solid and expensive ones are availabe which can withstand the power of a lightning strike... also they require proper grounding which makes the installation a bit tricky.
OT: Do you recommend them?