ZyXEL NBG6616 boot loop after sysupgrade from 17.01.5 to 18.06.1

My ZyXEL NBG6616 was happily running 17.01.5 when I wanted to do a sysupgrade to 18.06.1 (with keep settings). Unfortunately it is now in a boot loop so something went wrong.

The NBG6616 device page mentions a report of a bootloop with 18.06: "WARNING 18.06 currently seems to brick this device (I was stuck in a boot loop). Remove this if confirmed otherwise" I had seen that before upgrading and hoped it was either a one-off or something that may have been fixed with 18.06.1.

It doesn't look like it's fixed in 18.06.1. I can probably fix it using TFTP but does anyone know what went wrong here so we can prevent other people from running into the same issue?

I'm not sure if any active developer has access to the hardware to investigate and fix the problem. It would be helpful if someone could capture a serial boot log and attach it to a ticket.

Thanks! I don’t have any stuff for reading serial but can certainly look into that. Hopefully someone else (perhaps the person that added that warning to the device page?) already has a boot log and can help us get to the bottom of this.

OK, the USB to serial cable will be delivered tomorrow so should be able to provide a boot log then.

Meanwhile the damage seems to be a bit more serious than I expected as I can't get it to pick up any firmware from the TFTP server. It might be rebooting before the TFTP routine is even started. :neutral_face:

OK, the cause is found to be BUG1724, I get the same boot log as the one posted there.

I have now changed the warning on the device page to alert people not to install 18.06 or 18.06.1 and I have set 17.01.5 as the most recent compatible version. Hopefully we can prevent more people from soft bricking their device. It crashes before it can enter the TFTP stage so the only way to recover is using a TTL serial cable and quite a bit of hassle.

In that ticket it's suggested that commit 0cd5e85e7ad621223b0787e66d8ad20fb2694135 could be the cause.

I am not too sure are that commit removes the NBG6716 but not the NBG6616. If I look at the history of that file it has never contained a direct reference to the NBG6616

I might be missing something here (some of this is new territory for me) but if this file previously did not reference the NBG6616 how could it have worked under LEDE 17? Has the device always been referred to in a more generic sense instead of naming it explicitly? Has the build system changed dramatically since LEDE 17 so it now requires a specific reference when it didn't do in the past?

I am no expert, but after looking at the source code before i found this file, https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/ar71xx/files/arch/mips/ath79/mach-nbg6716.c;h=1c08e53f15529c689adae00fed5544c40157ed07;hb=HEAD

As you can see, the file register both NBG6716 and NBG6616 as devices in the end of the file, why deleting the reference to the NBG6716 device also breaks NBG6616.

So most likely the commit you reference is the cause of the problem.

2 Likes

That makes sense, good find!

You said you got it to work, sort of, by adding CONFIG_ATH79_MACH_NBG6616=y to that config-default file.

What happens if you restore it fully and add CONFIG_ATH79_MACH_NBG6716=y to the file instead? That might work better as it would bring the situation for both devices back to how it was.

If it does then it should be a very straightforward patch and we can get it fixed it on trunk. :slightly_smiling_face:

@NeoRaider should know if the removal of NBG6616 (and 6716) by that commit was by accident, or was it intentional.

Likely it (or actually mach-nbg6716 config item that contains also this 6616) was accidentally removed from generic/config-default along the "tiny" devices, and never added back.

Resolved in openwrt-18.06 branch

commit e3022727658166e736198529582a46abf2397ea4 (origin/openwrt-18.06)
Author: Matthias Schiffer
Date:   Mon Aug 27 20:25:01 2018 +0200

    ar71xx/generic: enable Zyxel NBG6616 in kernel config again

Thanks Matthias, after a couple of hours testing I can confirm that it has resolved the issue!

I have upgraded from 17.05.1 to snapshot using a 'Sysupgrade' via the web interface and I have reflashed with a snapshot 'Factory' image using the recovery method from a TFTP server.

In both tests the upgrade procedure went without any issues, and the device boots and runs like it should, using the most recent OpenWRT 18 snapshot.:+1:

I will now update the device page and the original bug ticket to confirm it as resolved for the next official release.

If your problem is solved, please consider marking this topic as [Solved]. (Click the pencil behind the topic...)

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.