Netgear R7800 enters boot loop in LEDE but works with official firmware

Dear community,

I am having a problem with my new Netgear R7800. Please let me know if I'm asking for help correctly! I am new to this forum but have been using LEDE for a few months already. I am running it successfully on a BT Home Hub 5A and a BT Home Hub 2B but wanted something with a faster CPU.

I bought a Linksys WRT3200ACM from Maplin but found its Wi-Fi to be very weak, so I returned it and bought this Netgear R7800 on eBay:

https://www.ebay.co.uk/itm/NETGEAR-Nighthawk-X4S-R7800-Wireless-Cable-Fibre-Router-AC-2600-Dual-band/202157127648

I am pleased with its Wi-Fi and it has been running the latest version of its official firmware, V1.0.2.44, for over nine days without interruption but when I install LEDE it promptly goes into a boot loop. I'm so glad it has a TFTP recovery mode!

I have installed the LEDE 17.01.4 factory image using both TFTP and the official firmware's update function. I also tried the previous version, LEDE 17.01.3. I verified the images' hashes and signatures before I flashed them. I made sure I used TFTP in binary mode. Every time I get the same result: LEDE comes up normally and I start to configure it. At a random time in the next few minutes it will reboot itself. Once it has started it will continue to reboot until I power it off.

When I turn it back on again, it goes into a boot loop immediately, but when I reinstall the official firmware using TFTP, it works again. I tried factory-resetting LEDE and the official firmware, to no avail. Please help me solve this mystery!

I am reluctant to do anything to the R7800 that I cannot undo, since I still have the option of returning it within 30 days of getting it on the 21st of January 2018. I see that custom builds of LEDE are popular as they make more flash space available but they advise backing up the default partitions before installation. Unfortunately, I note that the first step to back up those partitions is to boot into LEDE, which obviously doesn't work.

So, is everybody using custom builds and leaving the "stable" builds for the R7800 unstable, or does mine have a hardware fault, or is there something else going on? I would be very grateful to anyone who can help. Any troubleshooting you recommend, please make it non-destructive and something that I can execute quickly, before LEDE reboots!

Sincerely,
Mark (Baud of troubleshooting)

That is very unusual, as you are flashing LEDE's own stable firmware image, with nothing extra in between.

The only possible thing I could think of—notwithstanding a possible hardware defect—is the factory firmware image is doing something when switching over to LEDE. Curious to see if anyone else has had success or failure when flashing LEDE from NetGear's stock firmware version 1.0.2.44. If it's feasible, see if the flash works on previous firmwares (e.g. I used stock version 1.0.2.36 to flash to LEDE).

I've had one bootloop happen to me as well when going from LEDE back to stock firmware via a forced .img flash. I had to fix it by instead flashing the stock firmware over TFTP mode.

That all being said, I cannot think of any plausible cause as to why your R7800 doesn't appear to get along with LEDE. I would recommend you give hnyman's builds a try—his stable or master builds both work very well on the R7800.

As a last resort, you could try what is known as a "clean flash". This usually entails first resetting the current working firmware to defaults (though if your desire is to flash a different FW, you can skip this step and the next), flashing the same firmware on itself (first flash of factory firmware), then flashing the new, desired firmware and resetting once more (new firmware flash + a reset). LEDE has a nice feature that lets you decide to keep settings or to erase them after flashing (only works with OpenWRT/LEDE firmware though, to my knowledge). So in the case of going from LEDE to LEDE (different firmware versions though), a clean flash is largely unnecessary.

At least my own custom build (the community build linked above) are fully reversible, as they do not modify/extend the flash size. You could try either master or 17.01 build.
Build for Netgear R7800

I have flashed my R7800 over 300 times and have seen a bootloop only a few times, and that was due to testing kernel 4.9 patches (so that it was known beforehand that something may be wrong). Then it was some memory or bus timing error, which the core devs eventually fixed.

There has been something similar with WRT3200ACM lately, where the manufacturer has quietly changed the flash chip to a new model from Winbond that is not yet fully supported by upstream Linux. So, it is possible that the hardware the new R7800s is incompatible. But as your device initially boots ok, that is not that likely

The best way to analyse that kind of bootloop is to open the router and attach a serial console cable. Then you can follow the early kernel bootlog from a terminal. It is not complicated but requires opening the R7800:

EDIT:
Well, I read your description again, and your problem is not actually a boot loop during boot process, but a crash later. Analysing that is a bit easier.

You could start the the router normally, and right after the boot, before you start configuring, open two ssh terminal windows on the side, and then monitor kernel and system logs there. Depending on which driver the crash happens, you might be able to see some errors and crash dump. You might then be able to identify. which driver causes that.

logread -f

But it is possible that there is some hardware failure, e.g. a faulty RAM chip.

Ps. the "clean flash" advice looks overly complicated. As all settings in Openwrt/LEDE are stored as files in the same partition, flashing without keeping settings clears the settings.

It looks as if you've only tested the 17.01.x releases so far, especially given that the r7800 has a pretty robust tftp recovery, testing a more recent snapshot build (or e.g. hnyman's master builds) would probably be a good idea.

Ah yes, you're right about that.

Yes, as slh said above, the general consensus of community builds—and snapshot releases—have been well received (at least for the R7800) @Baud.