Grub2 hangs before boot menu after restoring SSD

Hello friends.

So, I've been preparing a PC I had built to be my new router. I've been using Macrium to backup the SSD but haven't tried restoring yet.

I took a break on the weekend and decided to restart working on it today, and decided to take the chance to restore the SSD. I restored the whole drive, not the partition.

And now Grub2 stopped working. It only shows "GRUB" and a blanking cursor, doesn't show boot menu.

I searched about it and it seems (https://plosquare.blogspot.com/2010/05/troubleshooting-grub-hangs.html) that boot files were moved out of reachable area and Grub is hang on stage 2. I'm not sure if that's it.

I restored the backup to a VMware VM, it doesn't even show GRUB message. Anyway I booted with ubuntu-20.04.2.0-desktop-amd64.iso and followed https://help.ubuntu.com/community/Grub2/Installing > Fixing a Broken System > via the LiveCD terminal.

Grub2 was reinstalled successfully, but it still fails to boot.

Shouldn't an average Grub2 boot partition work with OpenWRT? Should I try uninstalling and making a clean install?

I was able to properly replicate the issue on a VM and create a snapshot.

Cursor keeps blinking.

I had noted all apps I installed and have all my configs on a Subversion project, so I guess I could just wipe the SSD and reinstall it all again. But I need a working backup solution (reliable and quick restore), so I'm gonna try to make this work on the VM and get backup to work too.

Any help is very appreciated :slight_smile:

Update. I managed to use Ubuntu to fix boot, but now Grub2 doesn't know what to do, and neither do I lol

1 Like

Seems you're missing the grub2./grub.cfg in /boot/grub

You can regenerate it using grub2-mkconfig

But it isn't really an Openwrt issue.

3 Likes

Recreate a bigger openwrt disk image based on the official original disk image - #4 by vgaetera

You may need to utilize specific options to properly reinstall GRUB2.
Also verify that PARTUUID matches the one in the GRUB2 config.

1 Like

Well I was unable to replicate the issue on a clean VM. Both Macrium and Clonezilla correctly created and restored backups of the disk and partition, with original sizes and after resizing.

Only thing I can guess is that particular backup was bugged. Macrium verifies it. When it's restored, Grub2 bugs again.

At least this happened before the router went production. I'm gonna try now to restore a clean backup then restore OpenWRT partition above it and see if I can recover it.

As I said I can just make a new install and redo the setup, but I wanna use the stress I had to at least take the opportunity to learn and be more prepared next time it happens.

I'm planning to now keep a double backup, on Macrium and on Clonezilla. At least when 1 fails I'll have the other.

Update. I was able to recover it using Macrium on both ways. Restoring new backup disk then restoring original root partition over it, and restoring original disk backup then restoring new backup boot partition + MBR leaving root partition intact. I just have to note new partuuid and update grub.cfg.

It's rly very odd how the bug happened during backup creation. Time to try to revive the router.

1 Like

This is very odd. Yesterday as I described I was able to recover the router using Macrium itself.

Today I created a new (full, not incremental) backup from it, and restored this backup on the VM. And the same GRUB message is happening.

I used now GParted to delete the 2 partitions, but the message keeps happening. Where is it?!

I guess this code is stored on MBR itself. Or I'm crazy and leaving something behind.

I'm gonna delete this HD and create a new one and see what happens.

Update 1. Indeed, on a clean HD, VMware boot goes to PXE. After restoring the latest backup, GRUB msg bug happens. If I restore the recovery backup over it, Grub2 boots again.

My bet is there's something odd on my router, be it the NVMe controller, the SSD, or maybe Macrium's WinPE driver. Backups made on it are storing corrupted MBR and when these backups are restored it becomes unable to boot.

I'm gonna create a backup with Clonezilla to see how it behaves.

Update 2. This gets worse as I deeper investigate.

While creating the backup with Clonezilla it reported the SSD contains mismatched GPT and MBR partitions. I know OpenWRT uses MBR, so I used the command sgdisk -z /dev/nvme0n1 to destroy the GPT.

The backup was made, but then when I rebooted, the same issue happened!!

When the PC was built it had Win10 installed to test if all hardware was working, might the installation of OpenWRT using dd not wipe the GPT area and caused this?

During backup it reported a inode was wrong, I didn't understand it and just allowed it to fix.

I then used Macrium to restore back the recovery backup and it booted again. I tried again to create a Clonezilla backup, and this time no corruption was reported, and it's still booting.

Any idea what might be happening?

Update 3. I was gonna test restoring Clonezilla backup into VMware but it doesn't support restoring on smaller drive, even if the destination drive has enough space and the rest is just unpartitioned.

I created a new HD and finally restored Clonezilla, and it boots. It was restored on sdb, but PARTUUID wasn't changed so I didn't need to adapt grub.cfg or fstab.

Now both Macrium and Clonezilla backups are tested and working. I still wanna test both on real hardware. It takes a lot of time and is boring and frustrating so I'm gonna postpone the test. It's time to finish the setup and see it routing my LAN!

1 Like

Make sure you are performing backup when the file system is offline.
This means you should boot another OS and back up unmounted partitions.
Otherwise the file system can become corrupted and lose its integrity.

1 Like

I always do. It seems Clonezilla assures partition is unmounted.

I'm not sure but I guess that dd - or OpenWRT combined img - doesn't expect there's GPT on the drive. Maybe when I deleted WinPartition using GParted it left the drive as GPT, then dd wrote MBR image and that made GPT corrupted but still present.

Clonezilla recognized it and didn't accept making the backup until it was fixed, and I hope it was. Macrium didn't and made bugged backup, that corrupts one drive when restored on it.

If that's it, maybe I had to convert the drive to MBR before dding OpenWRT on it.

All this is assumption but is the best expanation I have for now.

1 Like

I installed and configured a few more stuff and made another backup. Macrium restored it to VM with no issue.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.