Router booting from harddrive

I've noticed a peculiar issue recently on a few routers running the 23.X firmware. A 1900acs, a 3200acm, and a rockpro64 i have all have harddrives connected to them for NAS purposes. When each device reboots due to maintenance, internet outage (watchcat program), or power outage, sometimes they will try and bootup from the NAS harddrive and just keep hammering the HDDs until I physically turn off the router and harddrives and power them back up. It happens about 1 in 20 times. The harddrives will keep making the same noises until I intervene. My question is simply, how can I troubleshoot this issue if it is happening at bootup? All the harddrives are 6TB or larger and have a sturdy case with a dedicated power supply. It occurs on both USB 3 and USB2.0 ports. The issue did not occur in any previous firmware and I'd say i started noticing the issue in 23.05.0 but wrote it off as a fluke until it happened twice this week.
Thanks,

I wonder if there is any relation to this:

Are they directly connected or through a USB hub? Do you have a powered USB hub to which you can connect them (unlike the other thread, I know you have external power supplies for those drives, but it would be curious if there is an issue that can be resolved with a hub).

The rockpro64 uses a powered usb C hub while the 1900acs and 3200acm are directly connected.

I should also specify that the 1900acs technically uses eSata on the eSata/usb2.0 combo port. The rockpro64 uses 1 usb3.0 and the usb C powered hub (3 drives altogether). The 3200acm uses a usb2.0 and usb3.0.

I'd be curious if you simply unplug the USB cable when the problem manifests and the plug it back in again, does that resolve it? The thinking here is maybe there is a race condition or some other USB enumeration issue that can be resolved by physically affecting the usb bus.

I will try and recreate the issue now on my rockpro64 and get back with you. Thanks.

I simulated the issue and unplugging the USB c hub and the USB3.0 drive didn't make a difference. I didn't have to actually power down the harddrives but I did have to restart the rockpro64 device.

Does the problem occur only when the usb is connected at the time that the router boots, or does it happen even if the usb connection is made after booting?

I have a few other routers with the latest firmware that do not have usb connections and they don't have this issue, another 3200acm included, so I'm guessing only when the USB is connected at boot time. I can also hear the HDDs churning when the issue occurs... The 1900acm HDD has woken me up in the middle of the night trying to boot off of the HDD. All of the HDDs use btrfs with LUKS encryption.

What happens if you remove the respective mount stanzas from fstab?

I'm admittedly not great at fstab so I do my best to avoid it and will answer with the best of my abilities.

My fstab /etc/fstab is empty by default

:~# cat /etc/fstab
# <file system> <mount point> <type> <options> <dump> <pass>
root@

I use a hdd script in my /etc/rc.local file.

# cat /etc/config/hddboot
if [ ! -f /dev/mapper/morty ];
then
if cryptsetup luksOpen --key-file /etc/openvpn/morty /dev/sda morty;
then mount /dev/mapper/morty /mnt/backup
fi
if cryptsetup luksOpen --key-file /etc/openvpn/morty /dev/sdb morty;
then mount /dev/mapper/morty /mnt/backup
fi
if cryptsetup luksOpen --key-file /etc/openvpn/morty /dev/sdc morty;
then mount /dev/mapper/morty /mnt/backup
fi
fi
if [ ! -f /dev/mapper/8terry ];
then
if cryptsetup luksOpen --key-file /etc/openvpn/8terry /dev/sda 8terry;
then mount /dev/mapper/8terry /mnt/archive
fi
if cryptsetup luksOpen --key-file /etc/openvpn/8terry /dev/sdb 8terry;
then mount /dev/mapper/8terry /mnt/archive
fi
if cryptsetup luksOpen --key-file /etc/openvpn/8terry /dev/sdc 8terry;
then mount /dev/mapper/8terry /mnt/archive
fi
fi
if [ ! -f /dev/mapper/carl ];
then
if cryptsetup luksOpen --key-file /etc/openvpn/carl /dev/sda carl;
then mount /dev/mapper/carl /mnt/usb
fi
if cryptsetup luksOpen --key-file /etc/openvpn/carl /dev/sdb carl;
then mount /dev/mapper/carl /mnt/usb
fi
if cryptsetup luksOpen --key-file /etc/openvpn/carl /dev/sdc carl;
then mount /dev/mapper/carl /mnt/usb
fi
fi

Ok... so maybe try disabling that script (comment it out or remove it) and then reboot the router with the drive plugged in. Does the same issue occur, or does it boot smoothly without causing the drive to go nuts?

Assuming it is all good at that point, manually mount your drive after the system has completed booting. Does that work as expected?

Meanwhile, why do you have the 2 different commands repeated 3 times?

I will try it.

I can't predict which drive will be on /dev/sda through /dev/sdc. They have always booted up differently for years and the long if statement just tries every key on /dev/sda through /dev/sdc.

Ah... ok. So this would be a good place for fstab because you can mount deterministically using UUIDs. I don't know how it interacts with luks disk encryption, though.

Do we talk about the old obsolete mechanical HDD here or SSD?
If we talk about the old HDD type, how many hours do the SMART say it has been operational?
Are you sure it isn’t simply broken and worn out out since this was what happened to those old HDD when they got worn out, they stopped working at boot time and made a lot of noise when doing their last breath…

I will look into using uuids. It will reduce my script size but im not sure that my /etc/rc.local file is even initialized since I cant ping my router at any point.

I will get the smart output and get to you but it doesn't sound like the "Click of death". It sounds like a hard drive initializing and trying to access files repeatedly over and over every 20 seconds or so.

What do you mean by this? Is the router not booting up properly?

rc.local will be the last thing run during the startup sequence. It is important to recognize that this doesn't mean that all previously launched sequences are actually complete.... that is not guaranteed by the init.d process. Instead, init.d's sequencing only specifies the order that things start, it is not dependent on the previous task completing. That's why I think it is worth testing without the startup mounting scripts so we can see if there might be a race condition.

I do not think the router is booting up properly. They all seem to get hosed trying to do USB discovery but these are only physical observations. I don't think I can recreate the issue without my harddrives connected. On the rockpro64, there is a bios i can tune, but the 1900acs or 3200acm don't have a bios i can look into.

Unless you have done something unusual with your setup, the OpenWrt doesn't require any USB drives to be connected to boot. Even if you've done extroot or an overlay pivot to an external drive, the device should be able to boot without that drive connected.