Boot process spends ages enumerating block devices

Hi, forum.

I have a TP-Link Archer C2600 running a custom build (it calls itself r22658+49-2c530fcb97), that's giving me some trouble. Specifically, I can see, thanks to some debugging I've added to the hotplug machinery, that it's taking several seconds to process each block device early in boot, totaling around an unfortunate 90 seconds before it considers any other kind of hardware:

Sun May 14 15:35:00 2023 user.warn kernel: [   36.163578] hotplug-debug: BLOCK; env: ACTION=add DEVICENAME=loop1 DEVNAME=loop1 DEVPATH=/devices/virtual/block/loop1 DEVTYPE=disk DISKSEQ=2 HOME=/ HOTPLUG_TYPE=block LOGNAME=root MAJOR=7 MINOR=1 PATH=/usr/sbin:/usr/bin:/sbin:/bin PWD=/ SEQNUM=983 SHLVL=1 SUBSYSTEM=block SYNTH_UUID=0 TERM=linux USER=root
Sun May 14 15:35:00 2023 user.warn kernel: [   38.752012] hotplug-debug: BLOCK; env: ACTION=add DEVICENAME=mtdblock23 DEVNAME=mtdblock23 DEVPATH=/devices/platform/soc/1a200000.gsbi/1a280000.spi/spi_master/spi0/spi0.0/mtd/mtd23/mtdblock23 DEVTYPE=disk DISKSEQ=32 HOME=/ HOTPLUG_TYPE=block LOGNAME=root MAJOR=31 MINOR=23 PATH=/usr/sbin:/usr/bin:/sbin:/bin PWD=/ SEQNUM=984 SHLVL=1 SUBSYSTEM=block SYNTH_UUID=0 TERM=linux USER=root
Sun May 14 15:35:00 2023 user.warn kernel: [   41.303332] hotplug-debug: BLOCK; env: ACTION=add DEVICENAME=mtdblock13 DEVNAME=mtdblock13 DEVPATH=/devices/platform/soc/1a200000.gsbi/1a280000.spi/spi_master/spi0/spi0.0/mtd/mtd13/mtdblock13 DEVTYPE=disk DISKSEQ=22 HOME=/ HOTPLUG_TYPE=block LOGNAME=root MAJOR=31 MINOR=13 PATH=/usr/sbin:/usr/bin:/sbin:/bin PWD=/ SEQNUM=985 SHLVL=1 SUBSYSTEM=block SYNTH_UUID=0 TERM=linux USER=root
...
Sun May 14 15:35:00 2023 user.warn kernel: [  123.002933] hotplug-debug: BLOCK; env: ACTION=add DEVICENAME=mtdblock25 DEVNAME=mtdblock25 DEVPATH=/devices/platform/soc/1a200000.gsbi/1a280000.spi/spi_master/spi0/spi0.0/mtd/mtd25/mtdblock25 DEVTYPE=disk DISKSEQ=34 HOME=/ HOTPLUG_TYPE=block LOGNAME=root MAJOR=31 MINOR=25 PATH=/usr/sbin:/usr/bin:/sbin:/bin PWD=/ SEQNUM=1015 SHLVL=1 SUBSYSTEM=block SYNTH_UUID=0 TERM=linux USER=root
Sun May 14 15:35:00 2023 user.warn kernel: [  125.636408] hotplug-debug: I2C; env: ACTION=add DEVICENAME=i2c-0 DEVPATH=/devices/platform/soc/100f8800.usb3/10000000.dwc3/xhci-hcd.0.auto/usb1/1-1/1-1.4/1-1.4:1.0/i2c-0 HOME=/ HOTPLUG_TYPE=i2c LOGNAME=root PATH=/usr/sbin:/usr/bin:/sbin:/bin PWD=/ SEQNUM=1017 SHLVL=1 SUBSYSTEM=i2c TERM=linux USER=root

The only thing that's at all unusual about this machine is that it's got a USB hub attached to it with a USB flash drive, a Yubikey, and an I2C bridge to an RTC. All of that machinery seems to be working quite nicely, once the machine is booted, it just... takes far longer to boot than ideal. Suggestions?

If you remove the hub and extra devices, does it boot quickly?

1 Like

Sorry for the long delay in answering; with everyone working from home (and keeping in touch with various timezones) it's hard to find a good time to reboot the gateway. In any case, no, the boot is not appreciably faster without USB devices connected. Key events' timestamps from dmesg:

                                                                    With USB        Without USB
"VFS: Mounted root (squashfs filesystem) readonly on device 31:12"  [    2.916059]  [    2.916219]
"init: Console is alive"                                            [    4.384900]  [    3.839212]
"procd: - init -"                                                   [   22.979026]  [   23.350285]
First "hotplug-debug: BLOCK;"                                       [   28.154923]  [   28.872889]
Last "hotplug-debug: BLOCK;"                                        [  126.217021]  [  119.062529]
Last boot kernel message                                            [  162.880062]  [  158.774940]

What's the output of

ls -lR /etc/hotplug.d/

Big. Since I think you are mostly asking about which files exist and not their metadata, here's find /etc/hotplug.d -type f:

/etc/hotplug.d/block/00-debug-kmsg
/etc/hotplug.d/block/00-media-change
/etc/hotplug.d/block/10-mount
/etc/hotplug.d/dhcp/00-debug
/etc/hotplug.d/firmware/11-ath10k-caldata
/etc/hotplug.d/i2c/00-debug-kmsg
/etc/hotplug.d/i2c/99-bus
/etc/hotplug.d/i2c/99-rtc-probe-failed
/etc/hotplug.d/ieee80211/10-wifi-detect
/etc/hotplug.d/ieee80211/10_fix_wifi_mac
/etc/hotplug.d/iface/00-debug
/etc/hotplug.d/iface/00-netstate
/etc/hotplug.d/iface/11-sqm
/etc/hotplug.d/iface/15-teql
/etc/hotplug.d/iface/20-firewall
/etc/hotplug.d/iface/30-nlbwmon
/etc/hotplug.d/iface/70-pbr
/etc/hotplug.d/iface/95-ddns
/etc/hotplug.d/iface/99-nwf-tun-rt
/etc/hotplug.d/net/00-debug
/etc/hotplug.d/net/00-sysctl
/etc/hotplug.d/ntp/00-debug
/etc/hotplug.d/ntp/25-dnsmasqsec
/etc/hotplug.d/ntp/99-rtc0
/etc/hotplug.d/openvpn/01-user
/etc/hotplug.d/rtc/00-debug-kmsg
/etc/hotplug.d/rtc/99-read-time
/etc/hotplug.d/tty/40-nut_serial
/etc/hotplug.d/usb/00-debug-kmsg
/etc/hotplug.d/usb/30-libhid-ups
/etc/hotplug.d/usb/99-i2c-if
/etc/hotplug.d/usb/99-yubikey-entropy

Most of those are stock; the ones I've written are

etc/hotplug.d/block/00-debug-kmsg
etc/hotplug.d/dhcp/00-debug
etc/hotplug.d/i2c/00-debug-kmsg
etc/hotplug.d/i2c/99-bus
etc/hotplug.d/i2c/99-rtc-probe-failed
etc/hotplug.d/iface/00-debug
etc/hotplug.d/iface/99-nwf-tun-rt
etc/hotplug.d/net/00-debug
etc/hotplug.d/ntp/00-debug
etc/hotplug.d/ntp/99-rtc0
etc/hotplug.d/rtc/00-debug-kmsg
etc/hotplug.d/rtc/99-read-time
etc/hotplug.d/usb/00-debug-kmsg
etc/hotplug.d/usb/99-i2c-if
etc/hotplug.d/usb/99-yubikey-entropy

FWIW. /etc/hotplug-preinit.json and /etc/hotplug.json are stock.

What are the contents of

/etc/hotplug.d/block/00-debug-kmsg
/etc/hotplug.d/block/00-media-change
/etc/hotplug.d/block/10-mount

Perhaps one of those is causing the slow boot.

The first is custom and should be quite short to execute:

# cat /etc/hotplug.d/block/00-debug-kmsg
(echo -n 'hotplug-debug: BLOCK; env: ' ; env | sort | tr '\n' ' ') | dd iflag=fullblock 2>/dev/null >/dev/kmsg

The other two are stock:

# cat /etc/hotplug.d/block/00-media-change
[ -n "$DISK_MEDIA_CHANGE" ] && /sbin/block info

if [ "$ACTION" = "add" -a "$DEVTYPE" = "disk" ]; then
	case "$DEVNAME" in
		mtd*) : ;;
		*) echo 2000 > /sys/block/$DEVNAME/events_poll_msecs ;;
	esac
fi
# cat /etc/hotplug.d/block/10-mount
[ "$ACTION" = "add" -o "$ACTION" = "remove" ] && /sbin/block hotplug

Can you add the following to the top and bottom of each script and reboot.

logger "$0"

We should be able to see which script is the slow one.

No need for that. Poking around myself, /sbin/block hotplug is the culprit:

ACTION=add DEVNAME=mtdblock23 time /sbin/block hotplug
Command exited with non-zero status 255
real	0m 2.51s
user	0m 0.00s
sys	0m 2.49s

I'm afraid this is a case of accidentally quadratic behavior. Despite the intention of looking for things relevant to a single device ($DEVNAME), each invocation of /sbin/block hotplug probes all block devices:

ACTION=add DEVNAME=mtdblock23 strace -t -e file /sbin/block hotplug
...
22:17:00 open("/dev/", O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_DIRECTORY) = 3
22:17:00 open("/sys/class/mtd/mtd0/type", O_RDONLY|O_LARGEFILE) = 3
22:17:00 statx(AT_FDCWD, "/dev/mtdblock0", AT_STATX_SYNC_AS_STAT, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFBLK|0600, stx_size=0, ...}) = 0
22:17:00 open("/dev/mtdblock0", O_RDONLY|O_LARGEFILE) = 3
22:17:00 open("/lib/libblkid.so", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
22:17:00 open("/usr/local/lib/libblkid.so", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
22:17:00 open("/usr/lib/libblkid.so", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
22:17:00 open("/lib/libblkid.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
22:17:00 open("/usr/local/lib/libblkid.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
22:17:00 open("/usr/lib/libblkid.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
22:17:00 statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0755, stx_size=197071, ...}) = 0
22:17:00 open("/dev/mtdblock0", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_CLOEXEC) = 3
22:17:00 statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFBLK|0600, stx_size=0, ...}) = 0
22:17:00 open("/sys/dev/block/31:0", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 4
22:17:00 openat(4, "dm/uuid", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
22:17:00 open("/sys/class/mtd/mtd1/type", O_RDONLY|O_LARGEFILE) = 3
22:17:00 statx(AT_FDCWD, "/dev/mtdblock1", AT_STATX_SYNC_AS_STAT, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFBLK|0600, stx_size=0, ...}) = 0
22:17:00 open("/dev/mtdblock1", O_RDONLY|O_LARGEFILE) = 3
...
22:17:00 open("/dev/mtdblock10", O_RDONLY|O_LARGEFILE) = 3
...
22:17:03 open("/dev/mtdblock9", O_RDONLY|O_LARGEFILE) = 3
...
22:17:03 open("/dev/loop7", O_RDONLY|O_LARGEFILE) = 3
...

This is all happening because of the logic at https://lxr.openwrt.org/source/fstools/block.c#L1256, which probes every block device it can think of, rather than using the device name it already has! I think that call could be replaced with something like

    char buf[PATH_MAX];
    snprintf(buf, sizeof(buf), "/dev/%s", device);
    _cache_load(buf);

to just populate the one device we know we're already looking for.

I've raised an issue report in what I hope is the right place.

1 Like