X86 sysupgrade not working

Hi all -

I've been struggling to get sysupgrade working on x86 and was hoping someone here could share some insight. Please read the details below...

Issue

When upgrading either via sysupgrade or via Luci, OS seems to process the update and reboots, but upon further inspection, the new OS image doesn't appear to have been installed. I've tried my custom images from my own image builder as well as the public images on the OpenWRT mirror.

Steps to reproduce

Tested with both openwrt-18.06.2-x86-64-combined-ext4 & openwrt-18.06.2-x86-64-combined-squashfs

  1. Use image builder to produce VDI and gunzipped IMG files
  2. Use VDI as the boot disk for a VM on virtual box
  3. Boot VM, login either via SSH, local console, or Luci
  4. for CLI, wget the image to /tmp and then run sysupgrade -v -n /tmp/openwrt-18.06.2-x86-64-combined-XXXX.img.gz
    for Luci, upload the image through the UI, opt not to save settings, and then flash the image
  5. In both cases, the VM reboots after a few seconds
  6. In both cases, the OS does not update (indicated by a file written to the overlay with the epoch time as a build marker as well as other additions to the overlay not making it on to the filesystem such as updated config files)

Config / Environment

Built using openwrt-imagebuilder-18.06.2-x86-64.Linux-x86_64 on Debian 8

Build config: CONFIG_TARGET_IMAGES_PAD=y CONFIG_TARGET_ROOTFS_PARTSIZE=64 CONFIG_VDI_IMAGES=y PACKAGES="nano htop curl gzip" PROFILE=Generic FILES=someCustomOverlayPath

Overlay contents:

├── etc
│ ├── banner
│ ├── config
│ │ ├── firewall
│ │ ├── network
│ │ └── system
│ ├── dropbear
│ │ ├── authorized_keys
│ │ └── dropbear_rsa_host_key
│ ├── hosts
│ ├── opkg
│ │ └── distfeeds.conf
│ ├── rc.local
│ └── shadow

VM / Host Info:

Virtual Box Version 5.2.18 r124319
macOS 10.14.4 (18E226)
1x CPU, 128mb RAM
boot disk is setup with an IDE controller (tried SATA as well)
dual ethernet interfaces (e1000)

Other

Any thoughts? :slight_smile: From what I've seen from a few posts on this site and the documentation, this method should work but no matter what I try, I seem to get the same results.

As a side note, I'll be trying to setup a serial console so I have some output independent of SSH or the VM console. The reboot is too quick to catch any meaningful output.

Here are the other threads and documentation that I've come across:

Cheers

Quick update, I did catch the last few lines of output via a screen recording. Nothing too informative, looks like all is well.

...2nd image...because new users can only post one :slight_smile:

@ScriptGiddy, welcome to the community!

Use SquashFS instead of EXT4 if you wish to use the sysupgrade feature.

@lleachii - thanks for the welcome!

Unfortunately I've tried combined SquashFS (base image + update image) and have gotten the same results. I'll build a clean VM right now just to confirm...

I also got serial working last night and was able to catch something interesting on the tail end of sysupgrade before the reboot:

/usr/bin/zcat: exec: line51: gzip: not found

Is my ramdisk somehow missing the appropriate binaries to unzip the upgrade image?

I haven't looked at the source for sysupgrade, but I assume its using zcat piped dd or something to write the image...

cheers

Also, for what its worth, I'm guessing this isn't enabled by default on the builds produced at https://downloads.openwrt.org/releases/18.06.2/targets/x86/64/ since I can't write to the overlay (no space). The documentation you provided (thank you!) says

combined-squashfs.img.gz This disk image uses the traditional OpenWrt layout, a squashfs read-only root filesystem and a read-write partition where settings and packages you install are stored. Due to how this image is assembled, you will have only 230-ish MB of space

Is it safe to assume that the remaining space detailed above would be available to /overlay or am I misunderstanding?

Can you recommend any other diagnostics I can use to verify an image has been actually written and upgraded? I'd love to see if the stock images work so I can rule out something upstream from my custom image build setup.

For the record, these are the docs I'm following to produce the VDI file from the stock images:

got the full serial dump from the vm during sysupgrade (grub kept clearing my terminal so I had to be quick). hopefully there is a clue in here :slight_smile:

this is with openwrt-18.06.2-x86-64-combined-squashfs.img.gz stock

Image metadata not found
Reading partition table from bootdisk...

gzip: stdout: Broken pipe
Reading partition table from image...
Commencing upgrade. Closing all shell sessions.
killall: telnetd: no process killed
killall: can't kill pid 1639: No such process
Sending TERM to remaining processes ... logd dnsmasq netifd odhcpd ntpd ubusd askfirst 
Sending KILL to remaining processes ... 
Switching to ramdisk...
Performing system upgrade...
Reading partition table from bootdisk...
/usr/bin/zcat: exec: line 51: gzip: not found
0+0 records in[   43.365042] sd 0:0:0:0: [sda] Synchronizing SCSI cache

0+0 records out
Reading partition table from image...
Invalid partition table on /tmp/image.bs
[   43.615093] reboot: Restarting system
[   43.619757] reboot: machine restart
[   43.635266] ACPI MEMORY or I/O RESET_

As an alternative update method you could create a small 2nd partition with another OpenWrt instance and update the 1st instance from there.
opkg -o /root/sda3 list-upgradable | cut -f 1 -d ' ' |xargs opkg -o /root/sda3 upgrade --noaction
(remove the --noaction if command works properly)

I switch between partitions with something like this (depends on your environment though. I'm on GPT/EFI with 1st partition as vfat/efi):

#!/bin/sh
mount -t vfat /dev/sda1 /boot
if [ "$(grep -E 'default.*0' < /boot/grub/grub.cfg)" ]; then
	sed -i -e 's/set default="0"/set default="1"/' /boot/grub/grub.cfg
elif [ "$(grep -E 'default.*1' < /boot/grub/grub.cfg)" ]; then
	sed -i -e 's/set default="1"/set default="0"/' /boot/grub/grub.cfg
fi
echo -n "Grub set to: "
grep default < /boot/grub/grub.cfg
umount /boot

Sysupgrade function of x86 works on both squashfs and ext4. There are only differences of factory reset or failsafe. On x86, there are enough of alternative solutions on ext4 builds for such purposes.

I constantly sysupgrade my ext4 snapshot builds since the days of LEDE 17.0.

Judging from your serial output, it's all about broken partition table.
Does your virtual disk have enough space for openwrt?
You can try -p switch (no restore of partition table) of sysupgrade after backing up your valuable data.
-n does not help here, since the config file restoration stage is after a successful flash. So it's meaningless while flash failed.
Or even further, you can try dd'ing the complete img into your virtual disk as a starting point.

@DazzyWalkman

Thanks for the input...responses below...

Sysupgrade function of x86 works on both squashfs and ext4. There are only differences of factory reset or failsafe. On x86, there are enough of alternative solutions on ext4 builds for such purposes.I constantly sysupgrade my ext4 snapshot builds since the days of LEDE 17.0.

good to know! glad someone out there has had some success. out of curiosity, are you still using older lede builds or have you had success now that things have merged back into OpenWRT?

Does your virtual disk have enough space for openwrt?

Yes, it should. I'm using the same build for both the initial VDI and the img.gz upgrade file - in theory, the partitions are 1:1.

I've also tried this with the stock images provided by OpenWRT (both squash and ext).

You can try -p switch (no restore of partition table) of sysupgrade after backing up your valuable data.

thanks, gave that a try! same results as the previously posted serial output.

you can try dd'ing the complete img into your virtual disk as a starting point

good idea. I booted an ubuntu live CD on the VM and used DD to image the disk from a source img. That works without any issue.

cheers

@HectoPascal

cool - thanks for sharing! if I'm reading this right....that would upgrade the OPKG packages, but not necessarily any custom files on an overlay...right?

either way, its a handy tip!

Huh, okay...this is exciting :slight_smile: I've found the issue.

Before diving into the details - can anyone share the best path for me to file a bug?

As I mentioned earlier, I was suspicious that gzip wasn't making it to the ramdisk..so I did some digging into how the ramdisk is assembled and found the gzip was indeed missing from the list of packages to copy over in /lib/upgrade/stage2.

Here is the relevant function, note how gzip is missing from the list of binaries! :slight_smile:

switch_to_ramfs() {
	for binary in \
		/bin/busybox /bin/ash /bin/sh /bin/mount /bin/umount	\
		pivot_root mount_root reboot sync kill sleep		\
		md5sum hexdump cat zcat bzcat dd tar			\
		ls basename find cp mv rm mkdir rmdir mknod touch chmod \
		'[' printf wc grep awk sed cut				\
		mtd partx losetup mkfs.ext4				\
		ubiupdatevol ubiattach ubiblock ubiformat		\
		ubidetach ubirsvol ubirmvol ubimkvol			\
		snapshot snapshot_tool					\
		$RAMFS_COPY_BIN
	do

after adding gzip to this list, sysupgrade worked perfectly!

I've patched my personal images, so I'm happy...but...this is definitely broken in the mainline OpenWRT X86 stock builds of 18.06.2. Again, any guidance on getting a bug filed would be much appreciated!

oh, and @DazzyWalkman - this worked for ext images...so that's super cool. thanks again for the tip.

cheers

1 Like

flyspray

@anomeome thanks :+1:

Congrats on your findings.
I follow snapshot code, compile and flash constantly, in a 'rolling update' fashion, from r2xx to pesent r1xxxx.
I prefer ext4 builds on x86.
In early days, dd was needed occasionally for recovery from some botched sysupgrade ops, but not much in recent 1 or 2 years, as long as I umount all uneccessary partitions (local or remote) before sysupgrade.

I wouldn't use that method on an overlayfs system. It would end up in a mixed files hell (kernel<>kmods) that would render the system unstable unless you strictly don't update any system files on the overlays.

What's the progress of this issue? I encountered the same problem and seems like it's still not solved yet, and the bug report website says "Unconfirmed" on state column.

This bug still exists today, unfortunately.