LuCI Attended Sysupgrade support thread

efahl · April 20, 2025, 10:48pm

This is a placeholder thread for reporting issues when upgrading using the LuCI Attended Sysupgrade app. The app itself will point to this thread as the place to report issues (it currently says you should report issues on the ASU github, where they go unnoticed).

Here's the app change, for the curious: https://github.com/openwrt/luci/pull/7734

psherman · April 21, 2025, 6:00am

I am having an issue with what I think is a stale build, and I can't get the expected ASU customized image.

When the ASU server was broken yesterday, one of my systems was at position 80 in the queue, but I had closed the window and came back to it later today. Now, when I click on the request build button, it doesn't show the build process. Rather, after a few seconds, it jumps to the "Successfully created firmware image" modal. But, the download link doesn't work (page not found) -- I'm guessing that the file was already purged from the ASU server. But, I don't seem to be able to get it to actually build a new image. I restarted the device, but that didn't change the situation.

Any ideas how I can force it to actually build via ASU? Would you like me to share the link?

robd3 · April 21, 2025, 7:48am

Attended Sysupgrade (using luci app)
allows selection of version:
24.10.1 r28597-0425664679
but then gives error:

Wrong checksum
Error during download of firmware. Please try again

RLB7777 · April 21, 2025, 8:52am

Same issue here as @psherman. Requested an image on April 20th, which was about 2500 in the queue. After a few hours I closed the window, and now when I go back and request again it immediately says the image has been built but the image download page cannot be found (I assume it was purged). Similarly, I can't force it to build a fresh image via ASU. Any ideas on how to work around this?

S7venLights · April 21, 2025, 9:40am

I've just searched for firmware updgrade but the Sha 256 reported doesn't match what's shown at the official firmware selector
The attended sysupgrade output:

Oh... is this because the attended sysupgrade includes layered packages?
Oh 2... I also see there's an update for attended sysupgrade I hadn't installed yet but even after updates and reboot, sysupgrade is still defaulting to the firmware with a different hash.

RLB7777 · April 21, 2025, 9:57am

OK, I manged to force the creation of a new firmware image using ASU by simply adding a new package to my existing Openwrt instance (something basic like some extra statistics using collectd. That forces a new build.

For me, it turns out to be no help because my BPI R3 needs a fresh install as I'd previously changed NAND flash layout (otherwise Openwrt wouldn't take advantage of the extra storage in an R3). But for others, hopefully this is an easy workaround.

efahl · April 21, 2025, 2:15pm

Yes, that's almost certainly the cause as the symptoms are consistent. The ASU server has the job results still lingering in its database, pointing to the build artifacts, but the build artifacts have been deleted...

The job results are subject to a time-to-live value, and Paul just changed those as part of the "clean out sooner" changes yesterday (the changes) Previously, successful builds were kept around for 24 hours, now they are kept for 3. The ttl is stored in the job metadata, so this new default probably was set after you requested your build, so highly likely it will stick around for the 24h period.

How to work around this:

Wait 24h from the previous build and try again after the job has expired from the ASU server's database.
Since jobs are identified by the hash key of the build request data (including version, target, platform, package names, etc), you can change the build request somehow and start a new job. The simplest way to change the job id in the LuCI app is by adding or deleting a package. This is sort of intrusive and ugly, but since LuCI ASU is pretty limited in its options, this is about the best you can do. (Firmware Selector and owut both allow you to add an 'init script' to the build request. Setting this to a script that does nothing - #xyz - is one easy way to change the hash key and force a new build.)

efahl · April 21, 2025, 2:19pm

Correct. The new sha is from the result of building the new image containing all the extra packages you've installed, and hence should not match the sha up on the firmware selector or downloads sites.

efahl · April 21, 2025, 2:21pm

As long as both LuCI ASU and sysupgrade don't complain about a sha mismatch, things should be fine.

Where exactly are you seeing the different hash?

efahl · April 21, 2025, 2:30pm

If you can tolerate running upgrades from the CLI, owut gives you access to the ROOTFS_PARTSIZE option of the imagebuilder, so you can increase it up to 1GB... (Details owut: expanding root file system, send any questions to Owut: OpenWrt Upgrade Tool)

psherman · April 21, 2025, 7:51pm

I ran ASU again today and it worked without issue. So it does seem like it was the persistence of the hash despite the actual image file having been purged. But since it's been over 24 hours, things are sync'd again.

Thanks!

efahl · April 22, 2025, 12:57am

I think when Paul saw the disk was full on the server, he just whacked all the artifacts, without regard for whether there were any jobs still looking for them.

With the current code, each job expires at 24h (well, 3h now), and a daily cron job looks for artifacts over 24h old and deletes them, so things shouldn't normally get out of sync... But, it does leave a lot of junk on disk well past its expiration date.

I've been running an experimental "janitor" for some months that sits inside the server, and every 15m or so runs through the artifacts to see if the job that created it exists or not, and if not then deletes it. Same ultimate result, but maybe a bit more aggressive in that artifacts are removed much sooner but still safely, hopefully averting the "disk full" that seems to pop up every time we do a release and the thundering herd requests builds all at once. (https://github.com/openwrt/asu/pull/1370)

psherman · April 22, 2025, 2:07am

I've got some additional ideas:

Could the janitor job be adaptive based on available disk space. For example, if it sees that the disk space is approaching some critical quota, it could start clearing out the oldest builds on a much more aggressive schedule.
And, for that matter, does the server have (or could it be setup to collect) statistics about the average and standard deviation of times from job-complete-to-latest-download so that there could be some data driven decision making about how aggressive the schedule should be -- balancing disk space vs processing loads. We don't want to be too eager to delete a build if people are downloading it (again) later (say an hour or two), but at the same time, if the average user is downloading within a really short time (for example, 5 minutes), we could go average + 3 std dev and probably keep a good balance of disk vs processor.

Thoughts? (disclaimer: I'm not a software engineer, nor do I have any insight into the backend of the sysupgrade ser.ver, so I have no idea how complex this would be to code into the existing envrionment)

RLB7777 · April 22, 2025, 1:25pm

Thanks for this. I've done a clean install now but when we get to version 25.0 I'll give this a try!

efahl · April 22, 2025, 4:55pm

Oh, I like the idea of using disk space as the metric. The janitor probably can't be made more aggressive without shortening the TTL on builds even further, but using the disk space to do rate limiting could solve the same problem. (The lag on a job expiring out of the database and it's artifacts being removed is only 10 minutes with that pending janitor PR, so it's pretty much the TTL on the jobs that dictate the schedule.)

The recently-added rate limiting is based on unserviced jobs in the queue (https://github.com/openwrt/asu/blob/main/asu/routers/api.py#L236), but it could just as easily check disk space and then say, "nope, wait a while until there's room" instead.

Should be simple code using just Python stdlib tools:

$ python -c 'import shutil ; print(shutil.disk_usage("."))'
usage(total=921143603200, used=794362228736, free=126781374464)

The server does log build duration for each job, but it's doing it wrong (dur = now - time-imagebuilder-created), so the data is unusable right now. The logs contain timestamps for both build initiation and every time the results are download (dammit it's doing that wrong, too!!! logging both HEAD and GET request).

I did measure clearance rates for the day we had 2000+ entries in the queue, though and with the server setup (number of "worker" instance) at that time, it was running about 5.1 jobs/min.

One issue I foresee in adjusting TTL as jobs are added, is that it's using a (very) lagging indicator. By the time you need to drop TTL to clear out old jobs, those old jobs that need shorter TTL were already assigned a long TTL... Not sure how much work it would be to scan the jobs and reset their TTL (or if it's even possible). There is a way to outright kill jobs, not sure how to find out their remaining TTL, maybe that would be the way to go.

More stuff to think about...

S7venLights · April 23, 2025, 3:04pm

Don't worry about this, 'oh 2' was a question before knowing that the hash should be different

Kraligor · April 23, 2025, 6:48pm

Hello qq because I couldn't find the answer in the manual; both LuCI ASU and auc in CLI tell me that I'm on the latest version (23.05.5) despite 24.10.1 being available for download for my router, but when I enter:

auc -b 24.10

...then it offers me to upgrade to 24.10.1.

Is this expected behavior? Should I upgrade with the -b flag?

Thanks!

efahl · April 23, 2025, 8:55pm

auc, owut and LuCI ASU app are all written to stay on a given release, so you have to make explicit that you want to jump the boundary. Prior to some features added over the last couple years, the ASU clients were not very good at crossing release boundaries: you'd end up with errors when there were package changes and things like that. It wasn't until a couple of months ago that the LuCI app was enhanced to be on par with the other clients, so now they are all capable of this.

Yes, auc -b 24.10 should do the job for you. This assumes there aren't major changes on your device like the swconfig -> DSA conversion, for which you'll want to reconfigure from scratch.

Megachip · April 23, 2025, 9:13pm

Server response: Error: Could not set up ImageBuilder

Please report the error message and request

Request Data:

2025-04-23 21:10:07 (89.2 MB/s) - ‘sha256sums.asc’ saved [299/299]

gpg: directory '/builder/.gnupg' created
gpg: keybox '/builder/.gnupg/pubring.kbx' created
gpg: key 1D53D1877742E911: 3 signatures not checked due to missing keys
gpg: /builder/.gnupg/trustdb.gpg: trustdb created
gpg: key 1D53D1877742E911: public key "OpenWrt Build System (Nitrokey3) " imported
gpg: key 28A39BC32074BE7A: 2 signatures not checked due to missing keys
gpg: key 28A39BC32074BE7A: public key "OpenWrt Build System (PGP key for 19.07 release builds) " imported
gpg: key CD84BCED626471F1: 2 signatures not checked due to missing keys
gpg: key CD84BCED626471F1: public key "OpenWrt Build System (PGP key for unattended snapshot builds) " imported
gpg: key 88CA59E88F681580: public key "OpenWrt Build System (PGP key for 21.02 release builds) " imported
gpg: key CD54E82DADB3684D: public key "OpenWrt Build System (GnuPGP key for 22.03 release builds) " imported
gpg: Total number processed: 5
gpg:               imported: 5
gpg: no ultimately trusted keys found
rm: cannot remove '/builder/keys/CD54E82DADB3684D.asc': Permission denied
rm: cannot remove '/builder/keys/0x1D53D1877742E911.asc': Permission denied
rm: cannot remove '/builder/keys/2074BE7A.asc': Permission denied
rm: cannot remove '/builder/keys/88CA59E8.asc': Permission denied
rm: cannot remove '/builder/keys/626471F1.asc': Permission denied

Isn't that fixed yet? And is this really a luci attended sysupgrade issue? ^^ (But hey, it requested me to post here, so ill do

Kraligor · April 23, 2025, 9:54pm

Thanks! I'll create a backup and try upgrading.