This is a placeholder thread for reporting issues when upgrading using the LuCI Attended Sysupgrade app. The app itself will point to this thread as the place to report issues (it currently says you should report issues on the ASU github, where they go unnoticed).
I am having an issue with what I think is a stale build, and I can't get the expected ASU customized image.
When the ASU server was broken yesterday, one of my systems was at position 80 in the queue, but I had closed the window and came back to it later today. Now, when I click on the request build button, it doesn't show the build process. Rather, after a few seconds, it jumps to the "Successfully created firmware image" modal. But, the download link doesn't work (page not found) -- I'm guessing that the file was already purged from the ASU server. But, I don't seem to be able to get it to actually build a new image. I restarted the device, but that didn't change the situation.
Any ideas how I can force it to actually build via ASU? Would you like me to share the link?
Same issue here as @psherman. Requested an image on April 20th, which was about 2500 in the queue. After a few hours I closed the window, and now when I go back and request again it immediately says the image has been built but the image download page cannot be found (I assume it was purged). Similarly, I can't force it to build a fresh image via ASU. Any ideas on how to work around this?
I've just searched for firmware updgrade but the Sha 256 reported doesn't match what's shown at the official firmware selector
The attended sysupgrade output:
Oh... is this because the attended sysupgrade includes layered packages?
Oh 2... I also see there's an update for attended sysupgrade I hadn't installed yet but even after updates and reboot, sysupgrade is still defaulting to the firmware with a different hash.
OK, I manged to force the creation of a new firmware image using ASU by simply adding a new package to my existing Openwrt instance (something basic like some extra statistics using collectd. That forces a new build.
For me, it turns out to be no help because my BPI R3 needs a fresh install as I'd previously changed NAND flash layout (otherwise Openwrt wouldn't take advantage of the extra storage in an R3). But for others, hopefully this is an easy workaround.
Yes, that's almost certainly the cause as the symptoms are consistent. The ASU server has the job results still lingering in its database, pointing to the build artifacts, but the build artifacts have been deleted...
The job results are subject to a time-to-live value, and Paul just changed those as part of the "clean out sooner" changes yesterday (the changes) Previously, successful builds were kept around for 24 hours, now they are kept for 3. The ttl is stored in the job metadata, so this new default probably was set after you requested your build, so highly likely it will stick around for the 24h period.
How to work around this:
Wait 24h from the previous build and try again after the job has expired from the ASU server's database.
Since jobs are identified by the hash key of the build request data (including version, target, platform, package names, etc), you can change the build request somehow and start a new job. The simplest way to change the job id in the LuCI app is by adding or deleting a package. This is sort of intrusive and ugly, but since LuCI ASU is pretty limited in its options, this is about the best you can do. (Firmware Selector and owut both allow you to add an 'init script' to the build request. Setting this to a script that does nothing - #xyz - is one easy way to change the hash key and force a new build.)
Correct. The new sha is from the result of building the new image containing all the extra packages you've installed, and hence should not match the sha up on the firmware selector or downloads sites.
If you can tolerate running upgrades from the CLI, owut gives you access to the ROOTFS_PARTSIZE option of the imagebuilder, so you can increase it up to 1GB... (Details owut: expanding root file system, send any questions to Owut: OpenWrt Upgrade Tool)
I ran ASU again today and it worked without issue. So it does seem like it was the persistence of the hash despite the actual image file having been purged. But since it's been over 24 hours, things are sync'd again.
I think when Paul saw the disk was full on the server, he just whacked all the artifacts, without regard for whether there were any jobs still looking for them.
With the current code, each job expires at 24h (well, 3h now), and a daily cron job looks for artifacts over 24h old and deletes them, so things shouldn't normally get out of sync... But, it does leave a lot of junk on disk well past its expiration date.
I've been running an experimental "janitor" for some months that sits inside the server, and every 15m or so runs through the artifacts to see if the job that created it exists or not, and if not then deletes it. Same ultimate result, but maybe a bit more aggressive in that artifacts are removed much sooner but still safely, hopefully averting the "disk full" that seems to pop up every time we do a release and the thundering herd requests builds all at once. (https://github.com/openwrt/asu/pull/1370)
Could the janitor job be adaptive based on available disk space. For example, if it sees that the disk space is approaching some critical quota, it could start clearing out the oldest builds on a much more aggressive schedule.
And, for that matter, does the server have (or could it be setup to collect) statistics about the average and standard deviation of times from job-complete-to-latest-download so that there could be some data driven decision making about how aggressive the schedule should be -- balancing disk space vs processing loads. We don't want to be too eager to delete a build if people are downloading it (again) later (say an hour or two), but at the same time, if the average user is downloading within a really short time (for example, 5 minutes), we could go average + 3 std dev and probably keep a good balance of disk vs processor.
Thoughts? (disclaimer: I'm not a software engineer, nor do I have any insight into the backend of the sysupgrade ser.ver, so I have no idea how complex this would be to code into the existing envrionment)
Oh, I like the idea of using disk space as the metric. The janitor probably can't be made more aggressive without shortening the TTL on builds even further, but using the disk space to do rate limiting could solve the same problem. (The lag on a job expiring out of the database and it's artifacts being removed is only 10 minutes with that pending janitor PR, so it's pretty much the TTL on the jobs that dictate the schedule.)
The server does log build duration for each job, but it's doing it wrong (dur = now - time-imagebuilder-created), so the data is unusable right now. The logs contain timestamps for both build initiation and every time the results are download (dammit it's doing that wrong, too!!! logging both HEAD and GET request).
I did measure clearance rates for the day we had 2000+ entries in the queue, though and with the server setup (number of "worker" instance) at that time, it was running about 5.1 jobs/min.
One issue I foresee in adjusting TTL as jobs are added, is that it's using a (very) lagging indicator. By the time you need to drop TTL to clear out old jobs, those old jobs that need shorter TTL were already assigned a long TTL... Not sure how much work it would be to scan the jobs and reset their TTL (or if it's even possible). There is a way to outright kill jobs, not sure how to find out their remaining TTL, maybe that would be the way to go.
Hello qq because I couldn't find the answer in the manual; both LuCI ASU and auc in CLI tell me that I'm on the latest version (23.05.5) despite 24.10.1 being available for download for my router, but when I enter:
auc -b 24.10
...then it offers me to upgrade to 24.10.1.
Is this expected behavior? Should I upgrade with the -b flag?
auc, owut and LuCI ASU app are all written to stay on a given release, so you have to make explicit that you want to jump the boundary. Prior to some features added over the last couple years, the ASU clients were not very good at crossing release boundaries: you'd end up with errors when there were package changes and things like that. It wasn't until a couple of months ago that the LuCI app was enhanced to be on par with the other clients, so now they are all capable of this.
Yes, auc -b 24.10 should do the job for you. This assumes there aren't major changes on your device like the swconfig -> DSA conversion, for which you'll want to reconfigure from scratch.