Making blind opkg upgrade safer

stazthebox · October 27, 2021, 4:08am

Hello everyone,
I know that searching the forums for "opkg upgrade" provides one with a plethora of posts explaining why you should never blindly upgrade all packages and I fully understand the dangers of upgrading everything...in a normal distro. However, almost all of those posts have the asterisk attached of "unless you know what you are doing" which is what I am trying to get at here. I'd like to know a little bit more about why blind upgrades are so dangerous. I've been experimenting a lot with the buildroot, and making my own mirrors, and I have noticed the nonshared, hold, and essential flags, though I haven't interacted much with them. I assume that they would prevent the upgrading of such packages without a --force flag of some kind. If that's the case, why is it so ill-advised to have a crontab running, and if I was to host my own package mirror, what would I have to do to make it safer?

Thanks in advance.

slh · October 27, 2021, 4:52am

The "if you know what you're doing" implies not doing anything blindly, but rather surgically restricting oneself to leaf packages that 'need an upgrade'.

While the details have been laid out numerous times, let's summarize some main points:

you can't upgrade kernel or libc anyways, as they're not packaged - so quite major components can't be upgraded either way.
most devices have not enough space for major upgrades.
upgrading base-files could lead to a desaster (not restricted to this package).
the closer you get to core components necessary during early boot, the more funny overlay issues get.
opkg as a package manager isn't capable enough for proper dependency management.
ABI tracking (SONAME, symbol versioning, etc.) and packaging are not quite sufficient for the task, libraries are typically neither versioned, nor co-installable for different SONAMEs.

If you want hands-on experience of what will go wrong, upgrade a ~6 weeks old master snapshot and see what happens. Spoiler alert, basically everything will crash and burn (old musl 1.1.x libc present in the snapshot, newer packages built against the musl 1.2.x libc and its ABI incompatible 64-bit time_t).

A strange game. The only winning move is not to play. How about a nice game of chess?

slh · October 27, 2021, 5:22am

OpenWrt is designed and built with severely resource constrained devices in mind, that comes with consequences.

Devices with 8-16 MB flash and 64-128 MB RAM still make up the bulk of systems running OpenWrt. On most devices with 8 MB flash, you might get 7.5 MB usable space at most - a default OpenWrt image already weighs around 6.5 MB (and further limits in terms of partitioning, bootloader constraints, fixed offsets and similar are common). NAND devices are usually partioned in rather special ways (e.g. dual-boot capable), leaving only a fraction of their raw NAND size usable (often between 40-100 MB at most). Even high-end devices that easily cost half a thousand bucks rarely exceed 256 MB (raw) storage.

For devices like these, OpenWrt is pretty much the only option, the only reason why you can boot a non-vendor firmware on these. While it's modular, it's still a 'firmware' - not a general purpose distribution. If you want those, look at debian, fedora, arch, gentoo, SuSE, mageia, etc.

anon50098793 · October 27, 2021, 5:40am

in addition to the thoughro explanations by slh...

on the point above... the is particularly relevant when it comes to the ecosystem and software development lifecycle/s...

all the knobs and whistles to support "blindly/inplace" automated upgrades reduce down to a huge ongoing manpower hours which exponentially grow with the number of packages supported

pinning packages to a state in the git tree (branch to package repo ref) is a sane manageable and light way for the project to keep it's critical resources (core maintainers) free to focus on what they do best.

and the broader community (package developers) to assume all the hours of work and complications that come with extended lifecycles and interoperability...

(check samba/ksmbd issues and pr's over the last 16 months to see just how much burden one package can place on developer time... multiply that by 10 000 and imagine doing that work by yourself)

stazthebox · October 27, 2021, 6:50am

The perspectives you all laid out here are very helpful as I look to understand how these intricacies work and how some decisions were made. The OpenWRT project is honestly probably one of the favorite things I've been able to work with, and I almost have never have had to fight the system.

This is probably one of the most helpful things I've seen to understand why the OpenWRT mirrors work the way they do, and I completely agree with you.

From what I understand, this has to do with managing dependency versioning, which makes sense. So, theoretically, it should be relatively safe for me to only upgrade packages that are not marked as essential or on hold, and that the dependencies of these packages at compile time have the same version numbers as what is currently installed.

I understand opkg can't actually accomplish that, but would this principle prevent most of the issues you all have seen when people blindly upgrade?

anon50098793 · October 27, 2021, 6:55am

so the above impacts 'snapshot/master' very differently to 'stable/release-branches'

master = something can break from one day/revision to the next
master = can be huge package 'variation' which extends to dependancies day by day
master = core changes can and often break packages

(grain of salt... given the scope of change, these occurrences are few, but need to overstated in this context)

release = updates/d packages are typically 'backported/pushed/otherwise-branch specifically modded' (see samba) when it is beneficial / required
release = core does not really change(much), so packages are mostly responsible for breakages

so your statements are generally true IF we are discussing the stable/release branches

(and when core stuff changes, typically there will be a new (point)-release or it is done in the preparation for it)

doesn't really work like that (although its the theory about behind inception)

all these flags are really used for under the surface is;

to identify and usually not touch some core stuff

if we started walking back up the dependancy tree, we get back into the opkg limitations and management overhead constraints so in practice, the full extent/capability of these flags are not utilized

stazthebox · October 27, 2021, 7:01pm

I understand that opkg isn't actually able to walk a dependency tree around like that, but just to solidify my understanding, if it could do it like I described, it would work, right?

vgaetera · October 27, 2021, 7:19pm

That's just the beginning, you will need to solve more problems like those:

bobafetthotmail · October 28, 2021, 9:04am

You can work around opkg limitations if you implement the package management and dependency tree logic in Luci (and then give direct commands to opkg), since modern Luci implements a lot of the logic as javascript that is run in the client (which is the user PC).
And that's assuming that the flags are used correctly in the packages so you can actually know what to do.

But imho, also because of disk space limitations (a lot of "blind updates" would just fill the small rw partition) and the fact that some parts of the system must be updated through a firmware upgrade anyway (kernel, libc, busybox, other core libs), the best solution is to use the "attended-sysupgrade" packages (on the device) and upstream server infrastructure https://github.com/aparcar/asu

The "attended-sysupgrade" software allows the device to contact a server (either the generic one using normal OpenWrt packages or one you run yourself if you have your own custom firmware/packages), tell it the packages it has installed, and the server will run the imagebuilder to generate a new firmware image with all packages required and send it down when it's ready.

Then the device does a sysupgrade. Since the image has all the same packages included, and the sysupgrade preserves configuration, everything should be fine after the reboot, no need to reinstall packages or do other complex/tedious operations like with normal sysupgrade using the default OpenWrt image. And also no problem with doing "blind upgrades" either, nor wasting space in the smaller rw partition.

I personally think this thing this should become more prominent and the "recommended" way to do update your firmware in OpenWrt.

stazthebox · October 28, 2021, 6:33pm

Hmm, I haven't heard about attended-sysupgrade before, that's intriguing. Unfortunately, I don't think it will really help me that much. My intention is to deploy OpenWRT enabled x86 machines to locations I don't have physical access too, and I'm under the impression that running sysupgrde remotely is more dangerous than running opkg upgrade.

What I am trying to figure out here is how I can have my own opkg mirror and what packages I can upgrade in that mirror before I start breaking things. I have some of my own personal leaf packages I intend on auto upgrading as well. I am starting to feel like I should just avoid merging in version upgrades unless critical issues or security vulnerabilities are identified.

grrr2 · October 28, 2021, 7:13pm

but should not set two different approaches here: one for embedded devices and x86? as for the former all the points raised (limited storage, overlay fs mechanics etc etc) are justifies the "no package upgrades" recommendation. but on the other hand, on an x86 box storage is hardly an issue.

although i do find a bit confusing: if the recommendation is not to do package upgrades due to valid reasons how backporting is working? what does it even mean? I mean, does not backporting mean that a newer version of a package is built against a particular release which includes the same libc, kernel etc as when it was released? so - in theory - a new package version could safely work with a given release environment? if that's not true why package updates exist, why not doing a monthly minor release upgrade roll-out schedule with all package updates included?
yes, in snapshot branch things are changing, the core environment is changing, that's purpose of it, so yes, ABI changes happen so packages might not able to catch-up, but the release branch is fixed. so locking core component packages, i.e all the packages which are installed by a vanilla release, then any user package could be kept fresh. ok, i hear already the reply: maintaining a snapshot and a release (or multiple releases) version of a package multiples the effort; but the user base mostly using the stable branch, the snapshot branch frequently changes so what's the point updating user packages too in snapshot? I think snapshot branch should be kept for next core development and not mixed with user packages.

the other confusing bit is: x86 storage is not an issue statement. well, it is actually. 21.02 shrinked to 100M from 256M, there are many questions about how to expand root space, e.g. overlayfs is not working. or does. or not, depending on which thread you read. adding additional drive space does not increase your root fs. the guide is outdated imho. the idea of doing sysupgrade just to update a package in x86 world would be cumbersome, see upgrade section on the link.

anyhow, i understand it is not easy due to the many factors and resource constrains, and really appreciate the work of all devs and package maintainers. so, thank you anyhow, it is still the best foss project.

bobafetthotmail · October 28, 2021, 7:15pm

I just told you it is safer and why.
With a sysupgrade you have near-zero chances of a package upgrade screwing up the system as the "package install" phase is done by the build server and can't be interrupted leaving you in a broken state because you are updating system components that are in use by the package update itself.
On the end device the update is done while all services are stopped and the system is rebooted afterwards so all services are restarted to use the new files, again limiting breakage and instability.

Just think about it, how are updated most embedded devices, do they give you piecemeal updates or do they give you a fully independent firmware you upgrade whole.

Either way you decide to go (sysupgrade or piecemeal packages), if you MUST ensure that bad upgrades don't happen, I highly recommend to get a local x86 machine identical to the ones in the field to be your guinea pig and you test your updates on it first.

OpenWrt does not have the manpower and equipment to provide the same level of testing and validation that commerical server-grade Linux OSes like RHEL or SLES provide, or even of community-powered server-grade distros like Debian, so if you need strong validation you probably need to DIY it.

stazthebox · October 28, 2021, 7:47pm

I understand that sysupgrade doesn't have the core package dependency hell issue that opkg upgrade has, but can I be confident that it will work reliably? I feel like it's a bit of a blunt approach and overwriting the entire disk seems to be more susceptible to outside factors, the obvious one being a power outage...

bobafetthotmail · October 28, 2021, 8:09pm

Why should OpenWrt try to compete with Ubuntu, Debian, RedHat and so on?
If you want to handle your device like a PC/server you have literally dozens of pre-made and high-quality Linux distros that can do it, and have done it for decades.

OpenWrt targets embedded devices, and was expanded to work on x86 while still treating them as embedded devices. Because some people wanted a firmware-like system.

The issue is not compiling the package, or the API changes but testing many different environments and make sure all the package update procedures don't break them.

Then you get other limitations of Opkg like that its check for free space either isn't accurate or does not work (I'm not sure there is one).

The recommendation comes from experience of many years of OpenWrt and forum.
It's supposed to work and update fine, but quite a few times people have encountered bugs or things that were not properly planned for, and the system breaks and needs manual intervention.

Oh boy don't get me started. Some times I think some core devs are not thinking straight

this one decides to move from 256 to 128 "save space" by shrinking an image that is gzipped anyway by default (????)

github.com/openwrt/openwrt

build: set TARGET_ROOTFS_PARTSIZE to make combined image fit in 128MB

committed 11:48AM - 21 Sep 19 UTC

neocturne

+1 -1

Change TARGET_ROOTFS_PARTSIZE from 128 to 104 MiB, so the whole image (bootloade…r + boot + root) will fit on a 128MB CF card by default. With these settings, the generated images (tested on x86-generic and x86-64) have 126,353,408 bytes; the smallest CF card marketed as "128MB" that I found a datasheet for (a Transcend TS128MCF80) has 126,959,616 bytes. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>

And this one shrunk it even further so that it can fit on the "smallest possible CF card". Yes that matters for someone in 2019.
(????????)

I tried to make the partition enlarging process better https://github.com/openwrt/openwrt/pull/1669
but as you see I got rejected

There is a PR that sidesteps the above nonsense bs by just resizing the overlay partition to use all the free space it finds on the device. You know, like Raspberry Pi and all other devices running a normal OS do.

github.com/openwrt/openwrt

x86: grow rootfs_data partition before format

openwrt:main ← luizluca:grow_rootfspart

opened 04:40AM - 23 Jan 21 UTC

luizluca

+361 -0

The non-default f2fs-tools can expand a f2fs only when not in use. /overlay is …only unmounted while in failsafe, which also prevents the use of any installed packages. Today, the easier way is to resize the partition after installation is resizing the partition and use failsafe to reformat the /overlay, requiring physical device access. And, after an upgrade, the partition is reverted. This patch adds a grow_rootfspart shellscript partitioner that expands the seconds MBR or GPT partition until the end of the device. It can run directly to an img file or from the installed device. It only requires standard Linux commands already available on default x86 images. After the partition resizes, it will refresh partition information in kernel. For GPT, it will always touch the disk at least once as the current generated images do not use all the available space and it also misses the GPT backup header at the end of the disk. preinit/43_grow_rootfspart will run it when it does not detect an f2fs/ext4 filesystem at the overlay position and only when not in failsafe. That condition is also met during the first boot after an upgrade. mount_root will format rootfs_data alter that. Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>

Is it going to get merged? Does not look like there is any rush to.

if you treat it like a monolithic firmware, not really.

But yeeah if you want to install packages and then do a sysupgrade then you need to reinstall everything and it's a pain

bobafetthotmail · October 28, 2021, 8:20pm

As long as the device isn't starved for RAM (as it loads the firmware to a ramdisk before flashing), yeah. Sysupgrade is a much simpler thing and there isn't much that can go wrong. I've been doing sysupgrades for more than 5 years on different devices, never failed if there was enough RAM.

And that's for routers, on any x86 you always have enough ram.

a power outage while updating system packages isn't much better. We all agree that updating secondary packages on their own is fine, the issue here is when you start updating the more important things like system libraries or core applications like busybox

Consider the size of the OpenWrt firmware and the speed of the storage drive. This isn't Windows Updates, a sysupgrade is going to be over in less than 10 seconds unless your custom packages are huge for some reason.

grrr2 · October 28, 2021, 9:02pm

you see this where i'd challenge this: assumption is that a release (e.g.21.02) is a frozen environment with a frozen kernel, frozen libc, frozen ABI, right, and when it is launched the user (=which are not core) packages are launched as well. later if there is an update to a user package then it should work against that exact same forzen environment. so no need to test many different environments (beyond what anyhow should/would to do: test against the various hw platforms).

my point is that probably would better treating core components as part a "firmware" and other, non-core packages as addition to that particular firmware. then any update to a user package should work against that very same "firmware".

and no, i dont want another linux distro, my point to have two approaches is purely about the fact that embedded devices are very limited so i understand why "no-update" policy is recommended, but on x86 ram, storage is not an issue, and luckily configs are per platform so for example 100MB root on x86 seems a strange decision (we agree on this).

[ my work life experience is that we tend to assume the only best way to do things how we use to do things. which obviously not true, some time is worth to find different angles. or challenge status quo; and also it does not mean that every time someone comes with a "better" idea is actually better. so please grant me the right to make ignorant comments. ]

bobafetthotmail · October 28, 2021, 9:51pm

By different environments I meant different selections of installed packages. Many things receive only limited testing against default minimal image on a few different architectures. Doing proper testing with all packages requires a lot more work and does not happen often if at all.

Plus there is the fact that opkg has to be simple and light, and that the developers that work on it may or may not be able to (or care enough) to make it comparable to vastly more advanced package managers used in Linux distros.

For example, the Zypper package manager used in OpenSUSE (what I have here at the moment) is vastly better on every metric but is also nearly 3 MB, which is huge for OpenWrt standards. (Opkg is around 150KB plus some libraries)

The current method is just to make a branch from Master, let it "stabilize" for a while and after no more people report issues it's deemed ready enough and pushed as a release.

Then only security backports and minor things are done to this branch, until it is eventually retired.

This should theoretically work. But in practice eh.

The "no update" policy is a recommendation in the forum and the wiki because we have seen over the years that a lot of people that did this have hosed their system. It is not a conscious choice from the developers.
In theory, as long as you do have free space, the system should not break when updating packages inside a stable release.

Yeah, and that's why I'm talking about attended-sysupgrades and maybe we should start treating OpenWrt like a monolithic firmware where you assemble stuff on a PC or a server and just do a firmware upgrade on the device.
Opkg and packages and updates is how we used to do things, but it has blown up in so many different ways in many people's faces and actually fixing this would require so much more work on the package maintainer side and opkg development that maybe it is not the best way forward.

In general, OpenWrt does not have a strong leadership or very strict roadmap. This is by design. Each developer works on what they like more and so on, so the evolution is organic. So in most cases the choice we have is not "what is best" but "what has worked better so far".

anon50098793 · October 29, 2021, 1:51am

running untested is a no-no...

i've delivered approx 35 x 10 upgrades = 350 ( conservative by half ) over approx 13 months...

failure rate is less than 2% and in these cases it was due to some crap i'd done... so you need a few locations where someone is onsite to test after your initial test...

there is a reason turris et. al. have beta updates... for your use case... i'd recommend avoiding attended-sysupgrade at least for 12-24 months...

stazthebox · October 29, 2021, 5:30am

Hmm... didn't realize that sysupgrade is such a viable choice. I guess I'll look into using both? Sysupgrade for the core components, and then opkg upgrade the leaf packages that I have to deploy very quickly.
https://c.tenor.com/3yTRibbSqIEAAAAd/both-the-road-to-el-dorado.gif

Genuinely didn't know anything about how stable sysupgrade is. I'll thoroughly test out how well it works and if there are any quirks with the hardware I am using, but if it seems to be stable, I agree it's the best solution. Is there anyone else here who has a lot of experience with sysupgrade?

I definitely wouldn't deploy any changes into the world without testing them first locally. Luckily, at any point in time I always have access to several of these servers, and my deployment is very flat, so I am completely okay with providing all the servers with a image builder file.

bobafetthotmail · October 29, 2021, 1:58pm

Sounds good, that way you should get the best of both worlds.

For my home devices I like to live on the edge with the latest features so those run on snapshot, I just integrate all packages I need in a new image every month or so, and sysupgrade. I've never had upgrade problems even with a very volatile environment like snapshot where stuff is added and changed all the time. Doing the same with opkg on a running system (even a x86 device with no space limitations) would simply not work.
(not to say I've never had any issue, sometimes the newer version of some application is just broken on its own and I need to roll back. But such is life on the edge.)