Mvebu: target upgrade borked

Something has entered master post

./scripts/getver.sh r13974
68ac3f2cddab8422d7de0ce1a78d23edf29012e7

stopping sysupgrade from working. Just dead air on upgrade attempt, no indication of any malfunction, just a return to the current running image.

likely the switch to DSA. try an upgrade without keeping settings.

edit:
might be this one: https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=494f12c52df6767ec0fabf2b2fac8f453323a4c5

No, long past the DSA change, been running that since the PR was being worked on. I am wondering about the last series of commits that try to address people not keeping their settings due to changes such as that ilk.

Ya, that stuff, but it should not be stopping me from going from one build, to the next, post that commit. And I have tried a couple of them now.

Ya, so timing is everything. I got caught in the middle. That stuff needs a message output, rather than a silent fail , Going to cause some grief down the road.

we should probably ask @adrianschmutzler how that is actually supposed to play in sysupgrade.

There is an error message "echoed" in https://github.com/openwrt/openwrt/commit/ad3e1f9db4cffaec6700d780308fb6241c09a96f but if that is put to the log and the device is then rebooted, it may get lost. Or as the error is not sent to the error log (right?), it may not be easily visible. That might be happening to you.

(Not sure if that error is visible e.g. in LuCI. Haven't tried yet with my WRT3200ACM.)

GUI is silent, as is a window on a serial connection, was not doing a logread -f, but I would guess the GUI would be the main stumbling block for most people.

I'm suspecting you are faced with a message about incompatibility on a device that has recently been moved to DSA. I've tried tried to write a manual for this here:
https://openwrt.org/docs/guide-user/installation/generic.sysupgrade#upgrade_compatibility

If your device has been migrated to DSA already before the latest upgrade, just force the upgrade with -F and without -n, i.e. keep settings. Afterwards you will need manually bump the compat_version in /etc/config/system to 1.1. If you still have swconfig on a device that's been migrated to DSA, flash with -F -n.

There is supposed to be a message output, can you provide detail about what you did?

I tested sysupgrading WRT3200ACM in LuCI from DSA-aware r13951-6c57fb7aa9 (of 2020-07-26, before your compact check things) to r14035-3d167ed805 with the new sysupgrade check logic.
Worked ok.
I received warning in LuCI and needed to force the upgrade, just as you intended in 02d6ac1060b4

EDIT:
and also console sysupgrade warns properly and force works:

root@router3:/tmp# sysupgrade wrt3200acm-master-r14035-3d167ed805-20200805-2239-
sqfs-sysupgrade.bin
Device linksys,wrt3200acm not supported by this image
Supported devices: Image version 1.1 incompatible to device: Config cannot be migrated from swconfig to DSA
Image check failed.
...
root@router3:/tmp# sysupgrade -F wrt3200acm-master-r14035-3d167ed805-20200805-22
39-sqfs-sysupgrade.bin
Device linksys,wrt3200acm not supported by this image
Supported devices: Image version 1.1 incompatible to device: Config cannot be migrated from swconfig to DSA
Image check failed but --force given - will update anyway!
Saving config files...
Commencing upgrade. Closing all shell sessions.
...
1 Like

Thanks, so the warning/hint is also displayed for Luci, though not exactly in a bold way.

It's a bit unfortunate that existing users already upgraded to DSA are affected, but that was the only way and I considered it the smaller harm compared to all 19.xx users upgrading into 20.xx with DSA without warning.

Note that if you did not wipe your config, you need to manually update the compatibility version:

uci set system.@system[0].compat_version=1.1
uci commit system
1 Like

Yes you do get the message the first time, but to duplicate the situation in which I found myself. In the r14035 image, edit /etc/config/system and delete the option compat_version '1.1' line, reboot, and flash an image from there. There is no message this go-around, not sure why that is the case. Ans the reason I found myself there is I was already running DSA, so chose to force and keep settings; probably exceptional.

1 Like

That is probably the drawback that anomeome has stumbled on, as the string is stored on common uci, not permanently in the image itself (like e.g. /etc/opkg/distfeeds.conf is defined)

I just tested it, and yep, that is buggy. Without the uci setting, the sysupgrade from the new image silently fails and the router quickly reboots.

Maybe we should have a uci-defaults script that would set that uci option after sysupgrade in case it is missing from the system config.

I first thought that /etc/board.d/05_compat-version was meant for that, but apparently not.

@adrianschmutzler Could the formatting of the message be improved? As it stands now it's a pretty dense read, and not everyone's mastery of English will be sufficient to fully parse it. The first line is clear:

Device linksys,wrt3200acm not supported by this image

The second line, however, this is very weird to read.

  • 'Supported devices' prints an image version and not a device? I suppose this is the supported image version.
  • Tagged onto this is 'incompatible to device:', where the preceding colon suggests an actual device name follows; yet it prints the reason migration is not possible.
Supported devices: Image version 1.1 incompatible to device: Config cannot be migrated from swconfig to DSA

Wouldn't the following be more readable:

Device linksys,wrt3200acm not supported by this image (image version: 1.1).
Reason: config from older image versions (< 1.1) cannot be migrated from swconfig to DSA.

Borromini: The formatting you see is a result of the fact that the routine as actually stored on the "old" firmware, i.e. I had to hack the image metadata in order support existing installations. Since the only thing the existing installations would print is the supported devices string, I just put the error message there. So, for old devices, I can essentially change everything after "Supported devices:", as that's what I set in the image metadata.

@hnyman

Essentially, this is a conceptual question I'd have liked to discuss beforehand, but unfortunately nobody was interested. As I see it, there are two ways of implementing the compat-version on device:

  1. As a "config version"
    That's what's currently done: The central assumption is that the compat-version is a property of the config, i.e. if I have swconfig setup in /etc/config/network, I would have version 1.0, if I have DSA setup there, I would have 1.1. Therefore, implementation via board.d files will only set the compat version during config creation, i.e. when no /etc/config/system and /etc/config/network exist. Since the idea of the compat-version is to force the user to wipe his config, this is generally not a problem. On the contrary, this would e.g. even provide the correct compat_version when backup/restore was used, as the compat_version would match the setup of data in the config files.
    The only reason for problems right now is the temporary issue that some users already have DSA config without the version bump, which should not happen (regularly) in the future.

  2. As an "image version"
    In this case, the compat-version would be a property of the installed image and not care about the config. If implemented cleanly, this would mean putting a file in e.g. /etc somewhere which would contain a string. If the user tricks himself into a new image with old config (e.g. -F without -n), then we still have the new compat_version although he wouldn't have updated his config (or, positively, for those already on DSA the new version would show up).
    A similar result could be achieved by using an uci-defaults script to set the value in uci config, however then the value would still be exposed to the user as if being a config parameter though it actually isn't.

I chose the first approach because IMO if the update is done properly, this is the cleaner solution. The devices affected by the "already DSA issue" are a temporary situation. However, since I still have to cover entire mt7621 in a subsequent patch, it might not be that simple or fun.

While switching to a file in /etc might not be that simple as it would change the mechanism again, switching to uci-defaults should actually be achievable quite easily. This could be done permanently, switching from concept 1 to concept 2, and board.d implementation could be dropped.
Or one could actually implement the uci-defaults part temporarily, so the current incoherent situation of early adopters is resolved, and then remove the uci-defaults from master again after the 20.xx branch when early adopters would have updated. This latter approach would resolve the current issue, but keep with the idea of concept 1.

I think that the main task of the feature is to prevent accidental & unknowing sysupgrade to a possibly incompatible version. User getting notification the first time is the key thing.

But if he chooses to go forward, via force and/or clearing settings, he is knowingly taking the risk.

This current situation of silently failing the next sysupgrade in case of the missing setting in a router with the new logic is nasty. There should at least be an error message about the missing setting. Now there is just a reboot (as seen from LuCI, not console).

As long as committers aren't willing to at least bother to document changes that will cause breakage this will never end but documentation has always been an issue...

Something like this would be a good starting point
https://svnweb.freebsd.org/base/head/UPDATING?revision=363723&view=markup

What is "failing silently" here? Nothing should fail silently, but the user would just be displayed the same message for the next upgrade again and again, until the setting is fixed.

For the conceptual discussion, let me just put this into an example:
Assume you have a linksys,wrt3200acm on 19.07 with swconfig ("1.0"), and now update into master with -F, but keep your config. So, you have the new image/DTS, but your config is unchanged, still swconfig. Would you want that to have "1.0" (option 1) or "1.1" (option 2) then?

@Borromini
I can play with everything after the "Supported devices", so maybe you'd be happier with something like:

Device linksys,wrt3200acm not supported by this image
Supported devices: linksys,wrt3200acm linksys-whateverelse - Image version mismatch: image 1.1, local 1.0. Image incompatible to device. Reason: Config cannot be migrated from swconfig to DSA

I could try to add "\n" to the string as well, but I hesitate to do that, as we will quickly end up in escape-hell then.

This:

I (and apparently also anomeome) sysupgraded after the first warning to a new image with "force". But I did not manually set the uci config value after that, as there is no hint about that.

On the next sysupgrade from the new image (that already has the new logic), the system notices that there is no uci config value and silently fails without any visible error message (at least in LuCI). The router just reboots. Pretty much like the sysupgrade would go ok, but the router simply reboots to the old image.