Is it normal to include breaking changes between release candidates?

For example, my ISP is MAC-locked...and syntax to change the MAC presented on the WAN link has been unchanged for years, yet changed between rc1 and rc2? In fact, there were many configuration syntax changes between rc1 and rc2...why? Isn't that the sort of thing you do during a development cycle, not in run-up to release?

Disclaimer: I'm not an OpenWrt developer and can't speak for the project.

First of all, this interpretation is down to the project and its developers - arguing about semantics is (IMHO) rather presumptuous.

That out of the way, the release notes of -rc1 explicitly mentioned that luci support for managing VLANs for DSA devices would be retrofitted for -rc2, in order not to delay this first -rc even further; to quote from OpenWrt 21.02.0 first release candidate:

Known issues

  • DSA support is new and might not be complete or fully working
  • The LuCI web interface has no support for DSA yet

This work has shown some ambiguity in the original netifd configuration syntax between L2 and L3 configurations, which had direct functional implications and limitations (particularly for DSA and wireless bridging). Given that DSA support as a whole (beyond mere software bridging of DSA members) is new to 21.02.x (and only available at all (first as external branches, then in master) for around half a year), the last chance to fix this would be now - or to live with these problems well over the life cycle of this stable release branch; respectively to abandon the 21.02.x branch in unreleased form and switch over to a hypothetical ~21.10.x. Pick your poison, those who end up having to both do the work- and to live with potential mistakes in user configuration well beyond just this release, have picked theirs[0].

--
[0] and they have even worked on smoothing the migration between the different syntax styles, trying to avoid breakage of loss of comments.

2 Likes

RC is for testing

Developers need to get the majority of their work out for testing in a RC when their tasks have been mostly completed.

Some tasks might need more work and will be held back for the next RC.

2 Likes

First, I'd think it more presumptuous for any F/OSS project to redefine the meaning of an industry-standard term that's existed a lot longer than OpenWRT and has very real intended meaning. That being the case, the reasoning for wanting to ensure it makes it into a stable branch is sound...and all the more reason to have delayed a RC1 label until it was complete. At the very least, when you're introducing a configuration change that is going to cause issues for a known, common configuration requirement (fixed MAC on the WAN side), I'd argue it warrants a hell of a lot more than a one-liner in the release notes that makes it sound like a LuCI behavioral change (which it isn't). It's a configuration element that isn't migrated or removed (which is indicated in the release notes) that worked-then-didn't without obvious cause, which is problematic It's concerning that this doesn't concern people.

What do you expect as a result? A public apology, TEPCO-style, to never ever do this again, pinky promise?!

Issues with MAC overrides for protocols like PPPoE got reported, and the reason behind them found around 18 hours ago - at a time when -rc2 was already tagged and had been building on the buildbots (and was already (partially) available in public, albeit unannounced) for the last two days. The known-issue was accordingly added to the release notes a couple of hourse before the release notes were published (and the bug was actually fixed ~2 hours before -rc2 got announced). At which point it was a question of consideration to either formally announce -rc2 with this -documented- known-issue or to skip/ retract -rc2, although the birds were already singing about it from the wires (and, newsflash, this option was discussed).

If you do expect SLA style behaviour, you'd need to get an SLA first - in the absence of which you have to accept what unpaid volunteers provide to you for free - or vote with your wallet and move elsewhere.

6 Likes

First, it wouldn't suck if things were managed as I described - get the heavy lifting out of the way, then start release candidates. It's not hard. It's open source...it's not like there's a fixed release schedule or a gun to anyone's head.

Second, I'd dispute most of the second paragraph...it's not a bug, it's an unanticipated side effect. Nothing in the release notes addresses it, either. In fact, let me quote directly from it:

LuCI network migration tool doesn't migrate custom bridge MAC addresses. Custom device MAC has to be set again manually.

It's not a LuCI problem, it's a data reorganization problem. Custom device MACs do indeed have to be set again manually, except...

Under 'New network configuration syntax':

The old syntax is still supported to facilitate transition

Mostly, yes...except for the breaking syntax change. There's nothing preventing the implementation for supporting MAC changes at either the interface or device level (with the latter taking precedence over the former, of course)...except it doesn't. Is this the "bug" you were referring to, and are you saying it was fixed in the 20.02 branch after the buildbots went to work? If so, that's understandable...but there was (and is) time to include that explicitly in the release notes, because users WILL find that working configurations will mysteriously stop connectivity to their ISP without obvious reason as RC2 proliferates. This is the kind of thing you call out in a red 72-point font. It's a breaking design change.

Yes, you can argue that one shouldn't use release candidates in 'production' scenarios. Hell, for that matter, we could argue that you shouldn't use F/OSS in production scenarios as well, and then OpenWRT would be a footnote in computing rather than the predominant third-party OS on home networking devices.

I'm not a wet-behind-the-ears newbie that just discovered OpenWRT. I've been running it since before White Russian, and I've been doing F/OSS development since LONG before it carried the moniker. How about instead of downplaying and excusing it, you at least let whoever is maintaining the release notes that this should perhaps be called out a little more prominently? This isn't ENRON, granted, but if there's a better way to handle it, why not do so?

Guess what, I can guarantee[0] that ≥1 device model 'supported' by OpenWrt will be hard-bricked by upgrading from 19.07.x to 21.02.0. No, I don't have any example in mind, nor can I point to a breaking change - this proposition is solely based on statistics and the stochastic probability that ≥1 out of the roughly 750 supported devices[1] will break with openwrt-21.02. Chances are high that ≥1 will even break between 21.02.0-rc1 and 21.02.0-final (and be it just by chance, like image sizes just growing a few bytes into the next erase block).

No one denies that this isn't an "unwanted behaviour (-change)", nor that things like this 'should' have been recognized before branching off, before tagging -rc1, before ${doing_anything}. But the matter of fact is, it has happened - it was documented as soon as it was known (and even before the release notes were published) and mitigations (vulgo bugfixes, to the extent possible at this moment) have already been deployed (for the openwrt-21.02 branch, which will end up in -rc3 or -final) as well. Bugs[2] happen, will always happen, what matters is how to deal with them[3], [4]. Parties expecting otherwise, and having a need to enforce these -their own- expectations, will have to get an SLA, or do the necessary homework of integration testing themselves[5]. If it breaks their environment, they get to keep the pieces[6].

Let's get down to earth for a minute.
This bug[7] fails to properly apply an administrative override for the declared MAC address of virtual interfaces[8], [9]. It does not brick your device, it does not set the house on fire, nor does it do "unspeakable acts of perversion with your pet dachshund"[10], [11].
The worst it can do, is making your upstream network admin/ ISP 'unhappy'[12] and placing you on the naughty chair[14], making you call them, take the road to Canossa and ask for eternal forgiveness.
Worse has happened, will happen, will always happen[15]…

The question of this being an accidental bug[16] or unforeseen[17] syntax/ protocol change neither changes the fact, nor the consequences. Neither of us even knows how this will be presented in the final release notes or if means to mitigate this completely can still be found, the final release is still in the future and RCs are meant to find and fix issues, in the broadest definition of the word. For many users the change between swconfig based configurations to DSA based ones will be a much more serious issue, inviting a huge opportunity of potential user error, as --force omits all kinds of sanity checks sysupgrade applies to the uploaded image file (which may accidentally be a photo ones own left toe, instead of a valid sysupgrade image) to be flashed.

What we come back to, are differing expectations towards an opensource project of unpaid volunteers owing external and unaffiliated entities 'a public apology'[19], the potential refund of damages and 'wasted' man-hours of debugging, based on semantics of what a release candidate entails. One could opt to remain on this high horse or accept reality and go with the notion of the clauses about warranties present in most opensource licenses[20]. Keep in mind that the only way to avoid errors, would be not doing anything at all - and that's usually the result if blame games continue and "annoy" these unpaid volunteers enough to make them look for less stressful endeavours in their free time, outside of the critics' scrutiny.

Disclaimer: as mentioned already, I'm not an OpenWrt developer, nor formally affiliated to the project. The opinions and statements expressed are my own and not representative for the OpenWrt project in any way, shape or form. But as an individual who has been (and still is) sitting in similar seats, I have grown rather allergic to trace amounts of entitlement towards the work of unpaid volunteers and how they have to spend their own time for the benefit of others. Little changes in phrasing can already make a big difference here, approaching the issue constructively, rather than confrontational[21].

…and now I will refrain from this thread for at least 24 hours, preferably forever.

--
[0] and if I'm wrong, then I will be wrong. I can live with the risk of potentially having made a wrong statistical assessment on the internet - and the inclined reader will have to, or take the consequences (of never believing ${me} again).
[1] of which quite a few legacy targets don't get any testing at all, let alone their individual devices
[2] in the widest possible definition of the word.
[3] and that's where one has to draw their very own conclusions.
[4] and at some point a decision has to be made, hold the press or document an erratum. By the very nature of this, every individual might assess every issue differently - be it because of their personal level of impact or more theoretical differences in evaluating it.
[5] it's still an -rc, not a production release - and as raised in the first paragraph, I can guarantee[0] that the production release won't be bug-free either.
[6] blame oneself for running an -rc in a production environment, not others working their gluteus maximus off for the sake of ${others}.
[7] bug, behaviour change, unexpected result, ${insert_from_buzzword_bingo_of_the day}
[8] among others, PPPoE.
[9] if previously expressed as option macaddr within the configuration stanza for said virtual interface, instead of applying it to the underlying physical device.
[10] "You could send […] the […] mailing list a note about it anyway, of course. (And perhaps pictures, if your dachshund is involved. Not that we'd be interested, of course. No. Just so that we'd know to avoid it next time)."[11]
[11] https://lwn.net/Articles/211904/
[12] duplicate MAC addresses happen[13], anyone managing an enterprise- or public access network, needs to deal with that and contain the damage properly.
[13] anyone of this proclaimed vintage, should have been baptized in fire by hme%d, le%d and be%d already and learned that lesson.
[14] temporarily blacklisted.
[15] -rc or not.
[16] in the sense of a typo, misplaced bracket or break() or return().
[17] at least this being an unintended/ unpredicted/ unrecognized change I can attest[18].
[18] believe it - or not, not that it makes a difference anyways.
[19] which I referred to as TEPCO style before, enshrined gestures, with no material essence behind it - but some designated scapegoat will have to jump into his sword, to appease the crowds.
[20] to quote a short one, the ISC license in this case - https://en.wikipedia.org/wiki/ISC_license
[21] http://www.catb.org/~esr/faqs/smart-questions.html

13 Likes

I think you both have valid points except that I don't own a dachshund...

While in the past I think the criticism that the project has sometimes been lacking in regards to projection of change onto the userbase... I actually wanted to weigh in here to congratulate the project in this case... as I think they've done a great job ( yes there is still room for things to be made clearer )... informing the userbase of potential changes...

I'd disagree with the point about making changes during normal development being 'easy'... in a project this large... if pushing something into RC is what is required to force (impotus) change... and if it takes to RC11... i'm happy if this means that important (positive) fundamental change is occurring and informed to the userbase as best that can be... (so happy... that if I ever regain employment and this thing goes to an RC3 i'll contribute $230AUD towards the additional hosting burden of extra RC's)

At the end of the day;

  1. It is an RC
  2. It's the developers themselves that ultimately have to deal with the consequences so I'm sure they are well aware of all the points you have raised and even considered them whilst performing all the actions discussed herein...
1 Like

Short answer to the question: yes it is normal to include breaking changes. And there are a lot of reasons to do so. Intentional and Unintentional.

Are you unhappy how it is handled by the current devs? Yes? --> Include yourself. Everybody can contribute.

So the question "Why" is imho not relevant and this discussion won't contribute anything to the project.

@slh While the enormous amount of academia pouring from your fingers is impressive, I'd argue pretty heavily that the time you invested would have been infinitely better spent updating this:

and this:

to simply, plainly say "IMPORTANT - if your ISP requires a specific MAC address for your WAN connection, you will need to take the following manual steps post-upgrade to restore functionality..." and lay out what they are, versus more or less mansplaining how "well, a lot of other people are statistically likely to get fucked as well".

This affects nearly ALL pppoe users, in addition to Charter Communications, Verizon FiOS, and God knows who else. A wink and a shrug isn't showing your best side, but your call.

@hauke @bjonglez Can you please comment on this?

By the way, it's worth pointing out this is NOT a bridge migration problem. It's about a forced MAC address at the interface level no longer being honored because it's now expected to be defined at the device level. I just looked through every checkin after rc2 and there's no fix for this.

I'd like to offer a workaround for the immediate issue you're facing, and then after touch upon the approach I take to unexpected changes in behaviour of FOSS projects.

The first is simply utilising /etc/rc.local to set anything you require at startup, though it is limited in the sense that there are no guarantees that changes you make will be for components or services that haven't completed the initialization process and it runs only once per session. What I've done with the personal build I maintain for my x86_64 rig, is utilise the existing infrastructure in both the /etc/hotplug.d subdirectories (example: /etc/hotplug.d/iface/00-netstate for an interface up/down event), and callbacks for various services depending on any custom requirements using /etc/firewall.user or any of the other available user callback options. Additionally, you can write scripts for any need you can think of, place them in /bin, and call them like binary commands from the various callbacks like the examples I gave.

For instance, my isp's dhcp server doesn't have a mac lock but is very sensitive to option changes, so on top of instructing udhcpc to use whichever mac I'm using at the moment (via luci, I might add), I also set the wan mac address using ifconfig (I know I know I need to learn "ip" but I'm old and lazy) from rc.local and verify it from a firewall.user callback.

I know it sounds like a lot of work, but this is where I'll offer my approach to all things foss: unless I'm involved with a project in an official capacity, or maintaining a personal customized fork that accounts for potential upstream changes, I never expect software behaviour or setting schemas to remain a constant. Basically it's like how Linus discourages people from becoming dependent on sysfs interfaces within the linux kernel, because they are explicitly provided without guarantees regarding paths, operational consistency, or existence between versions.

Having used openwrt for a long while, I'm sure you've seen some of the major changes take place a few times, so I encourage you to get ahead of it by utilizing the customization offered to users via callback scripts and make them a part of your upgrade/backup/restore process so you don't get caught like this in the future.

Take care!