Buttonless Failsafe Mode

CFSworks · March 14, 2023, 8:18pm

Hi Peter! Thanks for taking the time to provide feedback.

The idea would be that the writable partition (not the overlay) is temporarily mounted read-only just long enough to peek in there and grab the key, then immediately unmounted. Much later on, the overlay is mounted normally.

Come to think of it, this same mechanism could be used to implement a reboot-to-failsafe, which would make this whole thing a lot simpler: the "go to failsafe" signal could be received later in the boot process, perhaps even when the system is fully booted. Hmm...!

Yeah, fair point. I was wondering about that in the back of my mind the whole time I was mentioning WiFi, but figured I'd include it in case any devices had WiFi interfaces (certain hardmac devices with on-chip firmware) that could be available that early in the boot process. (The real motivation for WiFi support at all is because WiFi interfaces don't have those pesky switchdevs in the way.)

The quote this is under says that the failsafe process brings up the Ethernet interfaces. They can be taken back down again before proceeding in the boot process.

As an aside: I ask that you keep the criticism -- though it is appreciated -- constructive. "X won't work" doesn't give me a path forward. "Y may work better than X" is most preferred as it offers an alternative. "For X to work, we would have to do Z. I am concerned the costs of Z outweigh the benefits of X" allows community ownership of the difficulty of problem Z: perhaps someone will come forward who is willing to do Z, or someone can offer an innovative way of achieving X without Z. Alas much of the various feedback so far in this thread seems to be "this requested feature can't work in the current boot process" which is... pretty circular reasoning given that the request is to modify the boot process!

While I still believe this mechanism is workable, and it does solve a real-world problem ("How does one get a device into failsafe mode when reaching the reset button entails a personal safety risk?"), I do wonder if it's practical or would even see a lot of use. There's a certain amount of complexity that would have to be introduced into the boot process that the indoor-only users would not need (or, perhaps, tolerate) just for the few users with outdoor devices. It's an important problem, because it's addressing a safety concern (we do not want to make our users go up on ladders), but I'm wondering if specifically targeting failsafe mode isn't quite the right call here: it's a lot of complexity for one very specific situation.

After sleeping on it, I realized that this type of protocol would be a lot more useful to me personally (in the sense that it would save me having to get up and walk over to the device if I bungle the config) if it could work on a fully-booted device and get me a root shell. The same L2-protocol-only trick is used to bypass as much of the network stack as possible, getting around ip/netfilter misconfigurations -- indeed, the only requirement for access via WiFi Action frames would be that the PHYs be kept awake, which isn't hard at all to do. Such a daemon can be an entirely separate project independent of OpenWrt, available as an optional package for those that want it. (And who knows, it may prove to be so popular that it works its way into the core distribution.) I'd certainly enjoy writing such a thing, too, once I have the free time for it!

Regardless, I believe we should address some of the shortcomings in UCI that led to this problem happening in the first place:

This is a pretty clear sign that UCI is a far cry from LuCI's reliability. @Lynx essentially states that if the same change were done through the latter, this particular soft-bricking would have been avoided. I think there are 3 pretty low hanging fruits we should consider:

Invent a uci apply command, which mirrors LuCI's analogue: It applies (but does not commit) the changes, tells the user they need to run uci apply --confirm to prevent the rollback, and sleeps for 5 seconds before exiting (so the user won't confirm too soon). A uci commit would also cancel the rollback, but uci apply --confirm is the recommended way to go because it won't work after the rollback has already happened.
@Lynx was editing /etc/config/network directly, which (although I do the same myself) I think is pretty bad practice: you are saving changes before testing them. We should probably learn from tools like visudo that when a file is so critical that you could be locked out if you screw it up, you shouldn't be editing it directly. So, we should invent a uci edit command (so I can run uci edit network instead of vi /etc/config/network) which copies the pending config to a temporary location, runs the editor on it, checks if any changes were made, and if so, does a quick check for syntax and conflicting config changes that happened during the editing, before staging that new file for commit. (And, since it's meant to be an interactive command, it can even suggest that the user run uci apply next.)
Discourage writing to the saved configuration without testing it first! UCI could stick a notice at the top of the /etc/config files that says something like "editing this directly is discouraged, use uci edit instead." The documentation also needs to do a much better job communicating that uci commit should not be used until any changes are tested. The first example in the UCI page of the user guide is currently recommending the opposite.

@Lynx would these be a satisfactory resolution if we ditched the "failsafe mode" idea specifically?