Configuration file normalization/linting

A persistent annoyance when upgrading any package that has some configuration in /etc/config is that there are conflicts is configuration files which need manual resolution.

Most of the time, these changes turn out to be representation minutiae, rather than anything of substance. Things like:

  • single vs double vs no quotes for string options
  • tabs vs spaces for indentation
  • presence of a leading empty line
  • order of configuration blocks
  • empty lines vs lines containing only spaces between blocks

Wouldn't it be useful if the distributed files were always normalized to a specific format? This would eliminate configuration conflicts in many cases and make merging easier for the rest.

What would be the right way to implement this?

Technically it's not so hard to compare files while discarding unimportant differences, such as changed lines order. Example from my code (used in geoip-shell):

https://github.com/friendly-bits/posix-lists-and-strings/blob/2286c1fbabf6b0c5c6db452e8ea6d11ca8807a5d/posix-lists-and-strings.sh#L295

So how would you propose to use something akin to compare_files?

IMHO, in its current form, compare_files is more of a band-aid solution that covers only the less common and annoying of infranctions. (In my experience, changes in quoting is the most common and most annoying on.) Additionally, it only reports an exit code. So, for any false conflicts that are not covered, you still have to do manual inspection with your regular tools.

Moreover, openwrt runs on routers. I would guess that most users don't keep their curated list of shell helpers handy on their router. Resolving conflicts should be straightforward using diff and an editor.

Finally, many of the users are likely to prefer upgrading packages from luci. For this, avoiding false conflicts is essential for a smooth experience.

What I had in mind is linting/normalizing configuration files when building the packages or when configuration files are updated through luci.

It could be incorporated into the code which decides whether given file changed during the update on client device.

The code is not cut in stone - it can be adapted to ignore additional issues, and/or output the diff. This can also be implemented via different set of utilities, not necessarily awk. For example, via sort+uniq. That said, some issues (such as changed quoting) may be difficult to deal with programmatically. I've never seen this particular issue, but if it does happen then IMO the proper way to deal with it is at the level of the human who introduces the changes in the quotes.

This can only happen if someone incorrectly edits the /etc/config/ file directly.

OpenWrt uses UCI ( Unified Configuration Interface) to manage config files.

TL;DR - always use uci to modify config files, then any problems will be bugs in the package, rather than incorrectly edited files.

Of course you can edit the files directly, and many people do, more out of a tradition of editing config files rather than using a utility that automatically does "syntax-checking", "representation minutiae" checking etc.

That is handled already by UCI.

For example if you wanted to change the ssid of an AP, you could add a line to /etc/config/wireless starting with option ssid ... , but what is the full line and where to put it?

With UCI, it is easy.

First, look at the config (using a test system on my desk as an example):

root@meshnode-16ad:/etc/config# uci show wireless
wireless.radio0=wifi-device
wireless.radio0.type='mac80211'
wireless.radio0.path='platform/10300000.wmac'
wireless.radio0.band='2g'
wireless.radio0.channel='1'
wireless.radio0.htmode='HT40'
wireless.radio0.disabled='0'
wireless.radio0.log_level='0'
wireless.radio0.country='GB'
wireless.radio0.noscan='1'
wireless.default_radio0=wifi-iface
wireless.default_radio0.device='radio0'
wireless.default_radio0.network='lan'
wireless.default_radio0.mode='ap'
wireless.default_radio0.ssid='OpenWrt-2g-16ad'
wireless.default_radio0.encryption='none'
wireless.default_radio0.ifname='phy0-ap0'
wireless.default_radio0.macaddr='96:83:c4:18:16:ad'
wireless.default_radio0.disabled='0'

We can see the ssid is OpenWrt-2g-16ad.

Changing it is simple:

uci set wireless.default_radio0.ssid='SaltyChocolateCafe'

Now you can test it:

wifi

If it does not work as expected, you can revert the change:
uci revert wireless

If you bricked the router and lost contact, reboot it and it will revert itself.

If you are happy with the result and want to make it permanent, do:
uci commit wireless

Almost all packages use uci config files. Those that do not should, in my opinion, be updated to use uci.

1 Like

I agree on all points, except the last one. While UCI is good, it has some shortcomings and is not universally the best solution. To give a few examples of the shortcomings:

  • some UCI commands exit with code 0 regardless of success or failure
  • the higher-level UCI abstraction (i.e. via the uci CLI) encourages creating a subshell for every entry which one wants to retrieve, potentially leading to dozens of subshells created. I won't give a specific example here, although I have one in mind.
  • the lower-level UCI abstraction (i.e. via config_load, config_get, config_set etc) do not create subshells, but are quirky to use and the documentation (IMO) is not very user-friendly, forcing developers to spend a ton of time learning and testing the API
  • both the higher- and the lower-level UCI abstractions do not expect the config files to be modified directly by users, which nevertheless does happen, and sometimes even encouraged when troubleshooting issues. UCI then doesn't really have enough error checking and logic to deal with slight deviations from the expected machine-generated format
  • when adding OpenWrt package for an application not specific to OpenWrt, it is much easier to reuse its built-in config management system than to code a config translation layer

Nothing is perfect and improvements can always be made!

I would put these two the other way round.

The uci executable (that provides the cli) is the root of the uci system and is not an abstraction of anything "lower" in the weeds of OpenWrt.

The config_load, config_get, config_set etc functions are provided by the OpenWrt shell libraries, all of which ultimately call the uci executable.
All these shell functions do is complicate and obfuscate the handling of the uci config files. Documentation is, as you have found out, pretty much nonexistent.
I would go as far as to say that in practice these functions serve purely as an obfuscation tool...

I don't follow what you mean by this, perhaps you could give a little detail and maybe your example. Note that the abstraction obfuscation layer provided by the shell library calls also ultimately call the uci executable.

This is 100% laziness on behalf of the maintainer of the package. PRs for packages should not be accepted if a uci config is not part of the package, unless there is a very strong justification for not having one.

Right enough, implementing a uci config using the undocumented library functions is a PITA. But using uci directly it is a very simple task.

It should not be encouraged at all. It is not helped by the wiki (I think I might add a paragraph there on this subject).

Direct editing amounts to hacking the config system. Ok, if you want to, but "don't complain if something goes wrong later".

This is a very important subject for developers of packages and users alike, so is good to discuss it here.
Thanks to @m000 for bringing it up!

Sorry in advance about the long reply but I think it's important to dig into details here.

Perhaps I was incorrect to use the "higher" and "lower" words, however I don't think your description is 100% correct either.

I'll give an example:

# printf %s "'uci get' result: "; wan=$(uci get network.wan.ifname); echo "$wan"
'uci get' result: eth1
# printf %s "'config_get' result: "; . /lib/functions.sh; config_load network; config_get wan wan ifname; echo "$wan"
'config_get' result: pppoe-wan

First, as you can see, the results are not the same. eth1 is l3 device and pppoe-wan is logica interface name (pardon me if I'm using incorrect lingo but you get the point). Ignorint that, there is another difference in the efficiency of fetching the config.

What the command

$(uci get network.wan.ifname)

does is create a subshell, execute the command uci get network.wan.ifname in that subshell, then substitute the whole expression with the output of that command and kill the newly-created subshell. Creating a subshell is perhaps the one slowest native shell command, by far. It is indeed, truely slow.

Let's examine the other alternative:
. /lib/functions.sh - source the functions.sh script - doesn't create a subshell
config_load network: let's look at the called function in functions.sh:

config_load() {
        [ -n "$IPKG_INSTROOT" ] && return 0
        uci_load "$@"
}

Essentially it calls uci_load, without creating subshells. Let's look at uci_load() in /lib/config/uci.sh:

uci_load() {
        local PACKAGE="$1"
        local DATA
        local RET
        local VAR

        _C=0
        if [ -z "$CONFIG_APPEND" ]; then
                for VAR in $CONFIG_LIST_STATE; do
                        export ${NO_EXPORT:+-n} CONFIG_${VAR}=
                        export ${NO_EXPORT:+-n} CONFIG_${VAR}_LENGTH=
                done
                export ${NO_EXPORT:+-n} CONFIG_LIST_STATE=
                export ${NO_EXPORT:+-n} CONFIG_SECTIONS=
                export ${NO_EXPORT:+-n} CONFIG_NUM_SECTIONS=0
                export ${NO_EXPORT:+-n} CONFIG_SECTION=
        fi

        DATA="$(/sbin/uci ${UCI_CONFIG_DIR:+-c $UCI_CONFIG_DIR} ${LOAD_STATE:+-P /var/state} -S -n export "$PACKAGE" 2>/dev/null)"
        RET="$?"
        [ "$RET" != 0 -o -z "$DATA" ] || eval "$DATA"
        unset DATA

        ${CONFIG_SECTION:+config_cb}
        return "$RET"
}

Essentially it creates 1 subshell, then prints all config contents in that subshell and assigns the result to variable DATA. Then it evals the contents of $DATA in order to set shell variables for each config option. Ignoring (for now) the security aspect of this operation, I'll focus on the fact that only 1 subshell is created.

Now let's look at the function config_get() in /lib/functions.sh:

# config_get <variable> <section> <option> [<default>]
# config_get <section> <option>
config_get() {
        case "$2${3:-$1}" in
                *[!A-Za-z0-9_]*) : ;;
                *)
                        case "$3" in
                                "") eval echo "\"\${CONFIG_${1}_${2}:-\${4}}\"";;
                                *)  eval export ${NO_EXPORT:+-n} -- "${1}=\${CONFIG_${2}_${3}:-\${4}}";;
                        esac
                ;;
        esac
}

Essentially it assigns the value of the relevant config option to the variable whose name is provided as 1st argument. I'm ignoring the fact that it can also simply print the value if no variable name is provided as 1st argument because programmatically fetching the value this way would create a subshell, which is undesirable as explained above.

So for a single uci get or config_get command, the performance will be about the same - 1 subshell is created. The time consumed by all the other shell commands combined is dwarfed by the time it takes to create and destroy the subshell, so it can be ignored. However when parsing a config file, you will normally want to parse multiple options, not just one. Let's say that your config file has 50 options and you need to retrieve all of them. Using uci get to do that will create and then destroy 50 subshells. The alternative is to call config_load once, creating 1 subshell, then call config_get 50 times, creating 0 subshells.

Most likely this is the reason to having the config_load, config_get functions in the first place. If all OpenWrt programs would be using uci get, the user would be experiencing noticeably degraded responsiveness simply due to that.

You seem to be making an implicit assumption that uci is necessarily better than the package's own config management system. I'll give my own project 'geoip-shell' as an example, although I could also give an example of 'adblock-lean' which I designed the config system for, so I know it intimately.

geoip-shell's config system doesn't use UCI. What it does is the same thing as UCI, except it creates 0 subshells, plus it doesn't blindly eval the contents of the config file, in order to minimize security vulnerabilities. Rather it parses and validates each option, makes sure that it is safe for eval, and only then uses eval to assign the values. Moreover, it ignores minute differences, such as changed lines order, added empty lines or leading and trailing whitespaces and tabs, when making the judgement whether the config file changed. If it didn't change then it doesn't write to flash, essentially preventing unnecessary flash writes. Moreover, it uses a more compact format than UCI, so less data is written to flash to begin with.

Now why would you mandate me to replace a universal, better performing and safer config management system with UCI? It seems that while acknowledging UCI's shortcomings, you are making the assumption that nobody is able to design a better config management system than UCI. This just doesn't make sense to me. If your reasoning is that most developers can not do it or can not be bothered to do it then fair enough, but IMO this doesn't warrant a general rule like "use UCI or look for a core developer to make a detailed code review (which will never happen)".

Edit: Thinking about the security aspect of UCI a bit longer, I'm not sure that the uci binary doesn't implement precautions to sanitize the keys and values before outputting them, and I don't want to be unfair to UCI, so I'm taking back the security argument. The other points still stand.

I don't think that expecting people to never edit config files is reasonable, and I doubt that a wiki recommendation will change that behavior. IMHO the appropriate way to deal with this is implementing config validation, including checking for things like quotes. Example from adblock-lean code (obviously not directly applicable to UCI):

https://github.com/friendly-bits/adblock-lean/blob/c1656d1b2335c8a38a4b497da2f2532968ee59a1/adblock-lean#L534

This is a very good example of why it is NOT a good idea to use the library calls.

On a test system on my desk:

root@meshnode-8ecb:~# uci get network.wan.ifname
uci: Entry not found

Here uci get gives the correct answer, as ifname is invalid in a uci network config.

Compare with:

root@meshnode-8ecb:~# . /lib/functions.sh; config_load network; config_get wan wan ifname; echo "$wan"
eth1

Here the answer is eth1, an incorrect result.

Ok, I'm with you on that.

Indeed.

First of all, as the config is parsed, usually on startup of the package, waiting a little longer hardly matters.
But if you might want to save some time, so get all the options in one go with uci export or uci show and parse the result. This is what the library call attempts to do (but sometimes gets wrong, maybe when parsing lists as if they were options - not sure).

That is not the point. UCI is the OpenWrt Unified Configuration Interface.
UCI is not necessarily better of course, a package ported from elsewhere might have the very best handling of its own config file format, optimised for its own use, BUT it will not follow the Unified idea of UCI and is most likely unique to the package.
Now maybe this is justified in some edge cases, for example, your geoip-shell package, I am not familiar with it so cannot usefully comment.

This begs the question of why a package startup config processing would be writing to flash in the first place, it would write to a file on ram-disk if it was initialising some sort of database.... but that is an edge case again and tangential to the topic here.

I wouldn't. But I would ask the question whether the improved performance was essential to operation of the package. For almost all packages, better performance on startup will have no effect on operational performance.
For geoip-shell, it probably imports/sorts/checks a huge database of ip addresses... I'm guessing. But this would be best done by the package itself and not by something external like uci or library calls on startup.

No, that is not the reasoning. The reasoning is for a UNIFIED approach to config parsing. Often a maintainer of a package being ported to OpenWrt will do what is necessary to make the package work within some standardised guidelines. The "rule" or guideline should be to use uci config unless there is a very good reason for not doing so.
It is usually only necessary for the init.d script to create a "package style" unique config file on ram-disk and point the package executable at it. Just a little more work with the advantage of supporting the unified approach.

OpenWrt is an open system and it is fine to do whatever you like. But if you hack about with stuff, against recommendations, then you should maybe expect consequences at some stage.

To clarify this point, config writes may be performed under all sorts of conditions (speaking broadly, regardless of the specific application), the vast majority of them is when the user asks the application to change the config. Same with geoip-shell. The ip lists processing performed by geoip-shell is completely unrelated to config processing and the ip lists are never written to flash.

Of course saving say 500ms is not "essential". It is just nice and gives the user a pleasant experience. When scaled across many applications and startup services though, these small savings add up. Also processing config is not limited to application startup. Just one example. Let's say that my application changes something in the firewall config. Then it will need to reload the firewall service. This will trigger a lot of config processing in the background. Some other services may be triggered by that action. In fact, geoip-shell, when installed, adds a "firewall include", which is essentially a script which gets executed on firewall reload. Depending on how all the stuff triggered by the action is implemented, it may or may not cause a substantial lag.

In short, I don't tend to discount even minor inefficiencies easily, and on top of that, I don't consider inefficient config loading as minor inefficiency because it happens a lot and it adds up.

Let me ask you: what is the benefit of unified approach? I can see how it is beneficial for system components, essentially providing a single API for all other applications to work with. What I can not see is how it is beneficial for non-system applications. The only benefit that I can think of is that it saves the developer of a particular application the trouble of implementing their own config system, and it saves some space which would be taken by that code. However for an application which is not exclusive to OpenWrt, the opposite is true: this would force the maintainer to implement a custom config translation layer, which would supplement but not replace the application's own config management system, and that actually will take more space. Then who would benefit from that and in which way?

Config writes should only be done when a static, non-volatile change is required in the configuration of a package and at no other time.
This does not prevent a package having a dynamic configuration where it modifies its functionality depending on runtime circumstances. UCI has built in support for this, keeping the dynamic config in memory and its static "startup defaults" in /etc/config.

Indeed, that is why, if you want to save those milliseconds on startup you would not do many consecutive uci get commands.

FYI, uci has many cli options, not just "get":

Usage: uci [<options>] <command> [<arguments>]

Commands:
	batch
	export     [<config>]
	import     [<config>]
	changes    [<config>]
	commit     [<config>]
	add        <config> <section-type>
	add_list   <config>.<section>.<option>=<string>
	del_list   <config>.<section>.<option>=<string>
	show       [<config>[.<section>[.<option>]]]
	get        <config>.<section>[.<option>]
	set        <config>.<section>[.<option>]=<value>
	delete     <config>[.<section>[[.<option>][=<id>]]]
	rename     <config>.<section>[.<option>]=<name>
	revert     <config>[.<section>[.<option>]]
	reorder    <config>.<section>=<position>

Options:
	-c <path>  set the search path for config files (default: /etc/config)
	-d <str>   set the delimiter for list values in uci show
	-f <file>  use <file> as input instead of stdin
	-m         when importing, merge data into an existing package
	-n         name unnamed sections on export (default)
	-N         don't name unnamed sections
	-p <path>  add a search path for config change files
	-P <path>  add a search path for config change files and use as default
	-t <path>  set save path for config change files
	-q         quiet mode (don't print error messages)
	-s         force strict mode (stop on parser errors, default)
	-S         disable strict mode
	-X         do not use extended syntax on 'show'


Maybe you use this yourself?

I guess you write this to /etc/config/firewall? Do you check before adding it that you did not add it last time round?
You could add it using uci set.firewall. .... etc" and do a system firewall reload" and later do a uci revert firewall` when your package exits (a reboot will reset to defaults without any flash rewrites).

The unified UCI approach is beneficial for all service daemons with the single "API", but also for non service packages, where a running daemon may want to make a dynamic change to that package.

This is true, but will rarely be a difficult task, achieved by a few lines added to the init.d script.
The overheads are (usually) very small and the advantages can be very significant.

This is not far from the actual implementation in geoip-shell, except I do issue the uci commit command because I don't think that changing system config without committing it is a good idea, and this flash write should only be performed once in application's lifetime on a given device (well, it will recreate it during version updates but that's it). And yes, I do check if it exists already, before adding it.

https://github.com/friendly-bits/geoip-shell/blob/89390be81829f49ca8004ada452103c850510edc/OpenWrt/geoip-shell-owrt-mk-fw-include.tpl#L22

I am not against using UCI, as you can see by looking at that code. I just don't see why it should be mandated. This statement:

Doesn't show why it should be mandated for non-system applications. I agree that in some cases, UCI provides benefits which are essentially "free", requiring no development effort. But these benefits are not universally applicable. At least the 2 applications which I took a significant role in developing do just fine without them. A third application which I'm starting to help with right now is currently using UCI in a suboptimal way, which I will try to improve.

Anyway, this is already very much off-topic.

To get back to the topic, IMO a request to improve parsing and validation in UCI is not unreasonable. People do and will continue to manually edit the config files. This is just a fact of life. Why not put a bit of development effort to better handle the manually edited files?

I don't mind to do some work by myself, but given how hard it is to get core developers to look at PR's, I'm reluctant to put any work into it before hearing that the result of that work is wanted and will be looked at.

I'm not saying it should be mandated. I'm saying it should be used unless a good reason not to is presented for review/discussion.

I am sure that is the case. But how hard would it have been to have the config information in a uci file. I'm not accusing you, but I do know is is far too easy to take the line of least resistance and not bother, particularly if the guidelines/rules whatever are secretly hidden away (like the documentation for the OpenWrt script libraries).

Yes, sort of, but it has brought together some important viewpoints.

Yes, yes and yes!

It could on the one hand, be useful just to provide proper documentation for uci and uci related library scripts, then no PR would be needed.

On the other hand a new UCI helper package/utility would need a PR but will be easier to get accepted as it would have no potential for a ripple effect through who knows what, that patches or extensions to existing packages could have.

I'm happy to give more input and feedback (or even argue a bit more) if you feel it would be useful - here or elsewhere.