Why no mawk for OpenWrt?

Recently I got to compare gawk vs mawk performace when parsing one particularly large csv file, and was surprised by mawk being about 2x as fast. A bit of research revealed that this is not an accident and mawk is generally much faster. Looking at the binary size, /usr/bin/mawk is 155K while /usr/bin/gawk is 689K. That is binaries installed by a debian package (mawk is v1.3.4, apparently developed here) . Of course, the gawk binary on OpenWrt is smaller - 460K (for x86 architecture), but this is still 3x larger than mawk, and then mawk's size could probably be reduced as well by stripping debug info etc.

So I wonder why is there no mawk package for OpenWrt?

1 Like

The default awk implementation on OpenWrt is busybox-awk, everything else are optional (yes, may be selected as package dependencies) addon packages. None of the default packages generally depend on other awk implementations (beyond busybox-awk).

Similarly to bash vs the defined requirements for a (minimal) POSIX compliant shell, gawk extends the feature set of POSIX' awk requirements quite a bit - and yes, there are some things mawk can do, which aren't supported by gawk as well (e.g. random numbers).

In general, most userspace software depending on awk functionality tend to be either content with just about 'any' (POSIXly correct) awk provider XOR make heavy use of gawk specific features, so the need to package gawk for leaf packages using it) is usually glaringly obvious, while for most other uses, busybox awk will do (there is few software specifically depending on mawk features, or BSD awk features, or, … there certainly are such instances nevertheless).

I'm pretty sure this is the reason why gawk had to be packaged, while mawk isn't - no one (really) specifically needed it. At the same time I'm also confident that mawk would be accepted into the packages feed, if anyone wants to maintain it longer term. I see no reason for mawk to be rejected, but I would expect a stern question if the additional maintenance burden really is necessary (the answer to this could very well be 'yes').

EDIT: for most common uses on a typical OpenWrt installation, awk performance is not necessarily in the hot path, there typically is no large scale data crunching/ transposing involved - and even in your (presumed) case (processing large adblock lists), this is 'only' a startup/ refresh penalty, not a runtime (DNS query) penalty. So in many cases, the typical size and less-maintenance aspects prevail.

3 Likes

You can install gcc on your x86 vm and try it out.

I'm sure there is a way to compile it but I don't really want to deal with submitting and maintaining the package for now.

Just for the record, Busybox awk is yet slower than gawk, and for some types of tasks slower by a multiplier. So projects which do awk processing on large lists (all adblockers except maybe AGH, banip, geoip-shell and probably some others) strongly recommend users to install gawk. If mawk was available, for sure it would immediately be the go-to awk because of both the speed and the size. But yeah, someone will need to port it and maintain it, and as I wrote above, it won't be me, at least for now. But maybe this will give an idea to someone else.

1 Like

Yes, it is more compact, builds with plain ./configure ; make

root@OpenWrt:/tmp/tmp.bmkhfC/mawk-1.3.4-20260302# ls -l mawk
-rwxr-xr-x    1 root     root        179496 Mar  3 04:47 mawk
root@OpenWrt:/tmp/tmp.bmkhfC/mawk-1.3.4-20260302# ls -l `which gawk`
-rwxr-xr-x    1 root     root        471297 Mar  1 06:51 /usr/bin/gawk
root@OpenWrt:/tmp/tmp.bmkhfC/mawk-1.3.4-20260302# 

Parsing files locally on OpenWrt falls under measurable timing differences (gawk leans towards 1ms probably because it is a sip larger)

2 Likes