Solved: Default_postinst hangs

Hi;

I am having issues with a package postinst script hanging a much larger sequence of install commands in a script under luci (nginx)

There is no postinst-pkg script

/usr/lib/opkg/info/$pkg_name.postinst:
#!/bin/sh
[ "${IPKG_NO_SCRIPT}" = "1" ] && exit 0
. ${IPKG_INSTROOT}/lib/functions.sh
default_postinst $0 $@

The hang is due to password locks:

ps -Af | grep lock:
root     20226     1  0 Oct09 ?        00:00:00 lock /var/lock/group
root     20230     1  0 Oct09 ?        00:00:00 lock /var/lock/passwd
root     20240 20221  0 Oct09 ?        00:00:00 lock /var/lock/passwd

ps -Af | grep 20221
root      7433  5768  0 07:28 pts/0    00:00:00 grep 20221
root     20221 20220  0 Oct09 ?        00:00:00 /bin/sh ///usr/lib/opkg/info/$PKG_NAME.postinst configure
root     20240 20221  0 Oct09 ?        00:00:00 lock /var/lock/passwd

It appears that postinst is calling /lib/functions.sh:default_postinst which hangs due to existing (by root locks). There is no reason to call /lib/functions.sh:default_postinst

What I would like is:
a) has this behavior been seen and corrected before (i'm at r50108, slowly migrating to trunk)
b) An explanation for the logic behind this behavior and, if it is normal
c) What, if anything I can do in the Makefile to avoid calling /lib/functions.sh:default_postinst or, change postinst to #!/bin/sh, exit 0

Thanks; Bill

First of all; can you replicate the same issue with uhttpd?

Hi Jow; Issue showed up with nginx, so, no

Then the problem is likely with the interaction of nginx and LuCI. Since nginx does not natively support CGI I assume you use some kind of wrapper which might or might not be the culprit. Maybe something leaks FDs across forks, leading to the observed behaviour.

To answer your actual questions

There's been various issues with nginx that required workarounds in LuCI, so maybe it is addressed now.

The behaviour is not normal

You can't influence that from makefiles.

Thanks Jow; I will check luci git issue log and, possibly update luci to trunk. If I find a non-hacky solution, will report back.

Regards;
Bill

So; the best / pragmatic approach was to create custom package.mk and opkg.mk for inclusion in the problematic Makefile. This creates a postinst script that just returns, eliminating the lock problem.

My reasons are
a) none of the luci/nginx packages have any reported issues that seem pertinent
b) not enuf info to debug. Additional problems may show up, allowing insight
c) My working revision (r50108) is throwaway, a stepping stone to trunk (I have many custom packages such as xorg and vmware workstation)
d) Allow time for trunk to catch up to this bug

Regards;
Bill

To add a bit more context for the above... the default_postinst() is supposed to be always called, whether or not a package-specific script exits, since it performs key functions. In bypassing it, you also skip running any uci-defaults, enabling and starting the service. Hacking around it is OK to make other progress, but the root cause is what seems worrisome.

Can you emulate the "much larger sequence of install commands" you mentioned before, to determine what earlier package install is taking the locks? Does ps tell you the parent process of the one holding the first lock (i.e. PID 20230 in your first post)?

I already put a wrapper around /bin/lock which dumps the parent process and command line to a logfile to determine just that. The first two (root) locks came from ash, not sure exactly where in the command sequence.

When I revisit, I will make /bin/lock (wrapper) stall to answer the question.

Thanks for the explanation and pointing out that default_postinst is the norm and, the REAL problem is the pre-existing locks.

Regards;
Bill

So, I revisited this issue and reverted the package.mk changes (so default.postinst is default), monitored the locks.

A substantial amount of work is required from nginx (2 threads) from my custom script under luci. It was also stalling often.

Turns our the "problem" was /etc/uwsgi.conf, in particular "limit-as = 200", Changing it to "1000" eliminated both the stalls and the locks. My /bin/lock wrapper script now reports no locks.

I suggest that uwsgi.conf be changed in two areas:
limit-as = 1000
logger = file:/var/log/uwsgi.log (syslog clutter)

I speculate the following chain of events:
uwsgi thread a hit resource limits, locked
uwsgi thread b hit resource limits, locked
deadlock

Regards;
Bill

1 Like

Nice detective work!

This seems to be the sort of thing to easily suck up a huge amount of someone else's troubleshooting time :confused:, so perhaps also raise an issue/PR in the package repo and see what the maintainer thinks about your suggestion?

Best regards...

done: https://github.com/openwrt/packages/issues/7250

1 Like