Hotplug/iface script gets not executed on boot up

Add this line before the the first echo command, reboot, and see if there is anything in that file.

exec >>/tmp/exec.log 2>&1

Also, remove kmsg and use /tmp/if_dbg to monitor the script.

UPDATE: I do not know the sequence of things, but it is possible that kmsg is not always fully setup yet by the time the hot plug scripts are executed, so using a file is a more reliable solution.

I did some further tests.

Installed a new and clean OpenWRT instance in VMware.
Only changed its static lan IP address to DHCP.

Created a test script /root/test.sh

#!/bin/sh
sleep 30

if logread | grep -F "Reloading firewall"; then
        reboot
fi

Added a call for this script to /etc/rc.local.

This script issues a reboot after 30s if there is "Reloading firewall" firewall detected in the log, which gets created by the /etc/hotplug.d/iface/20-firewall script which came with OpenWRT.

This was running for more than an hour and it never stopped rebooting.
Means there was no problem running the hotplug scrips.

Then I installed these modules, but changed nothing else. Same modules are installed on the other instance which causes the problems.
base-files busybox comgt-ncm dnsmasq dropbear e2fsprogs flock htop ip6tables iptables-mod-dnetmap iptables-mod-nat-extra kernel kmod-3c59x kmod-8139too kmod-bnx2 kmod-button-hotplug kmod-e100 kmod-e1000 kmod-e1000e kmod-igb kmod-ipt-offload kmod-natsemi kmod-ne2k-pci kmod-nf-nathelper-extra kmod-pcnet32 kmod-r8169 kmod-sis900 kmod-tg3 kmod-usb-net-rndis kmod-via-rhine kmod-via-velocity logd luci-app-wol luci-proto-3g luci-ssl miniupnpc mkf2fs mtd natpmpc ntpclient odhcp6c odhcpd-ipv6only openvpn-openssl partx-utils picocom ppp ppp-mod-pppoe ppp-mod-pptp px5g-mbedtls uci urandom-seed urngd usb-modeswitch usbutils

I did not change anything in the configuration, but after the installation of these modules my test script stopped after a couple reboots.
Means to me that probably one of these packages causes a problem to the execution of the iface hotplug script.

2 Likes

Now, you can start bisecting the list to foggier out which package makes this happen.

1 Like

The reason messages could be missing is because the logger might not be fully operational yet. The more packages you install, the more work the router has to do on startup, the higher probability that the logger starts later sometimes.
The best approach is to use flock and write in your own file: the file system is definitely up and operational when the scripts are invoked.

1 Like

this might help you... replace / make a new version of /sbin/hotplug-call

sbin-hotplugcall-dbg
#!/bin/sh
export HOTPLUG_TYPE="$1"
. /lib/functions.sh
PATH="/usr/sbin:/usr/bin:/sbin:/bin"
LOGNAME=root
USER=root
export PATH LOGNAME USER
export DEVICENAME="${DEVPATH##*/}"

filterset() {
	while read THING; do
		case $THING in
			"PS"*|"LINENO"*|"HOSTNAME"*) continue; ;;
			"USER"*|"TERM"*|"SHLVL"*) continue; ;;
			"PPID"*|"PATH"*|"OPTIND"*) continue; ;;
			"IFS="*|"N="*|"'") continue; ;;
			"PWD="*|"HOME="*|"_C"*) continue; ;;
			"LOGNAME="*|"hlogfile"*|"_C"*) continue; ;;
			*) echo $THING; ;;
		esac
	done
}


STMP=`date +%H%M%S`
hlogfile="/tmp/HOTPLUG-sbin-hotplug-call-$HOTPLUG_TYPE"
hlogfilec="/tmp/HOTPLUG-sbin-hotplug-call-cumulative"

STMP=`date +%H%M%S`
if ! ps w | grep '/sbin/logd' | grep -v 'grep'; then STMP="${STMP}-nologd"; fi

echo "sbin-hotplugcall-$STMP $HOTPLUG_TYPE" > /dev/kmsg

case $HOTPLUG_TYPE in
	ntp) :; ;;
	firewall) echo `date +%H%M%S` ${*} ${DEVPATH##*/} $ACTION INTERFACE:$INTERFACE >> $hlogfilec; ;;
	net) echo `date +%H%M%S` ${*} $ACTION $DEVICENAME $DEVPATH >> $hlogfilec; ;;
	iface) echo `date +%H%M%S` ${*} $ACTION INTERFACE:$INTERFACE $DEVPATH >> $hlogfilec; ;;
	*) set | filterset >> $hlogfile; ;;
esac

echo /sys/$DEVPATH >> $hlogfile.$DEVICENAME.alluevent
cat /sys/$DEVPATH/uevent >> $hlogfile.$DEVICENAME.alluevent

if [ \! -d /etc/hotplug.d/$1 ]; then
	echo "sbin-hotplugcall-$STMP $HOTPLUG_TYPE NODIR-/etc/hotplug.d/$1" > /dev/kmsg
	echo "sbin-hotplugcall-$STMP $HOTPLUG_TYPE NODIR-/etc/hotplug.d/$1" >> $hlogfilec
fi

if [ \! -z "$1" -a -d /etc/hotplug.d/$1 ]; then
	for script in $(ls /etc/hotplug.d/$1/* 2>&-); do (
		[ -f $script ] && . $script
	); done
fi
1 Like

I just had a read through this thread and coincidentally something I posted last night may be helpful here: see this Libudev package

In particular, the udevadm monitor command will print out the real kernel uevents as they happen and you can use it to see if the uevent itself is being generated or not. Just start it up with a procd script very early in the boot process and pipe the output to a file.

If you use this in combination with the hotplug debug script @anon50098793 posted above, you'll get a detailed view of the events being generated and being caught by the hotplug handler that should help to debug a bit more transparently.

3 Likes

I was able to trace that issue back to the module ntpclient.
As soon as I install this module, the problem with the hotplug script occours.

This is the comparision of the log files, HOTPLUG-sbin-hotplug-call-firewall..alluevent was created only on R4.


R6 has the ntpclient package installed and it does not run some of the hotplug scrips.
All other log files are identical.

It the /etc/hotplug.d/iface/20-ntpclient script which causes the problem.
If I add there exit 0 on the beginn of the script, then the issue stops.

Solution:
Remove ntpclient since it is already included in the current OpenWRT version.

On R5 ist ntpclient not installed.

root@OpenWrt-R5:~# ps | grep hotplug
 2673 root      1044 S<   /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p 0.openwrt.pool.ntp.org -p 1.openwrt.pool.ntp.org -p 2.openwrt.pool.ntp.org -p 3.openwrt.pool.ntp.org
 2724 root      1040 R    grep hotplug

On R6 is ntpclient installed.

root@OpenWrt-R6:~# ps | grep hotplug
 2477 root      1088 S    /bin/sh /sbin/hotplug-call iface
 2517 root      1108 S    /bin/sh /sbin/hotplug-call iface
 2633 root      1040 S<   /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p 0.openwrt.pool.ntp.org -p 1.openwrt.pool.ntp.org -p 2.openwrt.pool.ntp.org -p 3.openwrt.pool.ntp.org
 2764 root      1040 R    grep hotplug

It seems to me as if the hotplug-call is hanging.

Even if there is a problem within the 20-ntpclient script, why does it suppress the execution of other scrips?

That I do not know. You would have to debug the scripts to figure out why.

The process tree shows that there a two hotplug-call scripts in a row.

|- 2191 /sbin/netifd
|   |- 2421 /bin/sh /sbin/hotplug-call iface
|   |   L- 2477 /bin/sh /sbin/hotplug-call iface
|   |       L- 2491 /usr/sbin/ntpclient -c 1 -p 123 -i 2 -h 0.openwrt.pool.ntp.org
|   L- 2437 udhcpc -p /var/run/udhcpc-br-lan.pid -s /lib/netifd/dhcp.script -f -t 0 -i br-lan -x hostname:OpenWrt-R6 -C -O 121

Does this mean, this is is a recursive call of the hotplug script? Is this the way how it should be?

uci show system | grep ntp
netstat -lnp | grep ntp
root@OpenWrt-R5:~# uci show system | grep ntp
system.ntp=timeserver
system.ntp.enabled='1'
system.ntp.enable_server='0'
system.ntp.server='0.openwrt.pool.ntp.org' '1.openwrt.pool.ntp.org' '2.openwrt.pool.ntp.org' '3.openwrt.pool.ntp.org'

root@OpenWrt-R5:~# netstat -lnp | grep ntp

root@OpenWrt-R5:~# ./pstree.sh
1 /sbin/procd
|- 1811 /sbin/ubusd
|- 1812 /sbin/askfirst /usr/libexec/login.sh
|- 1813 /bin/ash --login
|- 2041 /sbin/logd -S 64
|- 2072 /sbin/rpcd -s /var/run/ubus.sock -t 30
|- 2135 /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
|- 2168 /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300 -T 3
|   |- 2599 /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300 -T 3
|   |   L- 2600 -bash
|   L- 2715 /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300 -T 3
|       L- 2716 -bash
|           L- 2913 /bin/sh ./pstree.sh
|- 2223 /sbin/netifd
|   |- 2478 /bin/sh /sbin/hotplug-call iface
|   |   L- 2518 /bin/sh /sbin/hotplug-call iface
|   |       L- 2526 /usr/sbin/ntpclient -c 1 -p 123 -i 2 -h 0.openwrt.pool.ntp.org
|   L- 2493 udhcpc -p /var/run/udhcpc-br-lan.pid -s /lib/netifd/dhcp.script -f -t 0 -i br-lan -x hostname:OpenWrt-R5 -C -O 121
|- 2260 /usr/sbin/odhcpd
|- 2320 /usr/sbin/uhttpd -f -h /www -r OpenWrt-R5 -x /cgi-bin -t 60 -T 30 -k 20 -A 1 -n 3 -N 100 -R -p 0.0.0.0:80 -p [::]:80
L- 2641 /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p 0.openwrt.pool.ntp.org -p 1.openwrt.pool.ntp.org -p 2.openwrt.pool.ntp.org -p 3.openwrt.pool.ntp.org

root@OpenWrt-R5:~# kill 2526
After that it did execute the firewall hotplug script,

[  249.441107] sbin-hotplugcall-122148 iface
[  249.456593] sbin-hotplugcall-122148 firewall
[  249.458356] sbin-hotplugcall-122148 firewall NODIR-/etc/hotplug.d/firewall

but this does not happen all the time I run the kill command.

root@OpenWrt-R5:~# ./pstree.sh
1 /sbin/procd
|- 1811 /sbin/ubusd
|- 1812 /sbin/askfirst /usr/libexec/login.sh
|- 1813 /bin/ash --login
|- 2041 /sbin/logd -S 64
|- 2072 /sbin/rpcd -s /var/run/ubus.sock -t 30
|- 2135 /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
|- 2168 /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300 -T 3
|   |- 2599 /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300 -T 3
|   |   L- 2600 -bash
|   L- 2715 /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300 -T 3
|       L- 2716 -bash
|           L- 3308 /bin/sh ./pstree.sh
|- 2223 /sbin/netifd
|   L- 2493 udhcpc -p /var/run/udhcpc-br-lan.pid -s /lib/netifd/dhcp.script -f -t 0 -i br-lan -x hostname:OpenWrt-R5 -C -O 121
|- 2260 /usr/sbin/odhcpd
|- 2320 /usr/sbin/uhttpd -f -h /www -r OpenWrt-R5 -x /cgi-bin -t 60 -T 30 -k 20 -A 1 -n 3 -N 100 -R -p 0.0.0.0:80 -p [::]:80
|- 2641 /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p 0.openwrt.pool.ntp.org -p 1.openwrt.pool.ntp.org -p 2.openwrt.pool.ntp.org -p 3.openwrt.pool.ntp.org
L- 3101 /usr/sbin/ntpclient -i 600 -s -l -D -p 123 -h 1.openwrt.pool.ntp.org

After I have killed all the ntpclients

root@OpenWrt-R5:~# netstat -lnp | grep ntp
udp        0      0 0.0.0.0:49476           0.0.0.0:*                           2640/ntpd
root@OpenWrt-R5:~# /usr/sbin/ntpclient -c 1 -p 123 -i 2 -h 0.openwrt.pool.ntp.org
44013 45069.716   11032.0     66.1    3724.2  24566.7         0

root@OpenWrt-R5:~# /usr/sbin/ntpclient -c 1 -p 123 -i 2 -h 0.openwrt.pool.ntp.org
The second call hangs, I have to cancel it via Ctl+c.
If I call it a couple times in a row, then it stucks and this is probably what happens in the hotplug script and what causes the issue.

In general, for what is the ntpclient actually required?
So far as I have seen has the OpenWRT system 19.07.3 an ntp client already included.
I did setup my router with a setup script created a couple years ago, maybe was ntp not included in older versions of OpenWRT and this is the reason why the script installs ntpclient?
It does not help to set system.ntp.enabled='0', means it is not a problem to run both at the same time. Even if I kill all ntp processes and if I start a couple ntpclient commands in a row, then it stucks.

So what is then wrong with the ntpclient command that causes it to stuck?

Are you using DoT or DoH for DNS?

I am sure you can google it, but I do not have it on my router and never needed. It is not needed to keep the router system time in sync.

Not that I am aware of, I just installed a fresh OpenWRT system and changed its LAN interface to DHCP. After that I installed the ntpclient and the problem occurred.

opkg list-installed | grep ntp returned nothing on my router. It is up to you to decide if you need the ntpclient, which seems to be super old.

ntpclient - 2015_365-2 - NTP client for setting system time from NTP servers

The default ntpd from busybox seems to be doing the job for me:

ls -la /usr/sbin/ntpd /usr/sbin/ntpd-hotplug
lrwxrwxrwx    1 root     root            17 Jul  1 11:07 /usr/sbin/ntpd -> ../../bin/busybox
-rwxr-xr-x    1 root     root            45 Jun 28 08:37 /usr/sbin/ntpd-hotplug

The ntpclient hotplug script will block all subsequent scripts until the initial NTP client sync completed. Possibly there's a also a bug in this script and the ntpclient process never returns or only times out way later. Arguably that is a bug that should be adressed in the ntpclient package itself. Maybe it does not cope well with missing default routes - could be that it requires a very long time to time out.

To workaround the problem, you can prefix your 99-netmap script with something <= 20 to ensure it runs before the ntpclient hotplug. Even 20-netmap will do since 20-ne < 20-nt.

1 Like

I will remove ntpclient on my routers since it is not required anymore to sync the time.

Just wondering if ntpclient should get removed from OpenWRT in general since it is not required and it could cause problems to other users as well.
To not run some hotplug scripts like firewall might cause security issues.

Shouldn't it run then also 20-firewall before 20-ntpclient?

Yes, it should.