Mikrotik RBM33 Too many udhcp/ odhcp6 processes being started

I am trying to find the issue in many OpenWRT installations on Mikrotik RBM33 boards in the field. The OpenWRT version is rather old, OpenWrt 19.07.8. The router has Quectel EC200T 4G module that provides WAN. Customer has private static IP SIM cards installed on them. They monitor remote device connected to LAN from their monitoring system.
The issue is that every time there is a reconnection of cellular connection, a new process for udhcpc and odhcpv6 is spawn. This keeps going until some time when router stops responding and router needs to be restarted from power down/up.
I am attaching screenshot showing too many processes.

Can someone tell me why is that happening?

19.07? omg, that's old. Maybe you hit a very old bug in some very old software.

I don't even remember which bug tracker this project used back then.

Do you have a lab with a replica of the live setup, where you can upgrade to 23.05 and see if that works better, before any potential roll-out?

That would be my first suggestion.

Other than that, since I really have no idea :slight_smile:, best I can do is google it for you..

There's a call here that looks like it might select command-line arguments for running udhcpc, and something here that looks like a companion for killing that process. Similarly, looks like command-line arguments for odhcp6c might be selected here with a companion to kill here.

I think the command plus arguments end up being wrapped in JSON by a shell function here and here and the resulting JSON is thrown at _proto_notify() which seems to roll them into ubus calls, probably ends up looking something like this:

root@XNET:~# ubus call network.interface notify_proto { "action": 1, "command": ["blah", "blah"], "interface": "eth0.2" }
root@XNET:~# ubus call network.interface notify_proto { "action": 2, "signal": blah, "interface": "eth0.2" }

There's something looking like a notify_proto ubus call handler here.

Seems like "action 1" (run) is forwarded to proto_shell_run_command() and "action 2" (kill) is forwarded to proto_shell_kill_command().

The kill code just looks up the PID in some struct and sends it a signal via kill().

The run code calls into netifd_start_process() to fork the relevant daemon and record its PID.

There's also some more logic to kill processes (maybe when interfaces are no longer referenced and such?), see the rest of the netifd code.

The struct that netifd uses to keep track of a process is here.

My best guess would be... maybe the netifd process got killed and that is why it's loosing track of daemons? Just a completely wild guess though.

You could take a look at the logs with the logread command and see if anything like that sticks out. Perhaps also look at the Parent-PID of each dhcp daemon and see if the daemons were spawned by the same netifd process or not (PPID should be in /proc/<pid>/stat in field no. 4).

Fingers crossed that someone with more experience chimes in here! :crossed_fingers: