Ubus open file limit / lots of ubus call processes

rikt · October 10, 2024, 3:24pm

Hi,

Every now and then, I have to reboot my openwrt router as connecting to its wifi becomes an issue. In that case it's no longer possible to log in over SSH, and sometimes also not the web interface.

When I am able to log in on the web interface, I notice there are a lot of ubus call processes running.

I've rebooted the router and after a few days there are already more than 400 ubus processes running. They all seem to be the same call:

ubus call network.interface notify_proto { "action": 0, "link-up": true, "data": { "passthru": "001700202a0218000100000000000000004300022a0218000100000000000000004300010018000c0774656c656e657402626500001f00202a0218000100000200000000000000032a021800010000020000000000000001" }, "keep": false, "ip6addr": [ { "ipaddr": "2a02:1810:a59b:ef00:413e:501e:9bd9:a839", "mask": "128", "preferred": 68279, "valid": 241079, "offlink": true }, { "ipaddr": "2a02:1810:a59b:ef00:ce40:d0ff:fe49:8707", "mask": "64", "preferred": 59288, "valid": 232088, "offlink": true } ], "routes6": [ { "target": "::", "netmask": "0", "gateway": "fe80::5667:51ff:fe3c:fa01", "metric": 384, "valid": 1800, "source": "2a02:1810:a59b:eff0::\/60" }, { "target": "::", "netmask": "0", "gateway": "fe80::5667:51ff:fe3c:fa01", "metric": 384, "valid": 1800, "source": "2a02:1810:a59b:ef00:413e:501e:9bd9:a839\/128" }, { "target": "::", "netmask": "0", "gateway": "fe80::5667:51ff:fe3c:fa01", "metric": 384, "valid": 1800, "source": "2a02:1810:a59b:ef00:ce40:d0ff:fe49:8707\/64" }, { "target": "2a02:1810:a59b:ef00::", "netmask": "64", "metric": 256, "valid": 232088 }, { "target": "2a02:1810:a59b:ef00::", "netmask": "64", "gateway": "fe80::5667:51ff:fe3c:fa01", "metric": 512, "valid": 259200 } ], "ip6prefix": [ "2a02:1810:a59b:eff0::\/60,68279,241079" ], "dns": [ "2a02:1800:100::43:2", "2a02:1800:100::43:1" ], "dns_search": [ "telenet.be" ], "interface": "WAN6" }

All processes seem to be similar. Any idea on how to debug where those processes are coming from? They all have pid 1 as their parent pid. Which process could be spawning those processes, and why?

Regards,
Rik

Cthulhu88 · October 10, 2024, 6:26pm

It's coming from netifd, but this seems like yet another case of "My ISP doesn't know how to properly configure IPv6".

Try disabling wan6 and see if the problem goes away.

This is PPPoE, right?

rikt · October 10, 2024, 6:41pm

Hi,

No, the openwrt router just does DHCP to the CPE (router) of my ISP.

I need IPv6 so can't just disable it. Any idea what triggers this?

Regards,
Rik

Cthulhu88 · October 10, 2024, 6:52pm

As I've mentioned above, it's netifd through https://github.com/openwrt/netifd/blob/master/scripts/netifd-proto.sh#L318-L323

Now what exactly is triggering it like that within netifd, I don't know (lack of information to make a better judgment).
From what you posted, mentioning they are all like the one you posted, I am assuming this is another case of your ISP having a badly configured IPv6 network (and likely spamming you with it).

Also, like I suggested, you will have to disable wan6 to troubleshoot the issue. If the issue goes away, plug your PC directly into the modem and start capturing DHCPv6 and RA exchanges with something like Wireshark.

Cthulhu88 · October 10, 2024, 7:05pm

My best guess: proto_send_update "$INTERFACE" from /lib/netifd/dhcpv6.script, which is invoked by odhcp6c.

Can you post the results of continuously running pgrep -af odhcp6c | wc -l ?

rikt · October 10, 2024, 9:16pm

Hi,

The pgrep in a loop doesn't show much. It only lists a single process.

I've checked the start time of all the ubus commands and I've noticed that a lot of them have started at the same time, with several hours between them.

Here are the start times of all the processes and their frequency. They only started at 7:39 as that's when I pkilled all of them.

So they come in burst. But why do they hang? All the processes have pid 1 as their parent pid, so their real parent probably already terminated and they were re-parented to pid 1.

I've added a "logger" command to dhcpv6.script so I can trace when it's called.

Regards,
Rik

rikt · October 10, 2024, 9:43pm

The script is called every few minutes. I've also added "before" and "after" logger statements to the proto_send_update in that file. The before statements are logged by syslog, but the after statements are never logged. So this indeed seems to trigger the processes.

Unfortunately that still doesn't help me understand why proto_send_update would hang.

Regards,
Rik

Cthulhu88 · October 10, 2024, 9:55pm

Add a logger just above this line case "$2" in and log the value of $2.

rikt · October 11, 2024, 5:45am

Hi,

I've added the additional logger line.

To make sure my router would remain workable, I killed all ubus call processes. In the past the system would prevent me from logging in at some point.

So far, the processes are still being called but don't result in a process that remains. The parameter 2 is always ra-updated so far.

I will update when the ubus call ... processes start to increase again.

Regards,
Rik

Cthulhu88 · October 11, 2024, 5:00pm

How often are those ra-updated calls happening? They happen when odhcp6c receives new RA (router advertisement) from your ISP, but there are problems; see my own problem for reference: Odhcp6c issue with my ISP's configuration

If the RAs you're receiving aren't a mess like mine, you can delay these by setting this /etc/config/network, wan6 interface:

option ra_holdoff '180'

180 being the number of seconds to ignore identical RAs. Set it to whatever you feel like.

rikt · October 11, 2024, 5:04pm

Hi,

They happen about every one to 3 minutes.

So far there are no processing hanging. So it's not always an issue that they are called so frequently, but sometimes the ubus calls seem to hang. What could cause ubus to hang at that point?

I will look into your issue and comment on the config change.

Regards,
Rik

rikt · October 15, 2024, 6:29pm

Hi,

Issue happens again frequently. Although the script is called frequently, it does not explain why suddenly ubus calls starts to hang.

If I run ubus call network.interface.WAN6 status manually, it also hangs and can not be aborted with ctrl-c.

Running the same command for other interfaces also hangs.

Running other ubus calls still works, for example ubus call system info works fine. So it seems the service that ubus tries to call for network.interface calls hangs at some point. Is this the netifd daemon?

Regards,
Rik

Cthulhu88 · October 16, 2024, 9:02pm

Yes, netifd is the one that handles that call.

Should've probably asked early, but what version of OpenWrt are you running?

rikt · October 17, 2024, 5:24am

Hi,

I'm running 23.05.5, which should be the latest stable version.

Regards,
Rik

Cthulhu88 · October 17, 2024, 5:26am

Try forcing a deadlock and then immediately check logread | grep netifd, see if anything pops up.

rikt · October 19, 2024, 11:25am

Hi,

I haven't found a way to cause a deadlock yet.

I've also noticed that running logread also hangs. I also log to a remote syslog server (which seems to be done using logread) and that is still logging entries.

When I run ubus monitor in a terminal and then execute logread, it logs:

-> 77b01914 #77b01914          hello: {}
<- 77b01914 #00000000         lookup: {"objpath":"log"}
-> 77b01914 #00000000           data: {"objpath":"log","objid":-288538791,"objtype":1535440382,"signature":{"read":{"lines":5,"stream":7,"oneshot":7},"write":{"event":3}}}
-> 77b01914 #00000000         status: {"status":0}
<- 77b01914 #eecd3f59         invoke: {"objid":-288538791,"method":"read","data":{"stream":true,"oneshot":true}}
-> f33b77fc #77b01914         invoke: {"objid":-288538791,"method":"read","data":{"stream":true,"oneshot":true},"user":"root","group":"root"}

Running ubus call log read also hangs, so it seems some of the ubus services always hang while others still work.

Regards,
Rik

rikt · October 19, 2024, 11:31am

Hi,

After running strace -p <PID OF logd process>, a log of older logs started to stream to my syslog server, and this also unblocked the logread command.

The logread command (and the ubus call log read) now work again.

Regards,
Rik