Ubus open file limit / lots of ubus call processes

Hi,

Every now and then, I have to reboot my openwrt router as connecting to its wifi becomes an issue. In that case it's no longer possible to log in over SSH, and sometimes also not the web interface.

When I am able to log in on the web interface, I notice there are a lot of ubus call processes running.

I've rebooted the router and after a few days there are already more than 400 ubus processes running. They all seem to be the same call:

ubus call network.interface notify_proto { "action": 0, "link-up": true, "data": { "passthru": "001700202a0218000100000000000000004300022a0218000100000000000000004300010018000c0774656c656e657402626500001f00202a0218000100000200000000000000032a021800010000020000000000000001" }, "keep": false, "ip6addr": [ { "ipaddr": "2a02:1810:a59b:ef00:413e:501e:9bd9:a839", "mask": "128", "preferred": 68279, "valid": 241079, "offlink": true }, { "ipaddr": "2a02:1810:a59b:ef00:ce40:d0ff:fe49:8707", "mask": "64", "preferred": 59288, "valid": 232088, "offlink": true } ], "routes6": [ { "target": "::", "netmask": "0", "gateway": "fe80::5667:51ff:fe3c:fa01", "metric": 384, "valid": 1800, "source": "2a02:1810:a59b:eff0::\/60" }, { "target": "::", "netmask": "0", "gateway": "fe80::5667:51ff:fe3c:fa01", "metric": 384, "valid": 1800, "source": "2a02:1810:a59b:ef00:413e:501e:9bd9:a839\/128" }, { "target": "::", "netmask": "0", "gateway": "fe80::5667:51ff:fe3c:fa01", "metric": 384, "valid": 1800, "source": "2a02:1810:a59b:ef00:ce40:d0ff:fe49:8707\/64" }, { "target": "2a02:1810:a59b:ef00::", "netmask": "64", "metric": 256, "valid": 232088 }, { "target": "2a02:1810:a59b:ef00::", "netmask": "64", "gateway": "fe80::5667:51ff:fe3c:fa01", "metric": 512, "valid": 259200 } ], "ip6prefix": [ "2a02:1810:a59b:eff0::\/60,68279,241079" ], "dns": [ "2a02:1800:100::43:2", "2a02:1800:100::43:1" ], "dns_search": [ "telenet.be" ], "interface": "WAN6" }

All processes seem to be similar. Any idea on how to debug where those processes are coming from? They all have pid 1 as their parent pid. Which process could be spawning those processes, and why?

Regards,
Rik

It's coming from netifd, but this seems like yet another case of "My ISP doesn't know how to properly configure IPv6".

Try disabling wan6 and see if the problem goes away.

This is PPPoE, right?

Hi,

No, the openwrt router just does DHCP to the CPE (router) of my ISP.

I need IPv6 so can't just disable it. Any idea what triggers this?

Regards,
Rik

As I've mentioned above, it's netifd through https://github.com/openwrt/netifd/blob/master/scripts/netifd-proto.sh#L318-L323

Now what exactly is triggering it like that within netifd, I don't know (lack of information to make a better judgment).
From what you posted, mentioning they are all like the one you posted, I am assuming this is another case of your ISP having a badly configured IPv6 network (and likely spamming you with it).

Also, like I suggested, you will have to disable wan6 to troubleshoot the issue. If the issue goes away, plug your PC directly into the modem and start capturing DHCPv6 and RA exchanges with something like Wireshark.

My best guess: proto_send_update "$INTERFACE" from /lib/netifd/dhcpv6.script, which is invoked by odhcp6c.

Can you post the results of continuously running pgrep -af odhcp6c | wc -l ?

Hi,

The pgrep in a loop doesn't show much. It only lists a single process.

I've checked the start time of all the ubus commands and I've noticed that a lot of them have started at the same time, with several hours between them.

Here are the start times of all the processes and their frequency. They only started at 7:39 as that's when I pkilled all of them.

   147 07:39
      1 07:40
      1 07:42
      1 07:45
      1 07:47
      2 07:53
    287 17:11
      1 17:13
    165 22:50
      2 22:53

So they come in burst. But why do they hang? All the processes have pid 1 as their parent pid, so their real parent probably already terminated and they were re-parented to pid 1.

I've added a "logger" command to dhcpv6.script so I can trace when it's called.

Regards,
Rik

The script is called every few minutes. I've also added "before" and "after" logger statements to the proto_send_update in that file. The before statements are logged by syslog, but the after statements are never logged. So this indeed seems to trigger the processes.

Unfortunately that still doesn't help me understand why proto_send_update would hang.

Regards,
Rik

Add a logger just above this line case "$2" in and log the value of $2.

Hi,

I've added the additional logger line.

To make sure my router would remain workable, I killed all ubus call processes. In the past the system would prevent me from logging in at some point.

So far, the processes are still being called but don't result in a process that remains. The parameter 2 is always ra-updated so far.

I will update when the ubus call ... processes start to increase again.

Regards,
Rik

How often are those ra-updated calls happening? They happen when odhcp6c receives new RA (router advertisement) from your ISP, but there are problems; see my own problem for reference: Odhcp6c issue with my ISP's configuration

If the RAs you're receiving aren't a mess like mine, you can delay these by setting this /etc/config/network, wan6 interface:

option ra_holdoff '180'

180 being the number of seconds to ignore identical RAs. Set it to whatever you feel like.

Hi,

They happen about every one to 3 minutes.

So far there are no processing hanging. So it's not always an issue that they are called so frequently, but sometimes the ubus calls seem to hang. What could cause ubus to hang at that point?

I will look into your issue and comment on the config change.

Regards,
Rik

Hi,

Issue happens again frequently. Although the script is called frequently, it does not explain why suddenly ubus calls starts to hang.

If I run ubus call network.interface.WAN6 status manually, it also hangs and can not be aborted with ctrl-c.

Running the same command for other interfaces also hangs.

Running other ubus calls still works, for example ubus call system info works fine. So it seems the service that ubus tries to call for network.interface calls hangs at some point. Is this the netifd daemon?

Regards,
Rik

Yes, netifd is the one that handles that call.

Should've probably asked early, but what version of OpenWrt are you running?

Hi,

I'm running 23.05.5, which should be the latest stable version.

Regards,
Rik

Try forcing a deadlock and then immediately check logread | grep netifd, see if anything pops up.

Hi,

I haven't found a way to cause a deadlock yet.

I've also noticed that running logread also hangs. I also log to a remote syslog server (which seems to be done using logread) and that is still logging entries.

When I run ubus monitor in a terminal and then execute logread, it logs:

-> 77b01914 #77b01914          hello: {}
<- 77b01914 #00000000         lookup: {"objpath":"log"}
-> 77b01914 #00000000           data: {"objpath":"log","objid":-288538791,"objtype":1535440382,"signature":{"read":{"lines":5,"stream":7,"oneshot":7},"write":{"event":3}}}
-> 77b01914 #00000000         status: {"status":0}
<- 77b01914 #eecd3f59         invoke: {"objid":-288538791,"method":"read","data":{"stream":true,"oneshot":true}}
-> f33b77fc #77b01914         invoke: {"objid":-288538791,"method":"read","data":{"stream":true,"oneshot":true},"user":"root","group":"root"}

Running ubus call log read also hangs, so it seems some of the ubus services always hang while others still work.

Regards,
Rik

Hi,

After running strace -p <PID OF logd process>, a log of older logs started to stream to my syslog server, and this also unblocked the logread command.

The logread command (and the ubus call log read) now work again.

Regards,
Rik