Ubus call uci commit '{"config":"network"}' is not reloading the network

HI,

I have a random issue as follows.
ubus call uci commit '{"config":"network"}' is not reloading the network. where as "ubus call network reload" is working perfectly.
ubus monitor shows

  1. config.change event is received by procd and the procd braodcasted the same
root@xxxx:/# ubus call uci commit '{"config":"network"}'
-> 71b95a84 #71b95a84          hello: {}
<- 71b95a84 #00000000         lookup: {"objpath":"uci"}
-> 71b95a84 #00000000           data: {"objpath":"uci","objid":-532061543,"objtype":227694426,"signature":{"configs":{},"get":{"config":3,"section":3,"option":3,"type":3,"match":2,"ubus_rpc_session":3},"state":{"config":3,"section":3,"option":3,"type":3,"match":2,"ubus_rpc_session":3},"add":{"config":3,"type":3,"name":3,"values":2,"ubus_rpc_session":3},"set":{"config":3,"section":3,"type":3,"match":2,"values":2,"ubus_rpc_session":3},"delete":{"config":3,"section":3,"type":3,"match":2,"option":3,"options":1,"ubus_rpc_session":3},"rename":{"config":3,"section":3,"option":3,"name":3,"ubus_rpc_session":3},"order":{"config":3,"sections":1,"ubus_rpc_session":3},"changes":{"config":3,"ubus_rpc_session":3},"revert":{"config":3,"ubus_rpc_session":3},"commit":{"config":3,"ubus_rpc_session":3},"apply":{"rollback":7,"timeout":5,"ubus_rpc_session":3},"confirm":{"ubus_rpc_session":3},"rollback":{"ubus_rpc_session":3},"reload_config":{},"backup_restore":{"value":7}}}
-> 71b95a84 #00000000         status: {"status":0}
<- 71b95a84 #e0496299         invoke: {"objid":-532061543,"method":"commit","data":{"config":"network"}}
-> 81d840fd #71b95a84         invoke: {"objid":-532061543,"method":"commit","data":{"config":"network"},"user":"root","group":"root"}
<- 81d840fd #00000000         lookup: {"objpath":"service"}
-> 81d840fd #00000000           data: {"objpath":"service","objid":-619821689,"objtype":-47218302,"signature":{"event":{"type":3,"data":2}}}
-> 81d840fd #00000000         status: {"status":0}
###########  service_handle_event:556 -- event received  ##############
###########  service_handle_event:577 -- Config.change event received for network  ##############
<- 81d840fd #608c8479         invoke: {"objid":-619821689,"method":"event","data":{"type":"config.change","data":{"package":"network"}}}
-> 6f05c0c9 #81d840fd         invoke: {"objid":-619821689,"method":"event","data":{"type":"config.change","data":{"package":"network"}},"user":"rpcd","group":"rpcd"}
<- 6f05c0c9 #608c8479         notify: {"objid":-619821689,"method":"config.change","data":{"config":"network"},"no_reply":true}
<- 6f05c0c9 #81d840fd         status: {"status":0,"objid":-619821689}
-> 81d840fd #608c8479         status: {"status":0,"objid":-619821689}
<- 81d840fd #71b95a84         status: {"status":0,"objid":-532061543}
-> 71b95a84 #e0496299         status: {"status":0,"objid":-532061543}

After this "/etc/init.d/network reload" is expected from procd triggers which was not happened. And it is random in nature. I am in 19.07.2 release.

ubus call service list for network does not have any issues, all the triggers are fine.

Any clue? how can it be debugged.
please help.

Committing changes doesn't reload the service automatically.

This should save the runtime UCI changes to the persistent config.

This should reload the service applying the config changes if any.

Hi @vgaetera
I did "ubus call uci add/set" and the new changes are part of /etc/config/network. But the same changes are not seen in system as netifd reload is not triggered with "ubus call uci commit".

What makes you believe that it should be triggered?
Committing and applying changes are 2 logically different operations.


I see, there's a difference between the following commands:

uci commit network
ubus call uci commit '{"config":"network"}'

The latter actually applies the changes.
Perhaps there's really some problem then.

My flow as follows:

  1. create an interface using ubus call uci set '{--- data ---}'
  2. commit the changes with > ubus call uci commit '{"config":"network"}'

Expectation:
New interface should be created and the same has been happening most of the times... But very randomly this is not happening....
Usually after #2, /etc/config/network will be updated, the netifd will be reloaded and the new interface can be seen in the "ifconfig".
But in the failure case... netifd is not reloaded... and it will start working only after restarting it.

1 Like

Any clues for further debugging? As of now i have added a prints in ubus and checking where it gt terminated.
Kindly share me if any otherways are available.

Thanks

1 Like

Try to isolate the issue and identify regularities in its behavior.
If the issue depends on the section type, protocol, option name and/or current status.
Whether it affects newly configured sections, or already existing ones.

There's also the following method:

ubus call uci reload_config

Perhaps this is the recommended way?

On the other hand, it might be a bug:

In that case, you can report it:

@vgaetera
I have debugged and found that in trigger_command_api function(sevice/trigger.c), cmd entry is always available in avl and procd requeuing the command. The sample code snippet as follows. But the cmd is not deleting from the avl. Ideally the timeout should happen entry should be removed from the avl.

  /* Command currently running? */
                if (!cmd->delay.pending) {
                        cmd->requeue = true;
                        return;
                }
1 Like

@srinivaskodali - can you provide us with a way to reproduce the issue? I've been briefly looking into it and couldn't spot anything obviously wrong.

2 Likes

@jow
Using RPCD, when i update the /etc/config/network those changes are not reflected in the system.

Steps are as follows:

  1. ubus call uci add interface '{--data--}'
  2. ubus call uci commit '{"config":"network"}'

Here RPCD will raise a config.change event to "procd" and the same has been confirmed in ubus monitor.
eventually rule_handle_command will be invoked from the trigger event(), and trigger_command_add will be invoked.

Check if the entry is available in avl tree.

 cmd = avl_find_element(&trigger_pending, data, cmd, avl);

If entry is available, assumption is cmd is already running. But, how can cmd->delay.pending be false. Ideally when timer starts this value should be true. If timer expires there should not be any entry in avl tree.

       if (cmd) {
                /* Command currently running? */
                if (!cmd->delay.pending) {
                        cmd->requeue = true;
                        return;
                }

During failure case, cmd->delay.pending is always false. And, ubus call uci commit '{"config":"network"}' always returning from above.

Upon timer expiry(delay.pending=false) entry should be removed from the avl tree.

2 Likes

@jow
I figured out that cmd entry is not getting removed from the AVL. And its keep on requeuing. I just wanted to understand what is the harm by removing this requeue logic in trigger_command_run & trigger_command_complete.
Can somebody explains the logic behind requeue

if (cmd) {
                /* Command currently running? */
                if (!cmd->delay.pending) {
                        cmd->requeue = true;
                        return;
                }

Once the issue is observed, it is impacting all the other services....
config.change event for any of the config is not working.

Ideally, after adding the task to queue, it has to remove upon completion of task and task_complete should be called to remove the entries from data.
When the task will be running continuously without expiry? If one task is pending and the following tasks will be waiting in queue for their turn? Please confirm.

The following commit is related to the same line where my debug got stuck. Is there any clue why the following changes are introduced and the bug behind this?

https://git.openwrt.org/?p=project/libubox.git;a=commit;h=89fb6136ad7484e4e8f9b618e530e098cf573665

Did you read the commit description????? It even has a direct reference to the bug report FS#3016 ...

libubox: runqueue: fix use-after-free bug

Fixes a use-after-free bug in runqueue_task_kill():

Invalid read of size 8
at runqueue_task_kill (runqueue.c:200)
by uloop_process_timeouts (uloop.c:505)
by uloop_run_timeout (uloop.c:542)
by uloop_run (uloop.h:111)
by main (tests/test-runqueue.c:126)
Address 0x5a4b058 is 24 bytes inside a block of size 208 free'd
at free
by runqueue_task_complete (runqueue.c:234)
by runqueue_task_kill (runqueue.c:199)
by uloop_process_timeouts (uloop.c:505)
by uloop_run_timeout (uloop.c:542)
by uloop_run (uloop.h:111)
by main (tests/test-runqueue.c:126)
Block was alloc'd at
at calloc
by add_sleeper (tests/test-runqueue.c:101)
by main (tests/test-runqueue.c:123)

Since commit 11e8afea (runqueue should call the complete handler from
more places) the call to the complete() callback has been moved to
runqueue_task_complete(). However in runqueue_task_kill()
runqueue_task_complete() is called before the kill() callback. This
will result in a use after free if the complete() callback frees the
task struct.

Furthermore runqueue_start_next() is already called at the end of
runqueue_task_complete(), so there is no need to call it again in
runqueue_task_kill().

The issue was that the _complete() callback frees the memory used by the
task struct, which is then read after the _complete() callback returns.

Ref: FS#3016

The issues is the child process is getting termianted and the same is not getting detected by waitpid(uloop_handle_processes). So this process is in queue always and the subsequent requests will never be triggered.

As a fix, set a run_timeout option before adding command to the queue. if the process is not available in process list by th run_timeout happened(runque_cancel_timeout), clear the processes from uloop as well from procd avl tree.

2 Likes

Thanks for the analysis @srinivaskodali. Would you mind to post a simple diff here illustrating your solution? Also did you investigate why the child process has been terminated? Was there some abnormal termination in your case?

1 Like

@jow

Unfortunately I could not figure out why the child process has been terminated. But i see the child process PID is getting removed from the uloop process list from uloop_done() api.

The following to be confirmed yet.

  1. Why the child process termination is not detected by waitpid()
  2. Who is calling uloop_done().
+Index: libubox-2020-02-27-7da66430/runqueue.c
+===================================================================
+--- libubox-2020-02-27-7da66430.orig/runqueue.c
++++ libubox-2020-02-27-7da66430/runqueue.c
+@@ -251,6 +251,11 @@ void runqueue_process_cancel_cb(struct r
+ 		type = SIGTERM;
+ 
+ -	kill(p->proc.pid, type);
++	if (kill(p->proc.pid, type) != 0) {
++		/* pid is not available after run timeout. Clean the process list */
++		uloop_process_delete(&p->proc);
++		runqueue_task_complete(&p->task);
++	}
+ }
+ 
+ void runqueue_process_kill_cb(struct runqueue *q, struct runqueue_task *t)
+Index: procd-2020-01-24-31e4b2df/service/trigger.c
+===================================================================
+--- procd-2020-01-24-31e4b2df.orig/service/trigger.c
++++ procd-2020-01-24-31e4b2df/service/trigger.c
+@@ -201,6 +201,7 @@ static void trigger_command_start(struct
+ 
+ 	cmd->proc.task.type = &trigger_command_type;
+ 	cmd->proc.task.complete = trigger_command_complete;
++	cmd->proc.task.run_timeout = 20000;
+ 	runqueue_task_add(&q, &cmd->proc.task, false);
+ }
+
1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.