I've run into an issue where rpcd
crashes at times.
To reproduce this, I created a small dummy we can call :
# Create our `ubus call foo bar` dummy.
cat << EOF > /usr/libexec/rpcd/foo
#!/usr/bin/env lua
function list ()
return print('{"bar":{}}')
end
function bar ()
local json = require 'luci.jsonc'
return print(json.stringify({
baz = "qux"
}))
end
if arg[1] == 'list' then
return list()
elseif arg[1] == 'call' then
if arg[2] == 'bar' then
return bar()
end
end
EOF
# Set the executable bit.
chmod +x /usr/libexec/rpcd/foo
# Define an ACL to allow calling our dummy via uhttpd's /ubus endpoint.
cat << EOF > /usr/share/rpcd/acl.d/foo.json
{
"unauthenticated": {
"description": "Access controls for unauthenticated requests to foo",
"read": {
"ubus": {
"foo": [ "bar" ]
}
}
}
}
EOF
# Reload rpcd so it picks up our dummy and registers it with ubus and picks up our ACL.
/etc/init.d/rpcd reload
Next, I wrote a small stress-test program to bomb rpcd
with a lot of requests. It boils down to spamming the equivalent of the following command across all available cores:
curl --data '{ "jsonrpc": "2.0", "id": 1, "method": "call", "params": [ "00000000000000000000000000000000", "foo", "bar", {}, ], }' http://192.168.27.1/ubus
Which, if everything goes well, should return:
{"jsonrpc":"2.0","id":1,"result":[0,{"baz":"qux"}]}
Next, I ran gdbserver
on the OpenWrt router and attached to it on my own machine by following the gdb instructions:
# On the OpenWrt router.
gdbserver 0.0.0.0:9000 --attach $(pidof rpcd)
# On the local machine.
./scripts/remote-gdb 192.168.27.1:9000 build_dir/target-mipsel_24kc_musl/rpcd-2020-05-26-7be1f171/rpcd
Spawning the stress-tester quickly leads to rpcd
terminating with a segmentation fault. I've observed two distinct (but undoubtedly related) stack traces:
Program received signal SIGSEGV, Segmentation fault.
free (p=0x77f55220) at src/malloc/malloc.c:476
476 if (next->psize != self->csize) a_crash();
(gdb) bt
#0 free (p=0x77f55220) at src/malloc/malloc.c:476
#1 0x77e456a9 in json_tokener_free (tok=0x77f551e0) at json_tokener.c:132
#2 0x004057bb in rpc_plugin_call_finish_cb (blob=<optimized out>, stat=<optimized out>, priv=0xbc3070) at /home/alchiadus/development/openwrt/build_dir/target-mipsel_24kc_musl/rpcd-2020-05-26-7be1f171/plugin.c:123
#3 0x0040215b in rpc_exec_reply (c=0x77f554e0, rv=rv@entry=0) at /home/alchiadus/development/openwrt/build_dir/target-mipsel_24kc_musl/rpcd-2020-05-26-7be1f171/exec.c:136
#4 0x00402211 in rpc_exec_opipe_state_cb (s=<optimized out>) at /home/alchiadus/development/openwrt/build_dir/target-mipsel_24kc_musl/rpcd-2020-05-26-7be1f171/exec.c:266
#5 0x77e9fbbd in ustream_state_change_cb (t=0x77f55610) at /home/alchiadus/development/openwrt/build_dir/target-mipsel_24kc_musl/libubox-2018-07-25-c83a84af/ustream.c:109
#6 0x77e9f207 in uloop_process_timeouts (tv=<optimized out>) at /home/alchiadus/development/openwrt/build_dir/target-mipsel_24kc_musl/libubox-2018-07-25-c83a84af/uloop.c:505
#7 uloop_run_timeout (timeout=-1) at /home/alchiadus/development/openwrt/build_dir/target-mipsel_24kc_musl/libubox-2018-07-25-c83a84af/uloop.c:542
#8 0x00401d4f in uloop_run () at /home/alchiadus/development/openwrt/staging_dir/target-mipsel_24kc_musl/usr/include/libubox/uloop.h:111
#9 main (argc=<optimized out>, argv=<optimized out>) at /home/alchiadus/development/openwrt/build_dir/target-mipsel_24kc_musl/rpcd-2020-05-26-7be1f171/main.c:120
Program received signal SIGSEGV, Segmentation fault.
free (p=0xb7f120) at src/malloc/malloc.c:476
476 if (next->psize != self->csize) a_crash();
(gdb) bt
#0 free (p=0xb7f120) at src/malloc/malloc.c:476
#1 0x00402143 in rpc_exec_reply (c=0x77ee2550, rv=5, rv@entry=0) at /home/alchiadus/development/openwrt/build_dir/target-mipsel_24kc_musl/rpcd-2020-05-26-7be1f171/exec.c:153
#2 0x00402211 in rpc_exec_opipe_state_cb (s=<optimized out>) at /home/alchiadus/development/openwrt/build_dir/target-mipsel_24kc_musl/rpcd-2020-05-26-7be1f171/exec.c:266
#3 0x77efdbbd in ustream_state_change_cb (t=0x77ee2680) at /home/alchiadus/development/openwrt/build_dir/target-mipsel_24kc_musl/libubox-2018-07-25-c83a84af/ustream.c:109
#4 0x77efd207 in uloop_process_timeouts (tv=<optimized out>) at /home/alchiadus/development/openwrt/build_dir/target-mipsel_24kc_musl/libubox-2018-07-25-c83a84af/uloop.c:505
#5 uloop_run_timeout (timeout=-1) at /home/alchiadus/development/openwrt/build_dir/target-mipsel_24kc_musl/libubox-2018-07-25-c83a84af/uloop.c:542
#6 0x00401d4f in uloop_run () at /home/alchiadus/development/openwrt/staging_dir/target-mipsel_24kc_musl/usr/include/libubox/uloop.h:111
#7 main (argc=<optimized out>, argv=<optimized out>) at /home/alchiadus/development/openwrt/build_dir/target-mipsel_24kc_musl/rpcd-2020-05-26-7be1f171/main.c:120
To rule out multi-threading issues, I also tried setting max_requests
to 1 in /etc/config/uhttpd
(and restart uhttpd
afterwards) as follows:
# Maximum number of concurrent requests.
# If this number is exceeded, further requests are
# queued until the number of running requests drops
# below the limit again.
option max_requests 1
This made no difference, I could still reproduce the aforementioned crashes.
Unfortunately I am not familiar enough with the codebase to have a clear understanding which part is responsible for freeing certain things, as it looks like it could possibly be a double-free? Does anyone have some insights or pointers to help debug and fix this?