Zerotier segmentation fault on upgrade

Model                        Linksys WRT3200ACM
Architecture	          ARMv7 Processor rev 1 (v7l)
Target Platform          mvebu/cortexa9
Firmware Version      OpenWrt 21.02.1 r16325-88151b8303
Kernel Version	   5.4.154

root@OpenWrt:~# strace zerotier-cli info

execve("/usr/bin/zerotier-cli", ["zerotier-cli", "info"], 0xbebb7dc4 /* 13 vars */) = 0
set_tls(0xb6f215d8)                     = 0
set_tid_address(0xb6f2218c)             = 11293
open("/etc/ld-musl-armhf.path", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/lib/libminiupnpc.so.17", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libminiupnpc.so.17", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/libminiupnpc.so.17", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=40963, ...}) = 0
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\214\32\0\0004\0\0\0"..., 936) = 936
mmap2(NULL, 110592, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xb6e8b000
mmap2(0xb6ea4000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x9000) = 0xb6ea4000
close(3)                                = 0
open("/lib/libnatpmp.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libnatpmp.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/libnatpmp.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=8195, ...}) = 0
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0(\10\0\0004\0\0\0"..., 936) = 936
mmap2(NULL, 77824, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xb6e78000
mmap2(0xb6e89000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x1000) = 0xb6e89000
close(3)                                = 0
open("/lib/libstdc++.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libstdc++.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/libstdc++.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
fstat64(3, {st_mode=S_IFREG|0755, st_size=743655, ...}) = 0
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\310v\5\0004\0\0\0"..., 936) = 936
mmap2(NULL, 819200, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xb6db0000
mmap2(0xb6e70000, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0xb0000) = 0xb6e70000
mmap2(0xb6e76000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb6e76000
close(3)                                = 0
open("/lib/libgcc_s.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=40963, ...}) = 0
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0000E\0\0004\0\0\0"..., 936) = 936
mmap2(NULL, 110592, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xb6d95000
mmap2(0xb6dae000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x9000) = 0xb6dae000
close(3)                                = 0
mprotect(0xb6ea4000, 4096, PROT_READ)   = 0
mprotect(0xb6e89000, 4096, PROT_READ)   = 0
mprotect(0xb6e70000, 16384, PROT_READ)  = 0
mprotect(0xb6dae000, 4096, PROT_READ)   = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x47f854} ---
+++ killed by SIGSEGV +++
Segmentation fault

First of all I am not experienced with either debuging with gdb or building openwrt.
I followed the articles:

Configured and built openwrt with (on 21.02.2 with "git checkout v21.02.2") :

CONFIG_TARGET_ipq40xx=y
CONFIG_TARGET_ipq40xx_generic=y
CONFIG_TARGET_ipq40xx_generic_DEVICE_glinet_gl-b1300=y
CONFIG_PACKAGE_ip-tiny=m
CONFIG_PACKAGE_kmod-tun=m
CONFIG_PACKAGE_libminiupnpc=m
CONFIG_PACKAGE_libnatpmp=m
CONFIG_PACKAGE_libstdcpp=m
CONFIG_PACKAGE_zerotier=m

Modified (before compilation) the makefile with this:

TARGET_CFLAGS += -ggdb3

Started gdbserver on router with:

root@gl-b1300:~# gdbserver :9000 /usr/bin/zerotier-one
Process /usr/bin/zerotier-one created; pid = 18070
Listening on port 9000
Remote debugging from host ::ffff:10.10.XX.2, port 46516

Started gdb on source tree:

bzcanli@core:~/openwrt$ ./scripts/remote-gdb 10.10.XX.1:9000 ./build_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/ZeroTierOne-1.6.6/zerotier-one
Using target arm_cortex-a7+neon-vfpv4 (musl, eabi)
GNU gdb (GDB) 10.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=arm-openwrt-linux-muslgnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./build_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/ZeroTierOne-1.6.6/zerotier-one...
(No debugging symbols found in ./build_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/ZeroTierOne-1.6.6/zerotier-one)
0xb6fda6e0 in _dlstart () from /home/bzcanli/openwrt/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/lib/ld-musl-armhf.so.1
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0xb6fdbeec in do_relocs (dso=dso@entry=0xb6ffd6d0 <app>, rel=0x403e40, rel_size=4208522, stride=stride@entry=2) at ldso/dynlink.c:423
423                             *reloc_addr = (size_t)base + addend;
(gdb) bt
#0  0xb6fdbeec in do_relocs (dso=dso@entry=0xb6ffd6d0 <app>, rel=0x403e40, rel_size=4208522, stride=stride@entry=2) at ldso/dynlink.c:423
#1  0xb6fdc120 in reloc_all (p=p@entry=0xb6ffd6d0 <app>) at ldso/dynlink.c:1327
#2  0xb6fddb4c in __dls3 (sp=0xb6f909b4) at ldso/dynlink.c:1906
#3  0xb6fdd07c in __dls2 (base=<optimized out>, sp=0xbefffdd0) at ldso/dynlink.c:1650
#4  0xb6fda700 in _dlstart () from /home/bzcanli/openwrt/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/lib/ld-musl-armhf.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Any useful information to troubleshoot? More suggestions?

Note that the build_dir contains stripped binaries, while the staging_dir contains the originals before stripping.

I have remote debugging in several ways:

  • providing directly the unstripped binary:
    ./scripts/remote-gdb 192.168.1.2:9000 staging_dir/target-mips_34kc_musl-1.1.10/root-ar71xx/usr/sbin/collectd

  • debugging a crash core file:
    ulimit -c unlimited in router, to make sure that cores can be stored
    Analysing the core, once it is transferred to the buildhost
    ./build_dir/toolchain-mips_34kc_gcc-4.8-linaro_musl-1.1.10/gdb-7.8/gdb/gdb ./staging_dir/target-mips_34kc_musl-1.1.10/root-ar71xx/usr/sbin/collectd ./collectd.3328.11.1439752039.core

The gdb command was:

./scripts/remote-gdb 10.10.XX.1:9000 staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/usr/bin/zerotier-one

Then the remode gdb session becomes:

bzcanli@core:~/openwrt$ ./scripts/remote-gdb 10.10.10.1:9000 staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/usr/bin/zerotier-one
Using target arm_cortex-a7+neon-vfpv4 (musl, eabi)
GNU gdb (GDB) 10.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=arm-openwrt-linux-muslgnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/usr/bin/zerotier-one...
(No debugging symbols found in staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/usr/bin/zerotier-one)
0xb6fda6e0 in _dlstart ()
   from /home/bzcanli/openwrt/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/lib/ld-musl-armhf.so.1
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0xb6fdbeec in do_relocs (dso=dso@entry=0xb6ffd6d0 <app>, rel=0x403e40, rel_size=4208522, stride=stride@entry=2) at ldso/dynlink.c:423
423                             *reloc_addr = (size_t)base + addend;
(gdb) bt
#0  0xb6fdbeec in do_relocs (dso=dso@entry=0xb6ffd6d0 <app>, rel=0x403e40, rel_size=4208522, stride=stride@entry=2) at ldso/dynlink.c:423
#1  0xb6fdc120 in reloc_all (p=p@entry=0xb6ffd6d0 <app>) at ldso/dynlink.c:1327
#2  0xb6fddb4c in __dls3 (sp=0xb6f909b4) at ldso/dynlink.c:1906
#3  0xb6fdd07c in __dls2 (base=<optimized out>, sp=0xbefffdc0) at ldso/dynlink.c:1650
#4  0xb6fda700 in _dlstart ()
   from /home/bzcanli/openwrt/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/lib/ld-musl-armhf.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Transfeted the core to buildhost source tree.
Executed this command:

 ./build_dir/toolchain-arm_cortex-a7+neon-vfpv4_gcc-8.4.0_musl_eabi/gdb-10.1/gdb/gdb ./staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/usr/bin/zerotier-one ./zerotier-one.22656.11.1648474736.core

The result is:

bzcanli@core:~/openwrt$ ./build_dir/toolchain-arm_cortex-a7+neon-vfpv4_gcc-8.4.0_musl_eabi/gdb-10.1/gdb/gdb ./staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/usr/bin/zerotier-one ./zerotier-one.22656.11.1648474736.core
GNU gdb (GDB) 10.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=arm-openwrt-linux-muslgnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/usr/bin/zerotier-one...
(No debugging symbols found in ./staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/usr/bin/zerotier-one)

warning: Can't open file /usr/bin/zerotier-one during file-backed mapping note processing

warning: Can't open file /lib/libgcc_s.so.1 during file-backed mapping note processing

warning: Can't open file /usr/lib/libstdc++.so.6.0.25 during file-backed mapping note processing

warning: Can't open file /usr/lib/libnatpmp.so.20150609 during file-backed mapping note processing

warning: Can't open file /usr/lib/libminiupnpc.so.2.2.1 during file-backed mapping note processing

warning: Can't open file /lib/libc.so during file-backed mapping note processing
[New LWP 22656]
Core was generated by `zerotier-one'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0xb6fbbeec in ?? ()
(gdb) bt
#0  0xb6fbbeec in ?? ()
#1  0xb6fbc120 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

You still get complaint that no debugging symbols in staging. I guess that you need to enable the debugging option in build options, so that you get the debug symbols properly into staging. "Global build settings" in menuconfig, option "Compile packages with debugging info".

And if I remember right, it is possible to tell gdb the correct library path.

(So far that reloc_addr sounds more like generic linking routines, not zerotier.)

See a more recent example, where I used this inside gdb

set solib-search-path ./staging_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/usr/lib/

to tell gdb, where it should find the other loaded libraries.

I read these:

https://oldwiki.archive.openwrt.org/doc/devel/gdb
https://openwrt.org/docs/guide-developer/gdb
https://forum.archive.openwrt.org/viewtopic.php?id=52415

It was hard for me to create unstriped binary.
I tried to removed "-Os" from include/target.mk. Still stripped binary.
I tried to set Global build settings > Compile packages with debugging info (Symbol: DEBUG [=y].) Still stripped binary.
I tried to set Symbol: Binary stripping method (none) (NO_STRIP [=y]) . Still stripped binary
Added TARGET_CFLAGS += -ggdb3 to zerotier make file.Observed while compilation that -ggdb3 is there. Still stripped binary.

Whatever I tried it called "arm-openwrt-linux-muslgnueabi-strip" at the end.

I 'm sure its not appropriate but (and I'm sure there could be more appropriate ways to to this);
I replaced binary /staging_dir/toolchain-arm_cortex-a7+neon-vfpv4_gcc-8.4.0_musl_eabi/bin/arm-openwrt-linux-muslgnueabi-strip with /bin/true symbolic link.
I replaced symbolic link /staging_dir/toolchain-arm_cortex-a7+neon-vfpv4_gcc-8.4.0_musl_eabi/bin/arm-openwrt-linux-strip with /bin/true symbolic link.

At that stage I got unstripped binary:

bzcanli@core:~/openwrt$ file staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/usr/bin/zerotier-one                      staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/usr/bin/zerotier-one: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-armhf.so.1, with debug_info, not stripped

I don't understand the difference in details but mainly I think I just get rid of the message about no debugging symbols.
And the remote gdb session becomes like this:

bzcanli@core:~/openwrt$ ./scripts/remote-gdb 10.10.10.1:9000 staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/usr/bin/zerotier-one
Using target arm_cortex-a7+neon-vfpv4 (musl, eabi)
GNU gdb (GDB) 10.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=arm-openwrt-linux-muslgnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/usr/bin/zerotier-one...
(gdb) run
Starting program: /home/bzcanli/openwrt/staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/usr/bin/zerotier-one

Program received signal SIGSEGV, Segmentation fault.
0xb6fdbeec in do_relocs (dso=dso@entry=0xb6ffd6d0 <app>, rel=0x403e40, rel_size=4208522, stride=stride@entry=2) at ldso/dynlink.c:423
423                             *reloc_addr = (size_t)base + addend;
(gdb) bt
#0  0xb6fdbeec in do_relocs (dso=dso@entry=0xb6ffd6d0 <app>, rel=0x403e40, rel_size=4208522, stride=stride@entry=2) at ldso/dynlink.c:423
#1  0xb6fdc120 in reloc_all (p=p@entry=0xb6ffd6d0 <app>) at ldso/dynlink.c:1327
#2  0xb6fddb4c in __dls3 (sp=0xb6f909b4) at ldso/dynlink.c:1906
#3  0xb6fdd07c in __dls2 (base=<optimized out>, sp=0xbefffdc0) at ldso/dynlink.c:1650
#4  0xb6fda700 in _dlstart ()
   from /home/bzcanli/openwrt/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/lib/ld-musl-armhf.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

Will core with gdb may give you more details? Should I go on with core on gdb?

Any news on getting Zerotier fixed?

Also I notice that Zerotier v1.87 for Windows and Linux is now available on Zerotier Website.

Solved after zerotier 1.8.4-2 upgrade.

1 Like

Zerotier 1.8.4-2 fixed my problem.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.