Nslookup ipv6 in lede 17.01.1 (fix: update busybox to 1.25.1-4)

I single stepped further and your determination look right.

Breakpoint 1, nslookup_main (argc=argc@entry=4, argv=argv@entry=0xbefffdd4)
    at networking/nslookup_lede.c:744
744			ptr = llist_pop(&type_strings);
(gdb) s
752			for (c = 0; qtypes[c].name; c++)
(gdb) s
744			ptr = llist_pop(&type_strings);
(gdb) s
llist_pop (head=0xbefffcd4, head@entry=0xbefffccc) at libbb/llist.c:39
39		llist_t *temp = *head;
(gdb) s
37	{
(gdb) s
41		if (temp) {
(gdb) s
38		void *data = NULL;
(gdb) s
41		if (temp) {
(gdb) s
42			data = temp->data;
(gdb) s
43			*head = temp->link;
(gdb) s
44			free(temp);
(gdb) s

Program received signal SIGILL, Illegal instruction.
free (p=0xbefffede) at src/malloc/malloc.c:465
465			if (extra & 1) a_crash();

my bad. I just noticed that I used the line numbers from the patch, not the actual C file...

Not quite sure what to make of this. p *type_strings always seems to give memory access error:

Breakpoint 1, nslookup_main (argc=argc@entry=4, argv=argv@entry=0xbefffdd4)
    at networking/nslookup_lede.c:743
743		while (type_strings) {
(gdb) p type_strings
$1 = (llist_t *) 0xbefffede
(gdb) p *type_strings
$2 = {link = 0x41414141, 
  data = 0x6f6f6700 <error: Cannot access memory at address 0x6f6f6700>}
(gdb) s
744			ptr = llist_pop(&type_strings);
(gdb) p type_strings
$3 = (llist_t *) 0xbefffede
(gdb) p *type_strings
$4 = {link = 0x41414141, 
  data = 0x6f6f6700 <error: Cannot access memory at address 0x6f6f6700>}
(gdb) s
752			for (c = 0; qtypes[c].name; c++)
(gdb) p *type_strings
$5 = {link = 0x41414141, 
  data = 0x6f6f6700 <error: Cannot access memory at address 0x6f6f6700>}
(gdb) s
744			ptr = llist_pop(&type_strings);
(gdb) s
llist_pop (head=0xbefffcd4, head@entry=0xbefffccc) at libbb/llist.c:39
39		llist_t *temp = *head;
(gdb) s
37	{
(gdb) s
41		if (temp) {
(gdb) s
38		void *data = NULL;
(gdb) s
41		if (temp) {
(gdb) s
42			data = temp->data;
(gdb) s
43			*head = temp->link;
(gdb) s
44			free(temp);
(gdb) s

Program received signal SIGILL, Illegal instruction.
free (p=0xbefffede) at src/malloc/malloc.c:465

So it seems the list is incorrectly built by busybox' own getopt32() (unlikely) or corrupted by some other code (more likely). Can you step through getopt32() and in particular look at the llist_add_to_end() invocations? Maybe start with a breakpoint on llist_add_to_end and see what it receives as input and what it allocates.

@jow

I have tried setting breakpoints there, but that line with llist is never reached.

It looks like busybox in 17.01 decides NOT to use llist type for the arguments at all. It determines the option type to be string (I think) and does not enter the line inside PARAM_LIST type that contains the llist_add_to_end(). Both PARAM_LIST and PARAM_INT forks are passed so it ends with the default type PARAM_STRING (=0).

(this attempt was using q=AAAA, including "=")

581			if (optarg) {
(gdb) p optarg
$2 = 0xbefffedd "=AAAA"
(gdb) p on_off->param_type
$3 = 0
(gdb) s
582				if (on_off->param_type == PARAM_LIST) {
(gdb) s
584				} else if (on_off->param_type == PARAM_INT) {
(gdb) s
587				} else if (on_off->optarg) {
(gdb) s
588					*(char **)(on_off->optarg) = optarg;
(gdb) s
556		while ((c = getopt_long(argc, argv, applet_opts,
(gdb) s
...
608		option_mask32 = flags;
(gdb) s
609		return flags;
(gdb) s
615	}
(gdb) s
nslookup_main (argc=argc@entry=3, argv=argv@entry=0xbefffdd4)
    at networking/nslookup_lede.c:732
732		unsigned int types = 0;
(gdb) p type_strings
$5 = (llist_t *) 0xbefffedd
(gdb) p *type_strings
$6 = {link = 0x4141413d, 
  data = 0x6f670041 <error: Cannot access memory at address 0x6f670041>}
(gdb) 

That actually seems to match the earlier debug, where the link address was link = 0x41414141, which is just AAAA instead of a good memory reference. (This time the "=" has been embedded into it 0x4141413d address and the remaining part of the string"A\0" is at the end of 0x6f670041.) So busybox getopt32 has not returned a llist but a string.

I wonder if the getopt32 option definition string needs edting.

@jow
This might be due to busybox 1.25.1 in stable 17.01 vs 1.26.2 in master.

EDIT:
specifically, this getopt32.c commit introducing the syntax "o:*" is missing from 1.25.1 in 17.01. That syntax is used for the "q" option in nslookup_lede.
https://git.busybox.net/busybox/commit/libbb/getopt32.c?h=1_26_2&id=237bedd499c58034a1355484d6d4d906f0180308

Full commit edits quite many applets, but I am not sure if we need to change them in case we want to backport this to 1.25.1
https://git.busybox.net/busybox/commit/?h=1_26_2&id=237bedd499c58034a1355484d6d4d906f0180308

Indeed, I just came to the same conclusion :slight_smile:

Can you try the patch below? It should fix the issue on both busybox versions:

diff --git a/package/utils/busybox/patches/230-add_nslookup_lede.patch b/package/utils/busybox/patches/230-add_nslookup_lede.patch
index 51034e6bc4..334c0f4a65 100644
--- a/package/utils/busybox/patches/230-add_nslookup_lede.patch
+++ b/package/utils/busybox/patches/230-add_nslookup_lede.patch
@@ -760,6 +760,7 @@ index 000000000..6f1f86502
 +      applet_long_options = nslookup_longopts;
 +#endif
 +
++      opt_complementary = "q::";
 +      opts = getopt32(argv, "+q:*p:+r:+t:+s",
 +                      &type_strings, &default_port,
 +                      &default_retry, &default_timeout);

I pushed two commits to my staging tree, the first commit (http://git.lede-project.org/b10608c) addresses the crash issue when using the -q flag and the second commit (https://git.lede-project.org/5dbb90e) aligns the behaviour with busybox nslookup, meaning a mere nslookup example.org will lookup both IPv4 and IPv6 addresses and display them in a uniform way.

I have tested the first commit in my build and it seems to work in 17.01. I applied the change manually based on your input here.

I will test the second commit later today.
Is the second commit supposed to fix the missing "-q ANY" functionality, too?

Reboot (17.01-SNAPSHOT, r3326-1ab41265c3)

root@LEDE:~# nslookup -q AAAA google.com
Server:         127.0.0.1
Address:        127.0.0.1#53

Non-authoritative answer:
google.com      has AAAA address 2a00:1450:400f:808::200e

root@LEDE:~# nslookup google.com
Server:         127.0.0.1
Address:        127.0.0.1#53

Non-authoritative answer:
Name:   google.com
Address: 172.217.22.174
Name:   google.com
Address: 172.217.22.174
Name:   google.com
Address: 172.217.22.174

root@LEDE:~# nslookup -q ANY google.com
Server:         127.0.0.1
Address:        127.0.0.1#53

Non-authoritative answer:
*** Can't find google.com: No answer

ANY is a specific DNS query type which instructs the NS to respond with anything it knows about a domain but due to the size limitations of UDP packets it is often not useful and some servers might not even answer such queries.

What the 2nd commit does is automatically issuing both A and AAAA queries in parallel and displaying the received IPv4 and IPv6 addresses in a single list, to behave exactly like the old nslookup applet did.

This special behaviour is triggered when no query type (-q) is explicitly specified and when the given name argument is neither an IPv4 or IPv6 address.

You are right. ANY works, but not with dnsmasq as the DNS server...

root@LEDE:~# nslookup -q ANY google.com
Server:         127.0.0.1
Address:        127.0.0.1#53

Non-authoritative answer:
*** Can't find google.com: No answer

root@LEDE:~# nslookup -q ANY google.com 8.8.8.8
Server:         8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
Name:   google.com
Address: 172.217.22.174
Name:   google.com
Address: 172.217.22.174
Name:   google.com
Address: 172.217.22.174
google.com      has AAAA address 2a00:1450:400f:807::200e
google.com      mail exchanger = 20 alt1.aspmx.l.google.com
...

Is the second commit ok? It seems to end abruptly.
The 230-... patch seems to end in a middle of a function.

sorry for noise if I am wrong, but it looks strange.

No you're right; I fixed the commits and updated the links in my original post.

Seems to work. This is from a 17.01 build:

root@LEDE:~# nslookup google.com
Server:         127.0.0.1
Address:        127.0.0.1#53

Name:      google.com
Address 1: 172.217.22.174
Address 2: 172.217.22.174
Address 3: 172.217.22.174
Address 4: 2a00:1450:400f:807::200e

root@LEDE:~# nslookup -q AAAA lede-project.org
Server:         127.0.0.1
Address:        127.0.0.1#53

Non-authoritative answer:
lede-project.org        has AAAA address 2a03:b0c0:3:d0::1af1:1

Thanks. I see that commit was pushed to master and the 17.01 branch. Would this be backported to 17.01.1 ?

It will be part of 17.01.2. If you have lots of free space (~1MB) you can try installing the updated busybox package with opkg.

I actually failed in that when we debugged the issue two days ago...

Busybox package is marked as "essential" in opkg data and trying to reinstall .ipk with the same version number while overriding that restriction bricked my router two days ago. (I ended up reflashing the whole firmware via OEM TFTP recovery)

Based on your answer above, it should have succeeded.

EDIT:
you are right, upgrading to newer version works.
manually installed 1.25.1-4 over 1.25.1-3 and went smoothly:

root@LEDE:~# opkg install /tmp/busybox_1.25.1-4_arm_cortex-a15_neon-vfpv4.ipk
Upgrading busybox on root from 1.25.1-3 to 1.25.1-4...
Configuring busybox.

Thanks. I was able to upgrade using opkg upgrade. Everything works well now.

1 Like