[Workaround] GL-AR150: No DHCP if LAN cable is not plugged during boot

Hi :slight_smile:

I recently flashed the current OpenWRT (18.06.2 ) on a GL-AR150, and it's my first try of the "real" OpenWRT (apart from all kinds of Freifunk routers I'm taking care of).

Unfortunately, I ran into a – for my use case – severe problem/bug I reported at https://bugs.openwrt.org/index.php?do=details&task_id=2145 : no IPv4 DHCP addresses are offered if no LAN cable is plugged when the device boots (cf. the bug report).

I'm pretty sure this is an issue that should (and will) be fixed, but my problem is that – at the moment – I simply can't use the device at all, because I do need IPv4 addresses and the router is intended to be booted without a LAN cable attached.

I don't want to reflash the original firmware if I can avoid it, so here's my question: Can anybody tell me how to fix this for now, until a release fixing this issue will hopefully be done?

Thanks for all help in advance!

EDIT:

Here's the quintessenece of the longish discussion below:

Apparently, this is an issue caused by the dnsmasq init script doing unneccessary checks that fail and prevent the dhcp range to be added to the dnsmasq config, which then results in dnsmasq not offering dhcpv4 addresses.

The simplest workaround is to set the "force" option to the "lan" section of /etc/config/dhcp. This one is also confirmed to be used by the original firmware used by GL.iNet. (as posted by vgaetera at [Workaround] GL-AR150: No DHCP if LAN cable is not plugged during boot ).

Put an init.d script that restart or stop and restart the service you are referring to at the end of the router boot up ? Could this help ?

I already tried this (cf. the bug report): I put /etc/init.d/dnsmasq restart into /etc/rc.local, but it didn't help. Dnsmasq actually restarts after the boot sequence finishes, but still, no IPv4 DHCP range is set and no IPv4 addresses are offered.

It looks like you've confirmed the first thing, is dnsmasq starting at all, and put in a quick hack to make sure that it isn't a timing issue.

Is the "LAN" interface "up" without the cable plugged? (ip link should show link status for the interface and the bridge)

Dnsmasq is started in each case. It is even restarted once during the boot process, this is the case for both LAN cable plugged and unplugged (if you have some spare time, you can have a look at the two bootlogs I posted :wink:

Seems like both the LAN device and the bridge are up, no matter if a LAN cable is plugged when booting:

LAN cable plugged (IPv4 DHCP working):

root@OpenWrt:~# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN qlen 1000
    link/ether e4:95:6e:44:a8:3a brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br-lan state UP qlen 1000
    link/ether e4:95:6e:44:a8:3a brd ff:ff:ff:ff:ff:ff
4: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether e4:95:6e:44:a8:3a brd ff:ff:ff:ff:ff:ff
5: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether e4:95:6e:44:a8:3a brd ff:ff:ff:ff:ff:ff

LAN cable unplugged (no IPv4 DHCP):

root@OpenWrt:~# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN qlen 1000
    link/ether e4:95:6e:44:a8:3a brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br-lan state UP qlen 1000
    link/ether e4:95:6e:44:a8:3a brd ff:ff:ff:ff:ff:ff
4: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether e4:95:6e:44:a8:3a brd ff:ff:ff:ff:ff:ff
5: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether e4:95:6e:44:a8:3a brd ff:ff:ff:ff:ff:ff
1 Like

I'm quite inexperienced with OpenWRT, but it seems that dnsmasq does not use the "normal" configuration file directly, but the init script assembles one during startup.

So may I suppose that somewhere in this (quite complex) init script, there's some check that fails and should not which prevents the DHCPv4 range to be added to the "work" config file, which, in consequence, causes dnsmasq to run but to not offer DHCPv4 addresses?

1 Like

Yes, that's my understanding as well. Perhaps worth comparing what is present when things are working with when they're not.

This command might help find the auto-generated files (from an AR300M-Lite just as an example, not an AR150)

root@OpenWrt:~# find /tmp/ -name '*dns*' -type f -exec ls -l {} \;
-rw-r--r--    1 root     root           646 Feb 28 18:07 /tmp/etc/dnsmasq.conf.cfg01411c
-rw-r--r--    1 root     root             0 Feb 28 18:07 /tmp/lock/procd_dnsmasq.lock
-rw-r--r--    1 root     root             2 Feb 28 18:07 /tmp/run/dnsmasq.cfg01411c.br-lan.dhcp
-rw-r--r--    1 dnsmasq  dnsmasq          5 Feb 28 18:07 /tmp/run/dnsmasq/dnsmasq.cfg01411c.pid

I already found /tmp/etc/dnsmasq.conf.cfg01411c, which is apparently the config file actually used by dnsmasq, and compared the "plugged" and "unplugged" version.

The only difference is that the "plugged" version contains the line

dhcp-range=set:lan,192.168.1.100,192.168.1.249,255.255.255.0,12h

whereas the "unplugged" version misses this line (and thus does not offer DHCPv4 addresses).

1 Like

/etc/init.d/dnsmasq

Might be related to this:

dhcp_check() {
	local ifname="$1"
	local stamp="${BASEDHCPSTAMPFILE_CFG}.${ifname}.dhcp"
	local rv=0

	[ -s "$stamp" ] && return $(cat "$stamp")

	# If there's no carrier yet, skip this interface.
	# The init script will be called again once the link is up
	case "$(devstatus "$ifname" | jsonfilter -e @.carrier)" in
		false) return 1;;
	esac

	udhcpc -n -q -s /bin/true -t 1 -i "$ifname" >&- && rv=1 || rv=0

	[ $rv -eq 1 ] && \
		logger -t dnsmasq \
			"found already running DHCP-server on interface '$ifname'" \
			"refusing to start, use 'option force 1' to override"

	echo $rv > "$stamp"
	return $rv
}

(I have not followed the logic through on this other than seeing "If there's no carrier yet, skip this interface")

I would have thought that the hotplug scripts would update this on interface up ("The init script will be called again once the link is up"). Understanding the logic in /etc/init.d/dnsmasq and the hotplug scripts would be how I would go about tracing this down.

logger can be used to trace the flow of these scripts. They can be edited on a "live" install like any other file.

is this linux shell scripting language ?

how to decode it ? I mean interpret easily ?

Yes, #!/bin/sh

cant undestand

'jsonfilter' bit

isnt json a jawa related stuff ?

JSON is a text-based data-representation format, like XML, but different (and, to many humans, much more readable).

You're in the "guts" of an OS with this, and a good understanding of how shell scripts work is assumed.

jsonfilter -e @.carrier is basically checking for existence of a carrier-present indication on the device.

The relevant line is added by dhcp_add():

 450: dhcp_add() {
 451:     local cfg="$1"
 ...
 493:     config_get dhcpv4 "$cfg" dhcpv4
 ...    
 529:     if [ "$dhcpv4" != "disabled" ] ; then
 530:         xappend "--dhcp-range=$tags$nettag$START,$END,$NETMASK,$leasetime${options:+ $options}"
 540:     fi

which is called by dnsmasq_start():

 732: dnsmasq_start()
 ...
1000:     elif [ "$DNSMASQ_DHCP_VER" -gt 0 ] ; then
1001:         [ -n "$BOOT" ] || config_foreach filter_dnsmasq dhcp dhcp_add "$cfg"
1002:     fi

Actually, the init script is called twice in both cases (cf. the bootlogs I posted in the bug).

I don't see how a LAN cable being plugged or not influences this …

Looks like dhcp_add() gets short-circuited by a no-carrier condition at

    485         [ $force -gt 0 ] || dhcp_check "$ifname" || return 0

as dhcp_check() will return 1 in that case.

Still puzzling why the later call, assuming the carrier is detected, fails to bring up DHCP. For that matter, why the presence of a bridged, up, wireless adapter doesn't start DHCP.

ok thanks found

https://github.com/benschw/jsonfilter

I'll try to figure out about carrier later

thanks as always

so just change return 0 to return 1 and the router would behave as @l3u needs ?

could be the timing the ethernet is checked by the init.d script that is too late or early

in regards to the wireless adapter power up ? but openwrt by default has wifi off ?

what happen if openwrt is flashed to router with the wifi on instead of wifi off as standard set up ?

This happens no matter if Wifi is on or off. It only depends on the LAN cable being plugged or not.

Maybe, this could really be some timing problem … here's a part of the bootlog if a LAN cable is plugged:

Mon Feb 25 11:10:42 2019 kern.info kernel: [   27.288895] br-lan: port 1(eth1) entered disabled state
Mon Feb 25 11:10:43 2019 kern.info kernel: [   27.990110] eth1: link up (1000Mbps/Full duplex)
Mon Feb 25 11:10:43 2019 kern.info kernel: [   27.993339] br-lan: port 1(eth1) entered blocking state
Mon Feb 25 11:10:43 2019 kern.info kernel: [   27.998492] br-lan: port 1(eth1) entered forwarding state
Mon Feb 25 11:10:43 2019 daemon.notice netifd: Network device 'eth1' link is up
Mon Feb 25 11:10:43 2019 daemon.notice netifd: bridge 'br-lan' link is up
Mon Feb 25 11:10:43 2019 daemon.notice netifd: Interface 'lan' has link connectivity
Mon Feb 25 11:10:43 2019 kern.info kernel: [   28.048892] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
Mon Feb 25 11:10:43 2019 daemon.info procd: - init complete -
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq[726]: exiting on receipt of SIGTERM
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq[1210]: started, version 2.80 cachesize 150
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq[1210]: DNS service limited to local subnets
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq[1210]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP no-DHCPv6 no-Lua TFTP no-conntrack no-ipset no-auth no-DNSSEC no-ID loop-detect inotify dumpfile
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq-dhcp[1210]: DHCP, IP range 192.168.1.100 -- 192.168.1.249, lease time 12h
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq[1210]: using local addresses only for domain test
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq[1210]: using local addresses only for domain onion
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq[1210]: using local addresses only for domain localhost
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq[1210]: using local addresses only for domain local
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq[1210]: using local addresses only for domain invalid
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq[1210]: using local addresses only for domain bind
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq[1210]: using local addresses only for domain lan
Mon Feb 25 11:10:48 2019 daemon.warn dnsmasq[1210]: no servers found in /tmp/resolv.conf.auto, will retry
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq[1210]: read /etc/hosts - 4 addresses
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq[1210]: read /tmp/hosts/dhcp.cfg01411c - 2 addresses
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq-dhcp[1210]: read /etc/ethers - 0 addresses
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq[1210]: read /etc/hosts - 4 addresses
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq[1210]: read /tmp/hosts/dhcp.cfg01411c - 2 addresses
Mon Feb 25 11:10:48 2019 daemon.info dnsmasq-dhcp[1210]: read /etc/ethers - 0 addresses
Mon Feb 25 11:11:26 2019 daemon.notice netifd: Network device 'eth1' link is down
Mon Feb 25 11:11:26 2019 kern.info kernel: [   71.409043] eth1: link down
Mon Feb 25 11:11:26 2019 kern.info kernel: [   71.410881] br-lan: port 1(eth1) entered disabled state
Mon Feb 25 11:11:28 2019 daemon.notice netifd: bridge 'br-lan' link is down
Mon Feb 25 11:11:28 2019 daemon.notice netifd: Interface 'lan' has link connectivity loss
Mon Feb 25 11:11:28 2019 kern.info kernel: [   72.939935] eth1: link up (1000Mbps/Full duplex)
Mon Feb 25 11:11:28 2019 kern.info kernel: [   72.943174] br-lan: port 1(eth1) entered blocking state
Mon Feb 25 11:11:28 2019 kern.info kernel: [   72.948341] br-lan: port 1(eth1) entered forwarding state

and here the same if not:

Mon Feb 25 11:10:42 2019 kern.info kernel: [   27.288888] br-lan: port 1(eth1) entered disabled state
Mon Feb 25 11:10:43 2019 daemon.info procd: - init complete -
Mon Feb 25 11:10:45 2019 daemon.info dnsmasq[727]: read /etc/hosts - 4 addresses
Mon Feb 25 11:10:45 2019 daemon.info dnsmasq[727]: read /tmp/hosts/dhcp.cfg01411c - 0 addresses
Mon Feb 25 11:10:54 2019 kern.info kernel: [   38.629933] eth1: link up (1000Mbps/Full duplex)
Mon Feb 25 11:10:54 2019 kern.info kernel: [   38.633169] br-lan: port 1(eth1) entered blocking state
Mon Feb 25 11:10:54 2019 kern.info kernel: [   38.638318] br-lan: port 1(eth1) entered forwarding state
Mon Feb 25 11:10:54 2019 daemon.notice netifd: Network device 'eth1' link is up
Mon Feb 25 11:10:54 2019 kern.info kernel: [   38.646337] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
Mon Feb 25 11:10:54 2019 daemon.notice netifd: bridge 'br-lan' link is up
Mon Feb 25 11:10:54 2019 daemon.notice netifd: Interface 'lan' has link connectivity
Mon Feb 25 11:11:00 2019 daemon.notice netifd: Network device 'eth1' link is down
Mon Feb 25 11:11:00 2019 kern.info kernel: [   44.749048] eth1: link down
Mon Feb 25 11:11:00 2019 kern.info kernel: [   44.750885] br-lan: port 1(eth1) entered disabled state
Mon Feb 25 11:11:01 2019 daemon.notice netifd: bridge 'br-lan' link is down
Mon Feb 25 11:11:01 2019 daemon.notice netifd: Interface 'lan' has link connectivity loss
Mon Feb 25 11:11:01 2019 kern.info kernel: [   46.279931] eth1: link up (1000Mbps/Full duplex)
Mon Feb 25 11:11:01 2019 kern.info kernel: [   46.283167] br-lan: port 1(eth1) entered blocking state
Mon Feb 25 11:11:01 2019 kern.info kernel: [   46.288315] br-lan: port 1(eth1) entered forwarding state

seems like dnsmasq is not restarted in this case … but still: Why does restarting it via /etc/rc.local, at the very end of the boot process, not fix it?!