Upgraded 19.07 to 21.02 and router stopped working

I have a 19.07 LXD installation of OpenWRT serving as router for my network. Seeing as now there is an official way to install it in LXD (as opposed to a hacky way I used before), I did so. I also migrated lxd config and then restored openwrt config backup onto new 21.02 lxd.

The container start just fine and lxc list shows it gets the correct DHCP IP from ISP and static IP on lan interface. I am able to login into Luci and the router can ping both ways (internet addresses and LAN), but LAN machines (including LXD host) are totally cut-off from the internet. They can ping only as far as router's internal and external IP (and each other):

Pinging 8.8.8.8 gives no result, and pinging google.com returns

ping: gooogle.com: Name or service not known

Here are the key config files:

# ls -l /etc/config/
-rw-------    1 root     root          2479 Feb 11 01:32 dhcp
-rw-------    1 root     root          1000 Jun 14  2021 dhcp-opkg
-rw-------    1 root     root            86 Jun 15  2021 dropbear
-rw-------    1 root     root          6126 Feb 11 01:03 firewall
-rw-------    1 root     root           862 Dec  6  2020 luci
-rw-r--r--    1 root     root           687 Oct  2 17:41 luci-opkg
-rw-------    1 root     root           675 Feb 11 00:59 network
-rw-------    1 root     root           167 Feb 11 00:58 rpcd
-rw-------    1 root     root           423 Oct 27 09:23 system
-rw-r--r--    1 root     root           807 Oct  2 17:41 ucitrack
-rw-------    1 root     root          4140 Dec  1  2020 uhttpd

/etc/config/network: https://pastebin.com/raw/bup10Tkq
/etc/config/firewall : https://pastebin.com/raw/fKyGEESA
/etc/config/dhcp : https://pastebin.com/raw/6yNcLa83
/etc/config/dhcp-opkg : https://pastebin.com/raw/tjtk5Uqi
/etc/config/ucitrack : https://pastebin.com/raw/pKecmT1P

Also, there is this repeating entry in cgi-bin/luci/admin/status/syslog

Fri Feb 11 20:27:40 2022 daemon.warn dnsmasq-dhcp[1595]: no address range available for DHCP request via red0
Fri Feb 11 20:27:41 2022 daemon.warn dnsmasq-dhcp[1595]: no address range available for DHCP request via red0
Fri Feb 11 20:27:42 2022 daemon.warn dnsmasq-dhcp[1595]: no address range available for DHCP request via red0
Fri Feb 11 20:27:43 2022 daemon.warn dnsmasq-dhcp[1595]: no address range available for DHCP request via red0
Fri Feb 11 20:27:43 2022 daemon.warn dnsmasq-dhcp[1595]: no address range available for DHCP request via red0
Fri Feb 11 20:27:43 2022 daemon.warn dnsmasq-dhcp[1595]: no address range available for DHCP request via red0```

Nope, don't restore backup from one major release to another. There are significant changes and you'll end up with weird issues.
Restore the config to defaults, then use the contents of the backup as a guide to reconfigure the router manually.

4 Likes

I followed your advice and set up everything from scratch manually.
The end result is the exact same situation as described in the opening post.

I am not sure if you fixed it already, but in network configuration you are using GRN and RED as interface names, while in dhcp configuration you kept the lan/wan. If you are not sure, keep the original names.

Is that a problem? I have the same setup in the 19.07 config and it is working fine. I'll try it in out tomorrow.

Well yes. How else will Dnsmasq know on which interface to apply each policy?

Because 'lan' and 'wan' are names of zones, not interfaces, and the zones are named correctly in /etc/config/firewall ? That's how I understand it and it is supported by the fact that 19.07 works just fine with the above configuration.

@trendy is right, your network and dhcp config should match. use the interface names defined in network in dhcp config as well.

firewall zones are not used in dhcp config:

config dhcp 'lan'
	option interface 'lan'

you create a named dhcp config section (called lan) in which option interface is set to lan. but should be GRN or RED.

1 Like

I see. I updated the configuration with your suggestions, but the problem remains as was:
/etc/config/dhcp: https://pastebin.com/raw/AYPxJiC2

Install tcpdump opkg update; opkg install tcpdump
Run packet capture to see what is going on to the packets: tcpdump -i any -evn host 8.8.4.4
Then on a lan host run the ping 8.8.4.4 , stop it after a few lost packets, count the lost packets, stop tcpdump and copy paste here the output and the amount of lost packets.

1 Like

And post the output of sysctl net.ipv4.ip_forward

1 Like

tcpdump -i any -evn host 8.8.4.4
https://pastebin.com/raw/7juQ4n3q

# ping 8.8.4.4
PING 8.8.4.4 (8.8.4.4) 56(84) bytes of data.
--- 8.8.4.4 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3096ms

# sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1

Problem is that packets are leaving the router with the original source address, while they should be NATed. Eg. notice the In and Out

18:19:47.944977  In 56:8c:15:24:6e:09 ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 64, id 63814, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.7.2 > 8.8.4.4: ICMP echo request, id 2, seq 4, length 64

18:19:47.945072 Out 00:90:27:77:fb:02 ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 63, id 63814, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.7.2 > 8.8.4.4: ICMP echo request, id 2, seq 4, length 64

What is the output of iptables-save -c -t nat ?

# iptables-save -c -t nat 
# Generated by iptables-save v1.8.7 on Tue Feb 15 19:35:51 2022
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT
# Completed on Tue Feb 15 19:35:51 2022

Nothing in there, run service firewall restart and paste it here.

# service firewall restart 
sh: service: not found

/etc/init.d/firewall restart

# /etc/init.d/firewall restart
Warning: Unable to locate ipset utility, disabling ipset support
Warning: Section @defaults[0] requires unavailable target extension FLOWOFFLOAD, disabling
Warning: Section @defaults[0] requires unavailable target extension FLOWOFFLOAD, disabling
 * Set tcp_ecn to off
 * Set tcp_syncookies to on
 * Set tcp_window_scaling to on
 * Running script '/etc/firewall.user'

Evidently there is something weird going on with the firewall as well as the image, since service is missing. As I am not experienced with container installation, I'll refer it to some other members of the forum and hopefully you'll get more lucky.
Just mention where did you get the image from.

I got the image from here: https://openwrt.org/docs/guide-user/virtualization/lxc
Except I am using an LXD command, not the naked LXC. It is:

lxc launch images:openwrt/21.02 router

Which service is missing?