(Solved) Modifying / restarting the LAN (br-lan) interface in Luci causes the clients to remove their default / LAN routes during the next DHCP renew

When I modify / restart (after a configuration change) the LAN interface (br-lan) in Luci the 'LAN' routes on some of my clients disappear (I think during the next DHCP renew) and they are not automatically restored. I either have to add the routes by hand or reboot the computer(s) (which sometimes does not work as well). Once the routes are removed even a manual DHCP renew does not restore them.

The result of ip route on my desktop before the routes are removed:

default via 10.170.0.1 dev enp0s31f6 proto dhcp src 10.170.0.165 metric 203 
10.170.0.0/16 dev enp0s31f6 proto dhcp scope link src 10.170.0.165 metric 203 

Note: After this happens, I suspect during the next DHCP-renew, the above routes are gone.

The result of ip addr on my desktop before the routes are removed, but it is still the same after:

3: enp0s31f6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 4c:cc:6a:05:56:cc brd ff:ff:ff:ff:ff:ff
    inet 10.170.0.165/16 brd 10.170.200.255 scope global noprefixroute enp0s31f6
       valid_lft forever preferred_lft forever

I suspect that this happens after the following DHCP request I captured using tcpdump (changed hostnames for privacy reasons):

15:45:57.161373 IP (tos 0x0, ttl 64, id 23833, offset 0, flags [DF], proto UDP (17), length 363)
    desktop.lan.68 > router.lan.67: [udp sum ok] BOOTP/DHCP, Request from b8:27:eb:27:69:6c (oui Unknown), length 335, xid 0x569ee1b8, secs 1119, Flags [none] (0x0000)
          Client-IP desktop.lan
          Client-Ethernet-Address b8:27:eb:27:69:6c (oui Unknown)
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: Request
            Client-ID Option 61, length 7: ether b8:27:eb:27:69:6c
            MSZ Option 57, length 2: 1472
            Vendor-Class Option 60, length 46: "dhcpcd-6.11.5:Linux-4.14.76-v7+:armv7l:BCM2835"
            Hostname Option 12, length 8: "desktop"
            T145 Option 145, length 1: 1
            Parameter-Request Option 55, length 15: 
              Subnet-Mask, Classless-Static-Route, Static-Route, Default-Gateway
              Domain-Name-Server, Hostname, Domain-Name, MTU
              BR, NTP, Lease-Time, Server-ID
              RN, RB, Option 119
            END Option 255, length 0
15:45:57.166027 IP (tos 0xc0, ttl 64, id 37747, offset 0, flags [none], proto UDP (17), length 335)
    router.lan.67 > desktop.lan.68: [bad udp cksum 0x16b0 -> 0xc236!] BOOTP/DHCP, Reply, length 307, xid 0x569ee1b8, secs 1119, Flags [none] (0x0000)
          Client-IP desktop.lan
          Your-IP desktop.lan
          Server-IP router.lan
          Client-Ethernet-Address b8:27:eb:27:69:6c (oui Unknown)
          sname "router"[|bootp]

The relevant part of the /etc/config/dhcp configuration:

config dnsmasq
        option domainneeded '1'
        option localise_queries '1'
        option rebind_protection '1'
        option rebind_localhost '1'
        option local '/lan/'
        option domain 'lan'
        option expandhosts '1'
        option authoritative '1'
        option readethers '1'
        option leasefile '/tmp/dhcp.leases'
        option resolvfile '/tmp/resolv.conf.auto'
        option localservice '1'
        option enable_tftp '1'
        option tftp_root '/mnt/tftproot'
        option nonwildcard '0'

config dhcp 'lan'
        option interface 'lan'
        option start '100'
        option limit '150'
        option dhcpv6 'server'
        option ra 'server'
        option ra_management '1'
        list dhcp_option '6,10.170.0.1'
        list dhcp_option '3,10.170.0.1'
        option leasetime '5m'

The relevant part of the /etc/config/network configuration:

config interface 'lan'
        option type 'bridge'
        option proto 'static'
        option ip6assign '60'
        option igmp_snooping '1'
        option stp '1'
        option ipaddr '10.170.0.1'
        option netmask '255.255.0.0'
        option ifname 'eth0.1'

The relevant part of the /etc/config/firewall:

config zone                          
        option name 'lan'                       
        option input 'ACCEPT'          
        option output 'ACCEPT'        
        option network 'lan'         
        option forward 'REJECT' 
                                            
config forwarding                              
        option dest 'wan'                    
        option src 'lan'

Custom additions to /etc/dnsmasq.conf:

# Polycom provisioning
dhcp-option=66,"10.170.0.1"

# Computer provisioning
dhcp-boot=pxelinux.0,router,10.170.0.1

# Restrict listener
listen-address=127.0.0.1,10.170.0.1,10.180.0.1
bind-interfaces

I used the above for years without issues with one important difference, I recently migrated my network from the 192.168.170.0/24 subnet to the 10.170.0.0/16 subnet.

Any idea why this is happening (only after I modified / restarted the LAN interface on the router)?

I think you should add these to /etc/config/dhcp. I your dhcp config rewrites the conf file upon restart.

Also, you don't need to specify the router using DHCP option 3, the router knows it's a router.

The file /etc/dnsmasq.conf will not be rewritten, instead a new file is created in /var/etc/dnsmasq.conf* which will use the values from /etc/config/dhcp and any custom setting from /etc/dnsmasq.conf.

Some devices do/did not automatically assume that the DHCP server is also the router, that is why I added that. I've been doing this for years (may be > 10 years even) without issues. So I do not think that is what is causing my issues.

1 Like

There is another thing I did after this started happening. I believe that now that I removed this part from the configuration files the issue is gone.

Relevant parts that I removed from /etc/config/firewall:

config zone
       option input 'ACCEPT'
       option output 'ACCEPT'
       option forward 'ACCEPT'
       option masq '1'
       option mtu_fix '1'
       option name 'vpn'
       option network 'vpn'

config forwarding
       option src 'lan'
       option dest 'vpn'

config forwarding
       option src 'wan'
       option dest 'vpn'

config forwarding
       option dest 'lan'
       option src 'vpn'

config forwarding
       option dest 'wan'
       option src 'vpn'

Relevant parts that I removed from /etc/config/network (removed sensitive parts):

config interface 'vpn'
        option proto 'wireguard'
        option private_key 'XXXXXXXXXXXXXXXXXXXXXXXXXX'
        option listen_port '50400'
        list addresses '10.170.200.1/24'

config wireguard_vpn
        option public_key 'XXXXXXXXXXXXXXXXXXXXXXXXXX'
        list allowed_ips '10.170.200.100/32'
        option route_allowed_ips '1'
        option persistent_keepalive '25'

And the following in /etc/sysctl.d/15-arp-proxy-test.conf:

net.ipv4.conf.all.proxy_arp = 1

When running ip route on my router (removed public IPs):

default via <internet-gw> dev eth1.34 proto static src <internet-ip> metric 10 
10.170.0.0/16 dev br-lan proto kernel scope link src 10.170.0.1 
10.170.200.0/24 dev vpn proto kernel scope link src 10.170.200.1 
10.170.200.100 dev vpn proto static scope link 
10.180.0.0/16 dev br-iot proto kernel scope link src 10.180.0.1 
<internet ip>/23 dev eth1.34 proto static scope link metric 10 

When running ip addr on my route (removed public IPs):

86: vpn: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1
    link/none 
    inet 10.170.200.1/24 brd 10.170.200.255 scope global vpn
       valid_lft forever preferred_lft forever
92: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 32:46:9a:fe:2d:37 brd ff:ff:ff:ff:ff:ff
    inet 10.170.0.1/16 brd 10.170.255.255 scope global br-lan
       valid_lft forever preferred_lft forever

Could any of this cause the issues I mentioned?

Update: Did some more tests and when I disable everything that has to do with the VPN interface and zone I cannot reproduce the issue. I think it must have something to do with...

10.170.0.0/16 dev br-lan proto kernel scope link src 10.170.0.1 
10.170.200.0/24 dev vpn proto kernel scope link src 10.170.200.1 
10.170.200.100 dev vpn proto static scope link 

... causing some sort of loop. I still have to try remove the 10.170.200.0/24 line, as it seems unnecessary.
I think it is weird as no clients are even using the 10.170.200.0/24 range, still somehow DHCP get confused.

Update 2: It seems that by removing the following line from the routing table...

10.170.200.0/24 dev vpn proto kernel scope link src 10.170.200.1 

...I fixed the issue. By specifying 10.170.200.1/32 as the IP Addresses for WireGuard instead of 10.170.200.1/24 as shown in most tutorials (BTW I noticed many WireGuard tutorials mention things that are clearly wrong). I think it causes some kind of loop. Not sure, as I could not verify this by looking at the captures. The line 10.170.200.0/24 dev vpn is redundant anyway as the line 10.170.200.100 dev vpn already provides what is needed. May be someone that is more knowledgeable as me can explain why the DHCP issue happened?

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.