[22.03] No routing until firewall gets reloaded

I am running 22.03 on my RB5009UG. I've added a few firewall rules, and a WireGuard instance, but otherwise it's pretty much stock. After booting, routing is broken: DNS resolution works, but 'the internet' does not. After reloading the firewall, routing functionality is as it should be. Looks like somehow the firewall is loaded a bit too early?

I diffed the nftables ruleset prior and after reload, I turned it into a gist to have a colourised diff (command nft list ruleset):

Ruleset:

# cat /etc/config/firewall 

config defaults
	option syn_flood '1'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option drop_invalid '1'

config zone
	option name 'lan'
	list network 'lan'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'

config zone
	option name 'wan'
	list network 'wan'
	list network 'wan6'
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option masq '1'
	option mtu_fix '1'

config forwarding
	option src 'lan'
	option dest 'wan'

config rule
	option name 'Allow-DHCP-Renew'
	option src 'wan'
	option proto 'udp'
	option dest_port '68'
	option target 'ACCEPT'
	option family 'ipv4'

config rule
	option name 'Allow-Ping'
	option src 'wan'
	option proto 'icmp'
	option icmp_type 'echo-request'
	option family 'ipv4'
	option target 'ACCEPT'

config rule
	option name 'Allow-IGMP'
	option src 'wan'
	option proto 'igmp'
	option family 'ipv4'
	option target 'ACCEPT'

config rule
	option name 'Allow-DHCPv6'
	option src 'wan'
	option proto 'udp'
	option dest_port '546'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-MLD'
	option src 'wan'
	option proto 'icmp'
	option src_ip 'fe80::/10'
	list icmp_type '130/0'
	list icmp_type '131/0'
	list icmp_type '132/0'
	list icmp_type '143/0'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-ICMPv6-Input'
	option src 'wan'
	option proto 'icmp'
	list icmp_type 'echo-request'
	list icmp_type 'echo-reply'
	list icmp_type 'destination-unreachable'
	list icmp_type 'packet-too-big'
	list icmp_type 'time-exceeded'
	list icmp_type 'bad-header'
	list icmp_type 'unknown-header-type'
	list icmp_type 'router-solicitation'
	list icmp_type 'neighbour-solicitation'
	list icmp_type 'router-advertisement'
	list icmp_type 'neighbour-advertisement'
	option limit '1000/sec'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-ICMPv6-Forward'
	option src 'wan'
	option dest '*'
	option proto 'icmp'
	list icmp_type 'echo-request'
	list icmp_type 'echo-reply'
	list icmp_type 'destination-unreachable'
	list icmp_type 'packet-too-big'
	list icmp_type 'time-exceeded'
	list icmp_type 'bad-header'
	list icmp_type 'unknown-header-type'
	option limit '1000/sec'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-IPSec-ESP'
	option src 'wan'
	option dest 'lan'
	option proto 'esp'
	option target 'ACCEPT'

config rule
	option name 'Allow-ISAKMP'
	option src 'wan'
	option dest 'lan'
	option dest_port '500'
	option proto 'udp'
	option target 'ACCEPT'

config zone
	option name 'wg'
	option input 'ACCEPT'
	option forward 'ACCEPT'
	option output 'ACCEPT'
	option network 'wg0'
	option masq '1'

config forwarding
	option src 'wg'
	option dest 'lan'

config forwarding
	option src 'lan'
	option dest 'wg'

config rule
	option src '*'
	option target 'ACCEPT'
	list proto 'udp'
	option dest_port '8192'
	option name 'Allow-Wireguard-Inbound'

config redirect
	option target 'DNAT'
	option src 'wan'
	option dest 'lan'
	list proto 'tcp'
	option src_dport '4505'
	option dest_ip '10.0.0.5'
	option dest_port '4505'
	option name 'Salt 1'

config redirect
	option target 'DNAT'
	option src 'wan'
	option dest 'lan'
	list proto 'tcp'
	option src_dport '4506'
	option dest_ip '10.0.0.5'
	option dest_port '4506'
	option name 'Salt 2'

Are there any signs in the log that /etc/hotplug.d/iface/20-firewall reloads the firewall when the pppoe-wan and wg0 interfaces go up?

1 Like

Nope. Just this, only for br-lan...

# logread -e firewall
Sun May 29 21:09:27 2022 user.notice firewall: Reloading firewall due to ifup of lan (br-lan)

Further down (few seconds later) the PPP interface gets brought up but no firewall reload (21:09:31), same for wg (21:09:28).

@pavelgl 's been doing his homework.

Is this related?

https://git.openwrt.org/?p=project/firewall4.git;a=commitdiff;h=628d7917ea03a24de43a35fd90894cf8d5d62dc0

1 Like

That commit is included in 22.03.0-rc3. @Borromini which specific 22.03 RC are you using?

I'm on 22.03 HEAD, compiled a few days ago. So it's current since its latest fw4 bump which happened around 9 days ago.

It's been over a month since I set up OpenWrt on this device first, so I'm not sure if it's a regression. Will see what reverting 628d7917ea03a24de43a35fd90894cf8d5d62dc0 does.

Reverting 628d7917ea03a24de43a35fd90894cf8d5d62dc0 doesn't help unfortunately.

@jow Any idea what might be going on there?

No, not really. You need to debug the 20-firewall hotplug script, see if it is invoked at all for INTERFACE=wan and if yes, where it bails out exactly. Sprinkling logger statements will help.

Granted I'm not running 22.03 [Any Flavor], nor fw4; but would changing the /etc/init.d/firewall directly for a quick glimpse at the logger help?

Out of curiosity @Borromini since the revert failed; did the "reload, restart" of firewall get routing back?

Yeah, a firewall reload brings routing back.

@jow Hotplug script now looks like this:

#!/bin/sh

has_zone() {
	fw4 -q network "$INTERFACE" >/dev/null && return 0
	eval $(ubus call "network.interface.$INTERFACE" status | jsonfilter -e 'ZONE=@.data.zone')
	fw4 -q zone "$ZONE" >/dev/null
}

logger -t firewall "Interface up or update"
[ "$ACTION" = ifup -o "$ACTION" = ifupdate ] || exit 0

logger -t firewall "Interface update with IP addresses"
[ "$ACTION" = ifupdate -a -z "$IFUPDATE_ADDRESSES" -a -z "$IFUPDATE_DATA" ] && exit 0

/etc/init.d/firewall enabled || exit 0

logger -t firewall "Zone check"
has_zone || exit 0

logger -t firewall "Reloading firewall due to $ACTION of $INTERFACE ($DEVICE)"
fw4 -q reload

Which yields this:

# logread  -e firewall
Mon May 30 22:38:52 2022 user.notice firewall: Interface up or update
Mon May 30 22:38:52 2022 user.notice firewall: Interface update with IP addresses
Mon May 30 22:38:52 2022 user.notice firewall: Zone check
Mon May 30 22:38:52 2022 user.notice firewall: Reloading firewall due to ifup of lan (br-lan)

I can say for sure that the PPPoE interface only comes up at 22:39:01 and there's no firewall messages in the log anymore after that, only before (the ones above).

That looks as if no hotplug event is triggered at all when PPPoE is established. Is there any other prior scripts to 20-firewall? Do you see dangling shell processes in ps www ?

Ps output:

root@OpenWrt:~# ps ww
  PID USER       VSZ STAT COMMAND
    1 root      1652 S    /sbin/procd
    2 root         0 SW   [kthreadd]
    3 root         0 IW<  [rcu_gp]
    4 root         0 IW<  [rcu_par_gp]
    5 root         0 IW   [kworker/0:0-eve]
    6 root         0 IW<  [kworker/0:0H-ev]
    7 root         0 IW   [kworker/u8:0-ev]
    8 root         0 IW<  [mm_percpu_wq]
    9 root         0 SW   [rcu_tasks_trace]
   10 root         0 SW   [ksoftirqd/0]
   11 root         0 IW   [rcu_sched]
   12 root         0 SW   [migration/0]
   13 root         0 SW   [cpuhp/0]
   14 root         0 SW   [cpuhp/1]
   15 root         0 SW   [migration/1]
   16 root         0 SW   [ksoftirqd/1]
   17 root         0 IW   [kworker/1:0-eve]
   18 root         0 IW<  [kworker/1:0H-kb]
   19 root         0 SW   [cpuhp/2]
   20 root         0 SW   [migration/2]
   21 root         0 SW   [ksoftirqd/2]
   22 root         0 IW   [kworker/2:0-rcu]
   23 root         0 IW<  [kworker/2:0H-ev]
   24 root         0 SW   [cpuhp/3]
   25 root         0 SW   [migration/3]
   26 root         0 SW   [ksoftirqd/3]
   27 root         0 IW   [kworker/3:0-rcu]
   28 root         0 IW<  [kworker/3:0H-kb]
   29 root         0 IW<  [netns]
   30 root         0 IW   [kworker/u8:1-ev]
   33 root         0 IW   [kworker/u8:2-ev]
   39 root         0 IW   [kworker/1:1-eve]
   52 root         0 IW   [kworker/3:1-eve]
   63 root         0 IW   [kworker/0:1-eve]
   64 root         0 IW   [kworker/2:1-eve]
  297 root         0 IW   [kworker/u8:3-ev]
  298 root         0 SW   [oom_reaper]
  300 root         0 IW<  [writeback]
  302 root         0 SW   [kcompactd0]
  308 root         0 IW<  [pencrypt_serial]
  310 root         0 IW<  [pdecrypt_serial]
  312 root         0 IW<  [cryptd]
  329 root         0 IW<  [kblockd]
  330 root         0 IW<  [blkcg_punt_bio]
  340 root         0 IW<  [ata_sff]
  365 root         0 SW   [watchdogd]
  380 root         0 IW<  [kworker/0:1H-kb]
  417 root         0 SW   [kswapd0]
  418 root         0 IW<  [kthrotld]
  502 root         0 IW   [kworker/0:2-eve]
  503 root         0 IW   [kworker/0:3-eve]
  504 root         0 IW   [kworker/0:4-rcu]
  505 root         0 IW   [kworker/0:5-eve]
  506 root         0 IW   [kworker/0:6-eve]
  507 root         0 IW   [kworker/0:7-rcu]
  508 root         0 IW   [kworker/0:8]
  517 root         0 SW   [hwrng]
  518 root         0 IW   [kworker/1:2-eve]
  522 root         0 IW   [kworker/1:3-ubi]
  523 root         0 IW   [kworker/1:4]
  544 root         0 IW<  [nvme-wq]
  546 root         0 IW<  [nvme-reset-wq]
  548 root         0 IW<  [nvme-delete-wq]
  570 root         0 SW   [spi0]
  739 root         0 IW<  [stats-wq-eth0]
  742 root         0 IW   [kworker/u8:4-st]
  761 root         0 IW   [kworker/2:2-rcu]
  789 root         0 SW   [irq/43-f06f8000]
  790 root         0 IW   [kworker/2:3-eve]
  819 root         0 IW<  [ipv6_addrconf]
  820 root         0 IW<  [kworker/2:1H-kb]
  821 root         0 IW<  [dsa_ordered]
  832 root         0 SW   [f212a200.mdio-m]
  833 root         0 SW   [irq/51-mv88e6xx]
  834 root         0 SW   [irq/69-mv88e6xx]
  835 root         0 SW   [irq/47-mv88e6xx]
  836 root         0 SW   [irq/49-mv88e6xx]
  871 root         0 SW   [irq/54-mv88e6xx]
  872 root         0 SW   [irq/55-!cp0!con]
  876 root         0 SW   [irq/56-!cp0!con]
  880 root         0 SW   [irq/57-!cp0!con]
  884 root         0 SW   [irq/58-!cp0!con]
  888 root         0 SW   [irq/59-!cp0!con]
  892 root         0 SW   [irq/60-!cp0!con]
  896 root         0 SW   [irq/61-!cp0!con]
  908 root         0 SW   [irq/70-f2400000]
  916 root         0 SW   [ubi_bgt0d]
  922 root         0 IW<  [kworker/3:1H-kb]
  923 root         0 SW   [irq/71-keys]
  930 root         0 IW   [kworker/3:2-eve]
  935 root         0 IW<  [kworker/1:1H-kb]
  978 root         0 IW   [kworker/3:3-eve]
 1008 root         0 SW   [ubifs_bgt0_1]
 1084 ubus      1240 S    /sbin/ubusd
 1085 root       848 S    /sbin/askfirst /usr/libexec/login.sh
 1116 root       984 S    /sbin/urngd
 1204 root         0 SW   [irq/72-sfp-mod-]
 1205 root         0 SW   [irq/73-sfp-los]
 1206 root         0 SW   [irq/74-sfp-tx-f]
 1429 logd      1220 S    /sbin/logd -S 64
 1483 root      2152 S    /sbin/rpcd -s /var/run/ubus/ubus.sock -t 30
 1556 root      1080 S    /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300 -T 3
 1724 root      1828 S    /sbin/netifd
 1786 root      1592 S    /usr/sbin/odhcpd
 1851 root      1248 S    /usr/sbin/crond -f -c /etc/crontabs -l 5
 1939 root         0 SW   [irq/63-mv88e6xx]
 1954 root      5304 S    /usr/sbin/uhttpd -f -h /www -r OpenWrt -x /cgi-bin -u /ubus -t 60 -T 30 -k 20 -A 1 -n 3 -N 100 -R -p 0.0.0.0:80 -p [::]:80 -C /etc/uhttpd.crt -K /etc/uhttpd.key -s 0.0.0.0:443 -s [::]:443
 2005 root      1384 S    {hotplug-call} /bin/sh /sbin/hotplug-call iface
 2089 root         0 SW   [irq/64-mv88e6xx]
 2110 root         0 IW<  [wg-crypt-wg0]
 2149 root      1404 SN   /usr/sbin/nlbwmon -o /var/lib/nlbwmon -b 524288 -i 24h -r 30s -p /usr/share/nlbwmon/protocols -G 10 -I 1 -L 10000 -Z -s 192.168.0.0/16 -s 172.16.0.0/12 -s 10.0.0.0/8 -s 10.0.0.1/24 -s fd7b:dcca:3c0e::1/60 -s 10.0.0.0/8
 2179 root      1404 S    {hotplug-call} /bin/sh /sbin/hotplug-call iface
 2375 root      4316 SN   /usr/sbin/collectd -C /tmp/collectd.conf -f
 2805 root      1004 S    odhcp6c -s /lib/netifd/dhcpv6.script -t120 br-lan
 2819 root      1312 S    /usr/sbin/pppd nodetach ipparam wan ifname pppoe-wan lcp-echo-interval 1 lcp-echo-failure 5 lcp-echo-adaptive +ipv6 nodefaultroute usepeerdns maxfail 1 user user@domain password ?????? ip-up-script /lib/netifd/ppp-up ipv6-up-script /lib/netifd/ppp6-up ip-down-script /lib/netifd/ppp-down ipv6-down-script /lib/netifd/ppp-down mtu 1492 mru 1492 plugin pppoe.so nic-p8
 3157 unbound  15312 S    /usr/sbin/unbound -d -c /var/lib/unbound/unbound.conf
 3259 root      1652 S    {dynamic_dns_upd} /bin/sh /usr/lib/ddns/dynamic_dns_updater.sh -v 0 -S NoIP -- start
 3512 root      1248 S    sleep 60
 3666 root      1004 S    odhcp6c -s /lib/netifd/dhcpv6.script -P0 -t120 pppoe-wan
 3942 root       860 S    /usr/sbin/ntpclient -c 1 -p 123 -i 2 -h 2.openwrt.pool.ntp.org
 4187 root      1100 S    /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300 -T 3 -2
 4188 root      1260 S    -ash
 4197 root      1256 R    ps ww

/etc/hotplug.d contents:

# ls /etc/hotplug.d/*
/etc/hotplug.d/iface:
00-netstate   20-firewall   20-ntpclient  30-nlbwmon    95-ddns

/etc/hotplug.d/net:
00-sysctl               20-smp-packet-steering

/etc/hotplug.d/ntp:
25-unbound

So it seems the hotplug calls are indeed stuck. I suppose one of 20-ntpclient, 30-nlbwmon or 95-ddns is the culprit. One of these scripts does not return/hangs, preventing /sbin/hotplug-call to complete, causing events to pile up.

My first hunch would be the ddns script, due to the presence of a ddns script in your ps output

OK. I've deleted the 95-ddns hotplug script, but no difference AFAICT:

# logread -e firewall
Tue May 31 19:14:56 2022 user.notice firewall: Interface up or update
Tue May 31 19:14:56 2022 user.notice firewall: Interface update with IP addresses
Tue May 31 19:14:56 2022 user.notice firewall: Zone check
Tue May 31 19:14:56 2022 user.notice firewall: Reloading firewall due to ifup of lan (br-lan)

Can you please provide a complete logread output after boot?

Sure, here you go: https://0paste.com/377045-63b2219c.

Anything else, let me know.

After discussing this with @jow, it turned out the ntpclient hotplug handler spawns an ntpclient command that never returns, which blocks all subsequent hotplug events.

Ntpclient is pulled in by the luci-app-ntpc package, I will be filing a bug against the former as requested.

Thanks jow!

1 Like

I wonder whether hotplug should not actually allow to configure scripts as best-effort and time these out if they do not return in a user configurable amount of time? Or at the very least a monitoring process that can add a warning to the log that execution of a specific hotplug script blocked for XX seconds?

(I suffered from the same issue on turris OS for a long time and the best fudge I came up with (not being able to figure out the root cause) was to synthesize calls to the /etc/hotplug.d/iface scripts (where the offending ntpclient script lives I believe) from /etc/hotplug.d/net, rather crude).

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.