FRR OSPF rebooting assistance

I just replaced an EOL router with OWRT for a small business network. It is being used as a gateway/firewall only. The old router used ospf to pass the default route to the L3 switches. I installed frr ospf and configured it and things are working well. My only issue is reboot. When the router is rebooted the routes are not passed soon enough and the network goes down. After a few minutes everything comes back up slowly. I need to make sure the frr ospf daemon is loading quickly after reboot and passing routes. I can handle this on a full-blown linux box, but I'm a little lost with Openwrt. I searched and got MANY different ways to enable load after reboot but they are all so different and use different shells that I need some assistance. Can someone chime in on this? Much appreciated.

Please post the output of

ubus call system board
opkg list-installed frr
cat /etc/config/firewall

edit away secrets if any

most likely cause for delay is STP

root@edge:~# ubus call system board
{
	"kernel": "6.6.86",
	"hostname": "edge",
	"system": "UBNT_E300 (CN7030p1.2-1000-AAP)",
	"model": "Ubiquiti EdgeRouter 4",
	"board_name": "ubnt,edgerouter-4",
	"rootfs_type": "squashfs",
	"release": {
		"distribution": "OpenWrt",
		"version": "24.10.1",
		"revision": "r28597-0425664679",
		"target": "octeon/generic",
		"description": "OpenWrt 24.10.1 r28597-0425664679",
		"builddate": "1744562312"
	}
}
root@edge:~# opkg list-installed frr
frr - 10.2.1-r2
root@edge:~# cat /etc/config/firewall

config defaults
	option input 'DROP'
	option output 'ACCEPT'
	option forward 'DROP'
	option synflood_protect '1'
	option flow_offloading '1'

config zone
	option name 'lan'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	list network 'lan'
	list network 'UPLINK'

config zone
	option name 'wan'
	option input 'DROP'
	option output 'ACCEPT'
	option forward 'DROP'
	option masq '1'
	option mtu_fix '1'
	list network 'WAN'

config forwarding
	option src 'lan'
	option dest 'wan'

config rule
	option name 'Allow-DHCP-Renew'
	option src 'wan'
	option proto 'udp'
	option dest_port '68'
	option target 'ACCEPT'
	option family 'ipv4'

config rule
	option name 'Allow-Ping'
	option src 'wan'
	option proto 'icmp'
	option icmp_type 'echo-request'
	option family 'ipv4'
	option target 'ACCEPT'
	option enabled '0'

config rule
	option name 'Allow-IGMP'
	option src 'wan'
	option proto 'igmp'
	option family 'ipv4'
	option target 'ACCEPT'

config rule
	option name 'Allow-DHCPv6'
	option src 'wan'
	option proto 'udp'
	option dest_port '546'
	option family 'ipv6'
	option target 'ACCEPT'
	option enabled '0'

config rule
	option name 'Allow-MLD'
	option src 'wan'
	option proto 'icmp'
	option src_ip 'fe80::/10'
	list icmp_type '130/0'
	list icmp_type '131/0'
	list icmp_type '132/0'
	list icmp_type '143/0'
	option family 'ipv6'
	option target 'ACCEPT'
	option enabled '0'

config rule
	option name 'Allow-ICMPv6-Input'
	option src 'wan'
	option proto 'icmp'
	list icmp_type 'echo-request'
	list icmp_type 'echo-reply'
	list icmp_type 'destination-unreachable'
	list icmp_type 'packet-too-big'
	list icmp_type 'time-exceeded'
	list icmp_type 'bad-header'
	list icmp_type 'unknown-header-type'
	list icmp_type 'router-solicitation'
	list icmp_type 'neighbour-solicitation'
	list icmp_type 'router-advertisement'
	list icmp_type 'neighbour-advertisement'
	option limit '1000/sec'
	option family 'ipv6'
	option target 'ACCEPT'
	option enabled '0'

config rule
	option name 'Allow-ICMPv6-Forward'
	option src 'wan'
	option dest '*'
	option proto 'icmp'
	list icmp_type 'echo-request'
	list icmp_type 'echo-reply'
	list icmp_type 'destination-unreachable'
	list icmp_type 'packet-too-big'
	list icmp_type 'time-exceeded'
	list icmp_type 'bad-header'
	list icmp_type 'unknown-header-type'
	option limit '1000/sec'
	option family 'ipv6'
	option target 'ACCEPT'
	option enabled '0'

config rule
	option name 'Allow-IPSec-ESP'
	option src 'wan'
	option dest 'lan'
	option proto 'esp'
	option target 'ACCEPT'
	option enabled '0'

config rule
	option name 'Allow-ISAKMP'
	option src 'wan'
	option dest 'lan'
	option dest_port '500'
	option proto 'udp'
	option target 'ACCEPT'
	option enabled '0'

config redirect
	option dest 'lan'
	option target 'DNAT'
	option name 'GuardDog'
	option family 'ipv4'
	list proto 'tcp'
	option src 'wan'
	option src_ip 'xxxxxxxxxxxxx'
	option src_dport 'xxx'
	option dest_ip 'xxxxxxxxxx'
	option dest_port 'xxx'
	option reflection '0'

config redirect
	option dest 'lan'
	option target 'DNAT'
	option name 'GuardDog2'
	option family 'ipv4'
	list proto 'tcp'
	option src 'wan'
	option src_ip 'xxxxxxxxxx'
	option src_dport 'xxxx'
	option dest_ip '10.10.20.7'
	option dest_port 'xxxx'
	option reflection '0'

config redirect
	option dest 'lan'
	option target 'DNAT'
	option name 'GuardDog3'
	option family 'ipv4'
	list proto 'tcp'
	option src 'wan'
	option src_ip 'xxxxxxxxxxxx'
	option src_dport 'xxxx'
	option dest_ip '10.10.20.7'
	option dest_port 'xxx'
	option reflection '0'

config redirect
	option dest 'lan'
	option target 'DNAT'
	option name 'GuardDog4'
	option family 'ipv4'
	list proto 'tcp'
	option src 'wan'
	option src_ip xxxxxxxxx'
	option src_dport 'xxx'
	option dest_ip 'xxxxxxxxx'
	option dest_port 'xxx'
	option reflection '0'

config redirect
	option dest 'lan'
	option target 'DNAT'
	option name 'GAP'
	option src 'wan'
	option src_dport 'xxx'
	option dest_ip 'xxxxxxxxxx'
	option dest_port 'xxx'
	option reflection '0'

config redirect
	option dest 'lan'
	option target 'DNAT'
	option name 'GAP2'
	option src 'wan'
	option src_dport 'xxx'
	option dest_ip 'xxxxxxxxxx'
	option dest_port 'xxx'
	option reflection '0'

config redirect
	option dest 'lan'
	option target 'DNAT'
	option name 'GAP3'
	option src 'wan'
	option src_dport 'xxx'
	option dest_ip 'xxxxxxxxx'
	option dest_port 'xxx'
	option reflection '0'

config redirect
	option dest 'lan'
	option target 'DNAT'
	option name 'VPN'
	option src 'wan'
	option src_dport 'xxxxx'
	option dest_ip 'xxxxxxxxx'
	option dest_port 'xxxxx'
	option reflection '0'
	list proto 'udp'

config include 'pbr'
	option fw4_compatible '1'
	option type 'script'
	option path '/usr/share/pbr/firewall.include'

What's on the other side? Do you use the default ospf timers?

Mind to share your frr conf too?!

No prob at all. I’m not sure I have the conf complete but it works. Any tweaks would be appreciated. I haven’t set timers.


root@edge:~# more /etc/frr/frr.conf
frr version 10.2.1
frr defaults traditional
hostname edge
log syslog
service integrated-vtysh-config
!
password zebra
!
router ospf
 ospf router-id 1.1.1.1
 network 10.10.90.8/29 area 0.0.0.0
 area 0.0.0.0 range 10.10.90.8/29
 default-information originate always
exit
!
access-list vty seq 5 permit 127.0.0.0/8
access-list vty seq 10 deny any
!
line vty
 access-class vty
exit
!
root@edge:~# 

root@edge:~# more /etc/frr/daemons
# The staticd,watchfrr and zebra daemons are always started.
#
bgpd=no
ospfd=yes
#ospfd_instances=1,20
ospf6d=no
ripd=no
ripngd=no
isisd=no
pimd=no
ldpd=no
nhrpd=no
eigrpd=no
babeld=no
sharpd=no
pathd=no
pbrd=no
bfdd=no
fabricd=no
vrrpd=no

#
# If this option is set the /etc/init.d/frr script automatically loads
# the config via "vtysh -b" when the servers are started.
# Check /etc/pam.d/frr if you intend to use "vtysh"!
#
vtysh_enable=yes
zebra_options="  -A 127.0.0.1 -s 90000000"
mgmtd_options="  -A 127.0.0.1"
bgpd_options="   -A 127.0.0.1"
ospfd_options="  -A 127.0.0.1"
ospf6d_options=" -A ::1"
ripd_options="   -A 127.0.0.1"
ripngd_options=" -A ::1"
isisd_options="  -A 127.0.0.1"
pimd_options="   -A 127.0.0.1"
ldpd_options="   -A 127.0.0.1"
nhrpd_options="  -A 127.0.0.1"
eigrpd_options=" -A 127.0.0.1"
babeld_options=" -A 127.0.0.1"
sharpd_options=" -A 127.0.0.1"
pbrd_options="   -A 127.0.0.1"
staticd_options="-A 127.0.0.1"
bfdd_options="   -A 127.0.0.1"
fabricd_options="-A 127.0.0.1"
vrrpd_options="  -A 127.0.0.1"

# The list of daemons to watch is automatically generated by the init script.
#watchfrr_options=""

# for debugging purposes, you can specify a "wrap" command to start instead
# of starting the daemon directly, e.g. to use valgrind on ospfd:
#   ospfd_wrap="/usr/bin/valgrind"
# or you can use "all_wrap" for all daemons, e.g. to use perf record:
#   all_wrap="/usr/bin/perf record --call-graph -"
# the normal daemon command is added to this at the end.

The OWRT connects to an L3 switch which does the routing. The L3 switch passes the vlan subnets to OWRT and gets the default route from OWRT. This is the info from the old router which worked great but had not been updated in a long time and I became concerned and switch to OWRT (at least for the time being). I'm just having a problem getting this info into the config on OWRT

 ospf {
        area 0.0.0.0 {
            area-type {
                normal
            }
            network 10.10.90.8/29
        }
        default-information {
            originate {
                metric-type 2
            }
        }
        parameters {
            abr-type cisco
            router-id 1.1.1.1
        }
        redistribute {
        }
 ospf {
                dead-interval 40
                hello-interval 10
                priority 1
                retransmit-interval 5
                transmit-delay 1

OK... A wanna be router but does the switch do switching or routing?

Could it be that the openwrt frr box does not terminate frr correctly so your router just drops out and times out?

Also ospf master selection can take some time. Either configure priority or just disable the master selection on the downstream stub router.

If you have issues with frr you could also try bird2 which is in my opinion far easier to use and if you don't need bgp evpn support then bird2 can everything else...

This info? What's the issue?

Thanks for your help. I've tried several reconfigs of ospf today and the frr box does not start routing upon reboot. The network goes back online after about 3 to 5 minutes. After multiple attempts I decided to set static routes and everything is back online and working great. In fact the traffic starts flowing immediately when OWRT comes up after reboot. I will stay with that for a while.

I would really like to go back to dynamic routing at some point so I will take a look at BIRD. It's much easier in a small business environment to scale up using dynamic routing.

I said in the first line of my post above that the switch is doing the routing. Yes, I've heard the mantra "routers are for routing" and "switches are for switching" but this network has 2 L3 switches in the core layer doing the routing. One of them is a 2.5G switch with 10G fiber uplinks and it's AMAZINGLY fast. The next network I design I may consider this topology but for now I will let the "switches do the routing" and the "firewalls do the firewalling" since I don't want to have to dismantle and rebuild the entire network. However, I do agree that home users don't need all of this complexity.

But that can be totally normal.

Btw. What's on the other side of the link?

That's the spirit :slight_smile:

For now it could be an option but when you have two links, this breaks.

Back to OSPF timers: I repeat the question, what's on the other side? You can configure quiet "aggressive" timers. At least it was that called like 30 years ago.
IIRC frr has an option for these really fast OSPF timers. It is intended for modern datacenters.

Please look this up in the user guide / user manual; and check if you can adjust the other side accordingly.

Just in case.... Remember: OSPF comes from a time when like 5 min downtime^w recovery time after boot was considered fast.
You could also use BGP.... frr can do unnumbered bgp... but maybe we now get to fancy.
If you like you could give some more context. You can also PM me if you like.

IF you know your limits, this is a valid option. Like i.e. the l3psuedorouterwithswitchcapabilities can speak ospf and hold like 100 routes; then thats fine. Because you could still optimize your ospf and propagate aggregated routes; or you split into sub-areas (which by now nobody does any more on "real" routers / or x86 boxes with bird/frr)...

Anyhow; good luck and don't let you get down by such a downer. Realizing that all this shit is far away from sub-second failover is normal... BUT if you have multiple routers; do BGP and BFD you get sub-second failure detection; but still recovery needs some time....

It's quiet a walk; take your time :wink:

PS: If you need a sandbox/playground for more dynamic routing especially BGP; then have a look at DN42; its a self-organized private network to mimic "Internet"... kind of... You learn something. Promised.

Thanks. I have decided to stay with static routing for the time being while I research and lab another solution. I have done some searching on installing and configuring BIRD, but it would be most helpful if you could point me to info to get BIRD up and running sooner in OWRT. Much appreciated!

In fact, if you don’t mind sharing your bird config, that would be great!!