I've finally set up my WWAN LTE backup connection and have been testing it out, and I'm pretty buzzed by the results, I can switch between FTTP and LTE without even causing a noticeable interrupt on a Teams video call.
I do this my updating the route metrics and it's seamless.
#!/bin/sh
WAN1=wan
WAN2=wwan
uci set network.$WAN2.metric='0'
uci set network.$WAN1.metric='1'
uci commit network
/etc/init.d/network reload
I would love to get this automated to the point I could bridge a real-life WAN outage without interrupting a video call, but this means detecting the WAN outage in a timely fashion. Most methods I've seen rely on pinging some known public infrastructure like Google's DNS servers (8.8.8.8, 8.8.4.4).
First issue is that to do this fast enough to make the transition seamless, you'd need to be pinging multiple times per seconds - do Google allow this? Or would they eventually block you?
Second problem is how many failed pings do you wait for before declaring a route is down, a single failed ping would be too little, but waiting too long would interrupt streaming applications.
Google and others will probably not allow that and will block you.
You can however ping a host name which uses multiple IP addresses in a round robin fashion, I do this for my failover script:
you can also set a host-name which resolves to multiple IP addresses: under DHCP and DNS > Hostnames (/etc/config/dhcp, config domain) add: ping-host.mylan 8.8.8.8ping-host.mylan 9.9.9.9.
Check if the name resolves with: nslookup ping-host.mylan Then use ping-host.mylan as ping address and all addresses of ping-host.mylan will be used in a round robin method, this also adds redundancy if one server is down
But I do not ping more than once per 6 seconds, I do use 6 pings to see if the connection is down.
I do not think you can do this seamless so that you do not have any interruption
On business or Datacenter Connections you use a dynamic routing protocol with multiple routers and in conjunction with BFD to even get sub second failure detection.
And no, icmp is not reliable because the response packet is generated by the CPU and under heavy load these probe requests do not get answered at all, or delayed.