Wireguard routing issues

I'm working on a travel router. I was configured with default setup from wireguard client - wireguard interface is in 'wan' zone and route_allowed_ips '1'. It works ok - all traffic goes through wireguard. I need to be able to toggle wireguard via a sliding button on the side of the GL-MT300A. Below is my rc.button/BTN_1:

#!/bin/sh

#logger "the button was ${BUTTON} and the action was ${ACTION}"

WGIF="wg0"

if [ "$ACTION" = "released" ] && [ "$BUTTON" = "BTN_1" ];
then
        logger "Wireguard off"
        uci set network.${WGIF}.auto="0"
        uci commit
        /sbin/ifdown ${WGIF}
fi

if [ "$ACTION" = "pressed" ] &&  [ "$BUTTON" = "BTN_1" ];
then
        logger "Wireguard on"
        uci set network.${WGIF}.auto="1"
        uci commit
        /sbin/ifup ${WGIF}
fi

This works almost perfect. When I bring up wg0 I set auto '1' for it so it's brought up on next boot and conversly when I bring it down I set auto '0' and it won't start vpn on next boot. All fine and dandy. Issues arise when I disable wg0. It will drop wg0 interface, but it won't restore default route for wan interface and all the traffic will be stalled. I know I could just manually save the route to a file and restore it from file uppond disabling wg0, but I would like a more openwrt-idiomatic solution for this. Any way to restore those routes properly? Hardcoding them won't solve the problem - I have a dozen of crazy usb modems, radios etc that I want to act as wan on a whim and those routes will be different each time. My priority is fast enable/disable without restarting network and maximum leak prevention. I would like to avoid PBR at the moment.

I tried implementing routing all traffic but it has same issues and requires a lot more uci tweaking in firewall to remove forwards. I tried implementing this without route_allowed_ips '1' but I don't think it's posisble without PBR which I don't want to bake into my custom image just yet (I initially mistakenly thought that forwarding rules add routes but this is not the case).

Guys, halp, frustration is eating me away :smiley:

Two main options:

  1. (easy) set a metric for the wan interface. This will prevent the default route from being overwritten by WG (and in turn not re-enstated when WG is taken down)
  2. restart the wan interface after WG is down.

Thanks for rapid response!

Option 1 seems the best, but won't it cause the traffic to be automatically routed through WAN when VPN dies (no kill-switch)?

yes, that is a potential issue.

The solution to that is to put the VPN in a separate firewall zone. Add lan > vpn forwarding, and then remove lan > wan forwarding. Make the forwarding rules part of the slide switch script (you can always have lan > vpn, but delete the lan > wan when you want the VPN kill switch active; add lan > wan when it's okay for traffic to egress the standard wan.

Also... in case you haven't run into it... make sure that you only start the VPN after NTP has successfully sync'd, or setup appropriate firewall rules to allow one or more time server IPs to bypass the tunnel. If try to start WG before there is a valid time sync, the tunnel won't come up properly, but the routing table will prevent the NTP request traffic from egressing via the normal wan, creating a chicken-or-egg situation with the tunnel and time. You need generally accurate time in order to have a functioning tunnel, and your device doesn't have a real-time clock.

Option 2: add to your script:
service network restart to restore routing

Thanks a lot! Setting metric and dedicated zone helped! I made a dedicated zone and added a few lines to switch forwarding rules (few lines since I have 3 wireless networks and I can't route them selectively without PRB I guess?). Do I understand correctly - since forwarding is set to wg0_zone when wireguard dies the zone statys in place but does not route any traffic (it's not in wan zone anymore) so kill-switch is engaged.
This works great, now for the NTP problem - I found this solution - do I understand right: we have no time, so wg crypto can't work and can't connect. We restart ntp so it updates the time and we're golden. In my case I need to run the script that changes forwarding to wg0_zone after ntp starts - is there any way to do this? I initially thought about using /etc/rc.local - it's executed after whole init has finished so ntp should be up and running. I will just check if wg0 interface is up and if it is I will switch forwarding to wg0_zone. Is it a sane approach? :slight_smile:

Thanks! This indeed fixes the issue. My only gripe is that it's painfully slow on my hardware and tears the wifi down in the process :slight_smile:

You can try service network reload not sure if that would restore routing and how much faster that is

EDIT: marked this as solution for clarity, but credit goes to @psherman

What this setup does:

  • start/stop wireguard on button action
  • router will remember if wireguard is active or not next reboot
  • takes race condition with ntp into account
  • uses dedicated wireguard zone for a kill switch to work (if the wireguard connection dies router users are cut off internet instead of falling back to clearnet)
  • stopping the wireguard on runtime propery restores routes

You need to follow setup described here. All files need to be made executable (chmod +x).

You need to add a metric to your WAN interface(s) - otherwise shutting down wireguard will not bring back old default gateway and network will be down:

config interface 'wwan_rndis'
        option device 'usb0'
        option proto 'dhcp'
        option metric '1023'

config interface 'wwan_cdc'
        option device 'eth1'
        option proto 'dhcp'
        option metric '1023'

config interface 'wan'
        option device 'eth0.2'
        option proto 'dhcp'
        option metric '1024'

(and so on...)

I decided to use a procd-init script to launch my config script - this way I can be sure that it runs after sysntpd (which is START=98). Initially I tried running all the commands from within the procd script,but apparently running service firewall reload failed for some reason. I moved my configuration to external script in /usr/bin/wg_service and everything works really smoothly now. It takes 1-3s for the change to take place, which is absolutely awesome imo :smiley:
Here is the /etc/init.d/wireguard service:

#!/bin/sh /etc/rc.common

START=99
USE_PROCD=1

start_service() {
        /usr/bin/wg_service start
}

stop_service() {
        /usr/bin/wg_service stop
}
service_stopped() {
        return 0
}

NOTE: I'm not using procd_open_instance and other procd machinery on purpose - otherwise procd would expect the command wg_service not to exit and running service wireguard stop would not work since procd would think the service never started successfully).

Here is the /usr/bin/wg_service (EDIT: its final version that serches all forwarding rules):

#!/bin/sh

ACTION=$1
WGIF="wg0"
WAN_ZONE="wan"
WG_ZONE="wg0_zone"

. /lib/functions.sh

wan_to_wireguard() {
        local dest
        config_get dest "$1" dest
        if [ $dest = "$WAN_ZONE" ]; then
                uci_set firewall "$1" dest "$WG_ZONE"
        fi
}

wireguard_to_wan() {
        local dest
        config_get dest "$1" dest
        if [ $dest = "$WG_ZONE" ]; then
                uci_set firewall "$1" dest "$WAN_ZONE"
        fi
}


case "$ACTION" in
        start)
                logger "Wireguard on"
                config_load firewall
                config_foreach wan_to_wireguard forwarding
                uci_commit firewall
                # service firewall reload not needed - it's done by ifup
                /sbin/ifup ${WGIF}
        ;;
        stop)
                logger "Wireguard off"
                config_load firewall
                config_foreach wireguard_to_wan forwarding
                uci_commit firewall
                /sbin/ifdown ${WGIF}
                # we need some time for the wg0 to stop otherwise firwall reload fails
                sleep 3
                # ifdown does not reload firewall on it's own
                service firewall reload
        ;;
        *)
        exit 1
        ;;
esac

Here is an old version that does not search all forwarding rules it just changes first forwarding rule:

#!/bin/sh

ACTION=$1
WGIF="wg0"

case "$ACTION" in
        start)
                logger "Wireguard on"
                uci set firewall.@forwarding[0].dest="wg0_zone"
                uci commit
                /sbin/ifup ${WGIF}
        ;;
        stop)
                logger "Wireguard off"
                uci set firewall.@forwarding[0].dest="wan"
                uci commit
                /sbin/ifdown ${WGIF}
                # ifdown does not reload firewall on it's own
                service firewall reload
        ;;
        *)
        exit 1
        ;;
esac

And finally the simplified button script in /etc/rc.button/BTN_1:

#!/bin/sh

if [ "$ACTION" = "released" ] && [ "$BUTTON" = "BTN_1" ];
then
        service wireguard disable
        service wireguard stop
fi

if [ "$ACTION" = "pressed" ] &&  [ "$BUTTON" = "BTN_1" ];
then
        service wireguard enable
        service wireguard start
fi

As you can see it just enables/disable the service (so the config is preserved next boot) and starts/stops it.

If there is a way to start/stop services from within procd script I'd be grateful for a hint - the less files the better.

Omg, working with OpenWRT is soooo satisfying. The stuff in /lib is pure awesome!

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.