Hi everyone,
I'm running OpenWrt 24.10.5 with a strongSwan IPsec IKEv2 site-to-site tunnel between two routers. The tunnel establishes fine initially, but after some time it simply dies and I cannot bring it back up.
Environment
- OpenWrt 24.10.5
- Kernel: Linux 6.6.119 aarch64
- strongSwan 5.9.14-r8 (full package set installed)
- Authentication: PSK, IKEv2, AES-256, SHA-256, DH group 14
The problem
The tunnel runs fine after initial setup. After an unpredictable amount of time — sometimes minutes, sometimes longer — the tunnel drops. When I try to bring it back:
ipsec up bb2-tun
It just hangs, retransmitting indefinitely with no response from the peer:
initiating IKE_SA bb2-tun[1] to 10.0.0.32
sending packet: from 10.0.0.31[500] to 10.0.0.32[500] (1144 bytes)
retransmit 1 of request with message ID 0
sending packet: from 10.0.0.31[500] to 10.0.0.32[500] (1144 bytes)
retransmit 2 of request with message ID 0
...
Even though basic connectivity between the two routers is fine (ping works). Checking ipsec statusall shows the connection is defined but no Security Associations are up:
Connections:
bb2-tun: %any...10.0.0.32 IKEv2
bb2-tun: child: 192.168.101.0/24 === 192.168.102.0/24 TUNNEL
Security Associations (0 up, 0 connecting):
none
Root cause I suspect
After investigation, I believe the actual problem is not the tunnel negotiation itself — it's that the charon daemon on one or both sides is silently dying or getting into a broken state, and the /etc/init.d/ipsec restart does not properly revive it.
When the daemon is in this broken state, the following stale files remain in /var/run/:
/var/run/charon.ctl
/var/run/charon.pid
/var/run/charon.vici
/var/run/charon.xml
/var/run/charon.wlst
/var/run/charon.dck
/var/run/starter.charon.pid
When start is called again, the starter sees these files and skips launching the daemon entirely:
charon is already running (/var/run/charon.pid exists) -- skipping daemon start
starter is already running (/var/run/starter.charon.pid exists) -- no fork done
So from the outside it looks like the tunnel simply won't come back up, but the real issue is that charon isn't actually running at all — it just thinks it is because of leftover lock files.
Current workaround
I found that manually cleaning the lock files and forcing the starter up brings the tunnel back:
sh
killall -9 charon starter 2>/dev/null
sleep 2
rm -f /var/run/charon.* /var/run/starter.*
/etc/init.d/ipsec start
sleep 1
rm -f /var/run/charon.* /var/run/starter.*
sleep 1
/usr/lib/ipsec/starter --daemon charon --nofork &
sleep 10
ipsec up bb2-tun
This works and the tunnel comes back up cleanly. But obviously this is a terrible long-term solution — I can't be running this manually every time the tunnel drops, especially in a production environment.
Is there a native procd way to ensure these stale lock files are automatically cleared if the service crashes, or is this a known issue with the strongSwan package in 24.10?
Any help is appreciated.