I have a really strange Problem. Right now I use a Linksys EA8500 and a Netgear R6220 Router as Routers with conntrackd and keepalived as failover-cluster behind a dump DSL-Router (which works well and doesn’t seems to be part of the problem)! I Use Wireguard on a third (virtual) Openwrt-Router with several Clients (mainly openwrt-routers) behind it. The wireguard-Ports are forwarded (udp port 51820) by the cluster.
All works pretty perfect as long as I don't reboot one of my Cluster-Routers. After that the tunnels are down as long as one of the clients gets a new external IP (after reconnet the DSL-line). Anything else still works after reboot, like port-forwarding for a webserver or ssh…
I see on the wan interface the incoming connections, but the router doesn’t forward it to the Wireguard-Router (checked with tcpdump on inbound and outbound interface). When I reconnect one of the clients DSL-Router (with renew of external IP) it connects within seconds. This only happens when the tunnel was up on reboot. Any dialin-connection works after the reboot as long as it was not online when the reboot happens. I rebooted both rooters at the same tine to be sure that there is no session which is synchronized between the routers. I also disactivated the forward-rules and activated it again, still not working!
It feels like there are sessions cached and I don’t find a way to flush that cache. I tried to flush the conntrack-sessions and restart the firewall with no effect! I recently updated to 19.07.1 (came from 18.06.6) also with no effect.
As I see the incoming sessions on wan and not the forwarding on lan the problem must be on the cluster-routers, right? Any Idea of an (reboot persistent) cache for active sessions or connections or any firewall-caches?
Are you sure this isn't time related? Wireguard uses the clock to provide protection against replay attacks. The clock must always increase in Wireguard. When you reboot routers if they don't get their clock synchronized properly, the tunnel will be blocked.
Perhaps what really happens is that when you restart the wan you also managed to get the clock set finally? Make sure that your router can talk to an NTP server over the WAN (and not through the tunnel).
Good Idea, but since the traffic doen't get through the first router, the wireguard-router can't drop it!
I figured out, that it also start working after I reboot my own DSL-Uplink-Router (VMG1312-B30A). So it is related to the NAT-Sessions on both sides, but since the traffic gets through the DSL-Router and is droped on the the way trough the second router (cluster) it must be related to that router.
Maybe there is an active session between the two NAT-Routers ans the router behind can't deal with it after reboot. Maybe any session id which ist different.
I flushed it with the conntrack tools with no effect, but i will try that way too.
I also did som further analysis. I switched form the Zycel router to a Fritzbox (7430) which is slower with dsl but I hoped it would fix my problem. It did it a way but still the stream ended up after reconnects but now it does not even show the traffic on the wan-side of my openwrt router. I dumped the pure wan-traffic from the fritzbox an see that the traffic is somehow filtered: