Port-forwarding-problem after reboot with an active UDP session (Wireguard)

toxic-tonic · February 16, 2020, 11:54am

Hi,

I have a really strange Problem. Right now I use a Linksys EA8500 and a Netgear R6220 Router as Routers with conntrackd and keepalived as failover-cluster behind a dump DSL-Router (which works well and doesn’t seems to be part of the problem)! I Use Wireguard on a third (virtual) Openwrt-Router with several Clients (mainly openwrt-routers) behind it. The wireguard-Ports are forwarded (udp port 51820) by the cluster.

All works pretty perfect as long as I don't reboot one of my Cluster-Routers. After that the tunnels are down as long as one of the clients gets a new external IP (after reconnet the DSL-line). Anything else still works after reboot, like port-forwarding for a webserver or ssh…

I see on the wan interface the incoming connections, but the router doesn’t forward it to the Wireguard-Router (checked with tcpdump on inbound and outbound interface). When I reconnect one of the clients DSL-Router (with renew of external IP) it connects within seconds. This only happens when the tunnel was up on reboot. Any dialin-connection works after the reboot as long as it was not online when the reboot happens. I rebooted both rooters at the same tine to be sure that there is no session which is synchronized between the routers. I also disactivated the forward-rules and activated it again, still not working!

It feels like there are sessions cached and I don’t find a way to flush that cache. I tried to flush the conntrack-sessions and restart the firewall with no effect! I recently updated to 19.07.1 (came from 18.06.6) also with no effect.

As I see the incoming sessions on wan and not the forwarding on lan the problem must be on the cluster-routers, right? Any Idea of an (reboot persistent) cache for active sessions or connections or any firewall-caches?

Thanks in advance

Tobias

ulmwind · February 16, 2020, 2:46pm

I don't know, port forwarding stops to work after reboot? BTW UDP is not best choice for connection tracking.

toxic-tonic · February 16, 2020, 5:36pm

No,it work for any protocol but Wireguard. And wireguard uses UDP so there is no choice...

dlakelan · February 16, 2020, 5:55pm

Are you sure this isn't time related? Wireguard uses the clock to provide protection against replay attacks. The clock must always increase in Wireguard. When you reboot routers if they don't get their clock synchronized properly, the tunnel will be blocked.

Perhaps what really happens is that when you restart the wan you also managed to get the clock set finally? Make sure that your router can talk to an NTP server over the WAN (and not through the tunnel).

toxic-tonic · February 17, 2020, 12:00pm

Good Idea, but since the traffic doen't get through the first router, the wireguard-router can't drop it!

I figured out, that it also start working after I reboot my own DSL-Uplink-Router (VMG1312-B30A). So it is related to the NAT-Sessions on both sides, but since the traffic gets through the DSL-Router and is droped on the the way trough the second router (cluster) it must be related to that router.

Maybe there is an active session between the two NAT-Routers ans the router behind can't deal with it after reboot. Maybe any session id which ist different.

Any idea how to debug ist in the firewall?

Thanks

Tobias

jow · February 17, 2020, 12:39pm

Sounds like conntrack cached the stream with an old address, preventing the new flow with the updated addresses to get handled properly.

Does it start working if you flush the conntrack table on OpenWrt using echo f > /proc/net/nf_conntrack ?

toxic-tonic · March 4, 2020, 10:03am

Hi!

I flushed it with the conntrack tools with no effect, but i will try that way too.

I also did som further analysis. I switched form the Zycel router to a Fritzbox (7430) which is slower with dsl but I hoped it would fix my problem. It did it a way but still the stream ended up after reconnects but now it does not even show the traffic on the wan-side of my openwrt router. I dumped the pure wan-traffic from the fritzbox an see that the traffic is somehow filtered:

Frame 29: 78 bytes on wire (624 bits), 78 bytes captured (624 bits)
Ethernet II, Src: HuaweiTe_83:bb:b3 (54:...), Dst: AVMAudio_1b:8f:86 (e0:...)
PPP-over-Ethernet Session
Point-to-Point Protocol
Internet Protocol Version 4, Src: 87.138.xx.xx, Dst: 83.135.xx.xx
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
    Total Length: 56
    Identification: 0x4182 (16770)
    Flags: 0x0000
    Fragment offset: 0
    Time to live: 58
    Protocol: ICMP (1)
    Header checksum: 0xb042 [validation disabled]
    [Header checksum status: Unverified]
    Source: 87.138.xx.xx
    Destination: 83.135.xx.xx
Internet Control Message Protocol
    Type: 3 (Destination unreachable)
    Code: 13 (Communication administratively filtered)
    Checksum: 0x8df3 [correct]
    [Checksum Status: Good]
    Unused: 00000000
    Internet Protocol Version 4, Src: 83.135.xx.xx, Dst: 87.138.xx.xx
        0100 .... = Version: 4
        .... 0101 = Header Length: 20 bytes (5)
        Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        Total Length: 120
        Identification: 0x4182 (16770)
        Flags: 0x0000
        Fragment offset: 0
        Time to live: 55
        Protocol: UDP (17)
        Header checksum: 0xb2f2 [validation disabled]
        [Header checksum status: Unverified]
        Source: 83.135.xx.xx
        Destination: 87.138.xx.xx
    User Datagram Protocol, Src Port: 62028, Dst Port: 51820

Any idea why it is seen as ICMP-traffic and why ist is "Communication administratively filtered"?

Best regards

Tobias

toxic-tonic · March 4, 2020, 11:41am

In addition, hier the (failing) session initialisation:

No.	Time	Source	Destination	Protocol	Length	Info
22	3.939998	87.138.xx.xx	172.16.x.x	WireGuard	190	Handshake Initiation, sender=0x96E2105E
23	3.942495	172.16.x.x	87.138.xx.xx	WireGuard	134	Handshake Response, sender=0x347AD717, receiver=0x96E2105E
24	3.961361	87.138.xx.xx	172.16.x.x	ICMP		70	Destination unreachable (Communication administratively filtered)