[22.03-RC1] Odd port-forwarding source IP problem

I'm currently testing 22.03-rc1 on a nano-pi r4s, and loving it for the most part. However, I've been experiencing a strange intermittent issue with a port-forwarding rule.

I have a port-forwarding rule sending destination tcp/443 received from internet at WAN to a LAN host also on tcp/443, with NAT loopback enabled per default. I am able to test the rule using curl from an external cloud hosted server and tcpdump on the target LAN host and can verify that it works, sometimes.

When it works, I see the expected three-way-handshake and traffic flow at my LAN host using tcpdump:

public source IP --> LAN host 443
LAN host 443 -> public source IP

When it doesn't work, I see one of the following traffic patterns at my LAN host:

OpenWRT WAN (public) IP --> LAN host 443
LAN host 443 --> OpenWRT WAN (public) IP

OR

OpenWRT LAN IP --> LAN host 443
LAN host 443 --> OpenWRT LAN IP

This server isn't terribly busy, so by watching tcpdump live I could easily correlate my external test requests from the cloud host to these specific traffic patterns on the LAN host. When it behaves as expected, the curl command quickly completes and returns me to a prompt. When it misbehaves, the curl command hangs, waiting for a response.

My thoughts on this are that perhaps the fw4/nftables NAT loopback code is malfunctioning in some way and is randomly replacing the external source IP with the router's own WAN/LAN IP, as if it thought it was an internal (LAN) reflection-related flow? At first I thought I had messed up the router config, but that wouldn't explain why it sometimes seems to work normally and sometimes doesn't, as well as the inconsistent behavior of router WAN/LAN IP being observed as the source IP (instead of the actual public cloud host source IP).

Has anyone else observed this or is anyone able to also check and see if this happens to them on 23.02-RC1?

Thanks in advance for any replies!

Can you show the Port forward rule?

Sure thing.

From /etc/config/firewall:

config redirect
	option target 'DNAT'
	option src 'wan'
	option proto 'tcp'
	option src_dport '443'
	option dest_ip '<lan host>'
	option name 'Web Server'
	option dest 'lan'
	option dest_port '443'

From nft list ruleset - chain dstnat_wan:

meta nfproto ipv4 tcp dport 443 counter packets 411 bytes 24408 dnat ip to <lan-host>:443 comment "!fw4: Web Server"

From nft list ruleset - chain dstnat_lan:

ip saddr { <lan subnet> } ip daddr <wan-ip> tcp dport 443 dnat ip to <lan-host>:443 comment "!fw4: Web Server (reflection)"

From nft list ruleset - chain srcnat_lan:

ip saddr { <lan subnet> } ip daddr <lan-host> tcp dport 443 snat ip to <router-lan-ip> comment "!fw4: Web Server (reflection)"

Note I've replaced the actual ip/subnets above with <"-ip">, but you get the idea. These rules are the ones created by adding the port-forward rule in LuCI on 22.03-rc1 and are pretty simple/straightforward.

Thanks!

This problem continues to occur; I am now running my nanopi r4s on 22.03-rc2. I ran tcpdump on my router LAN interface and see some very odd results:

22:17:35.141904 IP 192.168.0.1.56294 > 192.168.0.11.443: Flags [S], seq 3459579803, win 29200, options [mss 1460,sackOK,TS val 812406056 ecr 0,nop,wscale 7], length 0
22:17:35.142456 IP 192.168.0.11.443 > 192.168.0.1.56294: Flags [S.], seq 3863732853, ack 3459579804, win 28960, options [mss 1460,sackOK,TS val 2718536222 ecr 812406056,nop,wscale 7], length 0
22:17:35.142664 IP public.wan.ip.addr.443 > 192.168.0.11.56294: Flags [S.], seq 3863732853, ack 3459579804, win 28960, options [mss 1460,sackOK,TS val 2718536222 ecr 812406056,nop,wscale 7], length 0
22:18:20.102112 IP 192.168.0.11.56294 > public.wan.ip.addr.443: Flags [P.], seq 770:801, ack 4586, win 315, options [nop,nop,TS val 812451017 ecr 2718536273], length 31
22:18:20.102383 IP 192.168.0.1.56294 > 192.168.0.11.443: Flags [P.], seq 770:801, ack 4586, win 315, options [nop,nop,TS val 812451017 ecr 2718536273], length 31
22:18:20.102437 IP 192.168.0.11.56294 > public.wan.ip.addr.443: Flags [F.], seq 801, ack 4586, win 315, options [nop,nop,TS val 812451017 ecr 2718536273], length 0
22:18:20.102495 IP 192.168.0.1.56294 > 192.168.0.11.443: Flags [F.], seq 801, ack 4586, win 315, options [nop,nop,TS val 812451017 ecr 2718536273], length 0
22:18:20.103026 IP 192.168.0.11.443 > 192.168.0.1.56294: Flags [.], ack 801, win 243, options [nop,nop,TS val 2718581182 ecr 812451017], length 0
22:18:20.103215 IP public.wan.ip.addr.443 > 192.168.0.11.56294: Flags [.], ack 801, win 243, options [nop,nop,TS val 2718581182 ecr 812451017], length 0
22:18:20.103026 IP 192.168.0.11.443 > 192.168.0.1.56294: Flags [R.], seq 4586, ack 802, win 243, options [nop,nop,TS val 2718581183 ecr 812451017], length 0
22:18:20.103290 IP public.wan.ip.addr.443 > 192.168.0.11.56294: Flags [R.], seq 4586, ack 802, win 243, options [nop,nop,TS val 2718581183 ecr 812451017], length 0

The actors above are:
192.168.0.1 = Router LAN interface IP
192.168.0.11 = LAN Web Server (with inbound port forward/reflection rules for tcp/443 on router WAN)
public.wan.ip.addr = Router WAN Public IP address

This traffic sequence was started from an internal LAN host (source) to my LAN web server via public IP (dest), and therefore reflection should handle it.

Notice that the initial SYN packet appears correct; the router is SNAT'ing itself in place of my LAN host and DNAT'ing the public WAN IP to the internal web server. All good.

The second packet is a SYN/ACK response from my web server, towards my router LAN IP address. This is also expected and seems fine.

However....

The third packet is something I can't figure out. A packet sourced from my router's public WAN address, on 443, towards my internal LAN web server, except it has reused the same destination port and TCP sequence/ack numbers from second packet, which was the Web Server's SYN/ACK back to the router LAN IP immediately prior?? This makes zero sense to me.

Can anyone else come up with any possible reason why this would occur? You can see the behavior continue several times in the tcpdump snippet above.

I'm at my wit's end trying to figure out why this is happening. Any help is appreciated.

And where is this fourth actor in your tcpdump snippet?

This should be a SYN-ACK message from the router's WAN address to the connection initiator, and it appears that the initiator is the LAN Web server itself.

Here is my tcpdump snippet under the same circumstances (22.03.0-rc1).
Router LAN - 192.168.3.1
Router WAN - router.wan.ip
LAN host (Initiator) - 192.168.3.135
WEB Server - 192.168.3.180

# LAN host -> Router WAN - SYN
10:08:52.072354 IP 192.168.3.135.54110 > router.wan.ip.443: Flags [S], seq 3647667317, win 8192, options [mss 1460,nop,wscale 2,nop,nop,sackOK], length 0
# Router LAN -> Web server - SYN
10:08:52.072845 IP 192.168.3.1.54110 > 192.168.3.180.443: Flags [S], seq 3647667317, win 8192, options [mss 1460,nop,wscale 2,nop,nop,sackOK], length 0
# Web server -> Router LAN - SYN-ACK
10:08:52.073769 IP 192.168.3.180.443 > 192.168.3.1.54110: Flags [S.], seq 3863104422, ack 3647667318, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
# Router WAN -> LAN host - SYN-ACK
10:08:52.074171 IP router.wan.ip.443 > 192.168.3.135.54110: Flags [S.], seq 3863104422, ack 3647667318, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
# LAN host -> Router WAN - ACK
10:08:52.075923 IP 192.168.3.135.54110 > router.wan.ip.443: Flags [.], ack 1, win 16425, length 0
# Router LAN -> Web server - ACK
10:08:52.076123 IP 192.168.3.1.54110 > 192.168.3.180.443: Flags [.], ack 1, win 16425, length 0

Pavelgl,

Thank you for your reply and information. I've completely blown away and rebuilt my web server and router, and now I cannot get the behavior to happen again.

Perhaps there was some corruption or misconfiguration somewhere; I'm still researching and testing. Thank you again for your feedback.