Weird connectivity issue:

Hi guys.

I have tplink 3600 router hostname a81m4 powered by openwrt 19.07.01

router IPs:
br-lan - 192.168.173.1/28
br-voice - 192.168.173.33/29
eth.0 (WAN) - 192.204.x.x/32

it's connected via l2l vpn using strongswan to remote subnet 192.168.172.0/28 tunnel works:

root@a81m4:/home/sam# ipsec status
Security Associations (1 up, 0 connecting):
gate.st1[1]: ESTABLISHED 2 hours ago, 192.204.x.x[192.204.x.x]...107.y.y.y[107.y.y.y]
net-192.168.173-32{5}:  INSTALLED, TUNNEL, reqid 2, ESP SPIs: c4d4dd34_i 5d5e392f_o
net-192.168.173-32{5}:   192.168.173.32/29 === 192.168.172.0/28
gate.st1{6}:  INSTALLED, TUNNEL, reqid 3, ESP SPIs: c4eba6b5_i 6e3ea942_o
gate.st1{6}:   192.168.173.0/28 === 192.168.172.0/28
root@a81m4:/home/sam#

there is two linux computers in remote subnet 192.168.172.2 and 192.168.172.4

I can ping both 192.168.172.2 and 192.168.172.4 from router 192.168.173.1 (and computers behing this) ok.

root@a81m4:/home/sam# ping -c 1 192.168.172.2
PING 192.168.172.2 (192.168.172.2): 56 data bytes
64 bytes from 192.168.172.2: seq=0 ttl=63 time=21.183 ms

--- 192.168.172.2 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 21.183/21.183/21.183 ms
root@a81m4:/home/sam# ping -c 1 192.168.172.4
PING 192.168.172.4 (192.168.172.4): 56 data bytes
64 bytes from 192.168.172.4: seq=0 ttl=63 time=18.069 ms

--- 192.168.172.4 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 18.069/18.069/18.069 ms
root@a81m4:/home/sam#

moreover I can open ssh to 192.168.172.4 from 192.168.173.1 (for some reason source ip being show as 192.168.173.33)

root@a81m4:/home/sam# ssh sam@192.168.172.4
sam@192.168.172.4's password:
Linux 2.6.35.4.
sam@sw122:~$

so tunnel works correctly

I can open ssh from 192.168.172.4 to 192.168.172.2 (of course they in same subnet)

sam@sw122:~$ ssh sam@192.168.172.2
sam@192.168.172.2's password:
Last login: Wed Feb 12 11:49:32 2020 from 192.168.172.4
Linux 4.19.34.

Sometimes a cigar is just a cigar.
                -- Sigmund Freud


In the beginning there was nothing.  And the Lord said "Let There Be Light!"
And still there was nothing, but at least now you could see it.

sam@st10:~$

but I CAN NOT open ssh from 192.168.173.1(.33 and computers behind a81m4 router) to 192.168.172.2

root@a81m4:/home/sam# ssh sam@192.168.172.2

ssh: Connection to sam@192.168.172.2:22 exited: Connect failed: Operation timed out
root@a81m4:/home/sam#

ssh permitted from 192.168.173.1(33) to 192.168.172.2:

[root@st10 sam]# iptables -vnL | grep .168.173
    0     0 ACCEPT     tcp  --  bond0  *       192.168.173.1        0.0.0.0/0            state NEW tcp dpt:179
    0     0 ACCEPT     tcp  --  bond0  *       192.168.173.1        0.0.0.0/0            state NEW tcp dpt:22
    0     0 ACCEPT     tcp  --  bond0  *       192.168.173.33       0.0.0.0/0            state NEW tcp dpt:22
[root@st10 sam]#

but there is no hit counts. VPN tunnel is wide open between subnets 192.168.173 to 192.168.172 (I did not setup VPN inline filtering yet).

during I was trying to open ssh to 192.168.172.2 tcpdump @ 192.168.173.1(32) shows following:

    192.168.173.33.36336 > 192.168.172.2.22: Flags [S], cksum 0x593c (correct), seq 846566414, win 29200, options [mss 1320,sackOK,TS val 1238172790 ecr 0,nop,wscale 4], length 0
    192.168.172.2.22 > 192.168.173.33.36336: Flags [S.], cksum 0xda9b (incorrect -> 0xfaba), seq 2124895372, ack 846566415, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0

could you guys help me with this incorrect checksumm issue?
I could assume it's because different MSS but why ssh from 192.168.173.1 to 192.168.172.4 works within same VPN tunnel but .2 is not.....

Thank you.

That's because it was used as the source IP. To specify another SRC IP, use:

ssh sam@192.168.172.4 -b 192.168.173.1

So you're receiving reply traffic - despite the firewall showing 0 hits?

:thinking:

yes. While binding source to 192.168.173.1 ssh to 192.168.172.4 still works but ssh to 192.168.172.2 doesn't:

root@a81m4:/home/sam#
root@a81m4:/home/sam# ssh sam@192.168.172.4 -b 192.168.173.1
sam@192.168.172.4's password:
Linux 2.6.35.4.
sam@sw122:~$ exit
logout
root@a81m4:/home/sam#
root@a81m4:/home/sam#
root@a81m4:/home/sam#
root@a81m4:/home/sam#
root@a81m4:/home/sam#
root@a81m4:/home/sam#
root@a81m4:/home/sam# ssh sam@192.168.172.2 -b 192.168.173.1

-= freezing for couple minutes here =-

ssh: Connection to sam@192.168.172.2:22 exited: Connect failed: Operation timed out
root@a81m4:/home/sam#

even I temporary clear all iptables chains @ 192.168.172.2 ssh from 192.168.173.1 (.33) is not going thru.

p.s. you wrote "That's because it was used as the source IP." but why system desided to use 173.33 (this is SVI for VLAN 13) but not 173.1 (SVI for vlan 10) as source? is this because of vlan ID? VID13 > VID10 so VID13 is source?

Any ideas?

Thank you.

Then you should troubleshoot the 192.168.172.0/28 router/firewall at the far end.

from remote router (who is holding l2l VPN to openwrt 19.07.1 router) I can ping all hosts in vlan10

gate#p 192.168.172.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.172.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms
gate#p 192.168.172.4
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.172.4, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms
gate#

also I can ping a81m4 vlan10 SVI 192.168.173.1

gate#p 192.168.173.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.173.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms
gate#

that VPN wide open for a81m4 yet.
Also I see packet counter increase.

gate#sh cry ips sa pe 192.204.x.x | i local|remote|caps
    Crypto map tag: vpn-cryptomap, local addr 107.y.y.y
   local  ident (addr/mask/prot/port): (192.168.172.0/255.255.255.240/0/0)
   remote ident (addr/mask/prot/port): (192.168.173.0/255.255.255.240/0/0)
    #pkts encaps: 10101, #pkts encrypt: 10101, #pkts digest: 10101
    #pkts decaps: 9038, #pkts decrypt: 9038, #pkts verify: 9038
     local crypto endpt.: 107.y.y.y, remote crypto endpt.: 192.204.x.x
   local  ident (addr/mask/prot/port): (192.168.172.0/255.255.255.240/0/0)
   remote ident (addr/mask/prot/port): (192.168.173.32/255.255.255.248/0/0)
    #pkts encaps: 9855, #pkts encrypt: 9855, #pkts digest: 9855
    #pkts decaps: 9961, #pkts decrypt: 9961, #pkts verify: 9961
     local crypto endpt.: 107.y.y.y, remote crypto endpt.: 192.204.x.x
gate#

and not able to open hit 192.168.173.1 by ssh from remote side router (subnet 192.168.172.0)

gate#sh ip int brie | i 172.1
Vlan10                     192.168.172.1   YES NVRAM  up                    up
gate#
gate#telnet 192.168.173.1 22 /source-interface vlan10
Trying 192.168.173.1, 22 ...
% Connection timed out; remote host not responding

gate#

it's a king of magic.....
Any other ideas????

Thank you.

update:

even synchronizing MSS from both sides like:

192.168.173.1
+++++++

root@a81m4:/home/sam# ping 192.168.172.2 -s 1200 -c 2
PING 192.168.172.2 (192.168.172.2): 1200 data bytes
1208 bytes from 192.168.172.2: seq=0 ttl=63 time=42.456 ms
1208 bytes from 192.168.172.2: seq=1 ttl=63 time=72.053 ms

--- 192.168.172.2 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 42.456/57.254/72.053 ms
root@a81m4:/home/sam#
root@a81m4:/home/sam#iptables -t mangle -A POSTROUTING -p tcp --tcp-flags SYN,RST SYN -d 192.168.172.2 -j TCPMSS --set-mss 1200
root@a81m4:/home/sam#

192.168.172.1
+++++++

[root@st1 sam]#iptables -t mangle -A POSTROUTING -p tcp --tcp-flags SYN,RST SYN -d 192.168.173.1 -j TCPMSS --set-mss 1200
[root@st1 sam]#

i see packets are hitting POSTROUTING chain

[root@st1 sam]# iptables -t mangle -vnL | fgrep "POSTROUTING\|MSS"
Chain POSTROUTING (policy ACCEPT 186K packets, 108M bytes)
 pkts bytes target     prot opt in     out     source               destination
   39  2028 TCPMSS     tcp  --  *      *       0.0.0.0/0            192.168.173.1        tcp flags:0x06/0x02 TCPMSS set 1200
[root@st1 sam]#

and tcpdump is not showing "incorrect" error anymore and MSS same from both sides:

16:53:08.971763 IP 192.168.173.1.39163 > 192.168.172.2.22: Flags [S], seq 528595826, win 29200, options [mss 1200,sackOK,TS val 974265337 ecr 0,nop,wscale 4], length 0
16:53:08.971796 IP 192.168.172.2.22 > 192.168.173.1.39163: Flags [S.], seq 3347028345, ack 528595827, win 29200, options [mss 1200,nop,nop,sackOK,nop,wscale 9], length 0

but ssh-ing from 173.1 to 172.2 still falling:

root@a81m4:/home/sam# ssh  sam@192.168.172.2 -b 192.168.173.1
packet_write_wait: Connection to 192.168.173.1 port 22: Broken pipe
[sam@sidko ~]$

What's going on here?

A few suggestions:

  • Use a computer other than the gateway as the SSH client.
  • Run tcpdump on the SSH client and server machines.
  • Try to capture also TCP retransmissions or the connection teardown, and any related ICMP messages.
  • Determine the first packet missing from the TCP connection, and where it was lost (on a gateway / on an end system).
  • Use ipsec statusall to see the xfrm counters.