SSH connection loss during DHCP renewal/VM unfreeze? (19.07.0-rc2)

I just installed 19.07.0-rc2 and I noticed my PuTTY SSH session to the router was abrupty ended after several hours. Typically PuTTY sends a keep alive every 5 seconds, and it did do that but for some reason my Windows 7 computer wouldn't send the ACK back after the ping-pong, and from then on wouldn't respond ACK any retransmissions of the server's (OpenWRT) pong. Yet I could easily reconnect in a new session.

I was capturing in Wireshark at the time (a coincidence) and I could see at the time the first session died my computer was in the middle of negotiating DHCP with the router. I've attached a screenshot, and I wonder if anyone has network experience and can tell me if this an OpenWRT issue or a PuTTY issue or a Windows 7 issue.

In particular I am interested why OpenWRT sends a ping immediately after the discover and then waits several seconds to send the offer, and whether that's a bug and could have something to do with it. One theory I have is the DHCP expired and Windows couldn't get a renewal in time, but I don't know why it would wait until the last 5 seconds to get a renewal, or why Windows wouldn't then ACK the retransmission of the first session after the renewal suceeded.

Well, how else would it check to see if that IP is in use?

I don't know the answer to that question, I'm surprised it even did that. It just happened again this evening, SSH connection died during DHCP renewal. I've now turned on Microsoft-Windows-Dhcp-Client/Operational logging in the event viewer, maybe that will give me some more information.

I'll have to observe my network; but I advise using:

-o ServerAliveInterval=x

in your ssh command - where x == number of seconds to send the keep alive. This solved my disconnect issue with ssh. I use 60.

The weird thing here is that your PC has an IP, yet it sends a Discover packet.
Normally, when a client has an IP, it should start sending Requests when half lease time is over, in order to keep the leased address and not have to start the whole DHCP packet exchange from the beginning.

This is how a DHCP should work. https://packetlife.net/media/captures/DHCP.cap

2 Likes

Thanks but it seems PuTTY already does that automatically like every 5 seconds.

That capture looks a lot different from mine. It appears in my case then it is not actually a DHCP renewal that is happening but instead it's like a new DHCP request? Maybe that's why Windows doesn't allow existing connections to resume.

Possibly this is a VM issue, Windows 7 is a VM guest and when the host hibernates the guest is not modified (remains running but since the host is hibernating it effectively hibernates with it). Then when I resume the host the guest is of course resumed with it and VMware tools updates the time in the guest. Perhaps there is some window in lease, renewal or rebinding that is never hit because the time is changed in the guest. Since I only SSH into the router from the guest after resuming from hibernate on the host I hadn't considered this but I suppose it's possible. The renewal is missed and then Windows realizes it's 12 hours later and starts a new DHCP request. OpenWRT sees it's a new request so it pings the existing .124, and when there is no reply it gives Windows that address.

If that's the case an interesting question would be why the DHCP server can't match the existing lease even though the same MAC and hostname and all other fields the same in the new DHCP request. It seems I get the requested same IP simply because the OS doesn't respond to ICMP ping.

Where they do that!?!?

Your description of the state and condition of the Windows VM is quite false; and greatly lends to your misconception of this so-called issue. You should have originally noted that you do this to the VM guest. This is more than likely your problem. I've never heard of safely (for guests) hibernating a VM Host!

I would have never been able to replicate your issue - because I would have never thought to do such a thing!

To be clear, I do not suggest putting your VM into such a state...I honestly think you're trying to "pull our leg" with this issue.

You sure?

(To be clear, just because time is updated doesn't mean that Windows handles this state change gracefully.)

Ummmmm YES!

Pause...you resume a machine from a freeze and expect the SSH connection to have remained up?

How do you expect the OpenWrt end to maintain that connection state?

:confused:

WHAT!?!?

By your own admission, there is no "existing lease":

...you already understand why this is happening, so I'm unsure why you're making a post about your perceived "issue"...?

I edited the title to better reflect your problem.

2 Likes

The issue is as I described it. I work with many VMs and hibnerate the host daily. The guests do not hibernate or go to sleep, but because they are VMs they're effectively hibnerated since the host is hibernated. On resume VMware tools syncs the host time to the guests.

As I said I make an SSH connection to the router from the guest after resuming, and at some point later it drops. The title update you made is wrong and is not the issue. I have no way to change it back.

Please spare us all any more incredulous replies.

1 Like

Just happened again and since I have logging enabled now I can see the DHCP lease expires. The address is unplumbed (removed) and then several seconds later plumbed (added).

12/23/2019 12:42:05 AM: Lease is expired in the adapter 19. Expired address is 192.168.1.124
12/23/2019 12:42:06 AM: Discover-Offer-Request-Ack is initiated on the adapter with Interface Id 19
12/23/2019 12:42:06 AM: Discover is sent from the adapter 19.Status code is 0x0
12/23/2019 12:42:10 AM: Request is sent from the adapter 19. Status code is 0x0
12/23/2019 12:42:10 AM: Offer is accepted in the adapter 19.Offered Address is 192.168.1.124.Server address is 192.168.1.1
12/23/2019 12:42:10 AM: Address 192.168.1.124 is unplumbed from  the adapter 19. Status code is 0x0
12/23/2019 12:42:10 AM: Routes are updated in the adapter 19. Status Code is 0x0
12/23/2019 12:42:10 AM: PERFTRACK (DORA): Offer is accepted in the adapter 19.Offered Address is 192.168.1.124.Server address is 192.168.1.1
12/23/2019 12:42:10 AM: Ack is accepted in the adapter 19. Received Address is 192.168.1.124.Server address is 192.168.1.1
12/23/2019 12:42:10 AM: PERFTRACK (DORA): Offer is accepted in the adapter 19.Offered Address is 192.168.1.124.Server address is 192.168.1.1
12/23/2019 12:42:10 AM: Dhcp has notified NLA for the configuration changes for the interface 19
12/23/2019 12:42:14 AM: Your computer was successfully assigned an address from the network, and it can now connect to other computers.
12/23/2019 12:42:14 AM: Dns registration has happened for the adapter 19. Status Code is 0x0. DNS Flag settings is 32.
12/23/2019 12:42:14 AM: Address 192.168.1.124 is plumbed to the adapter 19. Status code is 0x0

The event logs show the expired lease was established on 12/22/2019 12:36:51 AM so approximately 24 hours ago. There's no indication of any attempted renewal, which it had plenty of time to do. This is either a Windows issue or a VM issue but it doesn't appear to be an OpenWRT issue.

  • Can you honestly explain what do you think the guests are magically doing when the host is hibernating???
  • Can you describe any physical state (on the hardware level) of a machine that's identical (i.e. the computer is magically suspended in time, power, etc.; and resumes later)? Please don't say hibernation, as you didn't hibernate the machine (the assumption is the Microsoft properly built that feature and it should be used).

So this is really seeming like some joke.

You didn't believe the 12 hour config and new request, there's something you're tying to show about DHCP; or you wanted to tell us again/verify?

If DHCP hasn't taken place yet, your clarification about before/after doesn't matter. Which means you still needed to state an issue.

At least you've realized that. If you don't wish to believe why, that's OK.

  • I have will edit the title to reflect your original intention and note the the community that you are doing something uncommon (I get the impression you don't think the unfreeze even matters)
  • Every web search I find for "Hibernate VM Host" yields advice on how to pause or shutdown Guest VMs before the machine hibernates - I even found pages with issues about Windows guests!
  • I find no web results for "Hibernate VMWare Host" - I was also unaware there was a feature
  • I realized you never mentioned what OS is running on the Host
  • Since this is not an OpenWrt issue, I'll ask the mode to close the thread

(I'm totally lost at the posters who potray an attitude of: "I was given an answer; but I rather not believe it and pretend no answer was given at all.")

Here's one link to script hibernation of Guests on VMWare (if you decide it should be done): https://pclinuxoshelp.com/index.php/Hibernate_host_OS_with_running_VirtualBox


If you are solely talking about the SSH connection drop, you do realize there's also a TCP connection timeout on both ends, correct?

So if you want to beleive it's a Windows issue, I would say it's likely the TCP ungracefully times out when the time is fixed (I think "updated" is an improper term, as the feature is likely there to fix guest time drift, not for what you're doing).

And you should be able to edit your own titles.

My comments speak for themselves and should be clear to any native English speaker. If I knew the issue wasn't related to OpenWRT I wouldn't have asked here.

I think ssh session just timeouted on openwrt site? putty was freezed, so obviously there was no keepalive packet replyed.

1 Like

I'm not insulted or forget my hood or state/area (i.e. that I'm a native speaker of American English - odd, I had to tell someone that in another thread, LOL) just because you forgot what you typed.

:laughing:

I tried to offer another logical explanation that indeed followed what you stated.

Another idea I thought of in the interim was: time on the SSH encryption is skewed. Feel free to blame either side of the link. It's similar to what @orangepizza said, but I think it's the fact the time is being changed so abruptly.

There are also many other ungraceful and corrupt (since we don't want to address the freeze) processes I thought of on the Windows VM, especially if it connects to Microsoft services.

2 Likes