[SOLVED] SSH over wifi stops working on RT3200/E8450 with 22.03.0-rc6

I installed OpenWrt 22.03.0-rc6 on a Belkin RT3200 router. The Wi-Fi and traffic all seem fine, and I can SSH in. But...

Shortly after a reboot, the SSH sessions freeze. At that point, I cannot establish any new SSH logins. This occurs within a few minutes of a reboot, or perhaps as long as 20 minutes. This is repeatable. (This also happened with -rc4 - I updated to -rc6 before reporting the problem.)

A reboot clears up the SSH problem - I can log in as expected (for a while). In the meantime, even when the SSH process is frozen, the LuCI GUI works as expected, and the router passes traffic normally.

Update: My current test is to ssh into the router and run htop. The Uptime: value shows how long the SSH session runs before freezing...

What other troubleshooting information could I provide? Thanks.

1 Like

Okay, finally! So, I'm not having issues to establish sessions with the router, however, we are experiencing many issues with my SSH sessions to my rsync.net backup server and my partner cannot utilise SecureLink to tunnel remote Citrix session for work.
We had to revert back to 22.02.1 during our workdays. Initially I thought it was just SecureLink's version, but nope, so I'm glad to read that someone else is facing the same issues.

Question: Is your issue only connecting to your Belking RT3200 or with remote sessions too?

At this point, I'm only seeing problems with a loss of SSH access, so I cannot speak to your rsync.net / SecureLink question.

1 Like

Interestingly, I am running two rt3200 at two different locations on RC5, and I don't have this issue. Is RC5 fine for you as well?

Update:

  1. It happens with -rc6 and -rc4 - I did not try -rc5
  2. This seems to happen only when I'm ssh'd in. More evidence:

I rebooted the router and ssh'd in briefly when the router started up to ensure it worked. I then logged out and left the router alone, using the wifi for light work on the internet. The router ran without incident overnight.

This morning, I ssh'd in and left htop running. Uptime was 22:50.

I checked back a few minutes ago, and htop had frozen with uptime of 23:28 - about 38 minutes after the start of the SSH session. At that time, I was not able to re-establish another new SSH session from my laptop.

A reboot via LuCI web interface restored SSH access.

So my current test is to SSH in, and let htop run 'til it fails. What do you see if you try this? Thanks.

Is there anything special in your build, firewall or such ?

I am just wondering about any possible peculiarities, as others have not reported anything similar to my knowledge. If there would be a widely spread bug since 22.03.0-rc4, I would expect more people to have got hit by now.

I tried SSHing into my own RT3200 build after 5 days of uptime, and it worked ok.
I rebooted it, SSHed in and started htop. Let's see what happens.

Gonna try now. HTOP is now open & running:

image

I have had an SSH session with htop running open for 49 minutes with problems, so far...
(wired connection from PC)

I need to shutdown my PC now, but it hasn't locked up. The difference in uptime of this screenshot vs the first shows it's been running for over 2 hours, so I am not experiencing any stability issues with SSH.

Is SSH inaccessible or just really slow? For me with my rt3200's over WDS just went really slow.

No - nothing special installed. I installed RC6 and then added the packages below.

bash
htop
iputils-ping
luci-app-sqm
luci-app-wireguard
nano

NB: CAKE-autorate service is disabled and not started.

For more info, look at https://pastebin.com/CqjLZ8C7 - the output of OpenWrtScripts getstats.sh: https://github.com/richb-hanover/OpenWrtScripts/blob/main/getstats.sh

It's inaccessible. I walk away from htop and a while later it's frozen. Attempts to begin a new SSH session time out after 30-60 seconds.

I would like to share that our SecureLink (ssh tunnels) issues have disappeared with this commit:

Why? I don't properly understand it yet, but give it a go.

1 Like

Are you experiencing the issue via WiFi? Because my tests (in which I experienced no instability) were all done in a wired setup.

Yes - I have only tested for these lockups on Wi-Fi. I now have these tests in my queue:

  1. Test on a wired connection
    • tested with Ethernet - ran successfully for > 1 hour
    • tested with wi-fi again, failed after 9 minutes
  2. Re-flash with rc6, but no additional packages only htop and nano
    • reconnected over wi-fi, froze after 10 minutes
    • immediately (within 5 minutes) connected to Ethernet - ssh connection works
  3. Flash with a snapshot that contains the 80211 uninitialized lock fix
    • flashing now...

It'll take a couple days to report back...

1 Like

This sounds suspiciously like the issue I had with -rc4 on an MBL, except I'm going through an external AP (running OpenWrt, too). Perhaps there's a relation?

(JFTR, I have not tried with another rc or snapshot yet.)

How is this going? No complains from the missus here after 2 days, to note.

Larger bug report: Here's the testing I have done (#1 & #2 below reported earlier). See the OP for initial description of the problem, but the short story is that htop freezes after a while while ssh'd in over wi-fi.

  1. Test on a wired connection
    • tested with Ethernet - ran successfully for > 1 hour
    • tested with wi-fi again, failed after 9 minutes
  2. Re-flash with rc6, installing only htop and nano
    • Connected over wi-fi, froze after 10 minutes
    • Immediately connected using Ethernet (no reboot) - ssh connection works
  3. Flash with a snapshot that contains the 80211 uninitialized lock fix
    • Connection over wi-fi failed after ~1h 14 minutes
    • Reconnected over Ethernet (no reboot) - ssh works
    • NB: All the while wi-fi works to connect to the router and browse the internet
    • I then used LuCI to restart the LAN interface. I was able to connect to ssh over wi-fi.

The remainder of the note is my notes as I was doing these steps:

Belkin RT3200

- Initially flashed with dangowrt UBI instructions

On Ethernet: htop - 1d 4:33:50 or so
Still working at 1d 05:36:51 (~1 hour later)

Switch to Wi-Fi: htop 1d 5:36:51
Froze at 1d 05:45:33 (~9 minutes later)

==================
Flash with RC6 again

- Don't keep settings (none of the three checkboxes checked)
- Set password
- Update packages
- install htop
- install nano (not nano-full or nano-plus)
- Configure Wi-Fi - open
- Change LAN subnet to 192.168.249.1/24
- Start htop
- Uptime: 00:11:16
- Froze at 00:21:11

========================
Flash with snapshot from 9Aug2022, 18:12 EDT

Powered by LuCI Master (git-22.213.35850-abd9125) / OpenWrt SNAPSHOT r20265-e6e4f97999

- Don't keep settings (none of the three checkboxes checked)
- ssh in
- opkg update; opkg install luci
- (from LuCI...)
- Set password
- Update packages
- install htop
- install nano (not nano-full or nano-plus)
- Change System Name to Belkin-RT3200
- Configure Wi-Fi - open
- Change LAN subnet to 192.168.249.1/24
- Start htop
- Uptime: 00:13:10
- Disconnect with htop showing 01:27:21

Restart LAN interface from LuCI

Start htop on wifi at - 08:57:34
Stopped at: 09:39:06 (~12 minutes)
Restarted LAN interface using LuCI - ssh access over Wi-Fi works again

Another report: DIR-2660 admin ssh unstable through wifi (22.03 rc6)

I edited the title to contain "wifi".

This is likely something about the wifi breaking connection, forcing the continuous TCP session (?) for SSH to break, which then freezes the SSH terminal.

1 Like