[SOLVED] SSH over wifi stops working on RT3200/E8450 with 22.03.0-rc6

Are you experiencing the issue via WiFi? Because my tests (in which I experienced no instability) were all done in a wired setup.

Yes - I have only tested for these lockups on Wi-Fi. I now have these tests in my queue:

  1. Test on a wired connection
    • tested with Ethernet - ran successfully for > 1 hour
    • tested with wi-fi again, failed after 9 minutes
  2. Re-flash with rc6, but no additional packages only htop and nano
    • reconnected over wi-fi, froze after 10 minutes
    • immediately (within 5 minutes) connected to Ethernet - ssh connection works
  3. Flash with a snapshot that contains the 80211 uninitialized lock fix
    • flashing now...

It'll take a couple days to report back...

1 Like

This sounds suspiciously like the issue I had with -rc4 on an MBL, except I'm going through an external AP (running OpenWrt, too). Perhaps there's a relation?

(JFTR, I have not tried with another rc or snapshot yet.)

How is this going? No complains from the missus here after 2 days, to note.

Larger bug report: Here's the testing I have done (#1 & #2 below reported earlier). See the OP for initial description of the problem, but the short story is that htop freezes after a while while ssh'd in over wi-fi.

  1. Test on a wired connection
    • tested with Ethernet - ran successfully for > 1 hour
    • tested with wi-fi again, failed after 9 minutes
  2. Re-flash with rc6, installing only htop and nano
    • Connected over wi-fi, froze after 10 minutes
    • Immediately connected using Ethernet (no reboot) - ssh connection works
  3. Flash with a snapshot that contains the 80211 uninitialized lock fix
    • Connection over wi-fi failed after ~1h 14 minutes
    • Reconnected over Ethernet (no reboot) - ssh works
    • NB: All the while wi-fi works to connect to the router and browse the internet
    • I then used LuCI to restart the LAN interface. I was able to connect to ssh over wi-fi.

The remainder of the note is my notes as I was doing these steps:

Belkin RT3200

- Initially flashed with dangowrt UBI instructions

On Ethernet: htop - 1d 4:33:50 or so
Still working at 1d 05:36:51 (~1 hour later)

Switch to Wi-Fi: htop 1d 5:36:51
Froze at 1d 05:45:33 (~9 minutes later)

Flash with RC6 again

- Don't keep settings (none of the three checkboxes checked)
- Set password
- Update packages
- install htop
- install nano (not nano-full or nano-plus)
- Configure Wi-Fi - open
- Change LAN subnet to
- Start htop
- Uptime: 00:11:16
- Froze at 00:21:11

Flash with snapshot from 9Aug2022, 18:12 EDT

Powered by LuCI Master (git-22.213.35850-abd9125) / OpenWrt SNAPSHOT r20265-e6e4f97999

- Don't keep settings (none of the three checkboxes checked)
- ssh in
- opkg update; opkg install luci
- (from LuCI...)
- Set password
- Update packages
- install htop
- install nano (not nano-full or nano-plus)
- Change System Name to Belkin-RT3200
- Configure Wi-Fi - open
- Change LAN subnet to
- Start htop
- Uptime: 00:13:10
- Disconnect with htop showing 01:27:21

Restart LAN interface from LuCI

Start htop on wifi at - 08:57:34
Stopped at: 09:39:06 (~12 minutes)
Restarted LAN interface using LuCI - ssh access over Wi-Fi works again

Another report: DIR-2660 admin ssh unstable through wifi (22.03 rc6)

I edited the title to contain "wifi".

This is likely something about the wifi breaking connection, forcing the continuous TCP session (?) for SSH to break, which then freezes the SSH terminal.

1 Like

What I find extremely curious is that, while my case looks and feels to be the same, my AP and the 22.03 device are separate devices on the same network, and the wifi connection was established through a 21.02 device. My 22.03-rc4 device that showed the issue does not even have wifi.

This means that it's, somehow, not related to wifi on the device itself, but to a SSH connection that at some point went through an (OpenWrt) wifi.

Honestly, I don't know what to make of it, and how this is possible. But here's hoping this will help to narrow down the problem.


Thanks for fixing the title of the post.

Is it worth going back to 21.02.3 to test if it fails there?

More about the test case. All the Wi-Fi tests have been done with my laptop on my dining room table (much to the chagrin of my wife :slight_smile: that's about 25 feet from the router. (So it's not likely to be a "weak Wi-Fi signal causing disconnects.)

Maybe. However, I have several devices running 21.02.3, most notably the AP (MT7621AT) and my main router (X86-64), none of which exhibit the problem. I also think if it were a problem with 21.02, we would have heard more reports by now.

I will set up some other device on a different target with a 22.03-rc and put it on the network to see if the problem is reproducible. Let's see what I have lying around

Edit: an R6220, that will do.

Edit 2: 30 minutes in and SSH is still up and responsive. So that's a "fail", the (or at least: my) problem doesn't seem to be universal to the 22.03-rcs. Next up I will test again with the MBL that gave me the issue first and try to reproduce it.

Edit 3: "success" I guess? 22.03.0-rc6 on the MBL, as before, fresh install from official download, disabled firewall (no dnsmasq or odhcpd present), no other software installed. SSH through wired clients on the network works fine. SSH through wifi (again, external AP!) prompts an established connection syslog entry after a loooong wait, never makes it to the login prompt before timing out.

Edit 4: The fact that LuCI works fine even when SSH fails makes me think it is some kind of dropbear issue.

Edit 5: Well, that is getting weirder and weirder. After some 10 minutes, somehow it ... "recovered"? SSH is now again possible as if there was never any issue. After googling the issue with dropbear, some (older) posts suggest it is a 5GHz issue, so I tried 2.4GHz in between, with no different outcome, SSH was still timing out. But then all of a sudden ... it recovered.

Edit 6: ... aaaand it stopped responding to SSH again, some 5 to 10 minutes later.

This is somehow the behaviour we're seeing. SSH tunnel not opening and timing out, wait 10 min. and it works again -just because-, SSH tunnel becomes non-responsive and connection drops, no way to reconnect because of more time outs. Wait for a few minutes, or reboot the computer, or reboot the access point, and lo and behold we are back in business.

Hey, on a positive none, seems to me that there is some consistency.

Could this be a sign of insufficient entropy? Can you reproduce the hang with haveged or urngd running in the background?

Not any more.
Kernel was changed a while ago regarding entropy collection

1 Like

Always at 256, no matter what. Which is more than enough.

I know. In fact, I was CC'ed on that change. But it works only if the device has a working high-resolution clock - and I don't know if it exists on routers. That's why the double-check.

Someone CMIIW, but if it was an entropy issue, it would hit all clients, wired or wireless. Which it does not, it only affects clients coming through wifi, and most confusingly, even if that wifi connection is external to the device.

So what makes packets that at some point passed through a wifi connection so particular that they trigger this bug?

This and point about entropy above seems interesting.

In my case SSH over WDS on 2 4 GHz has become very slow. Like it will timeout and stuff. But LuCi and data transfer fine. No issue on 5Ghz.

Isn't there something about entropy being WiFi generated or something like that?

JFTR: I had an extended chat with @jow on #openwrt-devel about the issue. The current working theory is that the packets get mangled somehow on their way through the MT76 wifi, possibly as a result of some flow optimization, and dropbear on 22.03 seems to be sensitive to that.

It may be related to this issue which has potentially been fixed in this commit. That may explain my issues (my connection goes through an 21.02 MT76 OpenWrt AP before it passes on to the OpenWrt 22.03-rc). However, that fix has been backported to rc5 and as such it should have fixed the issue if the MT76 wifi is on the same 22.03-rc6 device that runs dropbear.

Another theory is that it might be related to packet sizes.

The next step is to tcpdump capture the stalling/failing SSH connection attempt both from the client and the server.

(And of course now that I'm watching intently, for some inexplicable reason my SSH connection works just fine, possibly because while testing I restarted the LAN on the AP. I will make another attempt to reproduce the issue tomorrow after the AP had a bit of a workout.)

Do you think my issue with seeing very slow ssh that results in timeouts connecting only over 2.4 is related? I recently set up three RT3200's for a neighbour and saw exactly the same thing happen there.

As @richb-hanover-priv linked above I experienced the same problem with unstable connection to ssh (and some proprietary closed source management software). But after restaring the lan interface it seems stable 12h after the interface restart.