I currently have 1x 2.4ghz device that gets disconnected unexpectedly once every 5-7 days. The device then pretends to be still connected with zero wifi strength signal, but no more IP connection is possible. Basically wifi on the device is frozen.
To get this client device back alive on the wifi, I have to remove the wifi config on the device and have to re-enroll the device again on my wifi. Then another 5-7 day cycle repeats on and on.
All other client devices have no issue.
It took me several weeks to identify that most likely the option: option disassoc_low_ack '1'
is responsible for the unwanted disconnect.
I have now set this to 0, and for the first time ever, the strange disconnect no longer happens for 2 weeks now.
This option seems to be present for quite a while now in OpenWRT versions, in all versions being default-on.
I understand that this option might be helpful in professional environments, to kick lingering devices that just seem to block valuable AP ressources.
But can someone tell me, what kind of advantage this option brings, when it is default-enabled on a typical SOHO OpenWRT router at home?
I think that it can be a good idea to open a PR on github or on mailing list to change this default, they either accept it or reject it and tell you a better answer than a couple lines from the old commit.
Technically, this is a client bug. Of course not having control of that it may be possible to work around.
Many battery powered clients simply shut down their wifi radio when they don't need the network. They may not even transmit a deassociate request first, since to them that is a waste of battery power.
Yep, sounds like you've found one of the big causes. From my hanging around the Bufferbloat developer community (as a spectator, not a developer!) and other places, I've heard about that for years. Many devces have poor powersave handling that causes a lot of delay going in and out of it. Don't know if that's getting better over the years or not, but one can turn it off, most of the time.
This does sound like bad design in the station code, not handling a disassociation well. I guess there's always checking for a updated driver that (hopefully) addressed it...
Hmmm... I don't know that much about the mechanics of disassociating. Might it be that the client dissassociated, vs the router, and both are reported as "due to inactivity"?
I also am unfamiliar with MLME. I have never seen it with my log entries of this kind. Do you get something different when you have the low ack dissassociate enabled?
There is both disassociate on low activity and disassociate on low acknowledgement.
# Station inactivity limit
#
# If a station does not send anything in ap_max_inactivity seconds, an
# empty data frame is sent to it in order to verify whether it is
# still in range. If this frame is not ACKed, the station will be
# disassociated and then deauthenticated. This feature is used to
# clear station table of old entries when the STAs move out of the
# range.
#
# The station can associate again with the AP if it is still in range;
# this inactivity poll is just used as a nicer way of verifying
# inactivity; i.e., client will not report broken connection because
# disassociation frame is not sent immediately without first polling
# the STA with a data frame.
# default: 300 (i.e., 5 minutes)
#ap_max_inactivity=300
#
# The inactivity polling can be disabled to disconnect stations based on
# inactivity timeout so that idle stations are more likely to be disconnected
# even if they are still in range of the AP. This can be done by setting
# skip_inactivity_poll to 1 (default 0).
#skip_inactivity_poll=0
# Disassociate stations based on excessive transmission failures or other
# indications of connection loss. This depends on the driver capabilities and
# may not be available with all drivers.
#disassoc_low_ack=1