HP Printer no longer connecting after upgrade

Apologies if this is the wrong place to post this.

So this is an odd one. I've been running OpenWrt on a Ubiquti UniFi AC LR router for a few years now, and it has been brilliant. I recently upgraded from an old master build to the latest 21.02 branch and my HP printer was no longer able to establish a connection. It's able to find the network and attach to it -- I can see it sending out DHCP requests and I can see the responses going back -- but the printer clearly doesn't receive them and sends out another DHCP request after a suitable pause. Manually setting an IP address doesn't help.

So I git bisected to find the commit where things stopped working and came up with this one:

commit 37752336bdfb361d597b316cd5bb9d8dc6ac1762
Author: Felix Fietkau <nbd@nbd.name>
Date:   Sat Jan 23 00:17:31 2021 +0100

    mac80211: add significant minstrel_ht performance improvements
    
    Completely redesign the rate sampling approach
    
    Signed-off-by: Felix Fietkau <nbd@nbd.name>

That contains six kernel patches. The first three appear to be fine. The third causes things to break:

349-mac80211-minstrel_ht-significantly-redesign-the-rate.patch

Can anyone suggest how I might further diagnose this issue?

Thanks,
-Ted

1 Like

The author of that commit @nbd might be able to offer you debugging advive.

Thanks @hynman.

For reference, there are over a dozen devices connected to the AP, and only the HP printer is having any issues.

Have you tried updating the printer's firmware? HP does a decent job of providing updates for most of its printers.

Good suggestion, but the printer was already running the latest firmware.

Another data point: this issue (or something that produces the same symptoms) has always been present on the 5 GHz network, but the 2 GHz network was fine. I had previously blamed the 5 GHz network issue on HP not properly supporting that network band, but now I'm wondering if the issue is actually something to do with OpenWrt. I'll try to track down a non OpenWrt access point to test that theory.

Since the change that broke things for me reimplemented way the best rate is found, I thought it might be useful to look at what's in the rc_rates file before and after the change. When things are working, the output looks like this:

              best   ____________rate__________    ____statistics___    _____last____    ______sum-of________
mode guard #  rate  [name   idx airtime  max_tp]  [avg(tp) avg(prob)]  [retry|suc|att]  [#success | #attempts]
CCK    LP  1          1.0M  160   10548     0.0       0.0     100.0       0     0 0            12   16       
CCK    LP  1          2.0M  161    5476     0.0       0.0       0.0       0     0 0             0   0        
CCK    LP  1          5.5M  162    2411     2.4       0.0       0.0       0     0 0             0   0        
CCK    LP  1         11.0M  163    1535     4.8       0.0       0.0       0     0 0             0   0        
CCK    SP  1          2.0M  165    5380     0.0       0.0       0.0       0     0 0             0   0        
CCK    SP  1          5.5M  166    2315     2.4       0.0       0.0       0     0 0             0   0        
CCK    SP  1         11.0M  167    1439     4.8       0.0       0.0       0     0 0             0   0        
HT20  LGI  1         MCS0     0    1477     4.8       0.0       0.0       1     0 0             0   9        
HT20  LGI  1         MCS1     1     738     9.7       0.0       0.0       0     0 0             0   0        
HT20  LGI  1         MCS2     2     492    14.6       0.0       0.0       0     0 0             0   2        
HT20  LGI  1         MCS3     3     369    17.0       0.0       0.0       0     0 0             0   2        
HT20  LGI  1         MCS4     4     246    24.4       0.0       0.0       0     0 0             0   40       
HT20  LGI  1         MCS5     5     185    29.2       0.0       0.0       0     0 0             0   93       
HT20  LGI  1         MCS6     6     164    31.7       0.0       0.0       0     0 0             0   93       
HT20  LGI  1         MCS7     7     148    34.1       0.0       0.0       0     0 0             0   94       
HT20  LGI  2         MCS8    10     738     9.7       9.7     100.0       0     0 0             1   1        
HT20  LGI  2         MCS9    11     369    17.0      17.0     100.0       0     0 0             1   1        
HT20  LGI  2         MCS10   12     246    24.4      24.4     100.0       5     0 0            17   24       
HT20  LGI  2         MCS11   13     185    29.2      29.2      96.7       5     0 0           128   152      
HT20  LGI  2         MCS12   14     123    36.6      36.6     100.0       6     0 0          1120   1238     
HT20  LGI  2         MCS13   15      92    43.9      26.8      54.7       6     0 0          3321   3697     
HT20  LGI  2    C    MCS14   16      82    46.3      39.0      76.1       6     0 0          2513   2827     
HT20  LGI  2     D   MCS15   17      74    48.8      39.0      72.5       6     0 0          4202   4746     
HT20  SGI  1         MCS0    40    1329     4.8       0.0       0.0       0     0 0             0   0        
HT20  SGI  1         MCS1    41     665     9.7       0.0       0.0       0     0 0             0   0        
HT20  SGI  1         MCS2    42     443    14.6       0.0       0.0       0     0 0             0   2        
HT20  SGI  1         MCS3    43     332    19.5       0.0       0.0       0     0 0             0   5        
HT20  SGI  1         MCS4    44     222    26.8       0.0       0.0       0     0 0             0   68       
HT20  SGI  1         MCS5    45     166    31.7       0.0       0.0       0     0 0             0   96       
HT20  SGI  1         MCS6    46     148    34.1       0.0       0.0       0     0 0             0   93       
HT20  SGI  1         MCS7    47     133    36.6       0.0       0.0       0     0 0             0   95       
HT20  SGI  2         MCS8    50     665     9.7       0.0       0.0       0     0 0             0   0        
HT20  SGI  2         MCS9    51     332    19.5      19.5      96.4       0     0 0             3   4        
HT20  SGI  2         MCS10   52     222    26.8      26.8     100.0       5     0 0            28   42       
HT20  SGI  2         MCS11   53     166    31.7      31.7     100.0       5     0 0           395   464      
HT20  SGI  2   B     MCS12   54     111    39.0      39.0     100.0       6     0 0          1596   1767     
HT20  SGI  2  A   P  MCS13   55      83    46.3      46.3     100.0       6     1 1          6736   7252     
HT20  SGI  2         MCS14   56      74    48.8      36.6      67.2       6     0 0          5387   6024     
HT20  SGI  2         MCS15   57      67    51.2      34.1      63.6       6     0 0          4739   5518     

Total packet count::    ideal 28834      lookaround 1354
Average # of aggregated frames per A-MPDU: 1.0

When they're not, it looks like this:

             best    ____________rate__________    ____statistics___    _____last____    ______sum-of________
mode guard #  rate   [name   idx airtime  max_tp]  [avg(tp) avg(prob)]  [retry|suc|att]  [#success | #attempts]
HT20  LGI  1  ABCDPS  MCS0     0    1477     5.6       0.0       0.0       1     0 7             0   1442     
HT20  LGI  1       S  MCS1     1     738    10.6       0.0       0.0       0     0 0             0   0        
HT20  LGI  1       S  MCS2     2     492    14.9       0.0       0.0       0     0 0             0   0        
HT20  LGI  1          MCS3     3     369    18.8       0.0       0.0       0     0 0             0   0        
HT20  LGI  1          MCS4     4     246    25.4       0.0       0.0       0     0 0             0   0        
HT20  LGI  1          MCS5     5     185    30.7       0.0       0.0       0     0 0             0   0        
HT20  LGI  1          MCS6     6     164    33.0       0.0       0.0       0     0 0             0   0        
HT20  LGI  1          MCS7     7     148    35.1       0.0       0.0       0     0 0             0   0        
HT20  LGI  2       S  MCS8    16     738    10.6       0.0       0.0       0     0 0             0   0        
HT20  LGI  2          MCS9    17     369    18.8       0.0       0.0       0     0 0             0   0        
HT20  LGI  2          MCS10   18     246    25.4       0.0       0.0       0     0 0             0   0        
HT20  LGI  2          MCS11   19     185    30.7       0.0       0.0       0     0 0             0   0        
HT20  LGI  2          MCS12   20     123    38.9       0.0       0.0       0     0 0             0   0        
HT20  LGI  2          MCS13   21      92    44.8       0.0       0.0       0     0 0             0   0        
HT20  LGI  2          MCS14   22      82    47.3       0.0       0.0       0     0 0             0   0        
HT20  LGI  2       S  MCS15   23      74    49.4       0.0       0.0       0     0 0             0   0        
HT20  SGI  1       S  MCS0    64    1329     6.2       0.0       0.0       0     0 0             0   0        
HT20  SGI  1          MCS1    65     665    11.6       0.0       0.0       0     0 0             0   0        
HT20  SGI  1          MCS2    66     443    16.3       0.0       0.0       0     0 0             0   0        
HT20  SGI  1       S  MCS3    67     332    20.4       0.0       0.0       0     0 0             0   0        
HT20  SGI  1          MCS4    68     222    27.2       0.0       0.0       0     0 0             0   0        
HT20  SGI  1          MCS5    69     166    32.8       0.0       0.0       0     0 0             0   0        
HT20  SGI  1          MCS6    70     148    35.1       0.0       0.0       0     0 0             0   0        
HT20  SGI  1       S  MCS7    71     133    37.3       0.0       0.0       0     0 0             0   0        
HT20  SGI  2       S  MCS8    80     665    11.6       0.0       0.0       0     0 0             0   0        
HT20  SGI  2          MCS9    81     332    20.4       0.0       0.0       0     0 0             0   0        
HT20  SGI  2          MCS10   82     222    27.2       0.0       0.0       0     0 0             0   0        
HT20  SGI  2       S  MCS11   83     166    32.8       0.0       0.0       0     0 0             0   0        
HT20  SGI  2          MCS12   84     111    41.0       0.0       0.0       0     0 0             0   0        
HT20  SGI  2          MCS13   85      83    47.0       0.0       0.0       0     0 0             0   0        
HT20  SGI  2          MCS14   86      74    49.4       0.0       0.0       0     0 0             0   0        
HT20  SGI  2          MCS15   87      67    51.5       0.0       0.0       0     0 0             0   0        

Total packet count::    ideal 417      lookaround 0
Average # of aggregated frames per A-MPDU: 1.0

I've worked out that the A, B, C and D represent the four best rates, and that 'P' is maximum probability rate. I'm less confident about 'S'. I know that flag appears if minstrel_ht_is_sample_rate finds it in the cur_sample_rates table, so I'm guessing that it indicates that it's a rate being considered by the algorithm. I've noticed the 'S' flags bounce around over time, but the ABCDP flags remain stuck on the first line. I assume that's a bad sign, as is the absence of any successes on any row.

I finally noticed the knobs under /sys/kernel/debug/ieee80211/phy1/rc, including fixed_rate_idx. Forcing that to a known good index was sufficient to allow things to get started. However, even clearing fixed_rate_idx to 4294967295 afterwards didn't cause the algorithm to probe any other rates. Most puzzling.

I've updated to the latest openwrt-21.02 build and used the fixed_rate_idx trick to get things going. I did see the rate shifting around there, though the reported lookaround remained at zero until about 800 packets had been exchanged. After that, I saw some probe packets sent out and the algorithm started sending packets over both long and short GI entries. In other words, it seems to be working as expected.

My hunch is that the first listed rate (MCS0, LGI) not being able to successfully transmit is somehow causing the algorithm to get stuck in a bad state. All of my other 2.4 GHz devices are able to communicate on their first listed rate, and none of them have this problem.

I guess I'll hack some debug info into the file to see if I refute this theory.

1 Like

I've found the problem. In the middle of minstrel_ht_get_rate is this line:

	if (time_is_before_jiffies(mi->sample_time))
		return;

I assume that this is supposed to cause the function to return if it's called too early. If so, the logic is backwards; it'll return if time (i.e. mi->sample_time) is before jiffies (the current time). In other words, it'll return early only if the sample_time has passed.

Replacing that with time_is_after_jiffies makes everything much better -- I each station select a rate other than the first in the list right away.

@nbd, does this sound right to you?

2 Likes

It looks like this is the same as https://bugs.openwrt.org/index.php?do=details&task_id=760, so I've added my solution to that ticket.

1 Like

Sorry for the delayed response. What you found makes perfect sense, and I will push a fix to OpenWrt (and send it upstream) right away.
Thanks for tracking this one down, I have no idea why it works so well in all of my tests without this fix.

Great, thanks!

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.