I think I may have stumbled across why the new virtual time-base airtime scheduler is causing problems for ath10k routers. I think the issue may be traced to one method ( ath10k_mac_op_wake_tx_queue) in the ath10k driver that tries to schedule packets to wireless clients.
From what I can understand of this method, it will be called by mac80211
to make a transmit queue (txq
) active in the firmware so that it can start transmitting packets in that queue. Instead of activating the txq
that mac80211 wants, it looks for another txq
and activates it. This method also updates the firmware with accounting information regarding transmit duration of the txq
. Now this method updates the accounting information for both the txq
that mac80211 sends in and also the new txq
that it found.
So net effect is that the firmware may try to send out packets for txq
that mac80211
did not intend to or have nothing to send, and artificially starving the txq
with data to transmit, due to incorrect accounting. This probably will not affect routers with small number of clients. Will probably be more pronounced with more clients or when time drags on. That may explain why the WiFi interface needed to be restarted every once in a while.
I think the following patch should solve the problem.
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4601,17 +4601,15 @@ static void ath10k_mac_op_wake_tx_queue(
{
struct ath10k *ar = hw->priv;
int ret;
- u8 ac;
- ath10k_htt_tx_txq_update(hw, txq);
if (ar->htt.tx_q_state.mode != HTT_TX_MODE_SWITCH_PUSH)
return;
- ac = txq->ac;
- ieee80211_txq_schedule_start(hw, ac);
- txq = ieee80211_next_txq(hw, ac);
if (!txq)
- goto out;
+ return;
+
+ if (!ieee80211_txq_may_transmit(hw, txq))
+ return;
while (ath10k_mac_tx_can_push(hw, txq)) {
ret = ath10k_mac_tx_push_txq(hw, txq);
@@ -4620,8 +4618,6 @@ static void ath10k_mac_op_wake_tx_queue(
}
ieee80211_return_txq(hw, txq, false);
ath10k_htt_tx_txq_update(hw, txq);
-out:
- ieee80211_txq_schedule_end(hw, ac);
}
/* Must not be called with conf_mutex held as workers can use that also. */
My R7800 is currently not available for test. Is anyone willing to test the patch? @KONG fancy a try as you seem to have a setup that could simulate this problem?