Just a thought, if you have 30 clients connected and they all rise the event at the same time, you will have spawn 30 scans which could overwhelm the CPU. Need to prevent multi spawn
I tested this, when a scan is already triggered but not completed when the next scan commands fires a message is thrown from iw like "device busy - code XX".
What would be more smart: if the last scan was less than 10 seconds ago , sleep 10 seconds and then trigger a new scan. This would lower scan count in total and guarantee that every iw event 60 gets a scan afterwards. Not sure how to do it in script to cancel "iw event 60 pending buffer" once a scan just launched - or the described mechanism would queue up scans probably faster then the 10s + scan delay passes by.
My bad , I just move the goal post and replace scan with mkdir. Still external command call. Each event has a epoch time in the first field. Using built-in shell for checks
dt=0
iw event -f -t| while read line; do
if [ ${line%%.*} -gt $dt ] ; then
scan
dt=$(($(date +%s)+10))
fi
# Put your custom commands here that should be executed once
# the system init finished. By default this file does nothing.
/bin/bash /root/ath9k-watchdog.sh start &
exit 0
/root/ath9k-watchdog.sh
#/bin/bash
#
# Purpose:
# Watch ath9k 2.4 GHz wadio event 60 and if the event occurs, trigger a WiFi scan as a workaround to avoid issues
# slowly creeping up like higher WiFi latency or unexpected WiFi client disconnects.
#
# Installation:
# bash "/root/ath9k-watchdog.sh" install
#
# Command line:
# bash "/root/ath9k-watchdog.sh" livelog
# bash "/root/ath9k-watchdog.sh" start &
# bash "/root/ath9k-watchdog.sh" stop
#
# For testing purposes only:
# bash "/root/ath9k-watchdog.sh" start
#
# Prerequisites:
# BASH
#
# Script Configuration.
PATH=/usr/bin:/usr/sbin:/sbin:/bin
SCRIPT_FULLFN="$(basename -- "${0}")"
SCRIPT_NAME="${SCRIPT_FULLFN%.*}"
LOGFILE="/tmp/${SCRIPT_NAME}.log"
LOG_MAX_LINES="1000"
#
#
# -----------------------------------------------------
# -------------- START OF FUNCTION BLOCK --------------
# -----------------------------------------------------
logAdd ()
{
TMP_DATETIME="$(date '+%Y-%m-%d [%H-%M-%S]')"
TMP_LOGSTREAM="$(tail -n ${LOG_MAX_LINES} ${LOGFILE} 2>/dev/null)"
echo "${TMP_LOGSTREAM}" > "$LOGFILE"
echo "${TMP_DATETIME} $*" | tee -a "${LOGFILE}"
return
}
serviceMain ()
{
#
# Usage: serviceMain
# Called By: MAIN
#
logAdd "[INFO] === SERVICE START ==="
#
logAdd "[INFO] Waiting to discover 2.4 GHz radio interface ..."
while (true); do
RADIO_ATH9K="$(iw dev|grep "Interface\|channel\|type"|grep -B 2 'channel.*24..'|grep -B 1 'AP'|tail -n 2|grep Interface|awk '{print $2}')"
if [ ! -z "${RADIO_ATH9K}" ]; then
break
fi
sleep 10
done
#
logAdd "[INFO] Setup iw_event_scan_trigger on interface [${RADIO_ATH9K}]"
#
dt=0
iw event -t -f | while read line; do
if $(echo -n "${line}" | grep -q "${RADIO_ATH9K}.*: unknown event 60"); then
#
# Check if the last scan was more than 10 seconds ago.
if [ ${line%%.*} -gt ${dt} ] ; then
echo "$(date +%Y-%m-%d_%H-%M-%S): ${dt} ${RADIO_ATH9K} scan ..."
iw dev ${RADIO_ATH9K} scan trigger freq 2447 flush >/dev/null 2>&1
dt=$(($(date +%s)+10))
fi
fi
done
#
return 0
}
# ---------------------------------------------------
# -------------- END OF FUNCTION BLOCK --------------
# ---------------------------------------------------
#
# Check shell
if [ ! -n "${BASH_VERSION}" ]; then
logAdd "[ERROR] Wrong shell environment, please run with bash."
exit 99
fi
#
trap "" SIGHUP
trap "trap - SIGTERM && kill -- -$$" SIGINT SIGTERM EXIT
#
if [ "${1}" = "install" ]; then
if ( grep -q "$(which bash) $(readlink -f "${0}") start &" "/etc/rc.local"); then
echo "[INFO] Script already present in startup."
exit 0
fi
sed -i "\~^exit 0~i $(which bash) $(readlink -f "${0}") start &\n" "/etc/rc.local"
echo "[INFO] Script successfully added to startup."
exit 0
elif [ "${1}" = "livelog" ]; then
tail -f "${LOGFILE}"
exit 0
elif [ "${1}" = "start" ]; then
serviceMain &
#
# Wait for kill -INT.
wait
exit 0
elif [ "${1}" = "stop" ]; then
ps w | grep -v grep | grep "$(basename -- $(which bash)) .*$(basename -- ${0}) start" | sed 's/ \+/|/g' | sed 's/^|//' | cut -d '|' -f 1 | grep -v "^$$" | while read pidhandle; do
echo "[INFO] Terminating old service instance [${pidhandle}] ..."
kill -INT "${pidhandle}" 2>/dev/null
kill "${pidhandle}" 2>/dev/null
done
#
# Check if parts of the service are still running.
if [ "$(ps w | grep -v grep | grep "$(basename -- $(which bash)) .*$(basename -- ${0}) start" | sed 's/ \+/|/g' | sed 's/^|//' | cut -d '|' -f 1 | grep -v "^$$" | wc -l)" -gt 0 ]; then
logAdd "[ERROR] === SERVICE FAILED TO STOP ==="
ps w | grep "iw event\|${SCRIPT_NAME}" | grep -v grep
exit 99
fi
#
killall iw 2>/dev/null
#
logAdd "[INFO] === SERVICE STOPPED ==="
exit 0
fi
#
logAdd "[ERROR] Parameter #1 missing."
logAdd "[INFO] Usage: bash ${SCRIPT_FULLFN} {install|livelog|start|stop}"
exit 99
To see if the watchdog triggers the scans appropriately, put "iw event -t -f " into the SSH shell running in parallel. After "unknown event 60" lines there should be "unknown event 33/34" lines afterwards if the scan was triggered by the watchdog script.
@anon2180415 Thanks for contacting me to update my original ath10k-ct-watchdog script and put both to my openwrt "useful scripts" repo for the TP-Link Archer C7v2|5.
@sammo Perfect, I've intregrated your snippet into the script and it works well, now throttling scans if the last scan was less than 10 seconds ago.
We've now agreed to put the recent version of the script here:
I would swap these 2 lines round. echo,grep mean more external command calls and using up resources
if $(echo -n "${line}" | grep -q "${RADIO_ATH9K}.*: unknown event 60"); then
#
# Check if the last scan was more than 10 seconds ago.
if [ ${line%%.*} -gt ${dt} ] ; then
While I've made a telegram notifier that pushes me a message when any of my Archer's output syslog line indicating an unexpected reboot occured (procd -- init complete --), I've discovered something interesting along the days:
all archers went fine and stable for 9 days without sudden reboots
one unit (that is very similar configured like the others) often has sudden reboots, after 8 days it went into the reboot and my script reported this immediately (from the syslog server to telegram).
the "problematic" unit did the reboot as soon I was turning on my old TV (in same room, short 2.4 GHz wifi distance) and using it to view video from a miniDLNA server.
So, I wonder if those "sudden reboots" have something to do with multicast/broadcast announcements of UPnP?!
I put the workaround in my rc.local:
iw dev wlan_2g scan trigger freq 2437 flush >/dev/null 2>&1
It kind works, my wifi connection didnt stop anymore, for Youtube,Download,Browser is working fine. But when I'm using VPN or other specific apps they crash or disconnect every time, probably because of IW SCAN.
I think I have identify the cause of ath9k slowness. I have an GL AR300m running OpenWRT 21.02 which only have a single 2.4Ghz radio. I run iperf and the slowness shows before 3hrs on AP. This rate increases in repeater mode with the repeater losing beacon and the WiFi stack reconnects.
I think the problem is txq buffer size.
Check your physical radio for 2.4ghz , it's either phy0 or phy1
Determine your phy 2.4ghz radio iwinfo|grep -m 1 -A 10 '.*2\.4'|grep -Eo 'phy\d+$'
Check memory limit and double it. Mine was originally set at 4194304
iw phy phy0 get txq
iw phy phy0 set txq memory_limit 8388608
Don't forget to disable scan workaround if you got that running
I can confirm the below settting on a GL AR300 with 128M memory has resolved my issues.
The above setting was testing for AP/STA. When testing for dumb AP, it dies alot quicker
so we know txq is part of the equation. I've also double the Packet limit to 16384 and currently testing
@sammo Which value do you suggest to optimize here in order to stabilize the Archer C7v2 ath9k 2.4 GHz Wifi without using the ath9k-watchdog.sh script? Ahhh silly me, phy1 is the 2.4 GHz and yes, I'll gve the 8 M memory limit a try now :-). Thank you.
ToDo for me:
iw phy phy1 set txq memory_limit 8388608
and I'll test if it is enough to set it once after startup or if it must be regularly refreshed e.g. when the adapter restarts.
To see how long the change sticks:
while(true);do clear; iw phy phy1 get txq; sleep 1; done;
UPDATE: To make the setting stick, put it in /etc/rc.local , on top for example is fine.
# Put your custom commands here that should be executed once
# the system init finished. By default this file does nothing.
/usr/sbin/iw phy phy1 set txq memory_limit 8388608
exit 0