Mmmeh, 8.8.8.8 and also gstatic.com are reasonably okay, but in a pinch will deprioritize ICMP echo generation and do strange things with ping in general (like responding ot large ICMP echo requests with short responses), but the bigger issue really is that unless one can guarantee that congestion only hits in the downstream, two one-way delay measurements are much better than a single round-trip-time, but almost no server will respond to icmp type 13requests with a type 14 response... That's why using irtt against a server under one's own control seems like a more robust design to me.
Then again, if that goal is "just" an improvement over the status quo then sure 8.8.8.8 and friends should get you started.
Statistics my friend, statistics....
my idea is to select 5 unconnected different reflectors, establish a baseline, and then at each evaluation see how much each reflector increases over the baseline and take the median increment. It seems unlikely that they'd all deprioritize or otherwise futz with the response at once, and the median is extremely robust to outliers. If the median response increases more than a few ms then you're experiencing congestion.
Of course, how much value you're going to get from dropping your shaper speed when its the upstream that's congested is anyone's guess. I would expect you'd mainly get some benefit if you are yourself using a fair amount of bandwidth, so that by reducing your speed you reduce the upstream congestion. This will only work if the other users don't just fill in that bandwidth you freed up.
Also, rather than rewrite the sqm config and reload, I'm thinking to just directly replace the IFB qdisc, which will be nicer to the flash.
So I'm thinking something like this (pseudocode)
pings = establish_baselines()
forever
ping all the things
calculate differences from baseline
calculate m = median(differences)
if (m > 10ms)
x = 1 - rand()*0.2
bw = bw * x
bw = max(bw,myminbw)
replace shaper with new bw
else if (m < 3ms)
x = 1+rand()*0.25
bw = bw * x
bw = min(bw, mymaxbw)
replace shaper with new bw
endif
sleep 10
end
You can change the settings of a running cake instance
tc qdisc change
Using multiple independent reflectors is a good idea, like 1.1.1.1/cloudflare, 8.8.8.8/Google, and 9.9.9.9/IBM might be a decent mix, median will still work, as will voting. That still leaves the downstream upstream question. And the other question you raised, is backing off actually a productive strategy, but given an automated mechanism that should be relative easy to test. Final point is policy, how deep does one want to try network issues at the own router. Sure traffic shaping is great to control the own internet access link, might still be decent for, say a DSLAM's/OLT's/CMTS's upstream, but probably not a good solution for a selectively peering overload between the own ISP and any other AS....
By all means go and test, gargoyle has something like that, but all my limited testing with ICMP reflectors in the internet left me turn away in disgust, but in retrospect I was not aiming for a decent good enough but was trying to measure a 0.250 millisecondcdelay increase...
Yeah, if you need that resolution you'll want your own custom reflector. But if you want to measure say a median of 20ms increase... I think this should be very doable.
After all the goal is to accept the occasional 5-15ms delays but avoid the
I think we may be able to repurpose some of the code I wrote for router performance monitoring... there wasn't much in the way of interest on that front, but it had a similar flavor..
For example it already has some code to ping stuff. We probably don't need a sqlite database, but it could just keep a collection of recent stats in memory.
Lua is really just not quite a real language... sigh
I'd love to do this in Erlang, which is all about high availability and fault tolerance and concurrency, and stuff like that, but it's not exactly lightweight, requiring 4.8MB of flash space. It would make things easy though, you just spawn off an erlang thread to read the latency from each ping responder, then have a loop that looks at the current latency compared to the historical latency and spawns off a thread to reduce the SQM speed when the increment is sufficiently high.
@Bndwdthseekr, since you're running extroot, are you up for an Erlang solution? I've been looking for an excuse to try it out anyway.
I'm just going to put some Erlang based design notes here since I am not going to be able to immediately start coding it, but don't want to lose the ideas:
There are two basic processes involved:
-
Several process that each ping a different reflector and detect a delay condition... This is basically a loop that pings, reads the time, remembers the shortest say 5 times, and detects when the current time is more than say 20ms above the 5th shortest time for this reflector. When that happens it sends a "delayed" message to the adjuster thread...
-
The adjuster thread. It waits for delayed messages. It counts delayed messages in a time-window (say 30 seconds) and if it gets N or more of them in that time window it cuts the bandwidth by a random fraction between 0.85 and 0.95... If it receives less than N-2 delay messages in its time window it increases the bandwidth by a fraction between 1.05 and 1.15. respecting both a minimum and maximum bound. If it gets N-2 or N-1 it does nothing. I'm thinking 5 reflectors, N=3 but you could also do something like 7 reflectors N=4.
Required functions:
- pingstats(N,A) should ping address A N times and return a list of times in ms, basically requires calling os:cmd and parsing out the results.
- setbw(I,B) should set the bandwidth on the cake instance on interface I to value B in kbps, by calling os:cmd("tc qdisc change ...")
- parsesqm(F) This should basically parse out the sqm parameters from the file F (normally /etc/config/sqm) so it knows the upper bound and extra options in use.
Seems doable in a reasonable time.
Yes, but I find it quite interesting to use the limited busybox shell.
Here is my try to implement a ping monitor:
#!/bin/sh
#title : SQM Adaptive Rate
#purpose : Serve the cake
#author : shm0
#date : 01/03/2020
#version : 0.5
LC_ALL=C
readonly base_dir="/var/shm/sqm_adaptive_rate"
readonly ping_monitor_dir="${base_dir}/ping_monitor"
readonly bandwidth_monitor_dir="${base_dir}/bandwidth_monitor"
init() {
# Initialize File and Folder Structure
rm -rf "${base_dir}" 2> /dev/null \
|| {
printf "%s\n" "Error: Failed to remove ${base_dir}!" >&2
exit 1
}
mkdir -p "${base_dir}" 2> /dev/null \
|| {
printf "%s\n" "Error: Failed to create ${base_dir}!" >&2
exit 1
}
mkdir -p "${ping_monitor_dir}" 2> /dev/null \
|| {
printf "%s\n" "Error: Failed to create ${ping_monitor_dir}!" >&2
exit 1
}
mkdir -p "${bandwidth_monitor_dir}" 2> /dev/null \
|| {
printf "%s\n" "Error: Failed to create ${bandwidth_monitor_dir}!" >&2
exit 1
}
return 0
}
deinit() {
# Clean up on exit
rm -rf "${base_dir}" 2> /dev/null \
|| {
printf "%s\n" "Error: Failed to remove ${base_dir}!" >&2
exit 1
}
exit 0
}
ping_monitor() {
# Parameters
# $1: Target to ping
# $2: Amount of ping samples to keep
# $3: Amount of time between pings
local ping_target="${1}"
local ping_samples="${2}"
local ping_interval="${3}"
# Local Variables
local ping_sample_file="${ping_monitor_dir}/${ping_target}.spl"
local ping_sample_list
local ping_time
local sample_count
local i
# FixMe
# Print error, even when running in the background
# Implement some better signal feedback to main fuction?
touch "${ping_sample_file}" 2> /dev/null \
|| {
printf "%s\n" "Error: Failed to create ${ping_sample_file}!" >&2
kill ${$}
exit 1
}
# Check if the sample file has been Initialized
# If not fill with 0 x ping samples num separated by whitespace
# So awk can process it
if [ ! -s "${ping_sample_file}" ]; then
i="1"
ping_sample_list="0"
while [ "${i}" -lt "${ping_samples}" ]; do
ping_sample_list="$(printf "%s %s" "${ping_sample_list}" "0")"
i="$((i + 1))"
done
printf "%s\n" "${ping_sample_list}" > "${ping_sample_file}"
fi
sample_count="1"
while [ 1 = 1 ]; do
ping_time="$(ping -q -c1 -W1 -A -s 56 -i 1 "${ping_target}" 2> /dev/null \
| grep -Eo '\/[0-9]*\.[0-9]*\/' \
| tr -d '/')"
sleep "${ping_interval}"
if [ -n "${ping_time}" ]; then
ping_time="$(printf "%.0f" "${ping_time}")"
else
ping_time="999"
fi
ping_sample_list="$(awk '{ $'"${sample_count}"' = '"${ping_time}"';
print $0 }' "${ping_sample_file}")"
printf "%s\n" "${ping_sample_list}" > "${ping_sample_file}"
if [ "${sample_count}" -ge "${ping_samples}" ]; then
sample_count="1"
else
sample_count="$((sample_count + 1))"
fi
done
return 0
}
bandwidth_monitor() {
# Parameters
# $1: Monitor Rates on this interface
local interface="${1}"
# Local Variables
local rx_rate_file="${bandwidth_monitor_dir}/${interface}.rx"
local tx_rate_file="${bandwidth_monitor_dir}/${interface}.tx"
local RXPREV="-1"
local TXPREV="-1"
local RX
local TX
local BWRX
local BWTX
# FixMe
# Print error, even when running in the background
# Implement some better signal feedback to main fuction?
touch "${rx_rate_file}" 2> /dev/null \
|| {
printf "%s\n" "Error: Failed to create ${rx_rate_file}!" >&2
kill ${$}
exit 1
}
touch "${tx_rate_file}" 2> /dev/null \
|| {
printf "%s\n" "Error: Failed to create ${tx_rate_file}!" >&2
kill ${$}
exit 1
}
while [ 1 = 1 ]; do
RX="$(cat /sys/class/net/"${interface}"/statistics/rx_bytes)"
TX="$(cat /sys/class/net/"${interface}"/statistics/tx_bytes)"
if [ "${RXPREV}" -ne -1 ]; then
BWRX="$(((RX - RXPREV) * 8 / 1000))"
printf "%d\n" "${BWRX}" > "${rx_rate_file}"
fi
if [ "${TXPREV}" -ne -1 ]; then
BWTX="$(((TX - TXPREV) * 8 / 1000))"
printf "%d\n" "${BWTX}" > "${tx_rate_file}"
fi
RXPREV="${RX}"
TXPREV="${TX}"
sleep 1
done
return 0
}
get_ping_samples() {
# Parameters
local ping_target="${1}"
# Local Variables
local ping_sample_file="${ping_monitor_dir}/${ping_target}.spl"
local ping_samples
if ping_samples="$(cat "${ping_sample_file}" 2> /dev/null)" \
&& [ -n "${ping_samples}" ]; then
printf "%s" "${ping_samples}"
return 0
else
return 1
fi
}
get_avg_median_ping() {
# Parameters
# $1: Amount of ping samples (user specified)
local ping_target="${1}"
# Local Variables
local ping_sample_file="${ping_monitor_dir}/${ping_target}.spl"
local ping_sample_list
local ping_median_list
local ping_sample_num
local ping_avg
if ! ping_sample_list="$(cat "${ping_sample_file}" 2> /dev/null)"; then
return 1
fi
ping_sample_num="$(printf "%s\n" "${ping_sample_list}" \
| wc -w)"
ping_median_list="$(printf "%s\n" "${ping_sample_list}" \
| tr " " "\n" \
| sort -nr \
| head -n -"$(((ping_sample_num + 10) / 10))" \
| tail -n +"$((((ping_sample_num + 10) / 10) + 1))" \
| tr "\n" " ")"
# As long as the ping sample file is not completely filled
# use normal avg ping calculation
if printf "%s\n" "${ping_median_list}" | grep -Eoq '\b0\b'; then
ping_median_list="${ping_sample_list}"
fi
ping_avg="$(printf "%s\n" "${ping_median_list}" \
| awk '
{
avg = 0; skip = 0;
for (i = 1; i <= NF; i++) {
if ($i == 0)
skip += 1;
avg += $i };
if (avg == 0)
avg = 0;
else
avg /= (NF - skip);
print avg
}
')"
if [ -n "${ping_avg}" ]; then
printf "%.0f" "${ping_avg}"
return 0
else
return 1
fi
}
get_new_ping_target() {
# Parameters
local ping_target_curr="${1}"
local ping_target_list="${2}"
local blacklist_duration="${3}"
# Local Variables
local blacklist_target_file="${ping_monitor_dir}/${ping_target_curr}.blacklist"
local blacklist_target_name
local blacklist_target_timestamp
local ping_sample_file
local current_time
local target
touch "${blacklist_target_file}" 2> /dev/null \
|| {
printf "%s\n" "Error: Failed to create ${blacklist_target_file}!" >&2
exit 1
}
# Check if the current ping target can be removed from blacklist
current_time="$(date '+%s')"
blacklist_target_name="$(basename "${blacklist_target_file}" .blacklist)"
ping_sample_file="${ping_monitor_dir}/${blacklist_target_name}.spl"
if blacklist_target_timestamp="$(cat "${blacklist_target_file}" 2> /dev/null)" \
&& [ -n "${blacklist_target_timestamp}" ] \
&& [ "${blacklist_target_timestamp}" -lt "${current_time}" ]; then
rm -rf "${blacklist_target_file}"
rm -rf "${ping_sample_file}"
else
printf "%d\n" "$((current_time + blacklist_duration))" > "${blacklist_target_file}"
fi
# Check if we can re-add some ping_targets
find "${ping_monitor_dir}" ! -name "$(printf "*\n*")" -name '*.blacklist' -maxdepth 1 > "${ping_monitor_dir}/blacklist.tmp"
while IFS= read -r target; do
blacklist_target_file="${target}"
blacklist_target_name="$(basename "${target}" .blacklist)"
blacklist_target_timestamp="$(cat "${target}")"
ping_sample_file="${ping_monitor_dir}/${blacklist_target_name}.spl"
if [ "${blacklist_target_timestamp}" -gt "${current_time}" ]; then
# Escape the dots in IPs/Domains to make the second sed properly work
blacklist_target_name="$(printf "%s\n" "${blacklist_target_name}" | sed 's/\./\\\./g')"
ping_target_list="$(printf "%s\n" "${ping_target_list}" | sed 's/'"${blacklist_target_name}"'//g')"
else
rm -f "${blacklist_target_file}" 2> /dev/null
rm -f "${ping_sample_file}" 2> /dev/null
fi
done < "${ping_monitor_dir}/blacklist.tmp"
rm -f "${ping_monitor_dir}/blacklist.tmp" 2> /dev/null
# Clean up string, just to be sure
# Replace 2 or more whitespace characters with 1
# Remove leading and trailing white space characters
ping_target_list="$(printf "%s\n" "${ping_target_list}" | sed -E -e 's/\s{2,}/ /g' -e 's/(^\ |\ $)//g')"
if [ -n "${ping_target_list}" ]; then
printf "%s" "${ping_target_list}"
return 0
else
return 1
fi
}
get_rx_current_rate() {
# Parameters
local interface="${1}"
# Local Variables
local rx_rate_file="${bandwidth_monitor_dir}/${interface}.rx"
local rx_rate
if rx_rate="$(cat "${rx_rate_file}" 2> /dev/null)" \
&& [ -n "${rx_rate}" ]; then
printf "%d" "${rx_rate}"
return 0
else
return 1
fi
}
get_tx_current_rate() {
# Local Variables
local interface="${1}"
# Local Variables
local tx_rate_file="${bandwidth_monitor_dir}/${interface}.tx"
local tx_rate
if tx_rate="$(cat "${tx_rate_file}" 2> /dev/null)" \
&& [ -n "${tx_rate}" ]; then
printf "%d" "${tx_rate}"
return 0
else
return 1
fi
}
get_rx_max_rate() {
# Parameters
# $1: Get max rx rate from sqm config for this interface
local interface="${1}"
# Local Variables
local rx_max_rate
if rx_max_rate="$(uci get sqm."${interface}".download 2> /dev/null)" \
&& [ -n "${rx_max_rate}" ]; then
printf "%d" "${rx_max_rate}"
return 0
else
return 1
fi
}
get_tx_max_rate() {
# Parameters
# $1: Get max tx rate from sqm config for this interface
local interface="${1}"
# Local Variables
local tx_max_rate
if tx_max_rate="$(uci get sqm."${interface}".upload 2> /dev/null)" \
&& [ -n "${tx_max_rate}" ]; then
printf "%d" "${tx_max_rate}"
return 0
else
return 1
fi
}
usage() {
echo "Usage:"
echo "Nothing here yet!"
exit 0
}
main() {
# Args
# -i|--interface Interface: to adjust Rates on (as in sqm config)
# -r|--rx-min-threshold: Minimum RX Rate in Percent, Default: 30
# -t|--tx-min-threshold: Minimum TX Rate in Percent, Default: 30
# -f|--reduce-factor: Reduce Factor in Percent, amount will be subtracted from current Rates, Default: 20
# -l|--ping-limit: Maximum Ping Limit in ms, above this Limit Rates will be reduced, Default: auto
# -z|--ping-target: Target to Ping, Default: 1.1.1.1
# -s|--ping-samples: Amount of Ping Samples to keep, Default: 5
# -c|--ping-interval: Amount of time in seconds between pings, Default: 1
# -a|--ping-fail-count: Max failed pings before a ping target gets blacklisted
# -d|--cooldown-time: Amount of time to keep new rates, Default: 3600
# -b|--blacklist-duration: Amount of time to black list a failing ping target, Default 900 sec
# -h|--help: Prints help text
local interface=""
local rx_min_rate_threshold="30"
local tx_min_rate_threshold="30"
local reduce_factor="20"
local ping_limit="auto"
local ping_target_list="1.1.1.1 8.8.8.8 9.9.9.9"
local ping_samples="5"
local ping_interval="1"
local ping_max_ping_fail_cnt="3"
local ping_target_blacklist_duration="900"
local cooldown_time="3600"
# Local Variables
local ping_mode
local ping_mode_auto_offset
local ping_avg
local ping_sample_list
local ping_target_list_tmp
local rx_current_rate
local tx_current_rate
local rx_current_max_rate
local tx_current_max_rate
local rx_min_rate
local tx_min_rate
local rx_max_rate
local tx_max_rate
local cooldown_counter
local sleep_counter
local sleep_time
local ping_monitor_pid
local status
while [ -n "${1}" ]; do
case "${1}" in
-i | --interface)
shift
interface="${1}"
;;
-r | --rx-min-threshold)
shift
rx_min_rate_threshold="${1}"
;;
-t | --tx-min-threshold)
shift
tx_min_rate_threshold="${1}"
;;
-f | --reduce-factor)
shift
reduce_factor="${1}"
;;
-l | --ping-limit)
shift
ping_limit="${1}"
;;
-z | --ping-targets)
shift
ping_target_list="${1}"
;;
-s | --ping-samples)
shift
ping_samples="${1}"
;;
-c | --ping-interval)
shift
ping_interval="${1}"
;;
-a | --ping-fails)
shift
ping_max_ping_fail_cnt="${1}"
;;
-d | --cooldown-time)
shift
cooldown_time="${1}"
;;
-b | --blacklist-duration)
shift
blacklist_duration="${1}"
;;
-h | --help)
usage
exit 0
;;
--)
shift
break
;;
*)
printf "%s" "Unrecognized option ${1}." && {
usage
exit 1
}
;;
esac
shift
done
if ! uci -q get sqm."${interface}".interface > /dev/null 2>&1; then
printf "%s" "Invalid interface specified!"
exit 1
fi
if [ "$(uci -q get sqm."${interface}".enabled)" -eq "0" ]; then
printf "%s" "SQM not enabled!"
exit 1
fi
if [ "$(uci -q get sqm.eth1.qdisc)" != "cake" ]; then
printf "%s" "SQM is not using cake qdisc!"
exit 1
fi
if ! echo "${rx_min_rate_threshold}" | grep -Eoq '^[0-9]+$' \
|| [ "${rx_min_rate_threshold}" -lt "1" ] \
|| [ "${rx_min_rate_threshold}" -gt "100" ]; then
printf "%s" "Invalid minimum RX rate rate specified! (1% - 100%)"
exit 64
fi
if ! echo "${tx_min_rate_threshold}" | grep -Eoq '^[0-9]+$' \
|| [ "${tx_min_rate_threshold}" -lt "1" ] \
|| [ "${tx_min_rate_threshold}" -gt "100" ]; then
printf "%s" "Invalid minimum TX rate rate specified! (1% - 100%)"
exit 64
fi
if ! echo "${reduce_factor}" | grep -Eoq '^[0-9]+$' \
|| [ "${reduce_factor}" -lt "1" ] \
|| [ "${reduce_factor}" -gt "99" ]; then
printf "%s" "Invalid reduce factor specified! (1% - 99%)"
exit 64
fi
if echo "${ping_limit}" | grep -Eoq '^auto.*$'; then
ping_mode="auto"
ping_mode_auto_offset="$({ echo "${ping_limit}" \
| grep -Eo '(\+|\-)[0-9]+'; } \
|| echo "+0")"
if [ "${ping_mode_auto_offset}" -gt "100" ]; then
ping_mode_auto_offset="+100"
fi
if [ "${ping_mode_auto_offset}" -lt "-100" ]; then
ping_mode_auto_offset="-100"
fi
ping_limit="0"
fi
if { ! echo "${ping_limit}" | grep -Eoq '(^[0-9]+$)' \
|| [ "${ping_limit}" -lt "1" ] \
|| [ "${ping_limit}" -gt "100" ]; } \
&& [ "${ping_mode}" != "auto" ]; then
printf "%s" "Invalid maximum ping limit specified! (1 - 100 ms)"
exit 64
fi
# Remove duplicate ping targets
ping_target_list="$(echo "${ping_target_list}" \
| tr ' ' '\n' \
| sort -u \
| tr '\n' ' ' \
| sed -E -e 's/\s{2,}/ /g' -e 's/(^\ |\ $)//g')"
for ping_target in ${ping_target_list}; do
if ! echo "${ping_target}" \
| grep -Eoq -e '\b(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}\b' \
-e '([a-z0-9A-Z]\.)*[a-z0-9-]+\.([a-z0-9]{2,24})+(\.co\.([a-z0-9]{2,24})|\.([a-z0-9]{2,24}))*'; then
printf "%s" "No valid ping target specified!"
exit 64
fi
done
if ! echo "${ping_samples}" | grep -Eoq '^[0-9]+$' \
|| [ "${ping_samples}" -lt "5" ] \
|| [ "${ping_samples}" -gt "20" ]; then
printf "%s" "Invalid ping sample amount specified! (5 - 20)"
exit 64
fi
if ! echo "${ping_interval}" | grep -Eoq '^[0-9]+$' \
|| [ "${ping_interval}" -lt "1" ] \
|| [ "${ping_interval}" -gt "5" ]; then
printf "%s" "Invalid ping interval specified! (1 - 5 seconds)"
exit 64
fi
if ! echo "${ping_max_ping_fail_cnt}" | grep -Eoq '^[0-9]+$' \
|| [ "${ping_max_ping_fail_cnt}" -lt "1" ] \
|| [ "${ping_max_ping_fail_cnt}" -gt "1000" ]; then
printf "%s" "Invalid max ping failed ping amount specified! (1 - 1000)"
exit 64
fi
if ! echo "${ping_target_blacklist_duration}" | grep -Eoq '^[0-9]+$' \
|| [ "${ping_target_blacklist_duration}" -lt "30" ] \
|| [ "${ping_target_blacklist_duration}" -gt "86400" ]; then
printf "%s" "Invalid blacklist duration specified! (30 - 86400 seconds)"
exit 64
fi
if ! echo "${cooldown_time}" | grep -Eoq '^[0-9]+$' \
|| [ "${cooldown_time}" -lt "1" ] \
|| [ "${cooldown_time}" -gt "86400" ]; then
printf "%s" "Invalid ping interval specified! (1 - 86400 seconds)"
exit 64
fi
init
bandwidth_monitor "${interface}" &
trap "deinit" INT TERM
trap "kill 0" EXIT
rx_max_rate="$(get_rx_max_rate "${interface}" || echo "0")"
tx_max_rate="$(get_tx_max_rate "${interface}" || echo "0")"
rx_min_rate="$((rx_max_rate * rx_min_rate_threshold / 100))"
tx_min_rate="$((tx_max_rate * tx_min_rate_threshold / 100))"
rx_current_max_rate="${rx_max_rate}"
tx_current_max_rate="${tx_max_rate}"
status="INIT"
ping_target="$(echo "${ping_target_list}" | awk '{ print $1 };')"
ping_monitor_pid="-1"
sleep_time="$(((ping_interval * ping_samples) + ping_interval))"
sleep_counter="0"
cooldown_counter="$((cooldown_time + 1))"
display_width="$(((ping_samples * 3) + ((ping_samples - 1) + 3)))"
while [ 1 = 1 ]; do
ping_sample_list="$(get_ping_samples "${ping_target}" || echo "0")"
ping_avg="$(get_avg_median_ping "${ping_target}" || echo "0")"
rx_current_rate="$(get_rx_current_rate "${interface}" || echo "0")"
tx_current_rate="$(get_tx_current_rate "${interface}" || echo "0")"
clear && printf '\e[3J'
printf "Current Rates : %"$((display_width - 13))"s / %7s kbit/s (RX/TX)\n" \
"${rx_current_rate}" "${tx_current_rate}"
printf "Current Maximum Rates: %"$((display_width - 13))"s / %7s kbit/s (RX/TX)\n" \
"${rx_current_max_rate}" "${tx_current_max_rate}"
printf "Maximum Rates : %"$((display_width - 13))"s / %7s kbit/s (RX/TX)\n" \
"${rx_max_rate}" "${tx_max_rate}"
printf "Minimum Rates : %"$((display_width - 13))"s / %7s kbit/s (RX/TX)\n" \
"${rx_min_rate}" "${tx_min_rate}"
printf "Reduction Rates : %"$((display_width - 13))"s / %7s kbit/s (RX/TX)\n" \
"$((rx_current_max_rate * reduce_factor / 100))" \
"$((tx_current_max_rate * reduce_factor / 100))"
printf "Reduction Factor : %"$((display_width - 3))"s %%\n" \
"${reduce_factor}"
printf "Ping Limit / Offset : %"$((display_width - 3))"s ms %11s\n" \
"${ping_limit}" "${ping_mode_auto_offset} ms"
printf "Ping (avg) / Interval: %"$((display_width - 3))"s ms %11s\n" \
"${ping_avg}" "${ping_interval} sec"
printf "Last Pings / Samples : %"$((display_width - 3))"s ms %11s\n" \
"${ping_sample_list}" "${ping_samples} spl"
printf "Ping Target : %"$((display_width - 3))"s\n" \
"${ping_target}"
printf "Ping Monitor : %"$((display_width - 3))"s\n" \
"$({ [ "${ping_monitor_pid}" -ne "-1" ] && echo "ACTIVE"; } \
|| echo "INACTIVE")"
printf "Cooldown : %"$((display_width + 1))"s\n" \
"$({ [ "${status}" = "IDLE" ] \
&& [ "${cooldown_counter}" -gt "0" ] \
&& [ "${cooldown_counter}" -lt "${cooldown_time}" ] \
&& echo "${cooldown_counter} sec"; } \
|| echo "INACTIVE ")"
printf "Status : %"$((display_width - 3))"s\n" \
"${status}"
# Check if ping target is down
if [ "$(echo "${ping_sample_list}" | grep -o '999' | wc -w)" -ge "${ping_max_ping_fail_cnt}" ]; then
status="DOWN"
fi
case "${status}" in
INIT)
if [ "${sleep_counter}" -ge "${sleep_time}" ]; then
if [ "${ping_mode}" = "auto" ]; then
ping_limit="$(((((ping_avg + 5) * 2) / 10 * 10) + ping_mode_auto_offset))"
fi
if [ "${ping_limit}" -le "$((ping_avg + 10))" ]; then
ping_limit="$((((ping_avg + 5) * 2) / 10 * 10))"
fi
if [ "${ping_limit}" -gt "100" ]; then
ping_limit="100"
fi
sleep_counter="-1"
status="IDLE"
else
if [ "${ping_monitor_pid}" -eq "-1" ]; then
rx_current_max_rate="${rx_min_rate}"
tx_current_max_rate="${tx_min_rate}"
tc qdisc change root dev ifb4"${interface}" cake bandwidth "${rx_current_max_rate}"kbit
tc qdisc change root dev "${interface}" cake bandwidth "${tx_current_max_rate}"kbit
sleep "${sleep_time}"
ping_monitor "${ping_target}" "${ping_samples}" "${ping_interval}" &
ping_monitor_pid="${!}"
fi
sleep_counter="$((sleep_counter + 1))"
fi
;;
IDLE)
if [ "${rx_current_max_rate}" -lt "${rx_max_rate}" ] \
|| [ "${tx_current_max_rate}" -lt "${tx_max_rate}" ]; then
if [ "${cooldown_counter}" -gt "0" ] \
&& [ "${cooldown_counter}" -le "${cooldown_time}" ]; then
cooldown_counter="$((cooldown_counter - 1))"
else
rx_current_max_rate="${rx_max_rate}"
tx_current_max_rate="${tx_max_rate}"
tc qdisc change root dev ifb4"${interface}" cake bandwidth "${rx_current_max_rate}"kbit
tc qdisc change root dev "${interface}" cake bandwidth "${tx_current_max_rate}"kbit
cooldown_counter="$((cooldown_time + 1))"
fi
fi
if [ "${ping_monitor_pid}" -ne "-1" ]; then
kill "${ping_monitor_pid}"
ping_monitor_pid="-1"
fi
if [ "${rx_current_rate}" -ge "${rx_min_rate}" ] \
|| [ "${tx_current_rate}" -ge "${tx_min_rate}" ]; then
status="MONITORING"
fi
;;
MONITORING)
if [ "${ping_monitor_pid}" -eq "-1" ]; then
ping_monitor "${ping_target}" "${ping_samples}" "${ping_interval}" &
ping_monitor_pid="${!}"
fi
if [ "${rx_current_rate}" -ge "${rx_min_rate}" ] \
|| [ "${tx_current_rate}" -ge "${tx_min_rate}" ]; then
if [ "${ping_avg}" -ge "${ping_limit}" ]; then
cooldown_counter="${cooldown_time}"
status="CONGESTION"
fi
else
status="IDLE"
fi
;;
CONGESTION)
if [ "${rx_current_rate}" -ge "${rx_min_rate}" ]; then
rx_current_max_rate="$((rx_current_max_rate - (rx_current_max_rate * reduce_factor / 100)))"
if [ "${rx_current_max_rate}" -le "${rx_min_rate}" ]; then
rx_current_max_rate="${rx_min_rate}"
fi
tc qdisc change root dev ifb4"${interface}" cake bandwidth "${rx_current_max_rate}"kbit
status="WAITING"
elif [ "${tx_current_rate}" -ge "${tx_min_rate}" ]; then
tx_current_max_rate="$((tx_current_max_rate - (tx_current_max_rate * reduce_factor / 100)))"
if [ "${tx_current_max_rate}" -le "${tx_min_rate}" ]; then
tx_current_max_rate="${tx_min_rate}"
fi
tc qdisc change root dev "${interface}" cake bandwidth "${tx_current_max_rate}"kbit
status="WAITING"
else
status="MONITORING"
fi
;;
WAITING)
if [ "${sleep_counter}" -ge "${sleep_time}" ]; then
sleep_counter="0"
status="MONITORING"
else
sleep_counter="$((sleep_counter + 1))"
fi
;;
DOWN)
if [ "${ping_monitor_pid}" -ne "-1" ]; then
kill "${ping_monitor_pid}"
ping_monitor_pid="-1"
fi
while ! ping_target_list_tmp="$(get_new_ping_target \
"${ping_target}" \
''"${ping_target_list}"'' \
"${ping_target_blacklist_duration}")"; do
wait_time="$((ping_target_blacklist_duration + 60))"
while [ "${wait_time}" -gt "0" ]; do
clear && printf '\e[3J'
printf "%s\n" "No more ping targets left!"
printf "%s\n" "Waiting ${wait_time} seconds!"
wait_time="$((wait_time - 1))"
sleep 1
done
done
ping_target="$(echo "${ping_target_list_tmp}" | awk '{ print $1 }')"
sleep_counter="0"
status="INIT"
;;
*)
status="IDLE"
;;
esac
sleep 1
done
exit 0
}
main "${@}"
Args
-i|--interface Interface: to adjust Rates on (as in sqm config)
-r|--rx-min-threshold: Minimum RX Rate in Percent, Default: 30
-t|--tx-min-threshold: Minimum TX Rate in Percent, Default: 30
-f|--reduce-factor: Reduce Factor in Percent, amount will be subtracted from current Rates, Default: 10
-l|--ping-limit: Maximum Ping Limit in ms, above this Limit Rates will be reduced, Default: auto
-z|--ping-target: Target to Ping, Default: 1.1.1.1
-s|--ping-samples: Amount of Ping Samples to keep, Default: 5
-c|--ping-interval: Amount of time in ms between pings, Default: 1
-d|--cooldown-time: Amount of time to keep new rates, Default: 3600
-h|--help: Prints help text
// Changelog 01.03.2020
- Added a simplistic round robin ping target fail over. When a ping target fails to respond 3 times it gets blacklisted for 15min. When no more ping targets are available the script will wait for the same amount as the blacklist time + 1 min.
- multiple ping targets can now be passed, use " " to enclose the ping targets.
For Example script.sh -z "1.1.1.1 8.8.8.8 9.9.9.9"
Well, I'm fiddling with Erlang, hopefully not while Rome burns. I've got it pinging things, and I'm hoping I can extract the ping times using regexes after a little more playing tomorrow morning.
Pretty sure once I have ping times I can get it to do the math... To get started I think I'll simply put the sqm command line into the erlang code manually, with placeholders for the parameters... easier than writing a parser for config files.
Sure, I have an insane amount of storage, so I'm for sure willing to give it a go.
I'm reading through this thread, and there's alot of info for me to digest. I appreciate everyone's input so far! I'm going to be working tonight, so not sure if I will have the chance to do anything with this until later this morning, but I am still here, and will be checking back in as soon as I have time xD
Good! we'll try it out... I don't have easy access to a testing situation here, but we'll see what we can do.
I'm not ready yet, but I do have a couple pages of Erlang code... so far it can ping sites and collect the timing statistics. It can update the bandwidth, given the appropriate tc command, and there's some slightly twisted logic to monitor a number of sites and decide to trigger a bandwidth reduction/increase based on sufficient number of sites with delays.... But none of it is tested, and some of it isn't quite finished...
It's a good project though, so thanks for the opportunity to test it out. I've been thinking about this kind of feedback mechanism for a while now. There are plenty of people with this variable bandwidth issue.
Ok. I've pushed the first draft of a script... https://github.com/dlakelan/routerperf/blob/master/sqmfeedback.erl
here's how you should test it..
- download onto your router in /root
- make sure you have installed erlang
- edit the file near the bottom to change the interface names to the ones in use by your upstream and downstream interfaces, and the bandwidths (in Kbit/s) the three bandwidths for each are lower, initial, and upper limits.
- Then, you'll need to compile and run it.
erlc sqmfeedback.erl
erl -noshell -s sqmfeedback main
Now it will print some logs and monitoring info about what it's up to. I can't promise it works or even that it doesn't break everything. I can say it hasn't got any malicious code which you should be able to see fairly straightfowardly. Basically it spawns off a bunch of threads to ping a number of big internet sites using names (so it'll use ipv6 if available). There's also a thread that just waits to get info about delays, and if enough of the sites have a delay, it asks another thread to reduce the bandwidth. It'll increase the bandwidth if there's no delays. Every time it monkeys with your bandwidth it should print a message about the command it's running.
As I say, be prepared to have it cause breakage etc, and we can monkey with it to be more fault tolerant and easier to use later. Let's just see if we can get an erlang script to work and ping things and maybe even amazingly adjust bandwidth!
For me, it does run and ping and periodically check the delays.
@moeller0 you might also enjoy playing with this in your copious spare time
when I run: erlc sqmfeedback.erl this is what I get. I'm pretty sure I'm not missing any dependencies though
{"init terminating in do_boot",{'cannot get bootfile','no_dot_erlang.boot'}}
init terminating in do_boot ({cannot get bootfile,no_dot_erlang.boot})
I'm not sure if setting all 3 at the same value for egress will muck things up, but I almost never have any fluctuation in my upstream, which I'm very thankful for.
monitor_ifaces([{"tc qdisc change root dev pppoe-wan cake bandwidth ~BKbit diffserv4 dual-srchost overhead 34 ", 1024, 1024, 1024},
{"tc qdisc change root dev pppoe-wan cake bandwidth ~BKbit diffserv4 dual-dsthost nat overhead 34 ingress",1024,6944,6944}],
Here's a typical friday night, first using a limit of 2080kbps, then switching to a limit of 1200kbps. Not sure what the rest of the fam is doing online, but after setting the limit lower, I was streaming 480p youtube, and it seems to be handling that along with everyone else's usage pretty well. While still not perfect, that shows the difference it makes, and I can live with that! xD
There are somewhere between five and fifteen devices actively using bandwidth 24/7, so this will be a good test once I figure out why I can't compile the script. I'm looking around to see if I can find what's causing the problem, but no luck yet.
There are a bunch of erlang packages... let me see, perhaps the standard one only offers the interpreter... yeah, it looks like you probably need to install erlang-compiler
EDIT: also, https://github.com/exercism/erlang/issues/113 suggests try installing erlang-tools
please note that this is my first erlang project (I started reading about the language a year ago but didn't have a project to do in it), so I'm learning how it works while we go along
I would say that it's ok to start things at the upper end, but I'd recommend not having the lower end also be at the upper end... give it a little wiggle room, so for the upload direction maybe try 800,1024,1024
I did try that, and still no luck. I did try a few other packages which I thought may be related to this issue as well, then I got ticked and installed everything erlang related on the repo... still nothing
ok, I'll get it running on an OpenWrt vm and see if I can figure out what's up. will have to be tomorrow though.
mean time what happens if you run the erl command without the erlc compile step?
I didn't try that until now, here's what I got:
root@Main_SQM:~# erl sqmfeedback.erl
Erlang/OTP 21 [erts-10.0] [source] [smp:1:1] [ds:1:1:10] [async-threads:1]
Eshell V10.0 (abort with ^G)
1>
try that one instead
Hmm...
root@Main_SQM:~# erl -noshell -s sqmfeedback main
{"init terminating in do_boot",{undef,[{sqmfeedback,main,[],[]},{init,start_em,1,[]},{init,do_boot,3,[]}]}}
init terminating in do_boot ({undef,[{sqmfeedback,main,[],[]},{init,start_em,1,[]},{init,do_boot,3,[]}]})
Crash dump is being written to: erl_crash.dump...done
No rush man, I appreciate the help to begin with xD
Ill do some research on my own as well and see if I can come up with anything.
yeah it's weird. I'll have to try it in the VM, it all works fine on my desktop machine. thanks for being the guinea pig.
you could run erl
by itself and then type c(sqmfeedback).
(note the period) at the erlang shell but I'm guessing you'll get more of the same.
ahh, this does shed more light on the issue...
1> c(sqmfeedback).
sqmfeedback.erl:46: Warning: variable 'Time' is unused
sqmfeedback.erl:74: Warning: variable 'T' is unused
sqmfeedback.erl:80: Warning: variable 'SitePids' is unused
sqmfeedback.erl:83: Warning: variable 'TimerPid' is unused
{ok,sqmfeedback}
actually, it doesn't, that was a clean compile, those warnings are just there to help you catch bugs in case you were supposed to use a variable but didn't. in this case I just assigned some stuff but didn't use the assigned value... now I'm really confused because it compiled fine
My ISP pretty much gives me stable rates with decent peering/transit, so I am confident that would rarely/never trigger on my link. (And I do not consider this to be a reasonable solution for the fact that on my link the sync speed varies between retrains, as this is only happening every few days and speeds are reliable while the link is up). So yes I am tempted to play with that, although I am also puzzled by selecting erlang for this task, that seems like using ICBMs to drive away a flock of sparrows