SQM autorate-ingress: Can I set thresholds for this?

Bndwdthseekr · February 19, 2020, 3:07am

I have been using SQM/CAKE for a while now and have had good results, except for when my ISP is congested at the DSLAM, which is every evening. This gives me massive latency spikes unless SQM is configured with a lower downstream limit.

This afternoon I decided to try autorate-intgress, as I have a script to adjust bandwidth based on latency, but its just a cronjob which runs every few minutes, and can't really adjust fast enough to be too seamless even though it makes a world of difference for gaming/VOIP applications as it is. After disabling my cronjobs and enabling autorate I noticed I was only getting around 1.3-2.1mbit/s when my scripts allow me to get around 3.8-5mbit/s. Ping plotter was running on my windows machine, and there wasn't much difference between the two besides jitter, and even then it was 5--15ms more (if that) over the hour or so, and these spikes didn't last for more than a second. With my ADSL connection being oversubscribed on the ISP side of things(thankfully only the downstream), most likely with their own queuing trying to balance everyone in the area due to their laziness not wanting to put a cent into the area, these things are going to happen. I would rather have a few more 5-15ms spikes and 4ish megabit/s than be stuck at 2 or less considering the meager bandwidth I have to distribute to the household in the first place.

I did try setting round trip time (rtt) to a few set values, but it didn't seem to have any effect on the autorate deal. Also, is it supposed to continue adjusting down when bandwidth isn't being used?

qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
 Sent 4049805618 bytes 10087961 pkt (dropped 0, overlimits 0 requeues 3)
 backlog 0b 0p requeues 3
  maxpacket 1506 drop_overlimit 0 new_flow_count 4 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.2 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.3 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.1 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 813d: dev pppoe-wan root refcnt 2 bandwidth 1012Kbit besteffort dual-srchost nat nowash no-ack-filter split-gso rtt 100.0ms atm overhead 32
 Sent 35025209 bytes 425065 pkt (dropped 86, overlimits 127140 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 147168b of 4Mb
 capacity estimate: 1012Kbit
 min/max network layer size:           29 /    1492
 min/max overhead-adjusted size:      106 /    1696
 average network hdr offset:            0

                  Tin 0
  thresh       1012Kbit
  target         18.0ms
  interval      113.0ms
  pk_delay        3.0ms
  av_delay        100us
  sp_delay         34us
  backlog            0b
  pkts           425151
  bytes        35121808
  way_inds        21721
  way_miss         5828
  way_cols            0
  drops              86
  marks               3
  ack_drop            0
  sp_flows            0
  bk_flows            1
  un_flows            0
  max_len         17008
  quantum           300

qdisc ingress ffff: dev pppoe-wan parent ffff:fff1 ----------------
 Sent 313932235 bytes 475219 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 813e: dev ifb4pppoe-wan root refcnt 2 bandwidth 128984bit autorate-ingress besteffort dual-dsthost nat wash no-ack-filter split-gso rtt 100.0ms atm overhead 32
 Sent 287920728 bytes 455153 pkt (dropped 20066, overlimits 751965 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 167328b of 4Mb
 capacity estimate: 137584bit
 min/max network layer size:           29 /    1492
 min/max overhead-adjusted size:      106 /    1696
 average network hdr offset:            0

                  Tin 0
  thresh      128984bit
  target        140.9ms
  interval      281.7ms
  pk_delay       41.4ms
  av_delay        2.7ms
  sp_delay         58us
  backlog            0b
  pkts           475219
  bytes       313932235
  way_inds        53173
  way_miss         5356
  way_cols            0
  drops           20066
  marks            4213
  ack_drop            0
  sp_flows            9
  bk_flows            1
  un_flows            0
  max_len          1492
  quantum           300

PS. If I see a comment saying "why don't you get another ISP" I may actually stand up and flip my desk BUT SERIOUSLY.... can't wait for starlink.

dlakelan · February 19, 2020, 4:37am

I think this argues for continuing the cron job. I could maybe give you some suggestions as to how to improve it though. Can you paste the script here?

Bndwdthseekr · February 19, 2020, 5:30am

uh... its a bit thrown together and still a work in progress... but sure

Actually it is two, but the other one connects to my ubuntu server and runs a speed test, while simultaneously checking ping on the router... I can post it if you want, but since its even more of a work in progress than this one I'm not really ready to share quite yet, but when I have some time to nail down a few things that need to be addressed, I will post it as well.

Ideally, the SQM autorate-ingress could be tweaked and everything could be basically unnoticable rather than being in a game, and spike to 130-150ms for a little while until the script kicks in, which could be up to 2 minutes... or more if its a holiday and plenty of people in the area started netflix at the same time... I have thought about using a loop until the ping condition is met, but I haven't had time to mess with it.

I'm not sure if this is the best way to do it, but it updates on the web UI which is what I was going for.

Here's the 'reducer' :

#!/bin/sh
#---------------------------------------------------------------------------------
# This script deals with fluctuating, low bandwidth connections.
# When high latency is detected, the bandwidth target for SQM will be reduced.
# Written by: (Bndwdthseekr)
# lowestbw needs to be set at the lowest expected bandwidth +1 the way the script is now. Can I fix this? Yes, but I am lazy.
#=================================================================================
lowestbw=1473
targms=110	#Set the latency at which the SQM bandwidth target is triggered here.
reducekb=304	#Set how much (in kbps) the bandwidh will be reduced by when triggered.
targbw=$(awk -F"\t" '/download/ { print $(NF-0) }' /etc/config/sqm | cut -d "'" -f2)
num1=$((targbw-reducekb))
#----------------------------------------------------------------------------------
sleep 5
average=$(ping -c 3 -q tester1.ec2.dslreports.com | cut -d "/" -s -f4 | cut -d "." -f1)
if [ "$average" -gt "$targms" ] && [ "$targbw" -gt "$lowestbw" ]; then
	#echo "Adaptive SQM: Latency high ${average}ms Bandwidth of ${targbw}kbps reduced by ${reducekb}kbps"
	logger -t Adaptive SQM "Latencywatch: Latency high ${average}ms Bandwidth of ${targbw}kbps reduced by ${reducekb}kbps"
	sed -i "s/.*download.*/\toption download '${num1}'/" /etc/config/sqm
	/etc/init.d/sqm reload
	/usr/bin/latency_watchdog.sh
	sleep 8
return
else
	if [ "$targbw" -gt "$lowestbw" ] && [ "$average" -lt "$targms" ]; then
		#echo "Adaptive SQM: Latency checked and appears to be normal avg ping: ${average}ms"
		logger -t Adaptive SQM "Latencywatch: Latency appears to be normal. avg ping: ${average}ms Bandwidth: ${targbw}kbps"
	fi
	if [ "$targbw" -lt "$lowestbw" ] && [ "$average" -gt "$targms" ]; then
	#echo "Adaptive SQM: Target Bandwidth: ${targbw}kbps"
	logger -t Adaptive SQM "Bandwidth is  ping(min/avg/max): ${min}${average}${max}ms  Target Bandwidth: ${targbw}kbps"
	fi
fi
sleep 5
average=$(ping -c 3 -q tester1.ec2.dslreports.com | cut -d "/" -s -f4 | cut -d "." -f1)
if [ "$average" -gt "$targms" ] && [ "$targbw" -gt "$lowestbw" ]; then
	#echo "Adaptive SQM: Latency checked and appears to be normal avg ping: ${average}ms"
	sed -i "s/.*download.*/\toption download '${num1}'/" /etc/config/sqm
	/etc/init.d/sqm reload
	/usr/bin/latency_watchdog.sh
return
fi

Here's where the bandwidth seems to settle at 12:50ish AM when I KNOW for a fact I have the bandwidth to download at higher rates, because here moments later (I deleted the file and restarted the download) I'm getting decent bandwidth, and no change in ping between the two. Is there something obvious I'm missing while setting this up?!

dlakelan · February 19, 2020, 6:48am

Ok I've got some ideas but will get back to you tomorrow about them. The basic idea is to just run the script in a loop, continuously adjusting the bandwidth. I think you will want to keep the SQM script in RAM so you can adjust without writing to flash. The basic idea is to do a kind of biased exponential random walk... Biased up when LAG ok, biased down when LAG too high. Multiply bandwidth by a random factor rather than increment it.

Bndwdthseekr · February 19, 2020, 7:55am

Probably not the best practice to write to the flash like that eh? But I am using a linksys E3000 bare board, overclocked cpu, with a usb drive extroot... so I think I'm okay? I have used this router since 2012 for multiple projects, and de-bricked it several times back in the day after failed attempts to get both wifi bands working properly, so if it dies on me after a few more writes to the flash, it deserves a break, and Ill grab another for $10

I guess I can use my lan server to host a page with the current bandwidth if I or other users need to check since it hosts a DSL modem stats web page already.

I would definitely appreciate any improvement you can suggest, and I already have two friends in the same boat as me wanting to get set up with something that works as well as mine already does.

shm0 · February 19, 2020, 8:09am

I would suggest to run the ping logger only when a certain traffic threshold is reached.
For example:

#!/bin/sh
LC_ALL=C
#set -x

trap "kill 0" SIGINT

get_interface_traffic_rx()
(
interface="$1"

if [ -z "$interface" ]; then
        interface=$(ls -1 /sys/class/net/ | head -1)
fi

RXPREV="-1"

while [ 1 = 1 ] ; do
        RX=$(cat /sys/class/net/"${interface}"/statistics/rx_bytes)
        if [ $RXPREV -ne -1 ] ; then
				BWRX=$(((RX - RXPREV) * 8 / 1024 / 1024))
				echo "$BWRX" > /var/bw_${interface}_rx
        fi
        RXPREV=$RX
        sleep 1
done
)

get_interface_traffic_tx()
(
local interface="$1"

if [ -z "$interface" ]; then
        interface=$(ls -1 /sys/class/net/ | head -1)
fi

TXPREV="-1"

while [ 1 = 1 ] ; do
        TX=$(cat /sys/class/net/"${interface}"/statistics/tx_bytes)
        if [ $TXPREV -ne -1 ] ; then
				BWTX=$(((TX - TXPREV) * 8 / 1024 / 1024))
				echo "$BWTX" > /var/bw_${interface}_tx
        fi
        TXPREV=$TX
        sleep 1
done
)

main()
(
interface="$1"
	get_interface_traffic_rx "$interface" &
	get_interface_traffic_tx "$interface" &
	
	while [ 1 = 1 ] ; do
        BWRX=$(cat /tmp/bw_${interface}_rx)
		BWTX=$(cat /tmp/bw_${interface}_tx)
		printf "\rRX Speed: %4d Mbit/s | TX-Speed: %4d Mbit/s" "$BWRX" "$BWTX"
		sleep 1
done
)

main "$1"

Seems like, getting variables back from a subfunction/shell, is not that easy
So best way to do that is through files or pipes?
And if you want to keep everything posix conform you can't use {} with local variables. (local is undefinied in posix)
So you have to use () to spawn the sub functions.
And managing the sub proccess is also a bit messy. I think the best way would be to store the pids of each sub proeccess/shell somewhere and then kill everything on exit.
I also would do some kind of the same approach to store the ping results in a file.
When bandwidth threshold > launch ping logger, save ping results to a file (maybe max 10-30 results).
And evaluate the results in the main function and adjust bandwidth accordingly.

moeller0 · February 19, 2020, 8:49am

I believe that autorate-ingress is designed to work well, if cake actually is in control of a queue of a length that correlates with actual congestion, which for upstream congestion is not really the case.

The next best thing really is to rund a more or less continuous RTT test against a non-overloaded ICMP reflector upstream of the suspected bottleneck and use reliable RTT changes to reciprocally control the shaper bandwidth. If the reflector is under your control, you could use irtt instead of ping to get one-way delays (assuming properly set clocks) to deal with the fact that congestion can affect both up- and downstream direction independently. But the crux really is that you need to be able to relay on the ICMP/irtt reflector and if your script is too trusting DOSing your reflector can effectively choke your internet access link... (but that can and should be dealt with in the script evaluating the delay probes).

dlakelan · February 19, 2020, 2:54pm

@shm0, for doing actual computing it might make sense to use Lua since it's usually on the router anyway and it has real programming constructs

@moeller0 probably a selection of public devices like 8.8.8.8 and 1.1.1.1 and a few others would work fine as ping reflectors.

@Bndwdthseekr I use a f2fs partition on a USB key as my /var/log on my RPI, that's probably a safe bet.

moeller0 · February 19, 2020, 3:14pm

Mmmeh, 8.8.8.8 and also gstatic.com are reasonably okay, but in a pinch will deprioritize ICMP echo generation and do strange things with ping in general (like responding ot large ICMP echo requests with short responses), but the bigger issue really is that unless one can guarantee that congestion only hits in the downstream, two one-way delay measurements are much better than a single round-trip-time, but almost no server will respond to icmp type 13requests with a type 14 response... That's why using irtt against a server under one's own control seems like a more robust design to me.
Then again, if that goal is "just" an improvement over the status quo then sure 8.8.8.8 and friends should get you started.

dlakelan · February 19, 2020, 4:04pm

Statistics my friend, statistics....

my idea is to select 5 unconnected different reflectors, establish a baseline, and then at each evaluation see how much each reflector increases over the baseline and take the median increment. It seems unlikely that they'd all deprioritize or otherwise futz with the response at once, and the median is extremely robust to outliers. If the median response increases more than a few ms then you're experiencing congestion.

Of course, how much value you're going to get from dropping your shaper speed when its the upstream that's congested is anyone's guess. I would expect you'd mainly get some benefit if you are yourself using a fair amount of bandwidth, so that by reducing your speed you reduce the upstream congestion. This will only work if the other users don't just fill in that bandwidth you freed up.

Also, rather than rewrite the sqm config and reload, I'm thinking to just directly replace the IFB qdisc, which will be nicer to the flash.

So I'm thinking something like this (pseudocode)

pings = establish_baselines()

forever
  ping all the things
  calculate differences from baseline
  calculate m = median(differences)
  if (m > 10ms)
    x = 1 - rand()*0.2
    bw = bw * x
    bw = max(bw,myminbw)
    replace shaper with new bw
  else if (m < 3ms)
    x = 1+rand()*0.25
    bw = bw * x
    bw = min(bw, mymaxbw)
    replace shaper with new bw
  endif
  sleep 10
end

moeller0 · February 19, 2020, 4:37pm

You can change the settings of a running cake instance
tc qdisc change

Using multiple independent reflectors is a good idea, like 1.1.1.1/cloudflare, 8.8.8.8/Google, and 9.9.9.9/IBM might be a decent mix, median will still work, as will voting. That still leaves the downstream upstream question. And the other question you raised, is backing off actually a productive strategy, but given an automated mechanism that should be relative easy to test. Final point is policy, how deep does one want to try network issues at the own router. Sure traffic shaping is great to control the own internet access link, might still be decent for, say a DSLAM's/OLT's/CMTS's upstream, but probably not a good solution for a selectively peering overload between the own ISP and any other AS....

By all means go and test, gargoyle has something like that, but all my limited testing with ICMP reflectors in the internet left me turn away in disgust, but in retrospect I was not aiming for a decent good enough but was trying to measure a 0.250 millisecondcdelay increase...

dlakelan · February 19, 2020, 4:46pm

Yeah, if you need that resolution you'll want your own custom reflector. But if you want to measure say a median of 20ms increase... I think this should be very doable.

After all the goal is to accept the occasional 5-15ms delays but avoid the

I think we may be able to repurpose some of the code I wrote for router performance monitoring... there wasn't much in the way of interest on that front, but it had a similar flavor..

github.com

dlakelan/routerperf/blob/master/luamonitor.lua

#!/usr/bin/lua


luamonitordb="/tmp/sqmmonitor.db"



require("posix")
sqlite3 = require("luasql.sqlite3")

-- returns a posixTimespec, which has two components tv_sec and tv_nsec (nanoseconds)
function timenow()
   return posix.time.clock_gettime(posix.time.CLOCK_REALTIME)
end


function getpingstats(ip,packsize)
   local f = io.popen("ping -c 5 -i 0.2 -s "..packsize .. " "..ip)
   local dat = f:read("*line") -- ignore the first line
   local l = {}

This file has been truncated. show original

For example it already has some code to ping stuff. We probably don't need a sqlite database, but it could just keep a collection of recent stats in memory.

dlakelan · February 19, 2020, 6:24pm

Lua is really just not quite a real language... sigh

I'd love to do this in Erlang, which is all about high availability and fault tolerance and concurrency, and stuff like that, but it's not exactly lightweight, requiring 4.8MB of flash space. It would make things easy though, you just spawn off an erlang thread to read the latency from each ping responder, then have a loop that looks at the current latency compared to the historical latency and spawns off a thread to reduce the SQM speed when the increment is sufficiently high.

@Bndwdthseekr, since you're running extroot, are you up for an Erlang solution? I've been looking for an excuse to try it out anyway.

I'm just going to put some Erlang based design notes here since I am not going to be able to immediately start coding it, but don't want to lose the ideas:

There are two basic processes involved:

Several process that each ping a different reflector and detect a delay condition... This is basically a loop that pings, reads the time, remembers the shortest say 5 times, and detects when the current time is more than say 20ms above the 5th shortest time for this reflector. When that happens it sends a "delayed" message to the adjuster thread...
The adjuster thread. It waits for delayed messages. It counts delayed messages in a time-window (say 30 seconds) and if it gets N or more of them in that time window it cuts the bandwidth by a random fraction between 0.85 and 0.95... If it receives less than N-2 delay messages in its time window it increases the bandwidth by a fraction between 1.05 and 1.15. respecting both a minimum and maximum bound. If it gets N-2 or N-1 it does nothing. I'm thinking 5 reflectors, N=3 but you could also do something like 7 reflectors N=4.

Required functions:

pingstats(N,A) should ping address A N times and return a list of times in ms, basically requires calling os:cmd and parsing out the results.
setbw(I,B) should set the bandwidth on the cake instance on interface I to value B in kbps, by calling os:cmd("tc qdisc change ...")
parsesqm(F) This should basically parse out the sqm parameters from the file F (normally /etc/config/sqm) so it knows the upper bound and extra options in use.

Seems doable in a reasonable time.

shm0 · February 19, 2020, 11:39pm

Yes, but I find it quite interesting to use the limited busybox shell.

Here is my try to implement a ping monitor:

#!/bin/sh
#title		: SQM Adaptive Rate
#purpose    : Serve the cake
#author		: shm0
#date       : 01/03/2020
#version	: 0.5

LC_ALL=C

readonly base_dir="/var/shm/sqm_adaptive_rate"
readonly ping_monitor_dir="${base_dir}/ping_monitor"
readonly bandwidth_monitor_dir="${base_dir}/bandwidth_monitor"

init() {
  # Initialize File and Folder Structure
  rm -rf "${base_dir}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to remove ${base_dir}!" >&2
      exit 1
    }

  mkdir -p "${base_dir}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to create ${base_dir}!" >&2
      exit 1
    }

  mkdir -p "${ping_monitor_dir}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to create ${ping_monitor_dir}!" >&2
      exit 1
    }

  mkdir -p "${bandwidth_monitor_dir}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to create ${bandwidth_monitor_dir}!" >&2
      exit 1
    }

  return 0
}

deinit() {
  # Clean up on exit
  rm -rf "${base_dir}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to remove ${base_dir}!" >&2
      exit 1
    }

  exit 0
}

ping_monitor() {
  # Parameters
  # $1: Target to ping
  # $2: Amount of ping samples to keep
  # $3: Amount of time between pings
  local ping_target="${1}"
  local ping_samples="${2}"
  local ping_interval="${3}"
  # Local Variables
  local ping_sample_file="${ping_monitor_dir}/${ping_target}.spl"
  local ping_sample_list
  local ping_time
  local sample_count
  local i

  # FixMe
  # Print error, even when running in the background
  # Implement some better signal feedback to main fuction?
  touch "${ping_sample_file}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to create ${ping_sample_file}!" >&2
      kill ${$}
      exit 1
    }

  # Check if the sample file has been Initialized
  # If not fill with 0 x ping samples num separated by whitespace
  # So awk can process it
  if [ ! -s "${ping_sample_file}" ]; then
    i="1"
    ping_sample_list="0"
    while [ "${i}" -lt "${ping_samples}" ]; do
      ping_sample_list="$(printf "%s %s" "${ping_sample_list}" "0")"
      i="$((i + 1))"
    done
    printf "%s\n" "${ping_sample_list}" > "${ping_sample_file}"
  fi

  sample_count="1"
  while [ 1 = 1 ]; do
    ping_time="$(ping -q -c1 -W1 -A -s 56 -i 1 "${ping_target}" 2> /dev/null \
      | grep -Eo '\/[0-9]*\.[0-9]*\/' \
      | tr -d '/')"

    sleep "${ping_interval}"

    if [ -n "${ping_time}" ]; then
      ping_time="$(printf "%.0f" "${ping_time}")"
    else
      ping_time="999"
    fi

    ping_sample_list="$(awk '{ $'"${sample_count}"' = '"${ping_time}"';
                                  print $0 }' "${ping_sample_file}")"

    printf "%s\n" "${ping_sample_list}" > "${ping_sample_file}"

    if [ "${sample_count}" -ge "${ping_samples}" ]; then
      sample_count="1"
    else
      sample_count="$((sample_count + 1))"
    fi

  done
  return 0
}

bandwidth_monitor() {
  # Parameters
  # $1: Monitor Rates on this interface
  local interface="${1}"
  # Local Variables
  local rx_rate_file="${bandwidth_monitor_dir}/${interface}.rx"
  local tx_rate_file="${bandwidth_monitor_dir}/${interface}.tx"
  local RXPREV="-1"
  local TXPREV="-1"
  local RX
  local TX
  local BWRX
  local BWTX

  # FixMe
  # Print error, even when running in the background
  # Implement some better signal feedback to main fuction?
  touch "${rx_rate_file}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to create ${rx_rate_file}!" >&2
      kill ${$}
      exit 1
    }

  touch "${tx_rate_file}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to create ${tx_rate_file}!" >&2
      kill ${$}
      exit 1
    }

  while [ 1 = 1 ]; do
    RX="$(cat /sys/class/net/"${interface}"/statistics/rx_bytes)"
    TX="$(cat /sys/class/net/"${interface}"/statistics/tx_bytes)"

    if [ "${RXPREV}" -ne -1 ]; then
      BWRX="$(((RX - RXPREV) * 8 / 1000))"
      printf "%d\n" "${BWRX}" > "${rx_rate_file}"
    fi

    if [ "${TXPREV}" -ne -1 ]; then
      BWTX="$(((TX - TXPREV) * 8 / 1000))"
      printf "%d\n" "${BWTX}" > "${tx_rate_file}"
    fi

    RXPREV="${RX}"
    TXPREV="${TX}"
    sleep 1
  done

  return 0
}

get_ping_samples() {
  # Parameters
  local ping_target="${1}"
  # Local Variables
  local ping_sample_file="${ping_monitor_dir}/${ping_target}.spl"
  local ping_samples

  if ping_samples="$(cat "${ping_sample_file}" 2> /dev/null)" \
    && [ -n "${ping_samples}" ]; then
    printf "%s" "${ping_samples}"
    return 0
  else
    return 1
  fi
}

get_avg_median_ping() {
  # Parameters
  # $1: Amount of ping samples (user specified)
  local ping_target="${1}"
  # Local Variables
  local ping_sample_file="${ping_monitor_dir}/${ping_target}.spl"
  local ping_sample_list
  local ping_median_list
  local ping_sample_num
  local ping_avg

  if ! ping_sample_list="$(cat "${ping_sample_file}" 2> /dev/null)"; then
    return 1
  fi

  ping_sample_num="$(printf "%s\n" "${ping_sample_list}" \
    | wc -w)"

  ping_median_list="$(printf "%s\n" "${ping_sample_list}" \
    | tr " " "\n" \
    | sort -nr \
    | head -n -"$(((ping_sample_num + 10) / 10))" \
    | tail -n +"$((((ping_sample_num + 10) / 10) + 1))" \
    | tr "\n" " ")"

  # As long as the ping sample file is not completely filled
  # use normal avg ping calculation
  if printf "%s\n" "${ping_median_list}" | grep -Eoq '\b0\b'; then
    ping_median_list="${ping_sample_list}"
  fi

  ping_avg="$(printf "%s\n" "${ping_median_list}" \
    | awk '
                {
                avg = 0; skip = 0;
            				for (i = 1; i <= NF; i++) {
            					if ($i == 0)
            						skip += 1;
            						avg += $i };
            				if (avg == 0)
            					avg = 0;
            				else
            					avg /= (NF - skip);
                print avg
                }
                ')"

  if [ -n "${ping_avg}" ]; then
    printf "%.0f" "${ping_avg}"
    return 0
  else
    return 1
  fi
}

get_new_ping_target() {
  # Parameters
  local ping_target_curr="${1}"
  local ping_target_list="${2}"
  local blacklist_duration="${3}"
  # Local Variables
  local blacklist_target_file="${ping_monitor_dir}/${ping_target_curr}.blacklist"
  local blacklist_target_name
  local blacklist_target_timestamp
  local ping_sample_file
  local current_time
  local target

  touch "${blacklist_target_file}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to create ${blacklist_target_file}!" >&2
      exit 1
    }

  # Check if the current ping target can be removed from blacklist
  current_time="$(date '+%s')"
  blacklist_target_name="$(basename "${blacklist_target_file}" .blacklist)"
  ping_sample_file="${ping_monitor_dir}/${blacklist_target_name}.spl"

  if blacklist_target_timestamp="$(cat "${blacklist_target_file}" 2> /dev/null)" \
    && [ -n "${blacklist_target_timestamp}" ] \
    && [ "${blacklist_target_timestamp}" -lt "${current_time}" ]; then
    rm -rf "${blacklist_target_file}"
    rm -rf "${ping_sample_file}"
  else
    printf "%d\n" "$((current_time + blacklist_duration))" > "${blacklist_target_file}"
  fi

  # Check if we can re-add some ping_targets
  find "${ping_monitor_dir}" ! -name "$(printf "*\n*")" -name '*.blacklist' -maxdepth 1 > "${ping_monitor_dir}/blacklist.tmp"
  while IFS= read -r target; do
    blacklist_target_file="${target}"
    blacklist_target_name="$(basename "${target}" .blacklist)"
    blacklist_target_timestamp="$(cat "${target}")"
    ping_sample_file="${ping_monitor_dir}/${blacklist_target_name}.spl"

    if [ "${blacklist_target_timestamp}" -gt "${current_time}" ]; then
      # Escape the dots in IPs/Domains to make the second sed properly work
      blacklist_target_name="$(printf "%s\n" "${blacklist_target_name}" | sed 's/\./\\\./g')"
      ping_target_list="$(printf "%s\n" "${ping_target_list}" | sed 's/'"${blacklist_target_name}"'//g')"
    else
      rm -f "${blacklist_target_file}" 2> /dev/null
      rm -f "${ping_sample_file}" 2> /dev/null
    fi
  done < "${ping_monitor_dir}/blacklist.tmp"
  rm -f "${ping_monitor_dir}/blacklist.tmp" 2> /dev/null

  # Clean up string, just to be sure
  # Replace 2 or more whitespace characters with 1
  # Remove leading and trailing white space characters
  ping_target_list="$(printf "%s\n" "${ping_target_list}" | sed -E -e 's/\s{2,}/ /g' -e 's/(^\ |\ $)//g')"

  if [ -n "${ping_target_list}" ]; then
    printf "%s" "${ping_target_list}"
    return 0
  else
    return 1
  fi
}

get_rx_current_rate() {
  # Parameters
  local interface="${1}"
  # Local Variables
  local rx_rate_file="${bandwidth_monitor_dir}/${interface}.rx"
  local rx_rate

  if rx_rate="$(cat "${rx_rate_file}" 2> /dev/null)" \
    && [ -n "${rx_rate}" ]; then
    printf "%d" "${rx_rate}"
    return 0
  else
    return 1
  fi
}

get_tx_current_rate() {
  # Local Variables
  local interface="${1}"
  # Local Variables
  local tx_rate_file="${bandwidth_monitor_dir}/${interface}.tx"
  local tx_rate

  if tx_rate="$(cat "${tx_rate_file}" 2> /dev/null)" \
    && [ -n "${tx_rate}" ]; then
    printf "%d" "${tx_rate}"
    return 0
  else
    return 1
  fi
}

get_rx_max_rate() {
  # Parameters
  # $1: Get max rx rate from sqm config for this interface
  local interface="${1}"
  # Local Variables
  local rx_max_rate

  if rx_max_rate="$(uci get sqm."${interface}".download 2> /dev/null)" \
    && [ -n "${rx_max_rate}" ]; then
    printf "%d" "${rx_max_rate}"
    return 0
  else
    return 1
  fi
}

get_tx_max_rate() {
  # Parameters
  # $1: Get max tx rate from sqm config for this interface
  local interface="${1}"
  # Local Variables
  local tx_max_rate

  if tx_max_rate="$(uci get sqm."${interface}".upload 2> /dev/null)" \
    && [ -n "${tx_max_rate}" ]; then
    printf "%d" "${tx_max_rate}"
    return 0
  else
    return 1
  fi
}

usage() {
  echo "Usage:"
  echo "Nothing here yet!"
  exit 0
}

main() {
  # Args
  # -i|--interface Interface: to adjust Rates on (as in sqm config)
  # -r|--rx-min-threshold: Minimum RX Rate in Percent, Default: 30
  # -t|--tx-min-threshold: Minimum TX Rate in Percent, Default: 30
  # -f|--reduce-factor: Reduce Factor in Percent, amount will be subtracted from current Rates, Default: 20
  # -l|--ping-limit: Maximum Ping Limit in ms, above this Limit Rates will be reduced, Default: auto
  # -z|--ping-target: Target to Ping, Default: 1.1.1.1
  # -s|--ping-samples: Amount of Ping Samples to keep, Default: 5
  # -c|--ping-interval: Amount of time in seconds between pings, Default: 1
  # -a|--ping-fail-count: Max failed pings before a ping target gets blacklisted
  # -d|--cooldown-time: Amount of time to keep new rates, Default: 3600
  # -b|--blacklist-duration: Amount of time to black list a failing ping target, Default 900 sec
  # -h|--help: Prints help text
  local interface=""
  local rx_min_rate_threshold="30"
  local tx_min_rate_threshold="30"
  local reduce_factor="20"
  local ping_limit="auto"
  local ping_target_list="1.1.1.1 8.8.8.8 9.9.9.9"
  local ping_samples="5"
  local ping_interval="1"
  local ping_max_ping_fail_cnt="3"
  local ping_target_blacklist_duration="900"
  local cooldown_time="3600"
  # Local Variables
  local ping_mode
  local ping_mode_auto_offset
  local ping_avg
  local ping_sample_list
  local ping_target_list_tmp
  local rx_current_rate
  local tx_current_rate
  local rx_current_max_rate
  local tx_current_max_rate
  local rx_min_rate
  local tx_min_rate
  local rx_max_rate
  local tx_max_rate
  local cooldown_counter
  local sleep_counter
  local sleep_time
  local ping_monitor_pid
  local status

  while [ -n "${1}" ]; do
    case "${1}" in
      -i | --interface)
        shift
        interface="${1}"
        ;;
      -r | --rx-min-threshold)
        shift
        rx_min_rate_threshold="${1}"
        ;;
      -t | --tx-min-threshold)
        shift
        tx_min_rate_threshold="${1}"
        ;;
      -f | --reduce-factor)
        shift
        reduce_factor="${1}"
        ;;
      -l | --ping-limit)
        shift
        ping_limit="${1}"
        ;;
      -z | --ping-targets)
        shift
        ping_target_list="${1}"
        ;;
      -s | --ping-samples)
        shift
        ping_samples="${1}"
        ;;
      -c | --ping-interval)
        shift
        ping_interval="${1}"
        ;;
      -a | --ping-fails)
        shift
        ping_max_ping_fail_cnt="${1}"
        ;;
      -d | --cooldown-time)
        shift
        cooldown_time="${1}"
        ;;
      -b | --blacklist-duration)
        shift
        blacklist_duration="${1}"
        ;;
      -h | --help)
        usage
        exit 0
        ;;
      --)
        shift
        break
        ;;
      *)
        printf "%s" "Unrecognized option ${1}." && {
          usage
          exit 1
        }
        ;;
    esac
    shift
  done
  if ! uci -q get sqm."${interface}".interface > /dev/null 2>&1; then
    printf "%s" "Invalid interface specified!"
    exit 1
  fi

  if [ "$(uci -q get sqm."${interface}".enabled)" -eq "0" ]; then
    printf "%s" "SQM not enabled!"
    exit 1
  fi

  if [ "$(uci -q get sqm.eth1.qdisc)" != "cake" ]; then
    printf "%s" "SQM is not using cake qdisc!"
    exit 1
  fi

  if ! echo "${rx_min_rate_threshold}" | grep -Eoq '^[0-9]+$' \
    || [ "${rx_min_rate_threshold}" -lt "1" ] \
    || [ "${rx_min_rate_threshold}" -gt "100" ]; then
    printf "%s" "Invalid minimum RX rate rate specified! (1% - 100%)"
    exit 64
  fi

  if ! echo "${tx_min_rate_threshold}" | grep -Eoq '^[0-9]+$' \
    || [ "${tx_min_rate_threshold}" -lt "1" ] \
    || [ "${tx_min_rate_threshold}" -gt "100" ]; then
    printf "%s" "Invalid minimum TX rate rate specified! (1% - 100%)"
    exit 64
  fi

  if ! echo "${reduce_factor}" | grep -Eoq '^[0-9]+$' \
    || [ "${reduce_factor}" -lt "1" ] \
    || [ "${reduce_factor}" -gt "99" ]; then
    printf "%s" "Invalid reduce factor specified! (1% - 99%)"
    exit 64
  fi

  if echo "${ping_limit}" | grep -Eoq '^auto.*$'; then
    ping_mode="auto"
    ping_mode_auto_offset="$({ echo "${ping_limit}" \
      | grep -Eo '(\+|\-)[0-9]+'; } \
      || echo "+0")"

    if [ "${ping_mode_auto_offset}" -gt "100" ]; then
      ping_mode_auto_offset="+100"
    fi

    if [ "${ping_mode_auto_offset}" -lt "-100" ]; then
      ping_mode_auto_offset="-100"
    fi
    ping_limit="0"
  fi

  if { ! echo "${ping_limit}" | grep -Eoq '(^[0-9]+$)' \
    || [ "${ping_limit}" -lt "1" ] \
    || [ "${ping_limit}" -gt "100" ]; } \
    && [ "${ping_mode}" != "auto" ]; then
    printf "%s" "Invalid maximum ping limit specified! (1 - 100 ms)"
    exit 64
  fi

  # Remove duplicate ping targets
  ping_target_list="$(echo "${ping_target_list}" \
    | tr ' ' '\n' \
    | sort -u \
    | tr '\n' ' ' \
    | sed -E -e 's/\s{2,}/ /g' -e 's/(^\ |\ $)//g')"

  for ping_target in ${ping_target_list}; do
    if ! echo "${ping_target}" \
      | grep -Eoq -e '\b(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}\b' \
        -e '([a-z0-9A-Z]\.)*[a-z0-9-]+\.([a-z0-9]{2,24})+(\.co\.([a-z0-9]{2,24})|\.([a-z0-9]{2,24}))*'; then
      printf "%s" "No valid ping target specified!"
      exit 64
    fi
  done

  if ! echo "${ping_samples}" | grep -Eoq '^[0-9]+$' \
    || [ "${ping_samples}" -lt "5" ] \
    || [ "${ping_samples}" -gt "20" ]; then
    printf "%s" "Invalid ping sample amount specified! (5 - 20)"
    exit 64
  fi

  if ! echo "${ping_interval}" | grep -Eoq '^[0-9]+$' \
    || [ "${ping_interval}" -lt "1" ] \
    || [ "${ping_interval}" -gt "5" ]; then
    printf "%s" "Invalid ping interval specified! (1 - 5 seconds)"
    exit 64
  fi

  if ! echo "${ping_max_ping_fail_cnt}" | grep -Eoq '^[0-9]+$' \
    || [ "${ping_max_ping_fail_cnt}" -lt "1" ] \
    || [ "${ping_max_ping_fail_cnt}" -gt "1000" ]; then
    printf "%s" "Invalid max ping failed ping amount specified! (1 - 1000)"
    exit 64
  fi

  if ! echo "${ping_target_blacklist_duration}" | grep -Eoq '^[0-9]+$' \
    || [ "${ping_target_blacklist_duration}" -lt "30" ] \
    || [ "${ping_target_blacklist_duration}" -gt "86400" ]; then
    printf "%s" "Invalid blacklist duration specified! (30 - 86400 seconds)"
    exit 64
  fi

  if ! echo "${cooldown_time}" | grep -Eoq '^[0-9]+$' \
    || [ "${cooldown_time}" -lt "1" ] \
    || [ "${cooldown_time}" -gt "86400" ]; then
    printf "%s" "Invalid ping interval specified! (1 - 86400 seconds)"
    exit 64
  fi

  init

  bandwidth_monitor "${interface}" &

  trap "deinit" INT TERM
  trap "kill 0" EXIT

  rx_max_rate="$(get_rx_max_rate "${interface}" || echo "0")"
  tx_max_rate="$(get_tx_max_rate "${interface}" || echo "0")"
  rx_min_rate="$((rx_max_rate * rx_min_rate_threshold / 100))"
  tx_min_rate="$((tx_max_rate * tx_min_rate_threshold / 100))"
  rx_current_max_rate="${rx_max_rate}"
  tx_current_max_rate="${tx_max_rate}"

  status="INIT"
  ping_target="$(echo "${ping_target_list}" | awk '{ print $1 };')"
  ping_monitor_pid="-1"
  sleep_time="$(((ping_interval * ping_samples) + ping_interval))"
  sleep_counter="0"
  cooldown_counter="$((cooldown_time + 1))"
  display_width="$(((ping_samples * 3) + ((ping_samples - 1) + 3)))"

  while [ 1 = 1 ]; do
    ping_sample_list="$(get_ping_samples "${ping_target}" || echo "0")"
    ping_avg="$(get_avg_median_ping "${ping_target}" || echo "0")"
    rx_current_rate="$(get_rx_current_rate "${interface}" || echo "0")"
    tx_current_rate="$(get_tx_current_rate "${interface}" || echo "0")"

    clear && printf '\e[3J'

    printf "Current Rates        : %"$((display_width - 13))"s / %7s kbit/s (RX/TX)\n" \
      "${rx_current_rate}" "${tx_current_rate}"

    printf "Current Maximum Rates: %"$((display_width - 13))"s / %7s kbit/s (RX/TX)\n" \
      "${rx_current_max_rate}" "${tx_current_max_rate}"

    printf "Maximum Rates        : %"$((display_width - 13))"s / %7s kbit/s (RX/TX)\n" \
      "${rx_max_rate}" "${tx_max_rate}"

    printf "Minimum Rates        : %"$((display_width - 13))"s / %7s kbit/s (RX/TX)\n" \
      "${rx_min_rate}" "${tx_min_rate}"

    printf "Reduction Rates      : %"$((display_width - 13))"s / %7s kbit/s (RX/TX)\n" \
      "$((rx_current_max_rate * reduce_factor / 100))" \
      "$((tx_current_max_rate * reduce_factor / 100))"

    printf "Reduction Factor     : %"$((display_width - 3))"s %%\n" \
      "${reduce_factor}"

    printf "Ping Limit / Offset  : %"$((display_width - 3))"s ms %11s\n" \
      "${ping_limit}" "${ping_mode_auto_offset}  ms"

    printf "Ping (avg) / Interval: %"$((display_width - 3))"s ms %11s\n" \
      "${ping_avg}" "${ping_interval} sec"

    printf "Last Pings / Samples : %"$((display_width - 3))"s ms %11s\n" \
      "${ping_sample_list}" "${ping_samples} spl"

    printf "Ping Target          : %"$((display_width - 3))"s\n" \
      "${ping_target}"

    printf "Ping Monitor         : %"$((display_width - 3))"s\n" \
      "$({ [ "${ping_monitor_pid}" -ne "-1" ] && echo "ACTIVE"; } \
        || echo "INACTIVE")"

    printf "Cooldown             : %"$((display_width + 1))"s\n" \
      "$({ [ "${status}" = "IDLE" ] \
        && [ "${cooldown_counter}" -gt "0" ] \
        && [ "${cooldown_counter}" -lt "${cooldown_time}" ] \
        && echo "${cooldown_counter} sec"; } \
        || echo "INACTIVE    ")"

    printf "Status               : %"$((display_width - 3))"s\n" \
      "${status}"

    # Check if ping target is down
    if [ "$(echo "${ping_sample_list}" | grep -o '999' | wc -w)" -ge "${ping_max_ping_fail_cnt}" ]; then
      status="DOWN"
    fi

    case "${status}" in
      INIT)
        if [ "${sleep_counter}" -ge "${sleep_time}" ]; then
          if [ "${ping_mode}" = "auto" ]; then
            ping_limit="$(((((ping_avg + 5) * 2) / 10 * 10) + ping_mode_auto_offset))"
          fi

          if [ "${ping_limit}" -le "$((ping_avg + 10))" ]; then
            ping_limit="$((((ping_avg + 5) * 2) / 10 * 10))"
          fi

          if [ "${ping_limit}" -gt "100" ]; then
            ping_limit="100"
          fi

          sleep_counter="-1"
          status="IDLE"
        else
          if [ "${ping_monitor_pid}" -eq "-1" ]; then
            rx_current_max_rate="${rx_min_rate}"
            tx_current_max_rate="${tx_min_rate}"

            tc qdisc change root dev ifb4"${interface}" cake bandwidth "${rx_current_max_rate}"kbit
            tc qdisc change root dev "${interface}" cake bandwidth "${tx_current_max_rate}"kbit

            sleep "${sleep_time}"

            ping_monitor "${ping_target}" "${ping_samples}" "${ping_interval}" &
            ping_monitor_pid="${!}"
          fi
          sleep_counter="$((sleep_counter + 1))"
        fi
        ;;
      IDLE)
        if [ "${rx_current_max_rate}" -lt "${rx_max_rate}" ] \
          || [ "${tx_current_max_rate}" -lt "${tx_max_rate}" ]; then
          if [ "${cooldown_counter}" -gt "0" ] \
            && [ "${cooldown_counter}" -le "${cooldown_time}" ]; then
            cooldown_counter="$((cooldown_counter - 1))"
          else
            rx_current_max_rate="${rx_max_rate}"
            tx_current_max_rate="${tx_max_rate}"

            tc qdisc change root dev ifb4"${interface}" cake bandwidth "${rx_current_max_rate}"kbit
            tc qdisc change root dev "${interface}" cake bandwidth "${tx_current_max_rate}"kbit

            cooldown_counter="$((cooldown_time + 1))"
          fi
        fi

        if [ "${ping_monitor_pid}" -ne "-1" ]; then
          kill "${ping_monitor_pid}"
          ping_monitor_pid="-1"
        fi

        if [ "${rx_current_rate}" -ge "${rx_min_rate}" ] \
          || [ "${tx_current_rate}" -ge "${tx_min_rate}" ]; then
          status="MONITORING"
        fi

        ;;
      MONITORING)
        if [ "${ping_monitor_pid}" -eq "-1" ]; then
          ping_monitor "${ping_target}" "${ping_samples}" "${ping_interval}" &
          ping_monitor_pid="${!}"
        fi

        if [ "${rx_current_rate}" -ge "${rx_min_rate}" ] \
          || [ "${tx_current_rate}" -ge "${tx_min_rate}" ]; then
          if [ "${ping_avg}" -ge "${ping_limit}" ]; then
            cooldown_counter="${cooldown_time}"
            status="CONGESTION"
          fi
        else
          status="IDLE"
        fi
        ;;
      CONGESTION)
        if [ "${rx_current_rate}" -ge "${rx_min_rate}" ]; then
          rx_current_max_rate="$((rx_current_max_rate - (rx_current_max_rate * reduce_factor / 100)))"

          if [ "${rx_current_max_rate}" -le "${rx_min_rate}" ]; then
            rx_current_max_rate="${rx_min_rate}"
          fi

          tc qdisc change root dev ifb4"${interface}" cake bandwidth "${rx_current_max_rate}"kbit

          status="WAITING"
        elif [ "${tx_current_rate}" -ge "${tx_min_rate}" ]; then
          tx_current_max_rate="$((tx_current_max_rate - (tx_current_max_rate * reduce_factor / 100)))"

          if [ "${tx_current_max_rate}" -le "${tx_min_rate}" ]; then
            tx_current_max_rate="${tx_min_rate}"
          fi

          tc qdisc change root dev "${interface}" cake bandwidth "${tx_current_max_rate}"kbit

          status="WAITING"
        else
          status="MONITORING"
        fi
        ;;
      WAITING)
        if [ "${sleep_counter}" -ge "${sleep_time}" ]; then
          sleep_counter="0"
          status="MONITORING"
        else
          sleep_counter="$((sleep_counter + 1))"
        fi
        ;;
      DOWN)
        if [ "${ping_monitor_pid}" -ne "-1" ]; then
          kill "${ping_monitor_pid}"
          ping_monitor_pid="-1"
        fi

        while ! ping_target_list_tmp="$(get_new_ping_target \
          "${ping_target}" \
          ''"${ping_target_list}"'' \
          "${ping_target_blacklist_duration}")"; do

          wait_time="$((ping_target_blacklist_duration + 60))"

          while [ "${wait_time}" -gt "0" ]; do
            clear && printf '\e[3J'
            printf "%s\n" "No more ping targets left!"
            printf "%s\n" "Waiting ${wait_time} seconds!"
            wait_time="$((wait_time - 1))"
            sleep 1
          done
        done

        ping_target="$(echo "${ping_target_list_tmp}" | awk '{ print $1 }')"
        sleep_counter="0"
        status="INIT"
        ;;
      *)
        status="IDLE"
        ;;
    esac

    sleep 1
  done

  exit 0
}

main "${@}"

Args

-i|--interface Interface: to adjust Rates on (as in sqm config)
-r|--rx-min-threshold: Minimum RX Rate in Percent, Default: 30
-t|--tx-min-threshold: Minimum TX Rate in Percent, Default: 30
-f|--reduce-factor: Reduce Factor in Percent, amount will be subtracted from current Rates, Default: 10
-l|--ping-limit: Maximum Ping Limit in ms, above this Limit Rates will be reduced, Default: auto
-z|--ping-target: Target to Ping, Default: 1.1.1.1
-s|--ping-samples: Amount of Ping Samples to keep, Default: 5
-c|--ping-interval: Amount of time in ms between pings, Default: 1
-d|--cooldown-time: Amount of time to keep new rates, Default: 3600
-h|--help: Prints help text

// Changelog 01.03.2020

Added a simplistic round robin ping target fail over. When a ping target fails to respond 3 times it gets blacklisted for 15min. When no more ping targets are available the script will wait for the same amount as the blacklist time + 1 min.
multiple ping targets can now be passed, use " " to enclose the ping targets.
For Example script.sh -z "1.1.1.1 8.8.8.8 9.9.9.9"

dlakelan · February 20, 2020, 1:10am

Well, I'm fiddling with Erlang, hopefully not while Rome burns. I've got it pinging things, and I'm hoping I can extract the ping times using regexes after a little more playing tomorrow morning.

Pretty sure once I have ping times I can get it to do the math... To get started I think I'll simply put the sqm command line into the erlang code manually, with placeholders for the parameters... easier than writing a parser for config files.

Bndwdthseekr · February 20, 2020, 5:16am

Sure, I have an insane amount of storage, so I'm for sure willing to give it a go.

I'm reading through this thread, and there's alot of info for me to digest. I appreciate everyone's input so far! I'm going to be working tonight, so not sure if I will have the chance to do anything with this until later this morning, but I am still here, and will be checking back in as soon as I have time xD

dlakelan · February 20, 2020, 9:53pm

Good! we'll try it out... I don't have easy access to a testing situation here, but we'll see what we can do.

I'm not ready yet, but I do have a couple pages of Erlang code... so far it can ping sites and collect the timing statistics. It can update the bandwidth, given the appropriate tc command, and there's some slightly twisted logic to monitor a number of sites and decide to trigger a bandwidth reduction/increase based on sufficient number of sites with delays.... But none of it is tested, and some of it isn't quite finished...

It's a good project though, so thanks for the opportunity to test it out. I've been thinking about this kind of feedback mechanism for a while now. There are plenty of people with this variable bandwidth issue.

Ok. I've pushed the first draft of a script... https://github.com/dlakelan/routerperf/blob/master/sqmfeedback.erl

here's how you should test it..

download onto your router in /root
make sure you have installed erlang
edit the file near the bottom to change the interface names to the ones in use by your upstream and downstream interfaces, and the bandwidths (in Kbit/s) the three bandwidths for each are lower, initial, and upper limits.
Then, you'll need to compile and run it.

erlc sqmfeedback.erl
erl -noshell -s sqmfeedback main

Now it will print some logs and monitoring info about what it's up to. I can't promise it works or even that it doesn't break everything. I can say it hasn't got any malicious code which you should be able to see fairly straightfowardly. Basically it spawns off a bunch of threads to ping a number of big internet sites using names (so it'll use ipv6 if available). There's also a thread that just waits to get info about delays, and if enough of the sites have a delay, it asks another thread to reduce the bandwidth. It'll increase the bandwidth if there's no delays. Every time it monkeys with your bandwidth it should print a message about the command it's running.

As I say, be prepared to have it cause breakage etc, and we can monkey with it to be more fault tolerant and easier to use later. Let's just see if we can get an erlang script to work and ping things and maybe even amazingly adjust bandwidth!

For me, it does run and ping and periodically check the delays.

@moeller0 you might also enjoy playing with this in your copious spare time

Bndwdthseekr · February 22, 2020, 2:51am

when I run: erlc sqmfeedback.erl this is what I get. I'm pretty sure I'm not missing any dependencies though

{"init terminating in do_boot",{'cannot get bootfile','no_dot_erlang.boot'}}
init terminating in do_boot ({cannot get bootfile,no_dot_erlang.boot})

I'm not sure if setting all 3 at the same value for egress will muck things up, but I almost never have any fluctuation in my upstream, which I'm very thankful for.

    monitor_ifaces([{"tc qdisc change root dev pppoe-wan cake bandwidth ~BKbit diffserv4 dual-srchost overhead 34 ", 1024, 1024, 1024},
		    {"tc qdisc change root dev pppoe-wan cake bandwidth ~BKbit diffserv4 dual-dsthost nat overhead 34 ingress",1024,6944,6944}],

Here's a typical friday night, first using a limit of 2080kbps, then switching to a limit of 1200kbps. Not sure what the rest of the fam is doing online, but after setting the limit lower, I was streaming 480p youtube, and it seems to be handling that along with everyone else's usage pretty well. While still not perfect, that shows the difference it makes, and I can live with that! xD

There are somewhere between five and fifteen devices actively using bandwidth 24/7, so this will be a good test once I figure out why I can't compile the script. I'm looking around to see if I can find what's causing the problem, but no luck yet.

dlakelan · February 22, 2020, 4:46am

There are a bunch of erlang packages... let me see, perhaps the standard one only offers the interpreter... yeah, it looks like you probably need to install erlang-compiler

EDIT: also, https://github.com/exercism/erlang/issues/113 suggests try installing erlang-tools

please note that this is my first erlang project (I started reading about the language a year ago but didn't have a project to do in it), so I'm learning how it works while we go along

I would say that it's ok to start things at the upper end, but I'd recommend not having the lower end also be at the upper end... give it a little wiggle room, so for the upload direction maybe try 800,1024,1024

Bndwdthseekr · February 22, 2020, 5:33am

I did try that, and still no luck. I did try a few other packages which I thought may be related to this issue as well, then I got ticked and installed everything erlang related on the repo... still nothing