SQM autorate-ingress: Can I set thresholds for this?

Mmmeh, 8.8.8.8 and also gstatic.com are reasonably okay, but in a pinch will deprioritize ICMP echo generation and do strange things with ping in general (like responding ot large ICMP echo requests with short responses), but the bigger issue really is that unless one can guarantee that congestion only hits in the downstream, two one-way delay measurements are much better than a single round-trip-time, but almost no server will respond to icmp type 13requests with a type 14 response... That's why using irtt against a server under one's own control seems like a more robust design to me.
Then again, if that goal is "just" an improvement over the status quo then sure 8.8.8.8 and friends should get you started. :wink:

Statistics my friend, statistics.... :smiley:

my idea is to select 5 unconnected different reflectors, establish a baseline, and then at each evaluation see how much each reflector increases over the baseline and take the median increment. It seems unlikely that they'd all deprioritize or otherwise futz with the response at once, and the median is extremely robust to outliers. If the median response increases more than a few ms then you're experiencing congestion.

Of course, how much value you're going to get from dropping your shaper speed when its the upstream that's congested is anyone's guess. I would expect you'd mainly get some benefit if you are yourself using a fair amount of bandwidth, so that by reducing your speed you reduce the upstream congestion. This will only work if the other users don't just fill in that bandwidth you freed up.

Also, rather than rewrite the sqm config and reload, I'm thinking to just directly replace the IFB qdisc, which will be nicer to the flash.

So I'm thinking something like this (pseudocode)

pings = establish_baselines()

forever
  ping all the things
  calculate differences from baseline
  calculate m = median(differences)
  if (m > 10ms)
    x = 1 - rand()*0.2
    bw = bw * x
    bw = max(bw,myminbw)
    replace shaper with new bw
  else if (m < 3ms)
    x = 1+rand()*0.25
    bw = bw * x
    bw = min(bw, mymaxbw)
    replace shaper with new bw
  endif
  sleep 10
end

You can change the settings of a running cake instance
tc qdisc change

Using multiple independent reflectors is a good idea, like 1.1.1.1/cloudflare, 8.8.8.8/Google, and 9.9.9.9/IBM might be a decent mix, median will still work, as will voting. That still leaves the downstream upstream question. And the other question you raised, is backing off actually a productive strategy, but given an automated mechanism that should be relative easy to test. Final point is policy, how deep does one want to try network issues at the own router. Sure traffic shaping is great to control the own internet access link, might still be decent for, say a DSLAM's/OLT's/CMTS's upstream, but probably not a good solution for a selectively peering overload between the own ISP and any other AS.... :wink:

By all means go and test, gargoyle has something like that, but all my limited testing with ICMP reflectors in the internet left me turn away in disgust, but in retrospect I was not aiming for a decent good enough but was trying to measure a 0.250 millisecondcdelay increase...

Yeah, if you need that resolution you'll want your own custom reflector. But if you want to measure say a median of 20ms increase... I think this should be very doable.

After all the goal is to accept the occasional 5-15ms delays but avoid the

I think we may be able to repurpose some of the code I wrote for router performance monitoring... there wasn't much in the way of interest on that front, but it had a similar flavor..

For example it already has some code to ping stuff. We probably don't need a sqlite database, but it could just keep a collection of recent stats in memory.

Lua is really just not quite a real language... sigh

I'd love to do this in Erlang, which is all about high availability and fault tolerance and concurrency, and stuff like that, but it's not exactly lightweight, requiring 4.8MB of flash space. It would make things easy though, you just spawn off an erlang thread to read the latency from each ping responder, then have a loop that looks at the current latency compared to the historical latency and spawns off a thread to reduce the SQM speed when the increment is sufficiently high.

@Bndwdthseekr, since you're running extroot, are you up for an Erlang solution? I've been looking for an excuse to try it out anyway.

I'm just going to put some Erlang based design notes here since I am not going to be able to immediately start coding it, but don't want to lose the ideas:

There are two basic processes involved:

  1. Several process that each ping a different reflector and detect a delay condition... This is basically a loop that pings, reads the time, remembers the shortest say 5 times, and detects when the current time is more than say 20ms above the 5th shortest time for this reflector. When that happens it sends a "delayed" message to the adjuster thread...

  2. The adjuster thread. It waits for delayed messages. It counts delayed messages in a time-window (say 30 seconds) and if it gets N or more of them in that time window it cuts the bandwidth by a random fraction between 0.85 and 0.95... If it receives less than N-2 delay messages in its time window it increases the bandwidth by a fraction between 1.05 and 1.15. respecting both a minimum and maximum bound. If it gets N-2 or N-1 it does nothing. I'm thinking 5 reflectors, N=3 but you could also do something like 7 reflectors N=4.

Required functions:

  1. pingstats(N,A) should ping address A N times and return a list of times in ms, basically requires calling os:cmd and parsing out the results.
  2. setbw(I,B) should set the bandwidth on the cake instance on interface I to value B in kbps, by calling os:cmd("tc qdisc change ...")
  3. parsesqm(F) This should basically parse out the sqm parameters from the file F (normally /etc/config/sqm) so it knows the upper bound and extra options in use.

Seems doable in a reasonable time.

Yes, but I find it quite interesting to use the limited busybox shell.

Here is my try to implement a ping monitor:

#!/bin/sh
#title		: SQM Adaptive Rate
#purpose    : Serve the cake
#author		: shm0
#date       : 01/03/2020
#version	: 0.5

LC_ALL=C

readonly base_dir="/var/shm/sqm_adaptive_rate"
readonly ping_monitor_dir="${base_dir}/ping_monitor"
readonly bandwidth_monitor_dir="${base_dir}/bandwidth_monitor"

init() {
  # Initialize File and Folder Structure
  rm -rf "${base_dir}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to remove ${base_dir}!" >&2
      exit 1
    }

  mkdir -p "${base_dir}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to create ${base_dir}!" >&2
      exit 1
    }

  mkdir -p "${ping_monitor_dir}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to create ${ping_monitor_dir}!" >&2
      exit 1
    }

  mkdir -p "${bandwidth_monitor_dir}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to create ${bandwidth_monitor_dir}!" >&2
      exit 1
    }

  return 0
}

deinit() {
  # Clean up on exit
  rm -rf "${base_dir}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to remove ${base_dir}!" >&2
      exit 1
    }

  exit 0
}

ping_monitor() {
  # Parameters
  # $1: Target to ping
  # $2: Amount of ping samples to keep
  # $3: Amount of time between pings
  local ping_target="${1}"
  local ping_samples="${2}"
  local ping_interval="${3}"
  # Local Variables
  local ping_sample_file="${ping_monitor_dir}/${ping_target}.spl"
  local ping_sample_list
  local ping_time
  local sample_count
  local i

  # FixMe
  # Print error, even when running in the background
  # Implement some better signal feedback to main fuction?
  touch "${ping_sample_file}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to create ${ping_sample_file}!" >&2
      kill ${$}
      exit 1
    }

  # Check if the sample file has been Initialized
  # If not fill with 0 x ping samples num separated by whitespace
  # So awk can process it
  if [ ! -s "${ping_sample_file}" ]; then
    i="1"
    ping_sample_list="0"
    while [ "${i}" -lt "${ping_samples}" ]; do
      ping_sample_list="$(printf "%s %s" "${ping_sample_list}" "0")"
      i="$((i + 1))"
    done
    printf "%s\n" "${ping_sample_list}" > "${ping_sample_file}"
  fi

  sample_count="1"
  while [ 1 = 1 ]; do
    ping_time="$(ping -q -c1 -W1 -A -s 56 -i 1 "${ping_target}" 2> /dev/null \
      | grep -Eo '\/[0-9]*\.[0-9]*\/' \
      | tr -d '/')"

    sleep "${ping_interval}"

    if [ -n "${ping_time}" ]; then
      ping_time="$(printf "%.0f" "${ping_time}")"
    else
      ping_time="999"
    fi

    ping_sample_list="$(awk '{ $'"${sample_count}"' = '"${ping_time}"';
                                  print $0 }' "${ping_sample_file}")"

    printf "%s\n" "${ping_sample_list}" > "${ping_sample_file}"

    if [ "${sample_count}" -ge "${ping_samples}" ]; then
      sample_count="1"
    else
      sample_count="$((sample_count + 1))"
    fi

  done
  return 0
}

bandwidth_monitor() {
  # Parameters
  # $1: Monitor Rates on this interface
  local interface="${1}"
  # Local Variables
  local rx_rate_file="${bandwidth_monitor_dir}/${interface}.rx"
  local tx_rate_file="${bandwidth_monitor_dir}/${interface}.tx"
  local RXPREV="-1"
  local TXPREV="-1"
  local RX
  local TX
  local BWRX
  local BWTX

  # FixMe
  # Print error, even when running in the background
  # Implement some better signal feedback to main fuction?
  touch "${rx_rate_file}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to create ${rx_rate_file}!" >&2
      kill ${$}
      exit 1
    }

  touch "${tx_rate_file}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to create ${tx_rate_file}!" >&2
      kill ${$}
      exit 1
    }

  while [ 1 = 1 ]; do
    RX="$(cat /sys/class/net/"${interface}"/statistics/rx_bytes)"
    TX="$(cat /sys/class/net/"${interface}"/statistics/tx_bytes)"

    if [ "${RXPREV}" -ne -1 ]; then
      BWRX="$(((RX - RXPREV) * 8 / 1000))"
      printf "%d\n" "${BWRX}" > "${rx_rate_file}"
    fi

    if [ "${TXPREV}" -ne -1 ]; then
      BWTX="$(((TX - TXPREV) * 8 / 1000))"
      printf "%d\n" "${BWTX}" > "${tx_rate_file}"
    fi

    RXPREV="${RX}"
    TXPREV="${TX}"
    sleep 1
  done

  return 0
}

get_ping_samples() {
  # Parameters
  local ping_target="${1}"
  # Local Variables
  local ping_sample_file="${ping_monitor_dir}/${ping_target}.spl"
  local ping_samples

  if ping_samples="$(cat "${ping_sample_file}" 2> /dev/null)" \
    && [ -n "${ping_samples}" ]; then
    printf "%s" "${ping_samples}"
    return 0
  else
    return 1
  fi
}

get_avg_median_ping() {
  # Parameters
  # $1: Amount of ping samples (user specified)
  local ping_target="${1}"
  # Local Variables
  local ping_sample_file="${ping_monitor_dir}/${ping_target}.spl"
  local ping_sample_list
  local ping_median_list
  local ping_sample_num
  local ping_avg

  if ! ping_sample_list="$(cat "${ping_sample_file}" 2> /dev/null)"; then
    return 1
  fi

  ping_sample_num="$(printf "%s\n" "${ping_sample_list}" \
    | wc -w)"

  ping_median_list="$(printf "%s\n" "${ping_sample_list}" \
    | tr " " "\n" \
    | sort -nr \
    | head -n -"$(((ping_sample_num + 10) / 10))" \
    | tail -n +"$((((ping_sample_num + 10) / 10) + 1))" \
    | tr "\n" " ")"

  # As long as the ping sample file is not completely filled
  # use normal avg ping calculation
  if printf "%s\n" "${ping_median_list}" | grep -Eoq '\b0\b'; then
    ping_median_list="${ping_sample_list}"
  fi

  ping_avg="$(printf "%s\n" "${ping_median_list}" \
    | awk '
                {
                avg = 0; skip = 0;
            				for (i = 1; i <= NF; i++) {
            					if ($i == 0)
            						skip += 1;
            						avg += $i };
            				if (avg == 0)
            					avg = 0;
            				else
            					avg /= (NF - skip);
                print avg
                }
                ')"

  if [ -n "${ping_avg}" ]; then
    printf "%.0f" "${ping_avg}"
    return 0
  else
    return 1
  fi
}

get_new_ping_target() {
  # Parameters
  local ping_target_curr="${1}"
  local ping_target_list="${2}"
  local blacklist_duration="${3}"
  # Local Variables
  local blacklist_target_file="${ping_monitor_dir}/${ping_target_curr}.blacklist"
  local blacklist_target_name
  local blacklist_target_timestamp
  local ping_sample_file
  local current_time
  local target

  touch "${blacklist_target_file}" 2> /dev/null \
    || {
      printf "%s\n" "Error: Failed to create ${blacklist_target_file}!" >&2
      exit 1
    }

  # Check if the current ping target can be removed from blacklist
  current_time="$(date '+%s')"
  blacklist_target_name="$(basename "${blacklist_target_file}" .blacklist)"
  ping_sample_file="${ping_monitor_dir}/${blacklist_target_name}.spl"

  if blacklist_target_timestamp="$(cat "${blacklist_target_file}" 2> /dev/null)" \
    && [ -n "${blacklist_target_timestamp}" ] \
    && [ "${blacklist_target_timestamp}" -lt "${current_time}" ]; then
    rm -rf "${blacklist_target_file}"
    rm -rf "${ping_sample_file}"
  else
    printf "%d\n" "$((current_time + blacklist_duration))" > "${blacklist_target_file}"
  fi

  # Check if we can re-add some ping_targets
  find "${ping_monitor_dir}" ! -name "$(printf "*\n*")" -name '*.blacklist' -maxdepth 1 > "${ping_monitor_dir}/blacklist.tmp"
  while IFS= read -r target; do
    blacklist_target_file="${target}"
    blacklist_target_name="$(basename "${target}" .blacklist)"
    blacklist_target_timestamp="$(cat "${target}")"
    ping_sample_file="${ping_monitor_dir}/${blacklist_target_name}.spl"

    if [ "${blacklist_target_timestamp}" -gt "${current_time}" ]; then
      # Escape the dots in IPs/Domains to make the second sed properly work
      blacklist_target_name="$(printf "%s\n" "${blacklist_target_name}" | sed 's/\./\\\./g')"
      ping_target_list="$(printf "%s\n" "${ping_target_list}" | sed 's/'"${blacklist_target_name}"'//g')"
    else
      rm -f "${blacklist_target_file}" 2> /dev/null
      rm -f "${ping_sample_file}" 2> /dev/null
    fi
  done < "${ping_monitor_dir}/blacklist.tmp"
  rm -f "${ping_monitor_dir}/blacklist.tmp" 2> /dev/null

  # Clean up string, just to be sure
  # Replace 2 or more whitespace characters with 1
  # Remove leading and trailing white space characters
  ping_target_list="$(printf "%s\n" "${ping_target_list}" | sed -E -e 's/\s{2,}/ /g' -e 's/(^\ |\ $)//g')"

  if [ -n "${ping_target_list}" ]; then
    printf "%s" "${ping_target_list}"
    return 0
  else
    return 1
  fi
}

get_rx_current_rate() {
  # Parameters
  local interface="${1}"
  # Local Variables
  local rx_rate_file="${bandwidth_monitor_dir}/${interface}.rx"
  local rx_rate

  if rx_rate="$(cat "${rx_rate_file}" 2> /dev/null)" \
    && [ -n "${rx_rate}" ]; then
    printf "%d" "${rx_rate}"
    return 0
  else
    return 1
  fi
}

get_tx_current_rate() {
  # Local Variables
  local interface="${1}"
  # Local Variables
  local tx_rate_file="${bandwidth_monitor_dir}/${interface}.tx"
  local tx_rate

  if tx_rate="$(cat "${tx_rate_file}" 2> /dev/null)" \
    && [ -n "${tx_rate}" ]; then
    printf "%d" "${tx_rate}"
    return 0
  else
    return 1
  fi
}

get_rx_max_rate() {
  # Parameters
  # $1: Get max rx rate from sqm config for this interface
  local interface="${1}"
  # Local Variables
  local rx_max_rate

  if rx_max_rate="$(uci get sqm."${interface}".download 2> /dev/null)" \
    && [ -n "${rx_max_rate}" ]; then
    printf "%d" "${rx_max_rate}"
    return 0
  else
    return 1
  fi
}

get_tx_max_rate() {
  # Parameters
  # $1: Get max tx rate from sqm config for this interface
  local interface="${1}"
  # Local Variables
  local tx_max_rate

  if tx_max_rate="$(uci get sqm."${interface}".upload 2> /dev/null)" \
    && [ -n "${tx_max_rate}" ]; then
    printf "%d" "${tx_max_rate}"
    return 0
  else
    return 1
  fi
}

usage() {
  echo "Usage:"
  echo "Nothing here yet!"
  exit 0
}

main() {
  # Args
  # -i|--interface Interface: to adjust Rates on (as in sqm config)
  # -r|--rx-min-threshold: Minimum RX Rate in Percent, Default: 30
  # -t|--tx-min-threshold: Minimum TX Rate in Percent, Default: 30
  # -f|--reduce-factor: Reduce Factor in Percent, amount will be subtracted from current Rates, Default: 20
  # -l|--ping-limit: Maximum Ping Limit in ms, above this Limit Rates will be reduced, Default: auto
  # -z|--ping-target: Target to Ping, Default: 1.1.1.1
  # -s|--ping-samples: Amount of Ping Samples to keep, Default: 5
  # -c|--ping-interval: Amount of time in seconds between pings, Default: 1
  # -a|--ping-fail-count: Max failed pings before a ping target gets blacklisted
  # -d|--cooldown-time: Amount of time to keep new rates, Default: 3600
  # -b|--blacklist-duration: Amount of time to black list a failing ping target, Default 900 sec
  # -h|--help: Prints help text
  local interface=""
  local rx_min_rate_threshold="30"
  local tx_min_rate_threshold="30"
  local reduce_factor="20"
  local ping_limit="auto"
  local ping_target_list="1.1.1.1 8.8.8.8 9.9.9.9"
  local ping_samples="5"
  local ping_interval="1"
  local ping_max_ping_fail_cnt="3"
  local ping_target_blacklist_duration="900"
  local cooldown_time="3600"
  # Local Variables
  local ping_mode
  local ping_mode_auto_offset
  local ping_avg
  local ping_sample_list
  local ping_target_list_tmp
  local rx_current_rate
  local tx_current_rate
  local rx_current_max_rate
  local tx_current_max_rate
  local rx_min_rate
  local tx_min_rate
  local rx_max_rate
  local tx_max_rate
  local cooldown_counter
  local sleep_counter
  local sleep_time
  local ping_monitor_pid
  local status

  while [ -n "${1}" ]; do
    case "${1}" in
      -i | --interface)
        shift
        interface="${1}"
        ;;
      -r | --rx-min-threshold)
        shift
        rx_min_rate_threshold="${1}"
        ;;
      -t | --tx-min-threshold)
        shift
        tx_min_rate_threshold="${1}"
        ;;
      -f | --reduce-factor)
        shift
        reduce_factor="${1}"
        ;;
      -l | --ping-limit)
        shift
        ping_limit="${1}"
        ;;
      -z | --ping-targets)
        shift
        ping_target_list="${1}"
        ;;
      -s | --ping-samples)
        shift
        ping_samples="${1}"
        ;;
      -c | --ping-interval)
        shift
        ping_interval="${1}"
        ;;
      -a | --ping-fails)
        shift
        ping_max_ping_fail_cnt="${1}"
        ;;
      -d | --cooldown-time)
        shift
        cooldown_time="${1}"
        ;;
      -b | --blacklist-duration)
        shift
        blacklist_duration="${1}"
        ;;
      -h | --help)
        usage
        exit 0
        ;;
      --)
        shift
        break
        ;;
      *)
        printf "%s" "Unrecognized option ${1}." && {
          usage
          exit 1
        }
        ;;
    esac
    shift
  done
  if ! uci -q get sqm."${interface}".interface > /dev/null 2>&1; then
    printf "%s" "Invalid interface specified!"
    exit 1
  fi

  if [ "$(uci -q get sqm."${interface}".enabled)" -eq "0" ]; then
    printf "%s" "SQM not enabled!"
    exit 1
  fi

  if [ "$(uci -q get sqm.eth1.qdisc)" != "cake" ]; then
    printf "%s" "SQM is not using cake qdisc!"
    exit 1
  fi

  if ! echo "${rx_min_rate_threshold}" | grep -Eoq '^[0-9]+$' \
    || [ "${rx_min_rate_threshold}" -lt "1" ] \
    || [ "${rx_min_rate_threshold}" -gt "100" ]; then
    printf "%s" "Invalid minimum RX rate rate specified! (1% - 100%)"
    exit 64
  fi

  if ! echo "${tx_min_rate_threshold}" | grep -Eoq '^[0-9]+$' \
    || [ "${tx_min_rate_threshold}" -lt "1" ] \
    || [ "${tx_min_rate_threshold}" -gt "100" ]; then
    printf "%s" "Invalid minimum TX rate rate specified! (1% - 100%)"
    exit 64
  fi

  if ! echo "${reduce_factor}" | grep -Eoq '^[0-9]+$' \
    || [ "${reduce_factor}" -lt "1" ] \
    || [ "${reduce_factor}" -gt "99" ]; then
    printf "%s" "Invalid reduce factor specified! (1% - 99%)"
    exit 64
  fi

  if echo "${ping_limit}" | grep -Eoq '^auto.*$'; then
    ping_mode="auto"
    ping_mode_auto_offset="$({ echo "${ping_limit}" \
      | grep -Eo '(\+|\-)[0-9]+'; } \
      || echo "+0")"

    if [ "${ping_mode_auto_offset}" -gt "100" ]; then
      ping_mode_auto_offset="+100"
    fi

    if [ "${ping_mode_auto_offset}" -lt "-100" ]; then
      ping_mode_auto_offset="-100"
    fi
    ping_limit="0"
  fi

  if { ! echo "${ping_limit}" | grep -Eoq '(^[0-9]+$)' \
    || [ "${ping_limit}" -lt "1" ] \
    || [ "${ping_limit}" -gt "100" ]; } \
    && [ "${ping_mode}" != "auto" ]; then
    printf "%s" "Invalid maximum ping limit specified! (1 - 100 ms)"
    exit 64
  fi

  # Remove duplicate ping targets
  ping_target_list="$(echo "${ping_target_list}" \
    | tr ' ' '\n' \
    | sort -u \
    | tr '\n' ' ' \
    | sed -E -e 's/\s{2,}/ /g' -e 's/(^\ |\ $)//g')"

  for ping_target in ${ping_target_list}; do
    if ! echo "${ping_target}" \
      | grep -Eoq -e '\b(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}\b' \
        -e '([a-z0-9A-Z]\.)*[a-z0-9-]+\.([a-z0-9]{2,24})+(\.co\.([a-z0-9]{2,24})|\.([a-z0-9]{2,24}))*'; then
      printf "%s" "No valid ping target specified!"
      exit 64
    fi
  done

  if ! echo "${ping_samples}" | grep -Eoq '^[0-9]+$' \
    || [ "${ping_samples}" -lt "5" ] \
    || [ "${ping_samples}" -gt "20" ]; then
    printf "%s" "Invalid ping sample amount specified! (5 - 20)"
    exit 64
  fi

  if ! echo "${ping_interval}" | grep -Eoq '^[0-9]+$' \
    || [ "${ping_interval}" -lt "1" ] \
    || [ "${ping_interval}" -gt "5" ]; then
    printf "%s" "Invalid ping interval specified! (1 - 5 seconds)"
    exit 64
  fi

  if ! echo "${ping_max_ping_fail_cnt}" | grep -Eoq '^[0-9]+$' \
    || [ "${ping_max_ping_fail_cnt}" -lt "1" ] \
    || [ "${ping_max_ping_fail_cnt}" -gt "1000" ]; then
    printf "%s" "Invalid max ping failed ping amount specified! (1 - 1000)"
    exit 64
  fi

  if ! echo "${ping_target_blacklist_duration}" | grep -Eoq '^[0-9]+$' \
    || [ "${ping_target_blacklist_duration}" -lt "30" ] \
    || [ "${ping_target_blacklist_duration}" -gt "86400" ]; then
    printf "%s" "Invalid blacklist duration specified! (30 - 86400 seconds)"
    exit 64
  fi

  if ! echo "${cooldown_time}" | grep -Eoq '^[0-9]+$' \
    || [ "${cooldown_time}" -lt "1" ] \
    || [ "${cooldown_time}" -gt "86400" ]; then
    printf "%s" "Invalid ping interval specified! (1 - 86400 seconds)"
    exit 64
  fi

  init

  bandwidth_monitor "${interface}" &

  trap "deinit" INT TERM
  trap "kill 0" EXIT

  rx_max_rate="$(get_rx_max_rate "${interface}" || echo "0")"
  tx_max_rate="$(get_tx_max_rate "${interface}" || echo "0")"
  rx_min_rate="$((rx_max_rate * rx_min_rate_threshold / 100))"
  tx_min_rate="$((tx_max_rate * tx_min_rate_threshold / 100))"
  rx_current_max_rate="${rx_max_rate}"
  tx_current_max_rate="${tx_max_rate}"

  status="INIT"
  ping_target="$(echo "${ping_target_list}" | awk '{ print $1 };')"
  ping_monitor_pid="-1"
  sleep_time="$(((ping_interval * ping_samples) + ping_interval))"
  sleep_counter="0"
  cooldown_counter="$((cooldown_time + 1))"
  display_width="$(((ping_samples * 3) + ((ping_samples - 1) + 3)))"

  while [ 1 = 1 ]; do
    ping_sample_list="$(get_ping_samples "${ping_target}" || echo "0")"
    ping_avg="$(get_avg_median_ping "${ping_target}" || echo "0")"
    rx_current_rate="$(get_rx_current_rate "${interface}" || echo "0")"
    tx_current_rate="$(get_tx_current_rate "${interface}" || echo "0")"

    clear && printf '\e[3J'

    printf "Current Rates        : %"$((display_width - 13))"s / %7s kbit/s (RX/TX)\n" \
      "${rx_current_rate}" "${tx_current_rate}"

    printf "Current Maximum Rates: %"$((display_width - 13))"s / %7s kbit/s (RX/TX)\n" \
      "${rx_current_max_rate}" "${tx_current_max_rate}"

    printf "Maximum Rates        : %"$((display_width - 13))"s / %7s kbit/s (RX/TX)\n" \
      "${rx_max_rate}" "${tx_max_rate}"

    printf "Minimum Rates        : %"$((display_width - 13))"s / %7s kbit/s (RX/TX)\n" \
      "${rx_min_rate}" "${tx_min_rate}"

    printf "Reduction Rates      : %"$((display_width - 13))"s / %7s kbit/s (RX/TX)\n" \
      "$((rx_current_max_rate * reduce_factor / 100))" \
      "$((tx_current_max_rate * reduce_factor / 100))"

    printf "Reduction Factor     : %"$((display_width - 3))"s %%\n" \
      "${reduce_factor}"

    printf "Ping Limit / Offset  : %"$((display_width - 3))"s ms %11s\n" \
      "${ping_limit}" "${ping_mode_auto_offset}  ms"

    printf "Ping (avg) / Interval: %"$((display_width - 3))"s ms %11s\n" \
      "${ping_avg}" "${ping_interval} sec"

    printf "Last Pings / Samples : %"$((display_width - 3))"s ms %11s\n" \
      "${ping_sample_list}" "${ping_samples} spl"

    printf "Ping Target          : %"$((display_width - 3))"s\n" \
      "${ping_target}"

    printf "Ping Monitor         : %"$((display_width - 3))"s\n" \
      "$({ [ "${ping_monitor_pid}" -ne "-1" ] && echo "ACTIVE"; } \
        || echo "INACTIVE")"

    printf "Cooldown             : %"$((display_width + 1))"s\n" \
      "$({ [ "${status}" = "IDLE" ] \
        && [ "${cooldown_counter}" -gt "0" ] \
        && [ "${cooldown_counter}" -lt "${cooldown_time}" ] \
        && echo "${cooldown_counter} sec"; } \
        || echo "INACTIVE    ")"

    printf "Status               : %"$((display_width - 3))"s\n" \
      "${status}"

    # Check if ping target is down
    if [ "$(echo "${ping_sample_list}" | grep -o '999' | wc -w)" -ge "${ping_max_ping_fail_cnt}" ]; then
      status="DOWN"
    fi

    case "${status}" in
      INIT)
        if [ "${sleep_counter}" -ge "${sleep_time}" ]; then
          if [ "${ping_mode}" = "auto" ]; then
            ping_limit="$(((((ping_avg + 5) * 2) / 10 * 10) + ping_mode_auto_offset))"
          fi

          if [ "${ping_limit}" -le "$((ping_avg + 10))" ]; then
            ping_limit="$((((ping_avg + 5) * 2) / 10 * 10))"
          fi

          if [ "${ping_limit}" -gt "100" ]; then
            ping_limit="100"
          fi

          sleep_counter="-1"
          status="IDLE"
        else
          if [ "${ping_monitor_pid}" -eq "-1" ]; then
            rx_current_max_rate="${rx_min_rate}"
            tx_current_max_rate="${tx_min_rate}"

            tc qdisc change root dev ifb4"${interface}" cake bandwidth "${rx_current_max_rate}"kbit
            tc qdisc change root dev "${interface}" cake bandwidth "${tx_current_max_rate}"kbit

            sleep "${sleep_time}"

            ping_monitor "${ping_target}" "${ping_samples}" "${ping_interval}" &
            ping_monitor_pid="${!}"
          fi
          sleep_counter="$((sleep_counter + 1))"
        fi
        ;;
      IDLE)
        if [ "${rx_current_max_rate}" -lt "${rx_max_rate}" ] \
          || [ "${tx_current_max_rate}" -lt "${tx_max_rate}" ]; then
          if [ "${cooldown_counter}" -gt "0" ] \
            && [ "${cooldown_counter}" -le "${cooldown_time}" ]; then
            cooldown_counter="$((cooldown_counter - 1))"
          else
            rx_current_max_rate="${rx_max_rate}"
            tx_current_max_rate="${tx_max_rate}"

            tc qdisc change root dev ifb4"${interface}" cake bandwidth "${rx_current_max_rate}"kbit
            tc qdisc change root dev "${interface}" cake bandwidth "${tx_current_max_rate}"kbit

            cooldown_counter="$((cooldown_time + 1))"
          fi
        fi

        if [ "${ping_monitor_pid}" -ne "-1" ]; then
          kill "${ping_monitor_pid}"
          ping_monitor_pid="-1"
        fi

        if [ "${rx_current_rate}" -ge "${rx_min_rate}" ] \
          || [ "${tx_current_rate}" -ge "${tx_min_rate}" ]; then
          status="MONITORING"
        fi

        ;;
      MONITORING)
        if [ "${ping_monitor_pid}" -eq "-1" ]; then
          ping_monitor "${ping_target}" "${ping_samples}" "${ping_interval}" &
          ping_monitor_pid="${!}"
        fi

        if [ "${rx_current_rate}" -ge "${rx_min_rate}" ] \
          || [ "${tx_current_rate}" -ge "${tx_min_rate}" ]; then
          if [ "${ping_avg}" -ge "${ping_limit}" ]; then
            cooldown_counter="${cooldown_time}"
            status="CONGESTION"
          fi
        else
          status="IDLE"
        fi
        ;;
      CONGESTION)
        if [ "${rx_current_rate}" -ge "${rx_min_rate}" ]; then
          rx_current_max_rate="$((rx_current_max_rate - (rx_current_max_rate * reduce_factor / 100)))"

          if [ "${rx_current_max_rate}" -le "${rx_min_rate}" ]; then
            rx_current_max_rate="${rx_min_rate}"
          fi

          tc qdisc change root dev ifb4"${interface}" cake bandwidth "${rx_current_max_rate}"kbit

          status="WAITING"
        elif [ "${tx_current_rate}" -ge "${tx_min_rate}" ]; then
          tx_current_max_rate="$((tx_current_max_rate - (tx_current_max_rate * reduce_factor / 100)))"

          if [ "${tx_current_max_rate}" -le "${tx_min_rate}" ]; then
            tx_current_max_rate="${tx_min_rate}"
          fi

          tc qdisc change root dev "${interface}" cake bandwidth "${tx_current_max_rate}"kbit

          status="WAITING"
        else
          status="MONITORING"
        fi
        ;;
      WAITING)
        if [ "${sleep_counter}" -ge "${sleep_time}" ]; then
          sleep_counter="0"
          status="MONITORING"
        else
          sleep_counter="$((sleep_counter + 1))"
        fi
        ;;
      DOWN)
        if [ "${ping_monitor_pid}" -ne "-1" ]; then
          kill "${ping_monitor_pid}"
          ping_monitor_pid="-1"
        fi

        while ! ping_target_list_tmp="$(get_new_ping_target \
          "${ping_target}" \
          ''"${ping_target_list}"'' \
          "${ping_target_blacklist_duration}")"; do

          wait_time="$((ping_target_blacklist_duration + 60))"

          while [ "${wait_time}" -gt "0" ]; do
            clear && printf '\e[3J'
            printf "%s\n" "No more ping targets left!"
            printf "%s\n" "Waiting ${wait_time} seconds!"
            wait_time="$((wait_time - 1))"
            sleep 1
          done
        done

        ping_target="$(echo "${ping_target_list_tmp}" | awk '{ print $1 }')"
        sleep_counter="0"
        status="INIT"
        ;;
      *)
        status="IDLE"
        ;;
    esac

    sleep 1
  done

  exit 0
}

main "${@}"


Args

-i|--interface Interface: to adjust Rates on (as in sqm config)
-r|--rx-min-threshold: Minimum RX Rate in Percent, Default: 30
-t|--tx-min-threshold: Minimum TX Rate in Percent, Default: 30
-f|--reduce-factor: Reduce Factor in Percent, amount will be subtracted from current Rates, Default: 10
-l|--ping-limit: Maximum Ping Limit in ms, above this Limit Rates will be reduced, Default: auto
-z|--ping-target: Target to Ping, Default: 1.1.1.1
-s|--ping-samples: Amount of Ping Samples to keep, Default: 5
-c|--ping-interval: Amount of time in ms between pings, Default: 1
-d|--cooldown-time: Amount of time to keep new rates, Default: 3600
-h|--help: Prints help text

// Changelog 01.03.2020

  • Added a simplistic round robin ping target fail over. When a ping target fails to respond 3 times it gets blacklisted for 15min. When no more ping targets are available the script will wait for the same amount as the blacklist time + 1 min.
  • multiple ping targets can now be passed, use " " to enclose the ping targets.
    For Example script.sh -z "1.1.1.1 8.8.8.8 9.9.9.9"
2 Likes

Well, I'm fiddling with Erlang, hopefully not while Rome burns. I've got it pinging things, and I'm hoping I can extract the ping times using regexes after a little more playing tomorrow morning.

Pretty sure once I have ping times I can get it to do the math... To get started I think I'll simply put the sqm command line into the erlang code manually, with placeholders for the parameters... easier than writing a parser for config files.

Sure, I have an insane amount of storage, so I'm for sure willing to give it a go.

I'm reading through this thread, and there's alot of info for me to digest. I appreciate everyone's input so far! I'm going to be working tonight, so not sure if I will have the chance to do anything with this until later this morning, but I am still here, and will be checking back in as soon as I have time xD

Good! we'll try it out... I don't have easy access to a testing situation here, but we'll see what we can do.

I'm not ready yet, but I do have a couple pages of Erlang code... so far it can ping sites and collect the timing statistics. It can update the bandwidth, given the appropriate tc command, and there's some slightly twisted logic to monitor a number of sites and decide to trigger a bandwidth reduction/increase based on sufficient number of sites with delays.... But none of it is tested, and some of it isn't quite finished...

It's a good project though, so thanks for the opportunity to test it out. I've been thinking about this kind of feedback mechanism for a while now. There are plenty of people with this variable bandwidth issue.

Ok. I've pushed the first draft of a script... https://github.com/dlakelan/routerperf/blob/master/sqmfeedback.erl

here's how you should test it..

  1. download onto your router in /root
  2. make sure you have installed erlang
  3. edit the file near the bottom to change the interface names to the ones in use by your upstream and downstream interfaces, and the bandwidths (in Kbit/s) the three bandwidths for each are lower, initial, and upper limits.
  4. Then, you'll need to compile and run it.
erlc sqmfeedback.erl
erl -noshell -s sqmfeedback main

Now it will print some logs and monitoring info about what it's up to. I can't promise it works or even that it doesn't break everything. I can say it hasn't got any malicious code which you should be able to see fairly straightfowardly. Basically it spawns off a bunch of threads to ping a number of big internet sites using names (so it'll use ipv6 if available). There's also a thread that just waits to get info about delays, and if enough of the sites have a delay, it asks another thread to reduce the bandwidth. It'll increase the bandwidth if there's no delays. Every time it monkeys with your bandwidth it should print a message about the command it's running.

As I say, be prepared to have it cause breakage etc, and we can monkey with it to be more fault tolerant and easier to use later. Let's just see if we can get an erlang script to work and ping things and maybe even amazingly adjust bandwidth!

For me, it does run and ping and periodically check the delays.

@moeller0 you might also enjoy playing with this in your copious spare time :rofl:

2 Likes

when I run: erlc sqmfeedback.erl this is what I get. I'm pretty sure I'm not missing any dependencies though :confused:

{"init terminating in do_boot",{'cannot get bootfile','no_dot_erlang.boot'}}
init terminating in do_boot ({cannot get bootfile,no_dot_erlang.boot})

I'm not sure if setting all 3 at the same value for egress will muck things up, but I almost never have any fluctuation in my upstream, which I'm very thankful for.

    monitor_ifaces([{"tc qdisc change root dev pppoe-wan cake bandwidth ~BKbit diffserv4 dual-srchost overhead 34 ", 1024, 1024, 1024},
		    {"tc qdisc change root dev pppoe-wan cake bandwidth ~BKbit diffserv4 dual-dsthost nat overhead 34 ingress",1024,6944,6944}],

Here's a typical friday night, first using a limit of 2080kbps, then switching to a limit of 1200kbps. Not sure what the rest of the fam is doing online, but after setting the limit lower, I was streaming 480p youtube, and it seems to be handling that along with everyone else's usage pretty well. While still not perfect, that shows the difference it makes, and I can live with that! xD

There are somewhere between five and fifteen devices actively using bandwidth 24/7, so this will be a good test once I figure out why I can't compile the script. I'm looking around to see if I can find what's causing the problem, but no luck yet.


There are a bunch of erlang packages... let me see, perhaps the standard one only offers the interpreter... yeah, it looks like you probably need to install erlang-compiler

EDIT: also, https://github.com/exercism/erlang/issues/113 suggests try installing erlang-tools

please note that this is my first erlang project (I started reading about the language a year ago but didn't have a project to do in it), so I'm learning how it works while we go along :wink:

I would say that it's ok to start things at the upper end, but I'd recommend not having the lower end also be at the upper end... give it a little wiggle room, so for the upload direction maybe try 800,1024,1024

I did try that, and still no luck. I did try a few other packages which I thought may be related to this issue as well, then I got ticked and installed everything erlang related on the repo... still nothing :confused:

ok, I'll get it running on an OpenWrt vm and see if I can figure out what's up. will have to be tomorrow though.

mean time what happens if you run the erl command without the erlc compile step?

I didn't try that until now, here's what I got:

root@Main_SQM:~# erl sqmfeedback.erl
Erlang/OTP 21 [erts-10.0] [source] [smp:1:1] [ds:1:1:10] [async-threads:1]

Eshell V10.0  (abort with ^G)
1>

try that one instead

Hmm...

root@Main_SQM:~# erl -noshell -s sqmfeedback main
{"init terminating in do_boot",{undef,[{sqmfeedback,main,[],[]},{init,start_em,1,[]},{init,do_boot,3,[]}]}}
init terminating in do_boot ({undef,[{sqmfeedback,main,[],[]},{init,start_em,1,[]},{init,do_boot,3,[]}]})

Crash dump is being written to: erl_crash.dump...done

No rush man, I appreciate the help to begin with xD
Ill do some research on my own as well and see if I can come up with anything.

yeah it's weird. I'll have to try it in the VM, it all works fine on my desktop machine. thanks for being the guinea pig.

you could run erl by itself and then type c(sqmfeedback). (note the period) at the erlang shell but I'm guessing you'll get more of the same.

ahh, this does shed more light on the issue...

1> c(sqmfeedback).
sqmfeedback.erl:46: Warning: variable 'Time' is unused
sqmfeedback.erl:74: Warning: variable 'T' is unused
sqmfeedback.erl:80: Warning: variable 'SitePids' is unused
sqmfeedback.erl:83: Warning: variable 'TimerPid' is unused
{ok,sqmfeedback}

actually, it doesn't, that was a clean compile, those warnings are just there to help you catch bugs in case you were supposed to use a variable but didn't. in this case I just assigned some stuff but didn't use the assigned value... now I'm really confused because it compiled fine :man_shrugging:

My ISP pretty much gives me stable rates with decent peering/transit, so I am confident that would rarely/never trigger on my link. (And I do not consider this to be a reasonable solution for the fact that on my link the sync speed varies between retrains, as this is only happening every few days and speeds are reliable while the link is up). So yes I am tempted to play with that, although I am also puzzled by selecting erlang for this task, that seems like using ICBMs to drive away a flock of sparrows :wink: