Gotcha. Should the free check space be hard coded and if so what value to use?
Not sure the easiest is to punt that to the user ;).
But thinking this over, I think the free space test can not be used for initiating rotations*, but just as a circuit breaker to stop logging with a final error message (also piped to the openwrt log that can be accessed by logread) informing the user that autorate stopped logging to avoid causing a critical condition and that the user should fix things.
*) It would work on the first rotation when the previous old file gets deleted and space reappears, but I see a problem if by that method the log files become increasingly smaller so we might end up rotating on every write.... which might still be preferably to no logging, but that seems not a healthy thing to do.
Now would be a good time to think about how to design alternative delay collection functions.
I think something with an initialize_pingers() function that handles the differences and starts the pinger(s) might work, so that multiple individual processes for ip-utils ping and hping3 can be selected or a single pinger with interleaved addresses for fping.
I really think it is worth the hassle to keep it possible to switch the delay source by a simple configurable parameter (maybe even at run-time, e.g. if all ICMP timestamp sources fail, maybe fall-back to ICMP echo requests).
The obvious change is to switch RTT sources to reporting pseudo OWDs (fill both with the RTT value) and make the rest of the code treat incoming and outgoing delays separately.
What we have at present in the main branch is actually not that far off from this.
We have main loop read from a global 'ping_fifo' like this:
while read -t $global_ping_response_timeout_s -r timestamp reflector seq rtt_baseline_us rtt_us rtt_delta_us
do
# do stuff
done</tmp/cake-autorate/ping_fifo
Therefore so long as we have processed write out to the 'ping_fifo' in format:
timestamp reflector seq rtt_baseline_us rtt_us rtt_delta_us
Then the main loop doesn't care what feeds it.
Right now for the iputils-ping case I have each ping process write out to its own fifo, and those are read in to write out to the global fifo:
monitor_reflector_responses()
{
# maintain baseline and output deltas to a common fifo
local pinger=$1
local rtt_baseline_us=$2
while read -r timestamp _ _ _ reflector seq_rtt
do
# If no match then skip onto the next one
[[ $seq_rtt =~ icmp_[s|r]eq=([0-9]+).*time=([0-9]+)\.?([0-9]+)?[[:space:]]ms ]] || continue
seq=${BASH_REMATCH[1]}
rtt_us=${BASH_REMATCH[3]}000
rtt_us=$((${BASH_REMATCH[2]}000+10#${rtt_us:0:3}))
reflector=${reflector//:/}
rtt_delta_us=$(( $rtt_us-$rtt_baseline_us ))
alpha=$(( (( $rtt_delta_us >=0 )) ? $alpha_baseline_increase : $alpha_baseline_decrease ))
rtt_baseline_us=$(( ( (1000-$alpha)*$rtt_baseline_us+$alpha*$rtt_us )/1000 ))
printf '%s %s %s %s %s %s\n' "$timestamp" "$reflector" "$seq" "$rtt_baseline_us" "$rtt_us" "$rtt_delta_us" > /tmp/cake-autorate/ping_fifo
printf '%s' "${timestamp//[[\[\].]}" > /tmp/cake-autorate/reflector_${pinger}_last_timestamp_us
done</tmp/cake-autorate/pinger_${pinger}_fifo
}
So perhaps I should modify the fping approach to actually use this format (i.e. the same main loop format) to help establish a common format to easily plug and play different utilities.
Start and stop functions can be provided as part of 'maintain_pingers()', which for the main branch already has e.g.:
start_pinger_next_pinger_time_slot()
{
# wait until next pinger time slot and start pinger in its slot
# this allows pingers to be stopped and started (e.g. during sleep or reflector rotation)
# whilst ensuring pings will remain spaced out appropriately to maintain granularity
local pinger=$1
local -n pinger_pid=$2
t_start_us=${EPOCHREALTIME/./}
time_to_next_time_slot_us=$(( ($reflector_ping_interval_us-($t_start_us-$pingers_t_start_us)%$reflector_ping_interval_us) + $pinger*$ping_response_interval_us ))
sleep_remaining_tick_time $t_start_us $time_to_next_time_slot_us
if (($debug)); then
ping -D -i $reflector_ping_interval_s ${reflectors[$pinger]} > /tmp/cake-autorate/pinger_${pinger}_fifo &
pinger_pid=$!
else
ping -D -i $reflector_ping_interval_s ${reflectors[$pinger]} > /tmp/cake-autorate/pinger_${pinger}_fifo 2> /dev/null &
pinger_pid=$!
fi
monitor_reflector_responses $pinger ${rtt_baselines_us[$pinger]} &
}
and
kill_pingers()
{
for (( pinger=0; pinger<$no_pingers; pinger++))
do
kill ${pinger_pids[$pinger]} 2> /dev/null
[[ -p /tmp/cake-autorate/pinger_${pinger}_fifo ]] && rm /tmp/cake-autorate/pinger_${pinger}_fifo
done
exit
}
So we basically have:
- a) common global format for all ping utilities in main loop
- b) maintain_pingers() wrapper for every ping utility with appropriate start/stop functions. And maintain_pingers() is responsible for ensuring that the global fifo is written to correctly in format:
timestamp reflector seq rtt_baseline_us rtt_us rtt_delta_us
So @moeller0 I think the change you are suggesting is to switch from rtt to owd in the main loop and adapt accordingly in maintain_pingers() for the iputils-ping and fping cases.
Other than that, do we have all the essential elements in this format? If not, what shall we add in?
And maybe separate sourceable files for iputils-ping and fping in the main branch? Or alternatively separate functions that are defined in dependence upon variable that selects pinger to keep things in one file and avoid installation complexity for the basic use case. And provide a template file for anyone to write their own wrapper?
I think you've been calling all along for this, but I realise now is the perfect time to do it as we have now four options: iputils-ping, fping, hping3 and also @tievolu's perl-based ping_ts. And I still have the generic-ish format in the main branch (albeit I didn't follow it in fping, but I totally see the value in making this all generic now).
Finally @patrakov well done in getting hping3 taken on!
@tievolu I have a question regarding your perl-based ping_ts. Would it be a lot of extra work to make it either offer round robin pinging, or alternatively that for each instance it sends ping ECHOs at precisely defined points relative to a common clock? You see a problem @patrakov identified with multiple ping instances is that there tends to be drift and thus either one needs round robin like fping with tightly defined intervals or sending at relative offsets relative to a common clock to ensure there is no drift in the sends so that the sends remain at good spacing therebetween to achieve good granularity. Hope this makes sense? So something like not just send with spacing X but send at spacing X with offset Y from system clock position 0 seconds? See my 'start_pinger_next_pinger_time_slot()' function above that staggers pings, but this only works for a while and then the spread drifts and even can become synchronised in weird situations.
@Lynx Here's the log:
https://drive.google.com/file/d/1Vhax8iFPjE4SEPaEsQH9B2aauduKES9O/view?usp=sharing
Let me know if you need anything else, and, if possible, show us your graphs, I'm curious.
Yes except we probably want:
timestamp reflector seq dl_owd_baseline_us dl_owd_us dl_owd_delta_us ul_owd_baseline_us ul_owd_us ul_owd_delta_us
maybe just use
monitor_reflector_responses_iputils_ping_RTT(), monitor_reflector_responses_fping_RTT(), monitor_reflector_responses_hping3_OWD()
that is create a specific function for each type and just construct the type from variables
cur_probe_binary="hping3"
cur_probe_type="OWD"
REFLECTOR_MONITOR_FUNCTION="monitor_reflector_responses_${cur_probe_binary}_${cur_probe_type}"
${REFLECTOR_MONITOR_FUNCTION} all call arguments as before
Yes, have the same service function, having a single fifo to multiplex the different results into and reading that from the main loop is a decent interface.
Yes, and it might be possible to avoid a switch statement by simply constructing the function name like shown above...
That makes sense, once these become large enough a separate file might be useful to keep things readable.
+1; as you say now is a good time, because we should be able to actually test and exercise these new delay measurement methods.
Not sure what to make of this since it looks so different to the typical LTE graph we see. This is on cable? Seems like we see a sleep event followed by some upload usage and then bufferbloat in the middle resulting in drop to 150Mbit/s.
Example of bufferbloat event on upload and surrounding data:
DATA 2022-09-22-08:10:56 1663845057 1663845057 827 36841 0 99 1663845057 8.8.8.8 648 5317 6440 1124 25377 0 dl_low ul_high 250000 37000
DATA 2022-09-22-08:10:56 1663845057 1663845057 830 36844 0 99 1663845057 8.8.4.4 648 4988 6140 1153 25377 0 dl_low ul_high 250000 37000
DATA 2022-09-22-08:10:56 1663845057 1663845057 830 36844 0 99 1663845057 1.1.1.1 649 5436 7270 1835 25377 0 dl_low ul_high 250000 37000
DATA 2022-09-22-08:10:56 1663845057 1663845057 830 36844 0 99 1663845057 1.0.0.1 649 5037 6900 1864 25377 0 dl_low ul_high 250000 37000
DATA 2022-09-22-08:10:57 1663845057 1663845057 830 36844 0 99 1663845057 8.8.8.8 649 5318 6720 1403 25377 0 dl_low ul_high 250000 37000
DATA 2022-09-22-08:10:57 1663845057 1663845057 822 36841 0 99 1663845057 8.8.4.4 649 4989 6560 1572 25377 0 dl_low ul_high 250000 37000
DATA 2022-09-22-08:10:57 1663845057 1663845057 822 36841 0 99 1663845057 1.1.1.1 650 5457 26800 21364 25377 0 dl_low ul_high 250000 37000
DATA 2022-09-22-08:10:57 1663845057 1663845057 822 36841 0 99 1663845057 1.0.0.1 650 5063 31700 26663 25377 1 dl_low ul_high 250000 37000
DATA 2022-09-22-08:10:57 1663845057 1663845057 822 36841 0 99 1663845057 8.8.8.8 650 5348 36300 30982 25377 2 dl_low_bb ul_high_bb 225000 33156
DATA 2022-09-22-08:10:57 1663845057 1663845057 670 33147 0 99 1663845057 8.8.4.4 650 5016 32000 27011 25421 3 dl_low_bb ul_high_bb 225000 33156
DATA 2022-09-22-08:10:57 1663845057 1663845057 670 33147 0 99 1663845057 1.1.1.1 651 5484 32600 27143 25421 4 dl_low_bb ul_high_bb 225000 33156
DATA 2022-09-22-08:10:57 1663845057 1663845057 670 33147 0 99 1663845057 1.0.0.1 651 5088 30700 25637 25421 4 dl_low_bb ul_high_bb 225000 33156
DATA 2022-09-22-08:10:57 1663845057 1663845057 670 33147 0 99 1663845057 8.8.8.8 651 5372 30200 24852 25421 3 dl_low_bb ul_high_bb 225000 33156
DATA 2022-09-22-08:10:57 1663845058 1663845058 765 31739 0 95 1663845058 8.8.4.4 651 5040 29400 24384 25421 2 dl_low_bb ul_high_bb 225000 33156
DATA 2022-09-22-08:10:57 1663845058 1663845058 765 31739 0 95 1663845058 1.1.1.1 652 5508 30000 24516 25421 1 dl_low ul_high 225000 33156
DATA 2022-09-22-08:10:57 1663845058 1663845058 765 31739 0 95 1663845058 1.0.0.1 652 5111 29000 23912 25421 0 dl_low ul_high 225000 33487
DATA 2022-09-22-08:10:57 1663845058 1663845058 765 31739 0 94 1663845058 8.8.8.8 652 5395 28400 23028 25417 0 dl_low ul_high 225000 33821
DATA 2022-09-22-08:10:57 1663845058 1663845058 772 33153 0 98 1663845058 8.8.4.4 652 5063 28300 23260 25414 0 dl_low ul_high 225000 34159
DATA 2022-09-22-08:10:57 1663845058 1663845058 772 33153 0 97 1663845058 1.1.1.1 653 5529 27500 21992 25410 0 dl_low ul_high 225000 34500
DATA 2022-09-22-08:10:57 1663845058 1663845058 772 33153 0 96 1663845058 1.0.0.1 653 5134 28700 23589 25406 0 dl_low ul_high 225000 34845
And here is another upload related bufferbloat event:
DATA 2022-09-22-08:12:47 1663845167 1663845167 76507 36676 62 99 1663845167 8.8.4.4 1199 5176 7900 2726 25427 0 dl_low ul_high 123269 37000
DATA 2022-09-22-08:12:47 1663845167 1663845167 76507 36676 62 99 1663845167 1.1.1.1 1200 5595 6240 645 25427 0 dl_low ul_high 123269 37000
DATA 2022-09-22-08:12:47 1663845167 1663845167 76507 36676 62 99 1663845167 1.0.0.1 1200 5752 7650 1899 25427 0 dl_low ul_high 123269 37000
DATA 2022-09-22-08:12:47 1663845167 1663845167 76507 36676 62 99 1663845167 8.8.8.8 1200 5158 6300 1143 25427 0 dl_low ul_high 124501 37000
DATA 2022-09-22-08:12:47 1663845167 1663845167 76695 36620 61 98 1663845167 8.8.4.4 1200 5187 17100 11924 25426 0 dl_low ul_high 124501 37000
DATA 2022-09-22-08:12:47 1663845168 1663845168 76695 36620 61 98 1663845168 1.1.1.1 1201 5619 30300 24705 25426 0 dl_low ul_high 124501 37000
DATA 2022-09-22-08:12:47 1663845168 1663845168 76695 36620 61 98 1663845168 1.0.0.1 1201 5801 55000 49248 25426 1 dl_low ul_high 124501 37000
DATA 2022-09-22-08:12:47 1663845168 1663845168 74539 27255 59 73 1663845168 8.8.8.8 1201 5196 43600 38442 25426 2 dl_low_bb ul_low_bb 112050 24529
DATA 2022-09-22-08:12:47 1663845168 1663845168 74539 27255 66 111 1663845168 8.8.4.4 1201 5213 31200 26013 25604 3 dl_low_bb ul_high_bb 112050 24529
DATA 2022-09-22-08:12:47 1663845168 1663845168 74539 27255 66 111 1663845168 1.1.1.1 1202 5651 38500 32881 25604 4 dl_low_bb ul_high_bb 112050 24529
DATA 2022-09-22-08:12:47 1663845168 1663845168 74539 27255 66 111 1663845168 1.0.0.1 1202 5831 36500 30699 25604 4 dl_low_bb ul_high_bb 112050 24529
DATA 2022-09-22-08:12:47 1663845168 1663845168 74539 27255 66 111 1663845168 8.8.8.8 1202 5226 36000 30804 25604 4 dl_low_bb ul_high_bb 112050 24529
DATA 2022-09-22-08:12:47 1663845168 1663845168 75036 24941 66 101 1663845168 8.8.4.4 1202 5243 35600 30387 25604 4 dl_low_bb ul_high_bb 112050 24529
DATA 2022-09-22-08:12:47 1663845168 1663845168 75036 24941 66 101 1663845168 1.1.1.1 1203 5683 38400 32749 25604 4 dl_low_bb ul_high_bb 112050 24529
DATA 2022-09-22-08:12:47 1663845168 1663845168 75036 24941 66 101 1663845168 1.0.0.1 1203 5860 35800 29969 25604 4 dl_low_bb ul_high_bb 100845 22076
DATA 2022-09-22-08:12:48 1663845168 1663845168 75036 24941 74 112 1663845168 8.8.8.8 1203 5259 38300 33074 25671 4 dl_low_bb ul_high_bb 100845 22076
DATA 2022-09-22-08:12:48 1663845168 1663845168 68454 23692 67 107 1663845168 8.8.4.4 1203 5275 37500 32257 25671 4 dl_low_bb ul_high_bb 100845 22076
DATA 2022-09-22-08:12:48 1663845168 1663845168 68454 23692 67 107 1663845168 1.1.1.1 1204 5716 38900 33217 25671 4 dl_low_bb ul_high_bb 100845 22076
DATA 2022-09-22-08:12:48 1663845168 1663845168 68454 23692 67 107 1663845168 1.0.0.1 1204 5891 37300 31440 25671 4 dl_low_bb ul_high_bb 100845 22076
DATA 2022-09-22-08:12:48 1663845168 1663845168 68454 23692 67 107 1663845168 8.8.8.8 1204 5290 36600 31341 25671 4 dl_low_bb ul_high_bb 100845 22076
DATA 2022-09-22-08:12:48 1663845168 1663845168 66174 21842 65 98 1663845168 8.8.4.4 1204 5305 35600 30325 25671 4 dl_low_bb ul_high_bb 100845 22076
DATA 2022-09-22-08:12:48 1663845168 1663845168 66174 21842 65 98 1663845168 1.1.1.1 1205 5748 38500 32784 25671 4 dl_low_bb ul_high_bb 100000 19657
DATA 2022-09-22-08:12:48 1663845168 1663845168 66174 21842 66 111 1663845168 1.0.0.1 1205 5923 37900 32009 25740 4 dl_low_bb ul_high_bb 100000 19657
DATA 2022-09-22-08:12:48 1663845168 1663845168 66174 21842 66 111 1663845168 8.8.8.8 1205 5323 38500 33210 25740 4 dl_low_bb ul_high_bb 100000 19657
DATA 2022-09-22-08:12:48 1663845168 1663845168 66634 20662 66 105 1663845168 8.8.4.4 1205 5338 38700 33395 25740 4 dl_low_bb ul_high_bb 100000 19657
DATA 2022-09-22-08:12:48 1663845169 1663845169 66634 20662 66 105 1663845169 1.1.1.1 1206 5780 38700 32952 25740 4 dl_low_bb ul_high_bb 100000 19657
DATA 2022-09-22-08:12:48 1663845169 1663845169 66634 20662 66 105 1663845169 1.0.0.1 1206 5954 37000 31077 25740 4 dl_low_bb ul_high_bb 100000 19657
DATA 2022-09-22-08:12:48 1663845169 1663845169 66634 20662 66 105 1663845169 8.8.8.8 1206 5354 36700 31377 25740 4 dl_low_bb ul_high_bb 100000 19657
DATA 2022-09-22-08:12:48 1663845169 1663845169 63897 19449 63 98 1663845169 8.8.4.4 1206 5369 37300 31962 25740 4 dl_low_bb ul_high_bb 100000 17504
DATA 2022-09-22-08:12:48 1663845169 1663845169 63897 19449 63 111 1663845169 1.1.1.1 1207 5814 40000 34220 25816 4 dl_low_bb ul_high_bb 100000 17504
DATA 2022-09-22-08:12:48 1663845169 1663845169 63897 19449 63 111 1663845169 1.0.0.1 1207 5986 38100 32146 25816 4 dl_low_bb ul_high_bb 100000 17504
DATA 2022-09-22-08:12:48 1663845169 1663845169 63897 19449 63 111 1663845169 8.8.8.8 1207 5386 38300 32946 25816 4 dl_low_bb ul_high_bb 100000 17504
DATA 2022-09-22-08:12:48 1663845169 1663845169 66006 17779 66 101 1663845169 8.8.4.4 1207 5401 38100 32731 25816 4 dl_low_bb ul_high_bb 100000 17504
DATA 2022-09-22-08:12:48 1663845169 1663845169 66006 17779 66 101 1663845169 1.1.1.1 1208 5846 38000 32186 25816 4 dl_low_bb ul_high_bb 100000 17504
DATA 2022-09-22-08:12:48 1663845169 1663845169 66006 17779 66 101 1663845169 1.0.0.1 1208 6016 36600 30614 25816 4 dl_low_bb ul_high_bb 100000 17504
DATA 2022-09-22-08:12:49 1663845169 1663845169 66006 17779 66 101 1663845169 8.8.8.8 1208 5416 35700 30314 25816 4 dl_low_bb ul_high_bb 100000 15753
DATA 2022-09-22-08:12:49 1663845169 1663845169 71880 17268 71 109 1663845169 8.8.4.4 1208 5434 39100 33699 25893 4 dl_low_bb ul_high_bb 100000 15753
DATA 2022-09-22-08:12:49 1663845169 1663845169 71880 17268 71 109 1663845169 1.1.1.1 1209 5879 39400 33554 25893 4 dl_low_bb ul_high_bb 100000 15753
DATA 2022-09-22-08:12:49 1663845169 1663845169 71880 17268 71 109 1663845169 1.0.0.1 1209 6048 38100 32084 25893 4 dl_low_bb ul_high_bb 100000 15753
DATA 2022-09-22-08:12:49 1663845169 1663845169 71880 17268 71 109 1663845169 8.8.8.8 1209 5449 38600 33184 25893 4 dl_low_bb ul_high_bb 100000 15753
DATA 2022-09-22-08:12:49 1663845169 1663845169 68792 15547 68 98 1663845169 8.8.4.4 1209 5465 37100 31666 25893 4 dl_low_bb ul_high_bb 100000 15753
DATA 2022-09-22-08:12:49 1663845169 1663845169 68792 15547 68 98 1663845169 1.1.1.1 1210 5910 37600 31721 25893 4 dl_low_bb ul_high_bb 100000 13992
DATA 2022-09-22-08:12:49 1663845169 1663845169 68792 15547 68 111 1663845169 1.0.0.1 1210 6081 39400 33352 25991 4 dl_low_bb ul_high_bb 100000 13992
DATA 2022-09-22-08:12:49 1663845169 1663845169 68792 15547 68 111 1663845169 8.8.8.8 1210 5483 39500 34051 25991 4 dl_low_bb ul_high_bb 100000 13992
DATA 2022-09-22-08:12:49 1663845169 1663845169 66765 14619 66 104 1663845169 8.8.4.4 1210 5499 40300 34835 25991 4 dl_low_bb ul_high_bb 100000 13992
DATA 2022-09-22-08:12:49 1663845170 1663845170 66765 14619 66 104 1663845170 1.1.1.1 1211 5945 41900 35990 25991 4 dl_low_bb ul_high_bb 100000 13992
DATA 2022-09-22-08:12:49 1663845170 1663845170 66765 14619 66 104 1663845170 1.0.0.1 1211 6113 38400 32319 25991 4 dl_low_bb ul_high_bb 100000 13992
DATA 2022-09-22-08:12:49 1663845170 1663845170 66765 14619 66 104 1663845170 8.8.8.8 1211 5517 39600 34117 25991 4 dl_low_bb ul_high_bb 100000 12592
DATA 2022-09-22-08:12:49 1663845170 1663845170 65001 13787 65 109 1663845170 8.8.4.4 1211 5532 38700 33201 26087 4 dl_low_bb ul_high_bb 100000 12592
DATA 2022-09-22-08:12:49 1663845170 1663845170 65001 13787 65 109 1663845170 1.1.1.1 1212 5980 41600 35655 26087 4 dl_low_bb ul_high_bb 100000 12592
DATA 2022-09-22-08:12:49 1663845170 1663845170 65001 13787 65 109 1663845170 1.0.0.1 1212 6144 37700 31587 26087 4 dl_low_bb ul_high_bb 100000 12592
DATA 2022-09-22-08:12:49 1663845170 1663845170 65001 13787 65 109 1663845170 8.8.8.8 1212 5550 39500 33983 26087 4 dl_low_bb ul_high_bb 100000 12592
DATA 2022-09-22-08:12:49 1663845170 1663845170 63766 12424 63 98 1663845170 8.8.4.4 1212 5564 38400 32868 26087 4 dl_low_bb ul_high_bb 100000 12592
DATA 2022-09-22-08:12:49 1663845170 1663845170 63766 12424 63 98 1663845170 1.1.1.1 1213 6012 38500 32520 26087 4 dl_low_bb ul_high_bb 100000 12592
DATA 2022-09-22-08:12:49 1663845170 1663845170 63766 12424 63 98 1663845170 1.0.0.1 1213 6174 37000 30856 26087 4 dl_low_bb ul_high_bb 100000 11181
DATA 2022-09-22-08:12:50 1663845170 1663845170 63766 12424 63 111 1663845170 8.8.8.8 1213 5583 39100 33550 26209 4 dl_low_bb ul_high_bb 100000 11181
DATA 2022-09-22-08:12:50 1663845170 1663845170 67288 12003 67 107 1663845170 8.8.4.4 1213 5598 39900 34336 26209 4 dl_low_bb ul_high_bb 100000 11181
DATA 2022-09-22-08:12:50 1663845170 1663845170 67288 12003 67 107 1663845170 1.1.1.1 1214 6045 39200 33188 26209 4 dl_low_bb ul_high_bb 100000 11181
DATA 2022-09-22-08:12:50 1663845170 1663845170 67288 12003 67 107 1663845170 1.0.0.1 1214 6207 40100 33926 26209 4 dl_low_bb ul_high_bb 100000 11181
DATA 2022-09-22-08:12:50 1663845170 1663845170 67288 12003 67 107 1663845170 8.8.8.8 1214 5616 38800 33217 26209 4 dl_low_bb ul_high_bb 100000 11181
DATA 2022-09-22-08:12:50 1663845170 1663845170 69291 10991 69 98 1663845170 8.8.4.4 1214 5631 38600 33002 26209 4 dl_low_bb ul_high_bb 100000 10062
DATA 2022-09-22-08:12:50 1663845170 1663845170 69291 10991 69 109 1663845170 1.1.1.1 1215 6080 41600 35555 26331 4 dl_low_bb ul_high_bb 100000 10062
DATA 2022-09-22-08:12:50 1663845170 1663845170 69291 10991 69 109 1663845170 1.0.0.1 1215 6239 38700 32493 26331 4 dl_low_bb ul_high_bb 100000 10062
DATA 2022-09-22-08:12:50 1663845170 1663845170 69291 10991 69 109 1663845170 8.8.8.8 1215 5649 38700 33084 26331 4 dl_low_bb ul_high_bb 100000 10062
DATA 2022-09-22-08:12:50 1663845170 1663845170 69522 10170 69 101 1663845170 8.8.4.4 1215 5663 38500 32869 26331 4 dl_low_bb ul_high_bb 100000 10062
DATA 2022-09-22-08:12:50 1663845171 1663845171 69522 10170 69 101 1663845171 1.1.1.1 1216 6110 36100 30020 26331 4 dl_low_bb ul_high_bb 100000 10062
DATA 2022-09-22-08:12:50 1663845171 1663845171 69522 10170 69 101 1663845171 1.0.0.1 1216 6270 37800 31561 26331 4 dl_low_bb ul_high_bb 100000 10062
DATA 2022-09-22-08:12:50 1663845171 1663845171 69522 10170 69 101 1663845171 8.8.8.8 1216 5680 37100 31451 26331 4 dl_low_bb ul_high_bb 100000 10000
DATA 2022-09-22-08:12:50 1663845171 1663845171 65154 9829 65 98 1663845171 8.8.4.4 1216 5694 36800 31137 26338 4 dl_low_bb ul_high_bb 100000 10000
DATA 2022-09-22-08:12:50 1663845171 1663845171 65154 9829 65 98 1663845171 1.1.1.1 1217 6142 38200 32090 26338 4 dl_low_bb ul_high_bb 100000 10000
DATA 2022-09-22-08:12:50 1663845171 1663845171 65154 9829 65 98 1663845171 1.0.0.1 1217 6302 38500 32230 26338 4 dl_low_bb ul_high_bb 100000 10000
DATA 2022-09-22-08:12:50 1663845171 1663845171 65154 9829 65 98 1663845171 8.8.8.8 1217 5710 36400 30720 26338 4 dl_low_bb ul_high_bb 100000 10000
DATA 2022-09-22-08:12:50 1663845171 1663845171 65545 9790 65 97 1663845171 8.8.4.4 1217 5723 35300 29606 26338 4 dl_low_bb ul_high_bb 100000 10000
DATA 2022-09-22-08:12:50 1663845171 1663845171 65545 9790 65 97 1663845171 1.1.1.1 1218 6172 37100 30958 26338 4 dl_low_bb ul_high_bb 100000 10000
DATA 2022-09-22-08:12:50 1663845171 1663845171 65545 9790 65 97 1663845171 1.0.0.1 1218 6331 35400 29098 26338 4 dl_low_bb ul_high_bb 100000 10000
DATA 2022-09-22-08:12:51 1663845171 1663845171 65545 9790 65 97 1663845171 8.8.8.8 1218 5739 35600 29890 26338 4 dl_low_bb ul_high_bb 100000 10000
DATA 2022-09-22-08:12:51 1663845171 1663845171 64899 9822 64 98 1663845171 8.8.4.4 1218 5751 34000 28277 26338 4 dl_low_bb ul_high_bb 100000 10000
DATA 2022-09-22-08:12:51 1663845171 1663845171 64899 9822 64 98 1663845171 1.1.1.1 1219 6199 33200 27028 26338 4 dl_low_bb ul_high_bb 100000 10000
DATA 2022-09-22-08:12:51 1663845171 1663845171 64899 9822 64 98 1663845171 1.0.0.1 1219 6355 31200 24869 26338 3 dl_low_bb ul_high_bb 100000 10000
DATA 2022-09-22-08:12:51 1663845171 1663845171 64899 9822 64 98 1663845171 8.8.8.8 1219 5765 32000 26261 26338 2 dl_low_bb ul_high_bb 100000 10000
DATA 2022-09-22-08:12:51 1663845171 1663845171 68929 9804 68 98 1663845171 8.8.4.4 1219 5774 29400 23649 26338 1 dl_low ul_high 101000 10000
DATA 2022-09-22-08:12:51 1663845171 1663845171 68929 9804 68 98 1663845171 1.1.1.1 1220 6225 32700 26501 26337 1 dl_low ul_high 101000 10100
DATA 2022-09-22-08:12:51 1663845171 1663845171 68929 9804 68 97 1663845171 1.0.0.1 1220 6380 32100 25745 26325 1 dl_low ul_high 101000 10201
DATA 2022-09-22-08:12:51 1663845171 1663845171 68929 9804 68 96 1663845171 8.8.8.8 1220 5790 31100 25335 26313 1 dl_low ul_high 101000 10303
DATA 2022-09-22-08:12:51 1663845171 1663845171 71344 9867 70 95 1663845171 8.8.4.4 1220 5797 29400 23626 26301 1 dl_low ul_high 101000 10406
DATA 2022-09-22-08:12:51 1663845172 1663845172 71344 9867 70 94 1663845172 1.1.1.1 1221 6251 32800 26575 26290 1 dl_low ul_high 101000 10510
DATA 2022-09-22-08:12:51 1663845172 1663845172 71344 9867 70 93 1663845172 1.0.0.1 1221 6403 30100 23720 26278 1 dl_low ul_high 101000 10615
DATA 2022-09-22-08:12:51 1663845172 1663845172 71344 9867 70 92 1663845172 8.8.8.8 1221 5812 28100 22310 26267 1 dl_low ul_high 101000 10721
DATA 2022-09-22-08:12:51 1663845172 1663845172 71613 10222 70 95 1663845172 8.8.4.4 1221 5818 27200 21403 26255 1 dl_low ul_high 101000 10828
Any thoughts @moeller0 on this?
Correct.
Isn't that what it's supposed to do?
Looks like you only get upload-related bufferbloat is that right? If so you might be better with just fixed download and lower upload?
Is your cable connection truly variable capacity? And in both directions or just upload?
For any fixed direction you should probably discard using these autorate approaches since they will necessarily worsen performance to test the capacity all the time.
So the upload in the middle seems to work well, ramps up quickly and stays high pretty consistently. I just wonder about the download starting at 2233, this initially reveals that the base rate was too high, but then stays more or less constant, while the shaper rate climbs back up to baseline. I would guess that baseline here still is incorrect... (but that is fine, climbing back to baseline if there are no delay spikes is intended behaviour).
I would say in general yes, but I have not thought through all of the details yet.
Since we only have RTTs we really can not figure out what caused the 3 RTT spike starting from 2341, could be that the upload was too high, but also possible that the lowish download already exceeded capacity. I keep harping on this, but the two achieved rates we have (or their ratio to the respective shaper rates) are not sufficient to resolve that ambiguity.
Hummm... I don't know. Does the graph tell that?
By fixed download you mean, use the maximum download rate achieved without sqm, reduced by ~25%, in all three download variables in the config file?
It hasn't been the case in the last months, but since Monday I've noticed a link deterioration, randomly lowering the rates. Because of that, I thought it would be better to set up adaptive bandwidth.
You mean cpu-wise?
You mean, base_dl_shaper_rate_kbps should be lowered?
This is about your long discussion on alternative approaches like ntpserver, hping etc, right? Or is there anything I can do to improve the test?
Just to illustrate what I said above, here are two graphs I use to monitor link quality.
(The circled area shows the link degradation on Monday. The gap comes from an outage. Before the outage, you can see how stable the Internet link was, with latency in the 99th percentile stuck around 11ms)
The leftmost graph shows cake bandwidth, the rightmost shows ping results to 8.8.8.8. The ping results come from this code, executed every minute:
/usr/bin/ping -qc20 -i 0.5 8.8.8.8 | /etc/snmp/ping-to-json/ping_to_json.sh
In this ping results, I mostly focus on latency.
Potentially... Since I suffer from the same issue that I can not for certain determine which direction was congested...
@gadolf do I read that graph correctly that you had cake-autorate running and it held the bandwidth constant for days until problems started developing? If so this is a first for me and super cool. But I may be missing something and it seems too good to be true.
Certainly I suppose that if base is set to max and min below that this is what would happen anyway. So that way the system is prepared for any congestion or other link deterioration that may arise in the future. It's a bit like @richb-hanover-priv's idea that this is used as a passive monitoring tool, but here the added benefit of kicking in and actively getting on top of any problems if they arise.
Perhaps from the latter angle everyone should be running this?
Round robin pinging is certainly doable. This is basically what the code in my sqm-autorate script does, and ping_ts
uses a simplified version of that code that targets just a single reflector, so it could be "unsimplified" if necessary.
However, there's the ongoing problem of perl threads - ping_ts
now also uses perl threads to enable it to handle timeouts properly, so it won't work for @patrakov and probably many others.
I did briefly look into what can be done without threads, but it's a massive pain because none of the decent event-driven perl modules/frameworks are available on OpenWrt without installing CPAN which requires a ton of other dependencies. This means that there's no easy way to do multiple concurrent timers/alarms for events such as sending and timing out ping requests
Ah bother. Linux is apparently due to ship with rust at some point in the future. But otherwise is the only other good option C? @Lochnair said he had an ICMP type 13 approach already written in C - @Lochnair could that be made to round robin and/or issue ECHO requests by sending out requests with interval X between them and all sent out according to offset Y from a common point of time? To elaborate on the latter I mean something like this:
Because we want the timings to be maintained such that we get increased granularity with multiple reflectors that is maintained (so we don't want ECHO requests to drift). Otherwise pings can start off in the right way but owing to drift we can lose the granularity.
fping is nice because it explicitly offers round robin and forces maintenance of granularity by specifying 'period' between packets to given reflector and 'interval' between individual echos:
fping --period $reflector_ping_interval_ms --interval $ping_response_interval_ms
So I think capturing this would be nice, either by true round robin or by individual instances working on the common offset Y1/Y2 as identified above.
Hum, I'm afraid not. Before the gap in the graph, I was running cake with fixed bandwidth, no cake-autorate running, sorry.
1 - Cake, fixed bdwt
2 - Cake, adaptive, standard
3 - Cake, adaptive, stall
4 - Cake, fixed bdwt
EDIT: During 1 and 2, I lowered the max bandwidth to around 300 Mbps, since I wasn't sure the ISP was capable of handling more than that, as per some speed tests in the period.
Simplified code, but what it's currently doing is this:
while (true) {
sleep_time = tick_duration / reflectors_len
for (reflector in reflectors) {
send_ping(reflector)
nanosleep(sleep_time)
}
}
In fping terms I guess this would be fping -l --period=0 --interval=200 1.1.1.1 1.0.0.1
, though that doesn't actually work.
Would it be feasible to support user-defined reflectors?
I can image use-cases that would want to optimize with respect to a specific host or hosts.
Rather than a generic sampling of public reflectors.
For instance, most online games run on a known set of dedicated servers.
An esports competitor would prefer to optimize with respect to whichever servers host their game.