CAKE w/ Adaptive Bandwidth [August 2022 to March 2024]

Gotcha. Should the free check space be hard coded and if so what value to use?

Not sure the easiest is to punt that to the user ;).
But thinking this over, I think the free space test can not be used for initiating rotations*, but just as a circuit breaker to stop logging with a final error message (also piped to the openwrt log that can be accessed by logread) informing the user that autorate stopped logging to avoid causing a critical condition and that the user should fix things.

*) It would work on the first rotation when the previous old file gets deleted and space reappears, but I see a problem if by that method the log files become increasingly smaller so we might end up rotating on every write.... which might still be preferably to no logging, but that seems not a healthy thing to do.

@Lynx hping3 package is now in the master branch.

2 Likes

Now would be a good time to think about how to design alternative delay collection functions.
I think something with an initialize_pingers() function that handles the differences and starts the pinger(s) might work, so that multiple individual processes for ip-utils ping and hping3 can be selected or a single pinger with interleaved addresses for fping.
I really think it is worth the hassle to keep it possible to switch the delay source by a simple configurable parameter (maybe even at run-time, e.g. if all ICMP timestamp sources fail, maybe fall-back to ICMP echo requests).

The obvious change is to switch RTT sources to reporting pseudo OWDs (fill both with the RTT value) and make the rest of the code treat incoming and outgoing delays separately.

1 Like

What we have at present in the main branch is actually not that far off from this.

We have main loop read from a global 'ping_fifo' like this:

while read -t $global_ping_response_timeout_s -r timestamp reflector seq rtt_baseline_us rtt_us rtt_delta_us
do 
    # do stuff
done</tmp/cake-autorate/ping_fifo 

Therefore so long as we have processed write out to the 'ping_fifo' in format:

timestamp reflector seq rtt_baseline_us rtt_us rtt_delta_us

Then the main loop doesn't care what feeds it.

Right now for the iputils-ping case I have each ping process write out to its own fifo, and those are read in to write out to the global fifo:

monitor_reflector_responses() 
{
	# maintain baseline and output deltas to a common fifo

	local pinger=$1
	local rtt_baseline_us=$2

	while read -r  timestamp _ _ _ reflector seq_rtt
	do
		# If no match then skip onto the next one
		[[ $seq_rtt =~ icmp_[s|r]eq=([0-9]+).*time=([0-9]+)\.?([0-9]+)?[[:space:]]ms ]] || continue

		seq=${BASH_REMATCH[1]}

		rtt_us=${BASH_REMATCH[3]}000
		rtt_us=$((${BASH_REMATCH[2]}000+10#${rtt_us:0:3}))

		reflector=${reflector//:/}

		rtt_delta_us=$(( $rtt_us-$rtt_baseline_us ))

		alpha=$(( (( $rtt_delta_us >=0 )) ? $alpha_baseline_increase : $alpha_baseline_decrease ))

		rtt_baseline_us=$(( ( (1000-$alpha)*$rtt_baseline_us+$alpha*$rtt_us )/1000 ))

		printf '%s %s %s %s %s %s\n' "$timestamp" "$reflector" "$seq" "$rtt_baseline_us" "$rtt_us" "$rtt_delta_us" > /tmp/cake-autorate/ping_fifo
	
		printf '%s' "${timestamp//[[\[\].]}" > /tmp/cake-autorate/reflector_${pinger}_last_timestamp_us

	done</tmp/cake-autorate/pinger_${pinger}_fifo
}

So perhaps I should modify the fping approach to actually use this format (i.e. the same main loop format) to help establish a common format to easily plug and play different utilities.

Start and stop functions can be provided as part of 'maintain_pingers()', which for the main branch already has e.g.:

start_pinger_next_pinger_time_slot()
{
	# wait until next pinger time slot and start pinger in its slot
	# this allows pingers to be stopped and started (e.g. during sleep or reflector rotation)
	# whilst ensuring pings will remain spaced out appropriately to maintain granularity

	local pinger=$1
	local -n pinger_pid=$2
	t_start_us=${EPOCHREALTIME/./}
	time_to_next_time_slot_us=$(( ($reflector_ping_interval_us-($t_start_us-$pingers_t_start_us)%$reflector_ping_interval_us) + $pinger*$ping_response_interval_us ))
	sleep_remaining_tick_time $t_start_us $time_to_next_time_slot_us
	if (($debug)); then
		ping -D -i $reflector_ping_interval_s ${reflectors[$pinger]} > /tmp/cake-autorate/pinger_${pinger}_fifo &
		pinger_pid=$!
	else
		ping -D -i $reflector_ping_interval_s ${reflectors[$pinger]} > /tmp/cake-autorate/pinger_${pinger}_fifo 2> /dev/null &
		pinger_pid=$!
	fi
	monitor_reflector_responses $pinger ${rtt_baselines_us[$pinger]} &
}

and

kill_pingers()
{
	for (( pinger=0; pinger<$no_pingers; pinger++))
	do
		kill ${pinger_pids[$pinger]} 2> /dev/null
		[[ -p /tmp/cake-autorate/pinger_${pinger}_fifo ]] && rm /tmp/cake-autorate/pinger_${pinger}_fifo
	done
	exit
}

So we basically have:

  • a) common global format for all ping utilities in main loop
  • b) maintain_pingers() wrapper for every ping utility with appropriate start/stop functions. And maintain_pingers() is responsible for ensuring that the global fifo is written to correctly in format:
timestamp reflector seq rtt_baseline_us rtt_us rtt_delta_us

So @moeller0 I think the change you are suggesting is to switch from rtt to owd in the main loop and adapt accordingly in maintain_pingers() for the iputils-ping and fping cases.

Other than that, do we have all the essential elements in this format? If not, what shall we add in?

And maybe separate sourceable files for iputils-ping and fping in the main branch? Or alternatively separate functions that are defined in dependence upon variable that selects pinger to keep things in one file and avoid installation complexity for the basic use case. And provide a template file for anyone to write their own wrapper?

I think you've been calling all along for this, but I realise now is the perfect time to do it as we have now four options: iputils-ping, fping, hping3 and also @tievolu's perl-based ping_ts. And I still have the generic-ish format in the main branch (albeit I didn't follow it in fping, but I totally see the value in making this all generic now).

Finally @patrakov well done in getting hping3 taken on!

@tievolu I have a question regarding your perl-based ping_ts. Would it be a lot of extra work to make it either offer round robin pinging, or alternatively that for each instance it sends ping ECHOs at precisely defined points relative to a common clock? You see a problem @patrakov identified with multiple ping instances is that there tends to be drift and thus either one needs round robin like fping with tightly defined intervals or sending at relative offsets relative to a common clock to ensure there is no drift in the sends so that the sends remain at good spacing therebetween to achieve good granularity. Hope this makes sense? So something like not just send with spacing X but send at spacing X with offset Y from system clock position 0 seconds? See my 'start_pinger_next_pinger_time_slot()' function above that staggers pings, but this only works for a while and then the spread drifts and even can become synchronised in weird situations.

@Lynx Here's the log:

https://drive.google.com/file/d/1Vhax8iFPjE4SEPaEsQH9B2aauduKES9O/view?usp=sharing

Let me know if you need anything else, and, if possible, show us your graphs, I'm curious.

Yes except we probably want:
timestamp reflector seq dl_owd_baseline_us dl_owd_us dl_owd_delta_us ul_owd_baseline_us ul_owd_us ul_owd_delta_us
maybe just use
monitor_reflector_responses_iputils_ping_RTT(), monitor_reflector_responses_fping_RTT(), monitor_reflector_responses_hping3_OWD()
that is create a specific function for each type and just construct the type from variables

cur_probe_binary="hping3"
cur_probe_type="OWD"
REFLECTOR_MONITOR_FUNCTION="monitor_reflector_responses_${cur_probe_binary}_${cur_probe_type}"
${REFLECTOR_MONITOR_FUNCTION} all call arguments as before

Yes, have the same service function, having a single fifo to multiplex the different results into and reading that from the main loop is a decent interface.

Yes, and it might be possible to avoid a switch statement by simply constructing the function name like shown above...

That makes sense, once these become large enough a separate file might be useful to keep things readable.

+1; as you say now is a good time, because we should be able to actually test and exercise these new delay measurement methods.

Not sure what to make of this since it looks so different to the typical LTE graph we see. This is on cable? Seems like we see a sleep event followed by some upload usage and then bufferbloat in the middle resulting in drop to 150Mbit/s.

Example of bufferbloat event on upload and surrounding data:

DATA	 2022-09-22-08:10:56	1663845057	1663845057	827	36841	0	99	1663845057	 8.8.8.8	648	5317	6440	1124	25377	0	 dl_low	 ul_high	250000	37000
DATA	 2022-09-22-08:10:56	1663845057	1663845057	830	36844	0	99	1663845057	 8.8.4.4	648	4988	6140	1153	25377	0	 dl_low	 ul_high	250000	37000
DATA	 2022-09-22-08:10:56	1663845057	1663845057	830	36844	0	99	1663845057	 1.1.1.1	649	5436	7270	1835	25377	0	 dl_low	 ul_high	250000	37000
DATA	 2022-09-22-08:10:56	1663845057	1663845057	830	36844	0	99	1663845057	 1.0.0.1	649	5037	6900	1864	25377	0	 dl_low	 ul_high	250000	37000
DATA	 2022-09-22-08:10:57	1663845057	1663845057	830	36844	0	99	1663845057	 8.8.8.8	649	5318	6720	1403	25377	0	 dl_low	 ul_high	250000	37000
DATA	 2022-09-22-08:10:57	1663845057	1663845057	822	36841	0	99	1663845057	 8.8.4.4	649	4989	6560	1572	25377	0	 dl_low	 ul_high	250000	37000
DATA	 2022-09-22-08:10:57	1663845057	1663845057	822	36841	0	99	1663845057	 1.1.1.1	650	5457	26800	21364	25377	0	 dl_low	 ul_high	250000	37000
DATA	 2022-09-22-08:10:57	1663845057	1663845057	822	36841	0	99	1663845057	 1.0.0.1	650	5063	31700	26663	25377	1	 dl_low	 ul_high	250000	37000
DATA	 2022-09-22-08:10:57	1663845057	1663845057	822	36841	0	99	1663845057	 8.8.8.8	650	5348	36300	30982	25377	2	 dl_low_bb	 ul_high_bb	225000	33156
DATA	 2022-09-22-08:10:57	1663845057	1663845057	670	33147	0	99	1663845057	 8.8.4.4	650	5016	32000	27011	25421	3	 dl_low_bb	 ul_high_bb	225000	33156
DATA	 2022-09-22-08:10:57	1663845057	1663845057	670	33147	0	99	1663845057	 1.1.1.1	651	5484	32600	27143	25421	4	 dl_low_bb	 ul_high_bb	225000	33156
DATA	 2022-09-22-08:10:57	1663845057	1663845057	670	33147	0	99	1663845057	 1.0.0.1	651	5088	30700	25637	25421	4	 dl_low_bb	 ul_high_bb	225000	33156
DATA	 2022-09-22-08:10:57	1663845057	1663845057	670	33147	0	99	1663845057	 8.8.8.8	651	5372	30200	24852	25421	3	 dl_low_bb	 ul_high_bb	225000	33156
DATA	 2022-09-22-08:10:57	1663845058	1663845058	765	31739	0	95	1663845058	 8.8.4.4	651	5040	29400	24384	25421	2	 dl_low_bb	 ul_high_bb	225000	33156
DATA	 2022-09-22-08:10:57	1663845058	1663845058	765	31739	0	95	1663845058	 1.1.1.1	652	5508	30000	24516	25421	1	 dl_low	 ul_high	225000	33156
DATA	 2022-09-22-08:10:57	1663845058	1663845058	765	31739	0	95	1663845058	 1.0.0.1	652	5111	29000	23912	25421	0	 dl_low	 ul_high	225000	33487
DATA	 2022-09-22-08:10:57	1663845058	1663845058	765	31739	0	94	1663845058	 8.8.8.8	652	5395	28400	23028	25417	0	 dl_low	 ul_high	225000	33821
DATA	 2022-09-22-08:10:57	1663845058	1663845058	772	33153	0	98	1663845058	 8.8.4.4	652	5063	28300	23260	25414	0	 dl_low	 ul_high	225000	34159
DATA	 2022-09-22-08:10:57	1663845058	1663845058	772	33153	0	97	1663845058	 1.1.1.1	653	5529	27500	21992	25410	0	 dl_low	 ul_high	225000	34500
DATA	 2022-09-22-08:10:57	1663845058	1663845058	772	33153	0	96	1663845058	 1.0.0.1	653	5134	28700	23589	25406	0	 dl_low	 ul_high	225000	34845

And here is another upload related bufferbloat event:

DATA	 2022-09-22-08:12:47	1663845167	1663845167	76507	36676	62	99	1663845167	 8.8.4.4	1199	5176	7900	2726	25427	0	 dl_low	 ul_high	123269	37000
DATA	 2022-09-22-08:12:47	1663845167	1663845167	76507	36676	62	99	1663845167	 1.1.1.1	1200	5595	6240	645	25427	0	 dl_low	 ul_high	123269	37000
DATA	 2022-09-22-08:12:47	1663845167	1663845167	76507	36676	62	99	1663845167	 1.0.0.1	1200	5752	7650	1899	25427	0	 dl_low	 ul_high	123269	37000
DATA	 2022-09-22-08:12:47	1663845167	1663845167	76507	36676	62	99	1663845167	 8.8.8.8	1200	5158	6300	1143	25427	0	 dl_low	 ul_high	124501	37000
DATA	 2022-09-22-08:12:47	1663845167	1663845167	76695	36620	61	98	1663845167	 8.8.4.4	1200	5187	17100	11924	25426	0	 dl_low	 ul_high	124501	37000
DATA	 2022-09-22-08:12:47	1663845168	1663845168	76695	36620	61	98	1663845168	 1.1.1.1	1201	5619	30300	24705	25426	0	 dl_low	 ul_high	124501	37000
DATA	 2022-09-22-08:12:47	1663845168	1663845168	76695	36620	61	98	1663845168	 1.0.0.1	1201	5801	55000	49248	25426	1	 dl_low	 ul_high	124501	37000
DATA	 2022-09-22-08:12:47	1663845168	1663845168	74539	27255	59	73	1663845168	 8.8.8.8	1201	5196	43600	38442	25426	2	 dl_low_bb	 ul_low_bb	112050	24529
DATA	 2022-09-22-08:12:47	1663845168	1663845168	74539	27255	66	111	1663845168	 8.8.4.4	1201	5213	31200	26013	25604	3	 dl_low_bb	 ul_high_bb	112050	24529
DATA	 2022-09-22-08:12:47	1663845168	1663845168	74539	27255	66	111	1663845168	 1.1.1.1	1202	5651	38500	32881	25604	4	 dl_low_bb	 ul_high_bb	112050	24529
DATA	 2022-09-22-08:12:47	1663845168	1663845168	74539	27255	66	111	1663845168	 1.0.0.1	1202	5831	36500	30699	25604	4	 dl_low_bb	 ul_high_bb	112050	24529
DATA	 2022-09-22-08:12:47	1663845168	1663845168	74539	27255	66	111	1663845168	 8.8.8.8	1202	5226	36000	30804	25604	4	 dl_low_bb	 ul_high_bb	112050	24529
DATA	 2022-09-22-08:12:47	1663845168	1663845168	75036	24941	66	101	1663845168	 8.8.4.4	1202	5243	35600	30387	25604	4	 dl_low_bb	 ul_high_bb	112050	24529
DATA	 2022-09-22-08:12:47	1663845168	1663845168	75036	24941	66	101	1663845168	 1.1.1.1	1203	5683	38400	32749	25604	4	 dl_low_bb	 ul_high_bb	112050	24529
DATA	 2022-09-22-08:12:47	1663845168	1663845168	75036	24941	66	101	1663845168	 1.0.0.1	1203	5860	35800	29969	25604	4	 dl_low_bb	 ul_high_bb	100845	22076
DATA	 2022-09-22-08:12:48	1663845168	1663845168	75036	24941	74	112	1663845168	 8.8.8.8	1203	5259	38300	33074	25671	4	 dl_low_bb	 ul_high_bb	100845	22076
DATA	 2022-09-22-08:12:48	1663845168	1663845168	68454	23692	67	107	1663845168	 8.8.4.4	1203	5275	37500	32257	25671	4	 dl_low_bb	 ul_high_bb	100845	22076
DATA	 2022-09-22-08:12:48	1663845168	1663845168	68454	23692	67	107	1663845168	 1.1.1.1	1204	5716	38900	33217	25671	4	 dl_low_bb	 ul_high_bb	100845	22076
DATA	 2022-09-22-08:12:48	1663845168	1663845168	68454	23692	67	107	1663845168	 1.0.0.1	1204	5891	37300	31440	25671	4	 dl_low_bb	 ul_high_bb	100845	22076
DATA	 2022-09-22-08:12:48	1663845168	1663845168	68454	23692	67	107	1663845168	 8.8.8.8	1204	5290	36600	31341	25671	4	 dl_low_bb	 ul_high_bb	100845	22076
DATA	 2022-09-22-08:12:48	1663845168	1663845168	66174	21842	65	98	1663845168	 8.8.4.4	1204	5305	35600	30325	25671	4	 dl_low_bb	 ul_high_bb	100845	22076
DATA	 2022-09-22-08:12:48	1663845168	1663845168	66174	21842	65	98	1663845168	 1.1.1.1	1205	5748	38500	32784	25671	4	 dl_low_bb	 ul_high_bb	100000	19657
DATA	 2022-09-22-08:12:48	1663845168	1663845168	66174	21842	66	111	1663845168	 1.0.0.1	1205	5923	37900	32009	25740	4	 dl_low_bb	 ul_high_bb	100000	19657
DATA	 2022-09-22-08:12:48	1663845168	1663845168	66174	21842	66	111	1663845168	 8.8.8.8	1205	5323	38500	33210	25740	4	 dl_low_bb	 ul_high_bb	100000	19657
DATA	 2022-09-22-08:12:48	1663845168	1663845168	66634	20662	66	105	1663845168	 8.8.4.4	1205	5338	38700	33395	25740	4	 dl_low_bb	 ul_high_bb	100000	19657
DATA	 2022-09-22-08:12:48	1663845169	1663845169	66634	20662	66	105	1663845169	 1.1.1.1	1206	5780	38700	32952	25740	4	 dl_low_bb	 ul_high_bb	100000	19657
DATA	 2022-09-22-08:12:48	1663845169	1663845169	66634	20662	66	105	1663845169	 1.0.0.1	1206	5954	37000	31077	25740	4	 dl_low_bb	 ul_high_bb	100000	19657
DATA	 2022-09-22-08:12:48	1663845169	1663845169	66634	20662	66	105	1663845169	 8.8.8.8	1206	5354	36700	31377	25740	4	 dl_low_bb	 ul_high_bb	100000	19657
DATA	 2022-09-22-08:12:48	1663845169	1663845169	63897	19449	63	98	1663845169	 8.8.4.4	1206	5369	37300	31962	25740	4	 dl_low_bb	 ul_high_bb	100000	17504
DATA	 2022-09-22-08:12:48	1663845169	1663845169	63897	19449	63	111	1663845169	 1.1.1.1	1207	5814	40000	34220	25816	4	 dl_low_bb	 ul_high_bb	100000	17504
DATA	 2022-09-22-08:12:48	1663845169	1663845169	63897	19449	63	111	1663845169	 1.0.0.1	1207	5986	38100	32146	25816	4	 dl_low_bb	 ul_high_bb	100000	17504
DATA	 2022-09-22-08:12:48	1663845169	1663845169	63897	19449	63	111	1663845169	 8.8.8.8	1207	5386	38300	32946	25816	4	 dl_low_bb	 ul_high_bb	100000	17504
DATA	 2022-09-22-08:12:48	1663845169	1663845169	66006	17779	66	101	1663845169	 8.8.4.4	1207	5401	38100	32731	25816	4	 dl_low_bb	 ul_high_bb	100000	17504
DATA	 2022-09-22-08:12:48	1663845169	1663845169	66006	17779	66	101	1663845169	 1.1.1.1	1208	5846	38000	32186	25816	4	 dl_low_bb	 ul_high_bb	100000	17504
DATA	 2022-09-22-08:12:48	1663845169	1663845169	66006	17779	66	101	1663845169	 1.0.0.1	1208	6016	36600	30614	25816	4	 dl_low_bb	 ul_high_bb	100000	17504
DATA	 2022-09-22-08:12:49	1663845169	1663845169	66006	17779	66	101	1663845169	 8.8.8.8	1208	5416	35700	30314	25816	4	 dl_low_bb	 ul_high_bb	100000	15753
DATA	 2022-09-22-08:12:49	1663845169	1663845169	71880	17268	71	109	1663845169	 8.8.4.4	1208	5434	39100	33699	25893	4	 dl_low_bb	 ul_high_bb	100000	15753
DATA	 2022-09-22-08:12:49	1663845169	1663845169	71880	17268	71	109	1663845169	 1.1.1.1	1209	5879	39400	33554	25893	4	 dl_low_bb	 ul_high_bb	100000	15753
DATA	 2022-09-22-08:12:49	1663845169	1663845169	71880	17268	71	109	1663845169	 1.0.0.1	1209	6048	38100	32084	25893	4	 dl_low_bb	 ul_high_bb	100000	15753
DATA	 2022-09-22-08:12:49	1663845169	1663845169	71880	17268	71	109	1663845169	 8.8.8.8	1209	5449	38600	33184	25893	4	 dl_low_bb	 ul_high_bb	100000	15753
DATA	 2022-09-22-08:12:49	1663845169	1663845169	68792	15547	68	98	1663845169	 8.8.4.4	1209	5465	37100	31666	25893	4	 dl_low_bb	 ul_high_bb	100000	15753
DATA	 2022-09-22-08:12:49	1663845169	1663845169	68792	15547	68	98	1663845169	 1.1.1.1	1210	5910	37600	31721	25893	4	 dl_low_bb	 ul_high_bb	100000	13992
DATA	 2022-09-22-08:12:49	1663845169	1663845169	68792	15547	68	111	1663845169	 1.0.0.1	1210	6081	39400	33352	25991	4	 dl_low_bb	 ul_high_bb	100000	13992
DATA	 2022-09-22-08:12:49	1663845169	1663845169	68792	15547	68	111	1663845169	 8.8.8.8	1210	5483	39500	34051	25991	4	 dl_low_bb	 ul_high_bb	100000	13992
DATA	 2022-09-22-08:12:49	1663845169	1663845169	66765	14619	66	104	1663845169	 8.8.4.4	1210	5499	40300	34835	25991	4	 dl_low_bb	 ul_high_bb	100000	13992
DATA	 2022-09-22-08:12:49	1663845170	1663845170	66765	14619	66	104	1663845170	 1.1.1.1	1211	5945	41900	35990	25991	4	 dl_low_bb	 ul_high_bb	100000	13992
DATA	 2022-09-22-08:12:49	1663845170	1663845170	66765	14619	66	104	1663845170	 1.0.0.1	1211	6113	38400	32319	25991	4	 dl_low_bb	 ul_high_bb	100000	13992
DATA	 2022-09-22-08:12:49	1663845170	1663845170	66765	14619	66	104	1663845170	 8.8.8.8	1211	5517	39600	34117	25991	4	 dl_low_bb	 ul_high_bb	100000	12592
DATA	 2022-09-22-08:12:49	1663845170	1663845170	65001	13787	65	109	1663845170	 8.8.4.4	1211	5532	38700	33201	26087	4	 dl_low_bb	 ul_high_bb	100000	12592
DATA	 2022-09-22-08:12:49	1663845170	1663845170	65001	13787	65	109	1663845170	 1.1.1.1	1212	5980	41600	35655	26087	4	 dl_low_bb	 ul_high_bb	100000	12592
DATA	 2022-09-22-08:12:49	1663845170	1663845170	65001	13787	65	109	1663845170	 1.0.0.1	1212	6144	37700	31587	26087	4	 dl_low_bb	 ul_high_bb	100000	12592
DATA	 2022-09-22-08:12:49	1663845170	1663845170	65001	13787	65	109	1663845170	 8.8.8.8	1212	5550	39500	33983	26087	4	 dl_low_bb	 ul_high_bb	100000	12592
DATA	 2022-09-22-08:12:49	1663845170	1663845170	63766	12424	63	98	1663845170	 8.8.4.4	1212	5564	38400	32868	26087	4	 dl_low_bb	 ul_high_bb	100000	12592
DATA	 2022-09-22-08:12:49	1663845170	1663845170	63766	12424	63	98	1663845170	 1.1.1.1	1213	6012	38500	32520	26087	4	 dl_low_bb	 ul_high_bb	100000	12592
DATA	 2022-09-22-08:12:49	1663845170	1663845170	63766	12424	63	98	1663845170	 1.0.0.1	1213	6174	37000	30856	26087	4	 dl_low_bb	 ul_high_bb	100000	11181
DATA	 2022-09-22-08:12:50	1663845170	1663845170	63766	12424	63	111	1663845170	 8.8.8.8	1213	5583	39100	33550	26209	4	 dl_low_bb	 ul_high_bb	100000	11181
DATA	 2022-09-22-08:12:50	1663845170	1663845170	67288	12003	67	107	1663845170	 8.8.4.4	1213	5598	39900	34336	26209	4	 dl_low_bb	 ul_high_bb	100000	11181
DATA	 2022-09-22-08:12:50	1663845170	1663845170	67288	12003	67	107	1663845170	 1.1.1.1	1214	6045	39200	33188	26209	4	 dl_low_bb	 ul_high_bb	100000	11181
DATA	 2022-09-22-08:12:50	1663845170	1663845170	67288	12003	67	107	1663845170	 1.0.0.1	1214	6207	40100	33926	26209	4	 dl_low_bb	 ul_high_bb	100000	11181
DATA	 2022-09-22-08:12:50	1663845170	1663845170	67288	12003	67	107	1663845170	 8.8.8.8	1214	5616	38800	33217	26209	4	 dl_low_bb	 ul_high_bb	100000	11181
DATA	 2022-09-22-08:12:50	1663845170	1663845170	69291	10991	69	98	1663845170	 8.8.4.4	1214	5631	38600	33002	26209	4	 dl_low_bb	 ul_high_bb	100000	10062
DATA	 2022-09-22-08:12:50	1663845170	1663845170	69291	10991	69	109	1663845170	 1.1.1.1	1215	6080	41600	35555	26331	4	 dl_low_bb	 ul_high_bb	100000	10062
DATA	 2022-09-22-08:12:50	1663845170	1663845170	69291	10991	69	109	1663845170	 1.0.0.1	1215	6239	38700	32493	26331	4	 dl_low_bb	 ul_high_bb	100000	10062
DATA	 2022-09-22-08:12:50	1663845170	1663845170	69291	10991	69	109	1663845170	 8.8.8.8	1215	5649	38700	33084	26331	4	 dl_low_bb	 ul_high_bb	100000	10062
DATA	 2022-09-22-08:12:50	1663845170	1663845170	69522	10170	69	101	1663845170	 8.8.4.4	1215	5663	38500	32869	26331	4	 dl_low_bb	 ul_high_bb	100000	10062
DATA	 2022-09-22-08:12:50	1663845171	1663845171	69522	10170	69	101	1663845171	 1.1.1.1	1216	6110	36100	30020	26331	4	 dl_low_bb	 ul_high_bb	100000	10062
DATA	 2022-09-22-08:12:50	1663845171	1663845171	69522	10170	69	101	1663845171	 1.0.0.1	1216	6270	37800	31561	26331	4	 dl_low_bb	 ul_high_bb	100000	10062
DATA	 2022-09-22-08:12:50	1663845171	1663845171	69522	10170	69	101	1663845171	 8.8.8.8	1216	5680	37100	31451	26331	4	 dl_low_bb	 ul_high_bb	100000	10000
DATA	 2022-09-22-08:12:50	1663845171	1663845171	65154	9829	65	98	1663845171	 8.8.4.4	1216	5694	36800	31137	26338	4	 dl_low_bb	 ul_high_bb	100000	10000
DATA	 2022-09-22-08:12:50	1663845171	1663845171	65154	9829	65	98	1663845171	 1.1.1.1	1217	6142	38200	32090	26338	4	 dl_low_bb	 ul_high_bb	100000	10000
DATA	 2022-09-22-08:12:50	1663845171	1663845171	65154	9829	65	98	1663845171	 1.0.0.1	1217	6302	38500	32230	26338	4	 dl_low_bb	 ul_high_bb	100000	10000
DATA	 2022-09-22-08:12:50	1663845171	1663845171	65154	9829	65	98	1663845171	 8.8.8.8	1217	5710	36400	30720	26338	4	 dl_low_bb	 ul_high_bb	100000	10000
DATA	 2022-09-22-08:12:50	1663845171	1663845171	65545	9790	65	97	1663845171	 8.8.4.4	1217	5723	35300	29606	26338	4	 dl_low_bb	 ul_high_bb	100000	10000
DATA	 2022-09-22-08:12:50	1663845171	1663845171	65545	9790	65	97	1663845171	 1.1.1.1	1218	6172	37100	30958	26338	4	 dl_low_bb	 ul_high_bb	100000	10000
DATA	 2022-09-22-08:12:50	1663845171	1663845171	65545	9790	65	97	1663845171	 1.0.0.1	1218	6331	35400	29098	26338	4	 dl_low_bb	 ul_high_bb	100000	10000
DATA	 2022-09-22-08:12:51	1663845171	1663845171	65545	9790	65	97	1663845171	 8.8.8.8	1218	5739	35600	29890	26338	4	 dl_low_bb	 ul_high_bb	100000	10000
DATA	 2022-09-22-08:12:51	1663845171	1663845171	64899	9822	64	98	1663845171	 8.8.4.4	1218	5751	34000	28277	26338	4	 dl_low_bb	 ul_high_bb	100000	10000
DATA	 2022-09-22-08:12:51	1663845171	1663845171	64899	9822	64	98	1663845171	 1.1.1.1	1219	6199	33200	27028	26338	4	 dl_low_bb	 ul_high_bb	100000	10000
DATA	 2022-09-22-08:12:51	1663845171	1663845171	64899	9822	64	98	1663845171	 1.0.0.1	1219	6355	31200	24869	26338	3	 dl_low_bb	 ul_high_bb	100000	10000
DATA	 2022-09-22-08:12:51	1663845171	1663845171	64899	9822	64	98	1663845171	 8.8.8.8	1219	5765	32000	26261	26338	2	 dl_low_bb	 ul_high_bb	100000	10000
DATA	 2022-09-22-08:12:51	1663845171	1663845171	68929	9804	68	98	1663845171	 8.8.4.4	1219	5774	29400	23649	26338	1	 dl_low	 ul_high	101000	10000
DATA	 2022-09-22-08:12:51	1663845171	1663845171	68929	9804	68	98	1663845171	 1.1.1.1	1220	6225	32700	26501	26337	1	 dl_low	 ul_high	101000	10100
DATA	 2022-09-22-08:12:51	1663845171	1663845171	68929	9804	68	97	1663845171	 1.0.0.1	1220	6380	32100	25745	26325	1	 dl_low	 ul_high	101000	10201
DATA	 2022-09-22-08:12:51	1663845171	1663845171	68929	9804	68	96	1663845171	 8.8.8.8	1220	5790	31100	25335	26313	1	 dl_low	 ul_high	101000	10303
DATA	 2022-09-22-08:12:51	1663845171	1663845171	71344	9867	70	95	1663845171	 8.8.4.4	1220	5797	29400	23626	26301	1	 dl_low	 ul_high	101000	10406
DATA	 2022-09-22-08:12:51	1663845172	1663845172	71344	9867	70	94	1663845172	 1.1.1.1	1221	6251	32800	26575	26290	1	 dl_low	 ul_high	101000	10510
DATA	 2022-09-22-08:12:51	1663845172	1663845172	71344	9867	70	93	1663845172	 1.0.0.1	1221	6403	30100	23720	26278	1	 dl_low	 ul_high	101000	10615
DATA	 2022-09-22-08:12:51	1663845172	1663845172	71344	9867	70	92	1663845172	 8.8.8.8	1221	5812	28100	22310	26267	1	 dl_low	 ul_high	101000	10721
DATA	 2022-09-22-08:12:51	1663845172	1663845172	71613	10222	70	95	1663845172	 8.8.4.4	1221	5818	27200	21403	26255	1	 dl_low	 ul_high	101000	10828

Any thoughts @moeller0 on this?

Correct.

Isn't that what it's supposed to do?

Looks like you only get upload-related bufferbloat is that right? If so you might be better with just fixed download and lower upload?

Is your cable connection truly variable capacity? And in both directions or just upload?

For any fixed direction you should probably discard using these autorate approaches since they will necessarily worsen performance to test the capacity all the time.

So the upload in the middle seems to work well, ramps up quickly and stays high pretty consistently. I just wonder about the download starting at 2233, this initially reveals that the base rate was too high, but then stays more or less constant, while the shaper rate climbs back up to baseline. I would guess that baseline here still is incorrect... (but that is fine, climbing back to baseline if there are no delay spikes is intended behaviour).

I would say in general yes, but I have not thought through all of the details yet.

Since we only have RTTs we really can not figure out what caused the 3 RTT spike starting from 2341, could be that the upload was too high, but also possible that the lowish download already exceeded capacity. I keep harping on this, but the two achieved rates we have (or their ratio to the respective shaper rates) are not sufficient to resolve that ambiguity.

Hummm... I don't know. Does the graph tell that?

By fixed download you mean, use the maximum download rate achieved without sqm, reduced by ~25%, in all three download variables in the config file?

It hasn't been the case in the last months, but since Monday I've noticed a link deterioration, randomly lowering the rates. Because of that, I thought it would be better to set up adaptive bandwidth.

You mean cpu-wise?

You mean, base_dl_shaper_rate_kbps should be lowered?

This is about your long discussion on alternative approaches like ntpserver, hping etc, right? Or is there anything I can do to improve the test?

Just to illustrate what I said above, here are two graphs I use to monitor link quality.

(The circled area shows the link degradation on Monday. The gap comes from an outage. Before the outage, you can see how stable the Internet link was, with latency in the 99th percentile stuck around 11ms)

The leftmost graph shows cake bandwidth, the rightmost shows ping results to 8.8.8.8. The ping results come from this code, executed every minute:

/usr/bin/ping -qc20 -i 0.5 8.8.8.8 | /etc/snmp/ping-to-json/ping_to_json.sh

In this ping results, I mostly focus on latency.

Potentially... Since I suffer from the same issue that I can not for certain determine which direction was congested...

1 Like

@gadolf do I read that graph correctly that you had cake-autorate running and it held the bandwidth constant for days until problems started developing? If so this is a first for me and super cool. But I may be missing something and it seems too good to be true.

Certainly I suppose that if base is set to max and min below that this is what would happen anyway. So that way the system is prepared for any congestion or other link deterioration that may arise in the future. It's a bit like @richb-hanover-priv's idea that this is used as a passive monitoring tool, but here the added benefit of kicking in and actively getting on top of any problems if they arise.

Perhaps from the latter angle everyone should be running this?

Round robin pinging is certainly doable. This is basically what the code in my sqm-autorate script does, and ping_ts uses a simplified version of that code that targets just a single reflector, so it could be "unsimplified" if necessary.

However, there's the ongoing problem of perl threads - ping_ts now also uses perl threads to enable it to handle timeouts properly, so it won't work for @patrakov and probably many others.

I did briefly look into what can be done without threads, but it's a massive pain because none of the decent event-driven perl modules/frameworks are available on OpenWrt without installing CPAN which requires a ton of other dependencies. This means that there's no easy way to do multiple concurrent timers/alarms for events such as sending and timing out ping requests :disappointed:

Ah bother. Linux is apparently due to ship with rust at some point in the future. But otherwise is the only other good option C? @Lochnair said he had an ICMP type 13 approach already written in C - @Lochnair could that be made to round robin and/or issue ECHO requests by sending out requests with interval X between them and all sent out according to offset Y from a common point of time? To elaborate on the latter I mean something like this:

image

Because we want the timings to be maintained such that we get increased granularity with multiple reflectors that is maintained (so we don't want ECHO requests to drift). Otherwise pings can start off in the right way but owing to drift we can lose the granularity.

fping is nice because it explicitly offers round robin and forces maintenance of granularity by specifying 'period' between packets to given reflector and 'interval' between individual echos:

fping --period $reflector_ping_interval_ms --interval $ping_response_interval_ms

So I think capturing this would be nice, either by true round robin or by individual instances working on the common offset Y1/Y2 as identified above.

Hum, I'm afraid not. Before the gap in the graph, I was running cake with fixed bandwidth, no cake-autorate running, sorry.

1 - Cake, fixed bdwt
2 - Cake, adaptive, standard
3 - Cake, adaptive, stall
4 - Cake, fixed bdwt

EDIT: During 1 and 2, I lowered the max bandwidth to around 300 Mbps, since I wasn't sure the ISP was capable of handling more than that, as per some speed tests in the period.

1 Like

Simplified code, but what it's currently doing is this:

while (true) {
    sleep_time = tick_duration / reflectors_len
    for (reflector in reflectors) {
        send_ping(reflector)
        nanosleep(sleep_time)
    }
}

In fping terms I guess this would be fping -l --period=0 --interval=200 1.1.1.1 1.0.0.1, though that doesn't actually work.

Would it be feasible to support user-defined reflectors?

I can image use-cases that would want to optimize with respect to a specific host or hosts.
Rather than a generic sampling of public reflectors.

For instance, most online games run on a known set of dedicated servers.
An esports competitor would prefer to optimize with respect to whichever servers host their game.