This might be the coincidence of a.. fiber line snapping right as i did the test as there was a high winds warning.
But the issues I've been having with cake was that during ingress it would be very very jittery and you can feel that while gaming, where codel worked as cake did while I was in the states.

I'm not actually sure,there wasn't any documents provided with the unit i was given, however they claim they don't throttle the connection, but also said that if alot of users connect to the internet it will lower the speeds... so... im not sure if they understood what I asked.

but under the experience of using this type of network, im frustrated and annoyed as I don't know if its due to something I did or the ISP is doing... cause all results feel extremely variable except for the past few days.

@moeller0 I was also playing around a lot with cake-autorate today. It seems that using tsping and hence pure OWDs actually doesn't work so well on my irregular connection; given that in cake-autorate only the download OWD is considered when setting the download shaper rate, then even though the upload OWD climbs to say 400ms on heavy saturating download, the download shaper rate is simply allowed to increase and increase. By contrast, using fping and hence synthetic OWDs set based on RTTs, the increase in upload OWD is factored in when setting the download shaper rate, and hence large RTT spikes associated with increased upload OWD are avoided.

This simply seems to be an oddity associated with my connection since at least one GitHub user does not see this issue and gets fantastic results with tsping and its OWDs.

This made me play around quite a bit with the delay settings. And I also found that on my connection the default bufferbloat refractory period of 300ms may not be enough to clear bufferbloat. From my testing it seemed to take more like 2 seconds for the reduced shaper rate to clear the excess data. Is that plausible or absurd?

As it should, we have a strong assumption that congestion is directional... and it seems to be otherwise downstream rates would not increase... Now, I agree that for whatever your ISP does on your link a policy of coupling both ODWs to detect non-direction congestion might be a better policy, luckily a policy easily selected by using one of the RTT delay sources instead of the OWD delay source.

My question is more, what happens in the upload direction if you actually start generating an upload load?

As so often that really depends... IMHO waiting 2 seconds after each rate decrease is not helpful, assume the rate reduction step was not big enough, say the bottleneck dropped down from 100 Mbps to 10 Mbps while we only dropped the shaper rate from say 100 to 75 (assuming the largest reduction step) we will still operate with a rate 7.5 times higher than the bottleneck rate. From that perspective, the longer the refractory period the larger the rate reduction step needs to be if we want to maintain lowish delay under load. So in your case maybe we need to introduce a minimum reduction step @bufferbloat (the current code will essentially just maintain the current shaper rate on detected congestion so maybe we need to already start with a "gentle" reduction)?

The upload OWD delay then decreases but only on substantial load. I tested and teams load is not enough. It needs to be a total upload load of say 4 Mbit/s or more.

But the flip side is that without this we drop far too low on my connection so we swing from say 60 down to 15 when 40 would have sufficed. Because with each drop the buffers are still emptying. And the net effect is that the shaper rate is brought too low and takes time to recover and so lots of lost bandwidth. I am finding the 0.25 drop with 2s wait seems to be enough to clear by the end of the 2s, and during that time the delay and number of detected delays X out of Y steadily decreases from say 6 to 0.

Currently we now scale reduction between 0 and 0.25 based on the ratio between the average delay and the average OWD delta threshold.

I don't think the latter will often be zero because for bufferbloat detection X out of Y samples will have to exceed the OWD delta threshold.

Say the OWD delta threshold is set to 100ms and the average OWD delta threshold is set to 150ms, then the average is not likely to be much less than 100ms if 3 out of 6 samples exceeded 100ms. If the average is 100 then we scale by 100/150 (so 2/3 of the full 0.25 reduction).

So there's a kind of built-in minimum already.

I'd wondered before about switching from the X out of Y to just the plain average for detection, which would seem simpler to understand and would harmonize bufferbloat detection with the rate reduction determination - by basing both on average deltas, but you didn't think this would be a good idea.

I would talk to the ISP, this is rather problematic...

Swinging too low is part of our design... we really need to get below the bottleneck rate quickly, dissipate all queues and the slowly ramp up again. This is all a trad-off between throughput and latency under load, there are different (too many) toggles in our controller to affect that trade-off (and as local policy all is fair game, as defaults I think we should aim for low latency over throughput).

But if on bloat detection we really just reduce the rate by ~0% (because we just crossed the threshold, we will stay at too high a shaper rate for a full 2 seconds, that is not going to be great for responsiveness. The thing is we do not really know how many reduction steps in a row we need to track the true bottleneck rate, so from that perspective we would like to be able to make reduction steps as quickly as possible. We introduced the refractory period really only, because we grudgingly accept that for the system to react we need to give it a bit of time. It is well possible that the default 300ms? might really be not ideal for your link, and you might need to set it higher (I question though that a full 2 seconds are really the minimum).

Yes. For me that means that I would set the detection threshold a bit lower than we did originally so that we are more likely to get conservative hold current rate event with the delay being close to the acceptable delay, but that is not going to help much if the rate is kept to high for too long due to our refractory period.

It will often be close to zero I think, essentially holding the current rate (maybe with gentle reduction) instead of causing a noticeable reduction step.

I might be missing something here, but I thought if the threshold is 100 and the average OWD delta threshold is set to 150ms we will reduce between 0 (average close to 100ms) and 25% (average close to 150ms), so if the true average is 100 so just above threshold we will reduce by 0%. But i might not understand what the current code actually does.

No this is as I trued to explain before poison for our diversity approach, the X out of Y approach counts the fact that none of our reflectors are truly trustworthy an each individual path to a reflector might be congested upstream of the path segment we want to control. That is why initially we basically operated on minimum of the full set of reflectors. No with interleaved/paced reflector emission the minimum over the full set has temporal side-effects and hence we have the X out of Y approach. But the main point still hold, each individual reflector owes us nothing and hence its data needs to be handled with care.
I might be insufficiently creative, but I see no way to maintain the diversity angle when operating on the average RTTs over all reflectors, because then even a single reflector can pull the average over the threshold.

No we are not scaling like this at present. Perhaps we should?

Right now:

  • bufferbloat detected if X samples out of Y samples > delta OWD threshold (default is 30ms);

  • reduce shaper rate by reduction factor from 0 to max bufferbloat reduction (default 0.25) based on the ratio between the average delta across Y and the average OWD delta threshold (default is 60ms). That is, scale the reduction factor according to average delta / average OWD delta threshold; and

  • do not change the shaper rate whilst bufferbloat refractory period active (default 300ms).

See here:

So if the average delta is 100, and the average threshold is 150, we jump by 2/3 of the maximum 0.25 reduction. If the average delta is >= 150, we jump by the maximum 3/3, i.e. 0.25 reduction. If the average were to be 0, then we would jump by 0, but it will never(?) be 0 because X out of Y still has to trigger and deltas are generally positive.

Yeah, this is not what I envisioned, I always though to make this not proportional to the total delay, but just proportional to the above-threshold delay. However at that point I think it is useful to introduce a minimum reduction step variable as well. The idea was to detect congestion events with the old diversity approach (what we still do) and then scale the reduction step between minimum_reductionstep_pct and maximum_reductionstep_pct linearly proportional to the the average OWD position between threshold and shaper_rate_max_adjust_down_bufferbloat. That gives a more direct control over the amount of the reduction step. With your approach there is now more coupling between the detection threshold and shaper_rate_max_adjust_down_bufferbloat which IMHO makes things somewhat hard to predict.
With your current settings we seem to start with a reduction step of
30/150 = 0.2 * 0.25 = 0.05 -> 5%
go over
100/150 = 0.666666666667 * 0.25 = 0.166666666667 -> 16%
to finally
150/150 = 1 * 0.25 = 0.25 -> 25%
if we set the threshold to 50
50/150 = 0.333333333333 * 0.25 = 0.083 -> 8%

almost a doubling. I think the alternative with only scaling above threshold makes sense as does the explicit minimum reduction step. Because if I need to to change the detection threshold I do not necessarily want to change the reduction step size.

OK so scale according to:

(average OWD delta - delta OWD threshold) / (average OWD threshold - delta OWD threshold)?

So with a delta OWD threshold of 30ms, and average delta OWD threshold 60ms, we'd scale according to:

  • average of 30:

(30 - 30)/(60 - 30) = 0/30 => so 0 * 0.25 = 0

  • average of 45:

(45 - 30) / (60 - 30) = 15/30 => so 0.5 * 0.25 = 0.125

  • average of 60:

(60 - 30) / (60 - 30) = 30/30 => so 1 * 0.25 = 0.25

Subject to minimum, which should be configurable and default to?

Something like
(average OWD delta - delta OWD threshold) / (average OWD threshold - delta OWD threshold)
to get the fraction in the range. then apply this to -> reduction_scale_fraction
something like (untested):
minium_reduction + ((maximum_reduction - minium_reduction) * reduction_scale_fraction)

And obviously bound reduction_scale_fraction to be in the range of 0-1...

Gotcha. And default minimum reduction?

Not sure start with 1% or so? That way we can see whether this will result in sort of a "stay the course" behaviour for small delay deviations... or set it to 5% if you thing the current behaviour was decent. This requires some testing and tweaking at least I can see no simple first principle here to invoke to pick a good value...

1 Like

What's your prediction about the effect of switching from what I've implemented so far to this modified plan? I suppose now there will be more the full range from between 1% to 100% of the 0.25 reduction, whereas with my approach it was in practice maybe more like 50% to 100% of the 0.25 reduction since by triggering with X out of Y the average was already pretty high anyway.

Nah, your method scaled from 5% to 25% (see calculations above) with the discussed changes it will still scale from 1 to 25% but this scaling will now be more independent of the detection threshold (differently dependent, easier to predict). So tis is primarily a change that should have little influence. However it now allows more easily to change the minimum and maximum reduction steps.
It might be interesting to try minimum values from 0-5% and see whether these make the controller less "jumpy".
I guess there still is the question whether we can give useful guidance of how to select " average OWD threshold" (same for delta OWD threshold, but there we at least already have some ideas, based on percentiles of low load samples....)

1 Like

BTW is there a way to have OpenWrt run as a virtual machine that translates WiFi into virtual ethernet interface passed through cake-autorate?

Looks like there might be:

I wonder how random WiFi networks compare with 4G and Starlink in terms of providing a variable bandwidth connection that could be addressed by cake-autorate.

Here is a waveform test from the one I'm connected to right now:

There might be, however this might require internal reflector(s)... that said @dtaht's make-wifi-fast effort tried to move enough fq_codel into the wifi stack to solve the latency issue right there... so maybe get a better AP first? :wink:

But if you ask about what you can do to fix crappy WiFi while traveling, then things become tricky fast :wink:

But that only worked with a small OWD delta threshold (30ms) and large average owd delta threshold (150ms).

If we increase the OWD delta threshold to 100ms then with an average OWD delta of 100ms:

  • old method → 100/150 → 2/3 of 0.25; and

  • new method → (100-100)/(150-100) → 0 of 0.25, so set to the minimum.

This will evaluate to minimum_reduction for reduction_scale_factor equal to 0....

But that is my point, with your original method changing the detection theshold will immediately change the minimum_reduction magnitude, which I find surprising. For any fixed set of detection and average thresholds, the new method can be set to match the old, the opposite however is not true....

I agree that your proposal makes more sense. I think I've captured it with this:

I also wonder about best way to disable now. Values that set the divisor to zero would give error so disabling by setting the average OWD delta threshold to the OWD delta threshold the same seems logical to me.

I think the simplest would be to set minimum and maximum reduction to the same value...

But it depends how ypu want to implement this, via an if statement with two code paths or by simply making sure the computation always returns a fixed result...

All sorted now I think.

Here is a new log file capture for when you get a chance to tweak your Octave plotter: