CAKE w/ Adaptive Bandwidth [October 2021 to September 2022]

I have patches to do this but failed testing due to missing ‘vstruct’. Going out for the evening, will check how to install vstruct then submit PRs when I get home

On openwrt, most modules for Lua should be installed via opkg if available, but vstruct has to be installed via Luarocks.

luarocks install vstruct

If you don’t have Luarocks installed, you can first get it via opkg.

In regards to EWMA, that’s going to be a @dlakelan question. That’s the math you guys run circles around me with.

But I am more than happy to help answer any questions around Lua syntax :slight_smile:

You see I'm lost about what this does:

            for ref,val in pairs(OWDrecent) do
                upewma = val.upewma and val.upewma or "?" -- Hacky Lua version of a ternary
                downewma = val.downewma and val.downewma or "?" -- Hacky Lua version of a ternary
                print("Reflector " .. ref .. " up baseline = " .. upewma  .. " down baseline = " .. downewma)
            end

In the shell script (and benefiting from @moeller0's suggestion rather far up above in the thread to apply separate up and down EWMA factors, which worked wonders) we calculate the deltas and then check whether deltas are positive or negative. And then in dependence upon the latter we update the EWMA with either a high factor (we want to increase slowly) or a low factor (we want to reduce quickly). This is because we want baselines to increase only slowly so that spikes get detected before the baselines increase and also because we want baselines to drop quickly because whilst delay can increase you cannot get a negative delay so it makes sense to reduce baselines sharply on a sequence of low values.

In summary:

  • determine delta = new value - baseline
  • is delta positive? If so, update baseline slowly
  • is delta negative? If so, update baseline quickly

The EWMA is just bog standard moving average - see here:

EWMA(t) = a * x(t) + (1-a) * EWMA(t-1)

So we just set a to either high or low to update baselines either quickly or slowly for the new values.

There was a suggestion to use a two-pole butterworth filter but I doubt anyone will want to code that into shell. But might be worth using that in lua instead (although that's untested and presumably complicates things).

I just can't understand the logic / how the lua code works in this respect. I don't understand the OWDrecent for example. @moeller0 can you make sense of the code here:

OWDbaseline[timedata.reflector].upewma = OWDbaseline[timedata.reflector].upewma * slowfactor + (1-slowfactor) * timedata.uplink_time
            OWDrecent[timedata.reflector].upewma = OWDrecent[timedata.reflector].upewma * fastfactor + (1-fastfactor) * timedata.uplink_time
            OWDbaseline[timedata.reflector].downewma = OWDbaseline[timedata.reflector].downewma * slowfactor + (1-slowfactor) * timedata.downlink_time
            OWDrecent[timedata.reflector].downewma = OWDrecent[timedata.reflector].downewma * fastfactor + (1-fastfactor) * timedata.downlink_time

            for ref,val in pairs(OWDbaseline) do
                upewma = val.upewma and val.upewma or "?" -- Hacky Lua version of a ternary
                downewma = val.downewma and val.downewma or "?" -- Hacky Lua version of a ternary
                print("Reflector " .. ref .. " up baseline = " .. upewma  .. " down baseline = " .. downewma)
            end
            for ref,val in pairs(OWDrecent) do
                upewma = val.upewma and val.upewma or "?" -- Hacky Lua version of a ternary
                downewma = val.downewma and val.downewma or "?" -- Hacky Lua version of a ternary
                print("Reflector " .. ref .. " up baseline = " .. upewma  .. " down baseline = " .. downewma)
            end
        end

But I can just about understand the rest.

I'll let @moeller0 and @dlakelan weigh in with their expert opinions on the math here, but I will say that the Lua script is still very much a WIP. There is a lot in the current version of the script that is subject to change and improvement.

I know @Lochnair and I were just trying to prove out the looping constructs in concert with @dlakelan's experience with coroutines. I think we're at a super place now where the coroutine configuration is going to work well. But as for actually tuning the process and math--that's where we're just now able to start focusing.

I consider this to be hilarious, @dlakelan certainly is a math expert by virtue of his profession and education, but I am a wet-ware biologist for crying out loud (and over here biologist do not have the reputation of being mathematically gifted ;))

Honestly, I do not understand that fully (I would need to look at the code that "consumes" these structure fields), but I think that @dlakelan actually proposed a modified algorithm and these might be inputs required for that.

Side-note about the use of EWMA, there is nothing clever about this, it just happens to be a rather simple way to keep some historic state about the (recent) past in a single variable that also will auto-tune/adjust to new conditions, allowing us to not having to bother about the static delays to the reflectors, as the code will quickly converge on a reasonable estimate based on the recent history. The use of a faster convergence rate for baseline corrections down compared to up, is just based on the fact that we know that under load delay will artificially increase and we would like to keep these known delay increases out of our baseline (as that ideally reflects the minimal or path delay to/from the respective reflector).
In theory we could try different ways to account for the recent past, like moving averages or fancier models like moving median or 2-pole butterworth filter, but as far as I am concerned @Lynx design and tests indicate that the EWMA part seems tp work well enough to not bother at the moment to change that.

1 Like

After testing sockets in C last night, I ended up doing another timestamp PoC in C just to see how difficult or not it would be. Chucked in pthreads to test that as well.

If it's gonna be used at all or not I don't really care, this was just because I thought it'd be interesting.

Pushed to a new branch on @_FailSafe's repository in case anyone else is curious about this.

Next up - UDP reflector.

1 Like

@Lynx Using your latest OWD script, I had two instances over this weekend where something "bad" happened which caused TC to completely tighten download bandwidth to 0 (or near 0) to the point where my internet connectivity was completely dead. I had to restart my SQM service each time to restore connectivity.

Unfortunately this happened back-to-back and as luck would have it, it happened whilst I was troubleshooting a work incident and did not have availability to dive into this issue. It was a "just fix it now" kind of situation and I didn't get any meaningful logs or data to pass along during this time frame.

I know that doesn't help improve anything, but I'm curious if anyone else has had a similar occurrence with the OWD variant.

Yes, we will re-introduce a hard lower limit (that is @Lynx is in the driver's seat here, I will just try to convince him :wink: ), setting a shaper to 0 is NEVER the right thing, because at that point latency under load is as long as the shaper remains in that wrong position, keep in mind that e.g. TCP absolutely requires to be able transfer packets in both direction other wise the lack of ACK packet will or data packets will make the flow stop hard.

The issue, I think, was that minimum and maximum as required parameters where removed from the OWD version to aim for a single rate configuration per direction; and we simply have not yet come around to re-introduce both of them as safety parameters (min should probably be >= 100Kbps and max = to the current interface link speed, no use in setting the shaper for a 100Mbps ethernet link to 1Gbps).

@moeller0 beat me to it!

There is still no hard minimum so it can suffer from chopping all the way down to zero and then failing under certain conditions. Hard lower limit seems to be the way to go. @moeller0 what did you suggest again? 1Mbit/s? Baseline drift may also be an issue.

Indeed, I just wanted to get some stats into a table, it isn't trying to do anything fancy just a big standard EWMA so far. However it is doing it at two different response rates, one that represents more recent samples and one that represents a longer history. This is all subject to major change.

2 Likes

For my link and use-case 1Mbps is just fine, but I think for safety reasons we could go a bit deeper if need be, but I fail to see much use in going below 100Kbps. Mind you a shaper throttling to 100 Kbps will not be fun, but it should not be as catastrophic as shutting down transmission in that direction completely.

What about min as a percentage of max instead of a static min?

FWIW I suggest adopting the baseline handling adopted in the shell script, configurable lower bandwidth limit, but then modifying the sending of requests based on:

I like the sound of this bit:

with every new returning request we can repeat our evaluation

Does this leave outstanding questions? Perhaps how the load is factored in. At the moment in the shell script load is calculated per tick. How would load be calculated for the different responses? Just based on present instantaneous load calculated between the requests? Or based on another time delta? This requires some thinking about I think.

For safety I recommend a default hard minimum below which we know things will break, otherwise problems with getting the maximum correct might invalidate our attempt at keeping the minimum sane. I am making things up here obviously, but the goal would be for that Minimum to never/rarely trigger, we really just need tp avoid the throttle hard situation, because delays will not recover from that....

Fair, but I start seeing major issues far before I even get down to 1mbps.

We still should have a lowish forces event rate, so that we are sure we slowly converge back to the set rates if neither load not delay indicate urgent changes, no?

Hmm. I suspect this relates to the configurable parameters in the script, like how the baselines update or how much the download and upload rates are punished (maybe too much). Or bad choice of reflectors for your connection. Or something similar?

We could try to fix the shell script to work, but honestly I think it is worth that you guys get something that works in lua now since that seems the best way forward. We can then optimise that to work on multiple connections. I am happy to test to death on my connection. And maybe I can even contribute to lua since I can actually understand most of it.

I am really keen that multiple individuals work on the same thing. Going back to my very first post on this thread, a problem hitherto has been that too many individuals have been working on their own DIY hacks.

Cheesy graphic time. Now we can work as a team:

image

So the goal from my perspective is to pull everything good together into one solution for OpenWrt. One adaptive rate routine to rule them all. @moeller0 has convinced me that the CAKE was not half-baked. It just needed icing.

Or one of these:

image

Or maybe something else, since not everyone approves of alcohol.

Haha - 'Sherry' the OpenWrt adaptive rate script to complement CAKE.

1 Like

I am not opposed to:
a) setting a default to > 1 mbps (mind you for such low rates we might also need to automatically adjust the threshold parameter) as a single full MTU will hog the link for 1000*(1500*8)/(1000*1000) = 12 ms and we probably need to allow for ~2 packets before our probes.

b) keeping this still as a parameter users could modify, that is we need a default that avoids hard failure and beyond that user's can tune it if they feel like it.

1 Like

As mentioned in my thread Why you need at least 3Mbps upload to get good game performance with ~1500byte packets: Doing the math lots of latency sensitive stuff will start to break down below 3Mbps. On the other hand plenty of people have slower connections than that. So I don't think we can easily just select that as the minimum. However perhaps the lower of 3Mbps or 10% of max but not less than 200kbps ?

1 Like