CAKE DiffServ theory

In respect of this paper:

And specifically:

Please can a kind soul give me a back of the envelope style calculation explaining the 1250ms latency spike associated with the 32 TCP stream initialisation and the 200ms steady state?

Is this example in the paper with shaper applied? If so I don't entirely understand the spike and steady state increase.

Really appreciate any thoughts.

So is this:

Not enough on its own then? Why not?

In other words why is DiffServ still needed?

I ask because of all the threads on this forum about DSCP markings, which practice seems a little nebulous and messy.

Whereas the theory in respect of the initial shaper part of the paper makes sense to me and seems nice and clean.

For modern connections is this only a problem relating to the upload direction and a huge number of parallel streams? If not, can a couple of real world examples help my weak but very eager understanding I wonder?

BOTE: 32 bulk flows all starting at the same time, and all having an initial window of say 10 packets result in 32 * 10 packets injected into the network or roughly 32*10*1514*8 = 3875840 bits which at 10Mbps (or rather 10-2, remember our probe flow runs at a fixed 2 Mbps) take: (32*10*1514*8)/((10-2)*1000^2) = 0.48448 seconds or 484 milliseconds, but these flows will not stand still, but rather try to ramp up their congestion window dumping even more packets into the queue until they get sufficient feed-back to slow down again, the exact dynamics of that depend on the exact TCPs used by the senders and the path RTT to the receiver and the resulting ACK flow.

The 200ms steady state probably is a result of the number of flows, at 33 flows each will get access to the link every 33 * 1514 * 8 / (10*1000^2) = 40 ms (assuming fair round-robin and all flows using maximum sized packets), and due to fair queueing each flow will get 10*1000/33 = 303 Kbps of capacity. The TCP flows will probably adapt to that rate more or less, but the real time flow will keep running at 2Mbps, so cake will ramp up the dropping rate and it seems that the 200ms are the steady state (at the given temporal resolution of the measurements) of cake using drops trying to reign in the unresponsive flow and that flow unrelentingly continuing to send packets at 2Mbps.

In this context each flow of N will get 1/N of the total capacity and will have to wait for all other N-1 flows to have their packets transmitted before getting its next packet transmitted. Sometimes this period of N transmission times can be too large (e.g. think an on-line game that send updates of world state as a burst of packets every 1/60 seconds, but the client can only make sense of the new world state after having evaluated all of the packets, so ideally these bursts of packets are not maximally interleaved at the AQM but transmitted back to back; which requires to have a way to select packets for special treatment, and that is what DSCPs are used for).

Not really, it is a function of whether the "latency" requirements of your important flow can e fulfilled with getting transmitted every Nths "timeslot" or not, if yes no special treatment required, if no, you need to carve out an exception for this flow. This is true in general (as transport protocols are designed to "fairly" share bottlenecks) but even more important for a flow queueing system like cake that can and does enforce pretty equitable sharing of the bottleneck capacity between all active flows. See how this description does not contain the direction at all. Naively one can expect this problem to show up more often ion the uplink, because that often is smaller then the downlink for many popular access technologies, but at the same time many users have more ingress/download traffic than upload traffic (downloads, video streaming, ...), so I am not sure the naive hypothesis is correct.

Just look at all the gaming use-cases in the qosify thread or the " Ultimate SQM settings:" thread, where admittedly often the issue is that the links are slow and gaming traffic adds up to a considerable fraction of capacity, but still this is where the desire for prioritisation comes from.

My take on this is that prioritization requires self constraint, up- or down-prioritizing a few carefully selected "connections" can do wonder, but one needs to take care not to go overboard; and I think one should always ask and test whether prioritization is actually required/helpful. Because prioritization will not "create" low latency de novo, but really just shifts resources (like transmit opportunities) around so is essentially a zero sum game for every packet that is transmitted earlier than its "fair share" other packets will be sent delayed by that packet's transmission time. (This is the rationale why I push back against configurations where apparently most traffic ends up in the high priority tiers, because at that point prioritization becomes futile).

1 Like

Thank you very much indeed for the helpful explanation. I think I get it all save for:

I struggle with this portion. Will each TCP flow not get 8*1000/32 and the real-time flow 2Mbit/s? I don't understand 'cake will ramp up the dropping rate' - is that the dropping of the TCP flows? Or the real-time flow or both? And why does this result in steady state latency increase?

It should not, because in flows mode cake will treat all flows equally so each of the 33 flows will get 10/33 = 0.303 Mbps, but the TP flows will adjust their sending rate (indirectly via the congestion window) to oscillate around that value, while the 2Mbps flow will also only get 0.303 Mbps, that is how flow queueing works, all flows seeking capacity >= their "fair-share" will be effectively throttled to their fair share.

That is true for all flows, codel/cobalt will start to drop gently in a way that is tailored to how TCP responds to drops but will reduce the interval to the next drop if a flow persistently shows sojourn times above target, and it will hence keep dropping faster and faster id a flow does not respond. Actually cake will also switch to a second mode similar to the BLUE AQM which is designed to handle unresponsive flows somewhat less gently then the codel component, I do not remember though how the switch from codel dropping to blue dropping is controlled.

My hypothesis here is that with the totally unresponsive 2 Mbps flow that is allowed only 0.3 Mbps the dynamics between the 2 Mbps arrival rate and the way the drop rate is gently increased result in a drop rate commensurate with the arrival rate at ~200 ms worth of packets in the queue (or rather 200-40 = 160 as the queue delay is additional to the round-robin scheduling delay*). The point is, unlike TCP which classically will half its congestion window (and hence the sending rate) after recognizing a single dropped packet our unrelenting 2 Mbps flow will not reduce its sending rate at all, so to get down to 0.3 Mbps we need to drop packets worth 1.7 M every second. Mind your without flow queueing ur 2 Mbps flow will mostly get 2 Mbps while the 32 TCP would share the remaining 8 Mbps more or less fairly among themselves.

There are some who claim that flow queueing (FQ) is broken because it will not allow our hypothetical 2 Mbps streaming load, but to that I cal BS, because without FQ every sufficiently unresponsive flow gets the lion share of the capacity, potentially starving all other more responsive flows to a trickle, which might be fine if that unresponsive flow is the most important flow on that link, but if the not starved flows are more important to the end-user he/she is out of luck.... FQ might not be optimal but it sure as hel is rarely pessimal....
But it does have the issue what to do if one wants to accommodate for flows that require more than their fair share of the capacity.... for cake one possible solution is to selectively move such flows to the higher priority bands, e.g. in diffserv3 the highest Voice tin gets 1/4 of the total shaper capacity, and if on a 10 Mbps shaper we move our 2 Mbps unresponsive flow to this Voice tin while keeping the 32 TCPs in Best Effort, we still can enjoy both the TCPs making progress as well as the unresponsive flow running at its desired 2 Mbps (because 10/4 = 2.5 so the tin has enough capacity to accommodate that flow, if we want to push an 8 Mbps fixed rate flow we are out of luck, as cake does not offer that).
That said, others like @dtaht, @tohojo, @chromi probably can offer far more detail and information.

*) Which in all likelyhood is not exactly 40ms either, but that should be the right ballpark number here. But that is splitting hairs, the sojourn time of packets in the 2 Mbps flow's queue will be ~200ms

1 Like