I’m interested in applying cake-qos-simple to only specific MAC’s. I’ve commented out the necessary parts and it appears to work. Is there a better way to go about it?
ether saddr { XX:XX:XX:XX:XX:XX, XX:XX:XX:XX:XX:XX } oifname wan ct mark & 128 == 0 goto store-dscp-in-conntrack
Edit:
Looks like there’s more I would need to do and I’m not sure it’s within this projects scope. Sorry for the noise.
I‘ve been running with it since I posted about it and it’s working well. I just feel more comfortable applying conntrack to only machines that need it and that I’m in complete control of. It limits the potential for abuse and unforeseen issues.
I thought I could make it more efficient if I did it via the ‘tc filter’ portion of the script but my knowledge is lacking. It seems I would need to add a new line per MAC if I go that route. I didn’t get too far. I just have this in my notes should I get motivated to come back to this.
I wouldn't go that route. What you could do instead is have nftables apply a mark and then match on that mark in tc - like so:
But I'm not sure what the advantage of that would be versus instead just limiting classification in nftables to source MAC as you do already.
Correct - it's up to the user to set the outbound DSCPs as desired using nftables and/or by setting it in the individual LAN clients. I personally have mixture - e.g. router sets stuff like DNS requests or NTP requests to 'voice' and my office computer sets Teams/Zoom traffic to mixture of 'video' and 'voice'.
It's difficult to generalise. My thinking is that with the custom nft.rules file users can easily add their own rules for individual use cases by following the nftables format. I don't see the point in making a separate system for cake-qos-simple that translates to nftables rules.
Yes it’s very powerful. I think leveraging connection tracking is super useful to enable setting of DSCPs on upload at router and/or by individual LAN clients, saving those to conntrack, and then restoring DSCPs on download. And harnessing nftables directly provides the flexibility and power that comes with that.
@moeller0 I have a question for you. I've been observing the tc cake stats whilst running speed tests and have observed, for example, the below taken during a saturating download speed test with the 'ondemand' performance governor:
I am trying to understand 'pk_delay', 'av_delay' and 'sp_delay' - I gather that these are short for 'peak delay', 'average delay' and 'sparse delay', and that EWMAs are involved. But what these represent is very hazy in my mind.
I'd really value a kind of summary on what these values represent and how we can interpret them.
Looking through various posts on the forum in which you comment on 'pk_delay', I gather that sometimes large values can be indicative of CPU saturation.
In what way can CPU saturation give rise to an increase in 'pk_delay', and how can we tell whether a large 'pk_delay' is a result of CPU saturation or something else? I'm still curious about whether the 'ondemand' performance governor might be hurting performance on my router.
Yes, these are EWMAs of the peak, the average and the "sparse" delay... sparse delay is for flows classified as sparse (sp_flows) these get a slight boost over other flows and hence are expected to see lower latencies. The problem with these measures mainly is these are EWMAs that are changed with every packet dequeued, so after a saturating load, as long as there still is some traffic, even the peak delay value will come down relatively quickly... I always phantasize about having the peak value be the true maximum, but reset on each read (so maximum sojourn time since last read-out), but I have not even thought about how to accomplish that...
pk_delay takes the most recent value with a higher weight than av_delay, so it will show larger swings and hence is easier to interpret. Other than that if cake runs out of CPU cycles (or does not get scheduled quickly enough) it will not drop packets hard but show increased delay, and as I said pk_delay tends to be more affected by that.
We can only take this as a hint, not as proof, then I would correlate this with current CPU load (ideally per CPU) and potentially CPU frequency to get stronger evidence.
Thanks for this explanation. I'm still a bit hazy. Is there any documentation on these values I wonder? Might it be possible or helpful to describe the situation from the perspective of a few particular packets and what cake does with them, and how this affects these metrics?
For example:
What governs this other than available CPU time?
Is there some sense in which the connection itself has an impact?
Suppose cake bandwidth is set to 60Mbit/s and true bandwidth as in capacity on the line is 10Mbit/s, and there is an attempt to exchange a) load that is 20Mbit/s b) load that is 70Mbit/s. How does the scheduling work in those situations? Or perhaps other such hypothetical examples might be better.
So pk_delay simply weights past and current value differently than av_delay, but both get updated for every dequeued packet...
If I set up a timer to wake/run cake again in 10ms and the kernel takes 20ms there is not much I can do, as I will be 5ms later than desired, the question then is how to deal with that unsolicited delay...
So are the delays 'pk_delay', 'avg_delay' and 'sp_delay' always associated with (avoidable) delays introduced by processing limitation (in terms of taking time to get round to doing something) or is there a situation in which something else can give rise to the delays?
Would a hypotethical computer with infinite processing capability result in these values always being zero?
No these really just are (biased) EWMAs over the sojourn times of dequeued packets... anything that makes a packet stay enqueued longer will result in delay spikes... be that that the queue can not be serviced as quickly as desired or that the the number of arriving packets exceeds the number of departing packets (and that to a degree is the reason for having queues, they act as "shock absorbers")
Is the servicing of the queues in any sense limited by the cake bandwidth? I initially wondered about that, but I thought the cake bandwidth dictates the rate of dropping, not the time of servicing packets in the queue.
Does a slow connection somehow limit the rate of servicing a queue? If so, how does that work?
Once the packets have been downloaded, and cake can either send them on their way or drop them, shouldn't there be essentially zero build up of queues given sufficiency processing capacity? If so, then isn't significant delay values always a measure of processing limitation?
Yes and no, conceptually cake needs to become active when:
a) a new packet arrives and is enqueued (and timestamped)
b) a packet is about to be dequeued (if operating as traffic shaper cake will dequeue some packets and then wait for some time to dequeue the next packets)
And b) is clearly affected by cake's bandwidth setting.
As long as b) is not happening, packets already in the queue will "age" that is the current time will increase while the enqueuing timestamp stays constant so the sojourn time at eventual dequeue time will increase. Any packet being enqueued while cake is waiting for the next time to dequeue packet(s) will grow the queue (actually cake uses stochastic flow queueing so there are multiple queues in parallel, but that is irrelevant here).
Anything that delays b) over the time it was supposed to happen will result in lower throughput than expected from the shaper setting, unless the traffic shaper tries to compensate and the dequeues a bit too much (resulting in less throughput sacrifice for being late at the cost of increasing latency transiently a bit).
Yes, if we keep things simplistic and assume only fixed sized packets an a set shaper rate of say 100 packets/second that means that if cake dequeued a packet now, the next packet will be ideally dequeued at time now+10ms. During that 10ms packets being passed to cake will need to be put into a queue and stored... (now during that wait time conceptually for a given queue there could be a drop scheduled, I a not sure whether such drops will actually be executed at that time or whether they will wait for the next dequeue action, I believe the latter).
No, the while idea about AQM is to better manage the queue, and the trick of sqm with its traffic shapers is to get control over the relevant queue... the only time a queue does not change noticeable (that is the queue size will only change by one packet) is if the incoming rate and the outgoing rate are identical (with the limit being ingress rate < egress rate, at which case the queue size fluctuates between 0 and 1).
But the consequence of this is that with a traffic shaper, we can not disambiguate whether the queueing delay is growing, because of more ingress that ideal, or because the egress rate is lower than expected/desired...
As I said to diagnose "processing limitation" we also would like to see that correlated to longer queues we also do see (close to) saturated CPUs.
I guess from my simplistic description, we could try to add a "dequeueing slack" statistic where we would EWMA the difference between desired dequeueing times and realized dequeuing times, with the expectation that this value will increase when we are not getting CPU access in a timely enough fashion...
So it seems that cake is queuing up packets, retaining them for some time in dependence on shaper rate (not just processing limitation), and then removing packets from the queues and either sending them onwards or dropping them in dependence upon the shaper rate.
Doesn't this introduce some degree of avoidable delay between the time a packet is received by the modem and sent on its merry way, which seems to run counter to avoiding any latency increase? I am sure you are going to tell me that this is not avoidable, but I am not fully getting why yet.
I am thinking - isn't buffering only needed to compensate for processing limitation, rather than in effecting the shaper rate?
Returning to the 100 packets/second example, rather than queuing and picking packets out from the queues every 10ms, wouldn't it instead be possible to simply process every packet as quickly as possible (with queuing only if needed to deal with processing limitation) and drop packets as necessary in dependence upon the shaper rate, and whether the packet is to be prioritised or not? I am supposing this should mean not seeing the sort of 10ms+ values I am seeing in the various delay statistics.