The memory used figure represents the peak queue length and doesn't necessarily mean it is the current figure. Also to say that cake is consuming memory is not quite correct. The kernel manages the actual packet buffers and hence the memory required for those buffers. Cake (or any other qdisc's job) is to arrange those buffers into a line or queue.
High queue lengths suggest a traffic flow or flows that are not responding to the usual 'congestion indicated' signals of either ECN or packet drop. As the overall queue length gets to 'memlimit' cake simply starts throwing away packets in an attempt to get the queue length back under control...this is actually the BLUE part of the combined CODEL/BLUE algorithm called COBALT.
You are unlikely to improve the situation by increasing memlimit, there may be more benefit by decreasing it but overall I'd investigate quite what traffic is going across the link before playing too much with parameters. Also check cpu usage.
I have seen some remote backup clients play badly with ECN enabled, CAKE signals congestion but the other end doesn't pass back the message, so the local backup client just keeps sending more and more. CAKE classifies those as 'unresponsive flows', and when memlimit is hit, BLUE kicks in and kills packets in that flow with prejudice! In my case despite the high queue lengths, CAKE kept the other flows nicely isolated and everything was responsive.
@ldir Thanks for the info. BTW it's an x86 box with an e6400 in it, CPU usage is almost non existent.
When time permits I'll see what type of traffic it is, it's a student house so who knows, the majority of them move out soon so I may never know. I'll wait and see how many complaints I get, if it's only one of them, CAKE maybe doing it's job properly with the DSCP marks and keeping the internet working well for everyone else.
The usage in those screen shots represent just under 48 hours if I remember correctly.
The connection I pay for is 40Mbit but it's over provisioned by about 4 Mbit, so 44 total. I've already lowered it to 38Mbit. It's been good so far, there isn't the same amount of traffic as when I had the problem reported to me.
With 100% upload there will be complaints for sure. It's looking more like a cable network problem right now, cable modem has gone offline or been reset twice by the students living there in the last 12 hours.
The hard part now is being able to VPN in and see the logs and signal levels on the modem before the modem is reset.
^^^ I can't help with custom sqm files because I have not written script for the tins because it is too complicated, I just dscp seperate the flows. Right now I have the issue where I can't set the standard dscp for a interface to bulk so that I can mark the connections I want to mark and go that route. I looked into ip set but I really don't want to make custom tins.
Hello, im not sure if below stats is normal Tin 2 and 4 registers some packets in lan eth0.1 interface but empty on wan eth0.2 interface, is this even possible.
Can you help me understand this better? Isn't piece_of_cake.qos essentially "besteffort" using a single tin (T0)? Wouldn't the diffserv4 option actually alter the underlying script so it uses layer_cake.qos instead or am I missing something?
Also, how would wash vs. nowash come into play in this scenario? Since nowash is the default, wouldn't it be recommended to place wash in the eqdisc_opts for both the WAN and LAN queues?
FWIW, here is my current sqm config file and I am very open to any opportunities for improving it:
IIRC, layer_cake.qos gets its default isolation mode from defaults.sh and that happens to be diffserv3. Sure you can also use layer_cake here, but it will not make much difference.
That really depends on whether you assume your internal dscps to be worth hiding from your ISP. In that case adding wash should be fine, for ingress adding wash might be more relevant and a better default than nonwash.
I would rather use the link layer tab to configure 18 bytes of overhead and mpu of 64 bytes explicitly, but that is a matter of taste, as the docsis keyword will result in the same outcome.
And what is the logic behind 3 individual instances of ingress shapers all set at option upload '450000'? Depending on the routing you might try to sink 450.000*3 = 1350.000 or 1.35 Gbps of traffic, or the IOT and GUEST shapers might rarely trigger since lan already makes sure that internet traffic stays below 450000 Kbps? Or are you trying to limit IOT and GUEST's access to LAN? (I assume that IOT and GUEST from their names would be isolated from LAN)?
Makes sense as well. I'm not concerned about my ISP's interpretation of my internal dscps. They're going to do whatever they are going to do anyway, regardless of my wishes, I'm sure. But based on my readings and understanding, it does seem having wash on the ingress would be desirable.
Oh dear... the logic. We had to go there.
I originally started out with a basic SQM setup (single queue on my WAN interface) and it's been working great as far as I could tell. You probably will recall some of my posts on other threads about my SQM performance and I had it tuned quite nicely. But, I was very bothered this weekend by the fact that I would see almost no activity in my BK tin, tons in BE, some in VI, and tiny amounts in VO. I started trying to investigate and ended up finding this thread. After reading hundreds of posts between this thread and the previous "Ultimate SQM settings" (>700 posts), I decided to give some of this a try and ended up going down a rabbit hole trying to solve for something that maybe doesn't matter in my case at the end of the day...
Back to the details--apart from my tinkering with a NanoPi R2S, my main router is a dedicated OpenWrt x86_64 VM. My AP is another standalone R7800 running OpenWrt, so it's outside this equation. I have my primary LAN VLAN, as well as a VLAN for guest (eth1.9) and another for IoT (eth1.99).
I was at the point last evening where I had SQM on my WAN (eth0) and LAN (eth1) as well as several iptables rules and dnsmasq ipset settings, mostly the way @hisham2630 had them with a few tweaks+additions. I am seeing a much better balance of usage between the four diffserv tins, overall. But, I started to wonder if the eth1 (LAN) queue was enough for all traffic heading into that physical interface or if it was literally only shaping for eth1, but not for eth1.9 and eth1.99. So I added queues for those VLANs as well.
Unfortunately, after >800 posts between this thread and the previous, there is a fair amount of conflicting information, mixed with various "noise", and a lot of very specialized one-off use-cases thrown in. So I probably ended up with some very goofy "logic", evidenced by my SQM config I shared.
I am NOT trying to limit bandwidth to my guest and IoT subnets. Ultimately I just wanted them to be included in good prioritization that SQM does on the whole. Having said that, perhaps this is all just over-the-top in my use-case and I should go back to a single SQM queue on the WAN. Your thoughts?
So a shaper on eth1 should shape all VLANs on eth1 as well as non-taggep packets, but a shaper on eth1.99 will only act on packets with VLAN tag 99. That should be relative easy to confirm, just set the eth1 shaper to a really low value and run speedtests from all your "sub-nets". But if you have enough CPU to spend, running the same packet through two shapers should work as well
Spot on, Sir! After I wrote my last reply to you, I started rethinking the whole thing. Funny what a night of sleep and writing out one's thoughts can do to help bring clarity to a situation.
I actually started peeling this thing back after some testing. The counters in my eth1 tins would increase at the same rates as the counters in my eth1.9 and eth1.99 tins. During a heavy speed test, this was very obvious using my netdata-chart-sqm chart. This conclusion is very logical, again, after a night of sleep and "talking it out" with you here.
I went back down to a single queue for eth1 and single queue for eth0. While I do have CPU to spare on my x86_64 box, my goal is to eventually get things worked out to run fully from my NanoPi R2S and I'm sure having four SQM queues on it would make it beg for mercy. I'm not even sure it will like two SQM queues.
But anyway, in the interest of keeping SQM straight forward (which is why I have loved ditching traditional QoS), I think this is good for now.
Going back to this wash vs nowash notion for a moment, though... something else I thought about is that given @hisham2630's dscp script already does this:
... this means with eth1 having its own queue now, dscp from the upstream ISP was getting washed anyway regardless of the CAKE default of nowash. So adding the wash option is not going to break anything, it's just redundant in this case. Is that the way you would interpret this as well?
For anyone who's been testing and tinkering with all this for a while, I would love to have some feedback on a PR I submitted against @hisham2630's project.
The intent of this PR was to combine the ipv4 and ipv6 DSCP scripts into a single script that would act on both iptables and ip6tables in dual-stack environments, or just iptables in ipv4 environments. The driver for this is ease of administration if one is adding multiple rules. This new script would, in a dual-stack environment, create both the ipv4 & ipv6 rules based on a single rule definition instead of separate iptmark and ipt6mark calls for each rule.
This would also allow a relatively seamless migration of current configuration during a transition from ipv4 to dual-stack.
egress: If the ISP is going to ignore my DSCPs, and they probably should, then why should I waste CPU in clearing them?
ingress: Incoming (from ISP) marks are mostly irrelevant for me anyway since I re-mark with my own mapping courtesy act_ctinfo for the benefit of CAKE's tins. Now whether CAKE should then wash those restored markings is a debate to be had especially where wifi is involved.
I realize the 2nd and 3rd options create three rules each in the dcsp_mark chain (one rule per subnet). In several cases such as this MS Teams use-case, I would prefer to prioritize traffic based on IP/Subnet + port. But if the resulting rules create a performance hit for packets traversing the dscp_mark chain, I would be willing to reconsider.
Update
Pondering this some more... by adding one or more destination subnets, this would negate the ingress shaping of the same port--yes? So, it's probably a bad idea to add any IP qualifications on these rules. Maybe I'm overthinking it.