Questions about SQM settings

anon89577378 · May 16, 2022, 9:41pm

Cake is known to be CPU-intensive.

HelloShitty · May 16, 2022, 9:48pm

I'm aware, but I still haven't explored any other options, so, for now, I'll keep focused on cake, otherwise I can't test anything and make any conclusions!

I was trying to find info on those 2 last options that I have commented, because I took them from a friend's config but he was using fq_codel with simplest_qos and I don't know if these two options can be used with cake.

But I couldn't find much on those options. I'm still seaarching the wiki

anon89577378 · May 16, 2022, 9:52pm

Those would replace cake and piece_of_cake.

For my use cases, I got better performance with fq_codel.

As mentioned above, now I don't use either (no SQM at all).

I get a slight uptick in latency, but still get an A+ on Waveform.

All tests were done on a wired connection.

anon89577378 · May 16, 2022, 10:03pm

Bufferbloat.net is an excellent resource -

amteza · May 17, 2022, 5:16am

Those options only work with cake. Have a look:
https://man7.org/linux/man-pages/man8/tc-cake.8.html

moeller0 · May 17, 2022, 6:06am

No cake does not ignore ECN on packets at all. It only ignores SQM's ECN/NOECN configuration options since cake has no way to disable ECN usage. But for ECN to be used the endpoints need to be configured first... looking at the marks row in cake's statistics output gives a feel for how often cake encountered ECN enabled flows (not a strict flow counter, but looking at the ratio of drops/marks should give an idea about the ratio of NOECN/ECN flows.

A flow that negotiated ECN usage between the endpoints will set the two ECN bits (the lowest 2 bits in the old TOS byte in the IP header) to eiter 01 or 10 (in ECN nomenclature these two bitpatterns are called ECT(0) and ECT(1)), an ECN enabled AQM like cake will then rewrite these two bits to 11 (called CE) if it had dropped the packet otherwise to signal the flow to slow down. Flows tgat do not use ECN leave the two ECN bits at their default value of 00, called often NotECT. If you want more details jusT have a look at rfc3168.

That means that one or more CPUs are fully used, so that no code can run any faster....
In the context of sqm that often shows in that increasing the configured shaper rate does not result in higher throughput measured in speedtests.

moeller0 · May 17, 2022, 6:21am

Why this? GSO-style super-packets can be as large as 64KB (compared to the typical ~1.5KB) and hence if a super packet is transmitted the link can be blocked for quite some time and all other packets need to wait until the transmission is finished.
At 75000 Kbps this delay can rach up to ~70 milliseconds (although it is quite unlikely for GSO to ever aggregate the 43 full MTU packets of a single flow to built such a large super packet in the first place).
In case you wonder, splitting up super packets has some CPU cost, so there are situations where disabling it is warranted, but you need to be aware of the associated increase in delay/jitter.

moeller0 · May 17, 2022, 6:29am

To elaborat, it also does more than HTB+fq_codel, so depending on one's desires the additional cycle might or migt not be well-invested.
However on R7800's cake appears even more costly than usually on other platforms, no idea why.... and since I do not have a R7800 myself to test with, I wil refrain from speculating. There probably are threads in this forum addressing that, for anybody interested.

These two commented lines (ignoring the no-split-gso) will confogure cake for internal IP fairness. That is your available capacit is shared equally between all active machines (actualky all active IP addresses), so if only one machine is active it will get 100% of capacity but if say 3 machines are active with capacity seeking downloads each will get capacity/3.
Often (though not always) that is a useful configuration for home networks, as e.g. bit torrenting on one machine will not unduly affect video streaming on other machines, assumming the fair capacity share is high enough for streaming at the desired quality.
Note this will not address the problem that on the torrent machine other interactive uses will be suffering, any application like bit torrent that uses loads of parallel flows can not be controlled well by flow-fair queueing (but these applications are also not well controlled without flow fair queueing, so at least no regression).
Also this capability comes at a slight CPU cost, so think about whether you want to use that. Personally I do, as this nicely isolates the rest of my users from being accidentally affected by whatever experiment I might be running (I do disable this if the experiment requires it, but then I also schedule such experiments to when r0the network is quiet.)

HelloShitty · May 17, 2022, 9:21am

Thanks @amteza and @moeller0 for the link and explanations, respectively. I'll take a closer look at the links and the posts themselves later when I'm home!

In the case of the mentioned 2 options, well, my laptop RockPro64 device is probably the one using the most available bandwidth and I want to keep it that way. That device has the priority. So, according to your explanation, I don't want to split the capacity and eventually capping the usage of my RockPro64.

@slh said something about an RPi4 and r4s... What are r4s? And why did you mentioned them? Can we use such devices for this job instead of the router itself? I have a Pine RockPro64 board that is probably being underused by me. If it is doable, I can transfer this task of traffic shaping into the board.

moeller0 · May 17, 2022, 9:38am

Fair enough, that is a policy choice each network administrator needs to take for the network under her/his care. Side note though, "priority" and capacity sharing are slightly different concepts, you can give your RockPro64 priority (by switching to layer_cake.qos and only setting higher priority gaining DSCPs on packets to/from the RockPro64*) without giving it a higher capacity share.

That said, cake's default policy triple-isolate tries to approximate fair sharing by IP so if you really do not want that you need to add the flows keyword to option iqdisc_opts and option eqdisc_opts.

From man tc-cake:

triple-isolate (default)
Flows are defined by the 5-tuple, and fairness is applied over source and destination addresses intelligently (ie. not merely by host-pairs), and also over individual flows. Use this if you're not certain whether to use dual-srchost or dual-dsthost; it'll do both jobs at once, preventing any one host on either side of the link from monopolising it with a large number of flows.

Please note that with flows the application with most parallel flows gets most of the capacity, in a sense the by-internal-IP-isolation mode's nicest feature is that it makes sure no other host can crowd out any other host.

*) Doing that for egress is relatively simple, but doing this for the ingress direction considerably trickier.

HelloShitty · May 17, 2022, 10:22am

Hum, ok, this can get quite complex in a blink of an eye!

Will see later when I get home.

But as for one of my questions: is it possible to use my rockPro64 to manage all traffic shaping or it has to be done on the router exclusively? the idea would be to take advantage of the superior processing power of the SBC, obviously!

moeller0 · May 17, 2022, 10:46am

It depends, for traffic shaping to be effective the shaper needs to see all traffic (or better account for all traffic). That leaves multiple options that conserve low bufferbloat like:

a) use the rockPro64 as wired only router with two ethernet ports (you might need to add an ethnet NIC via the PCIe slot) and use the R7800 as AP only. That should also allow to use cake up to your 500/100 contracted rates.

b) use the rockPro64 as traffic shaper for everything behind it and still use the R7800 as WiFi router; for this to work you need to make sure that between the rockPro64 and the R7800 there is not more traffic injected so that:
rockPro64-shaper rate + injected traffic <= contracted rate
stays true. I see no real way of implementing this short of adding a traffic shaper on the R7800's WiFi with X Mbps (combined for both bands) and connect all wired deviced behind the rockPro64 set to Y Mbps ans make sure that X+Y stays below either 500 for the download or 100 for the upload direction.

c) something even more complicated...

Personally I would probably opt for a)

HelloShitty · May 17, 2022, 11:11am

Hum, about option A, I am already using the PCIe slot with an M.2 adapter for my NVMe drive. I have to think about this!

moeller0 · May 17, 2022, 2:49pm

If you use the rockPro64 as wired-only router, I would propose to reconsider whether having it do file serving as well... IIUC the rockPro64 has an SD card slot so could be run without requiring an SSD for its own OS.

But this is clearly a decision with some noticeable side effects you need to feel comfortable with.

amteza · May 17, 2022, 6:59pm

I would suggest invest efforts in making this device your main router. With two A72 cores and four A53 cores you have more than enough power to do SQM/cake shaping at your line speeds. This device will support easily ad-blocking and any kind of DPI if required. Let the 7800 be a dump AP just to provide WiFi access to your network. Between using a RPi4 and your PINE board there won't be a huge difference. Of course, as commented by @moeller0 , you'll need 2 Ethernet ports.

This is what I'm doing with a RPi4 on a 1000/50 line.

HelloShitty · May 17, 2022, 8:14pm

Ok, I have to think about this because there is more to it than just transfer SQM management to the board. IIRC, I have a specific setting in my router to be able to stream my IPTV because I'm not using any ISP devices other than the ONT. I mean, I remember I had to create a VLAN, I guess , and give it a specific setting to be able to get the stream on my TV via my router and I'm not comfortable at all to change this, if it needs to be changed by transferring all the traffic to the RP64.

I might need to create a new thread for this matter as this one is supposed to handle SQM settings!

HelloShitty · May 17, 2022, 10:16pm

You mean:

option iqdisc_opts 'nat dual-dsthost no-split-gso flows'
option eqdisc_opts 'nat dual-srchost flows'

or

option iqdisc_opts 'nat lows'
option eqdisc_opts 'nat flows'

or how excatly?

Edited;
I see from the man pages, the following for option flows

flows
Flows are defined by the entire 5-tuple of source address,
destination address, transport protocol, source port and
destination port. This is the type of flow isolation performed
by SFQ and fq_codel.

So, it says only for SFQ (I don't know what this is) and fq_codel, therefore not cake. I'm a bit confused!

moeller0 · May 18, 2022, 5:11am

Only flows, neither nat nor the dual-xxxhost modes work with flows (actually flows is a mutually exclusive alternative for the dual-xxxhost modes).

No, this just tells you that two other well-known and popular Qdiscs use the same mode. Say if you want cake to behave like fq_codel this text instructs you to use the flows keyword.

HelloShitty · May 22, 2022, 9:23am

So, does this looks correct:

config queue 'wan'                                                              
        option enabled '1'                                                      
        option interface 'eth0.12'                                              
        option download '75000'                                                 
        option upload '75000'                                                   
        option qdisc 'cake'                                                                                               
        option script 'piece_of_cake.qos'                                                                              
        option qdisc_advanced '1'                                               
        # option ingress_ecn 'ECN'                                              
        # option egress_ecn 'ECN'                                               
        option qdisc_really_really_advanced '0'                                 
        option itarget 'auto'                                                   
        option etarget 'auto'                                                   
        option linklayer 'ethernet'                                             
        option squash_dscp '1'                                                  
        option squash_ingress '1'                                               
        option overhead '44'                                                    
        option debug_logging '1'                                                
        option verbosity '5'                                                    
        option iqdisc_opts 'flows'                                              
        option eqdisc_opts 'flows'

moeller0 · May 22, 2022, 11:14am

Yes, that should do the trick... However, once you configure SQM that way you might want to consider switching to simplest.qos/fq_codel or even simplest_TBF.qos/fq_codel as these might be more CPU efficient when none of cake's bells and whistles are used. If you are still using the R7800 especially simplest_TBF.qos/fq_codel might allow you to shape closer to your actual contracted rates (but you still need to test whether bufferbloat is under control).

Keep in mind though, that fairness will be strictly by flows, that is an application that uses two parallel flows will get twice as much capacity as an application using one flow (at small numbers that will be mostly benign/acceptable), but an application using ship-loads of flows (say a torrenting app) will get the lion's share of the capacity. If that becomes a problem, cake's more advanced isolation modes can be quite helpful, if that is not a problem at all, not using cake can help make SQM more efficient.