Best practice for SQM 2.5Gigabit on WAN and 1GB on LAN?

moeller0 · April 20, 2024, 8:51pm

And that is A-OK, as I said it is a local policy each network admin needs to take for their network.

My point that this is an individual decision that each network needs to take based on local condition and requirements? What is controversial or requiring a proof? If for you no download shaping wirks well, by all means make this decision and be happy. But claiming, as you did,

is not a generally useful statement... for one the ISP will also limit upload traffic and the ISP might do this limiting in a way that result in high working latency and low responsiveness of the link under load. But as I said, I am not doubting that for your network this is not the right policy, just that this is true for all networks.

linutic · April 20, 2024, 8:57pm

type or paste code hereNow, I am the first to admit that whether to use sqm or not s a policy decision each network needs to make individually; BUT there is no magical access rate from which on sqm becomes futile... if your ISP delivers acceptable latency under full saturating loads, by all means do not use local AQM, but that is true independent of rate...
Say your ISP uses e.g. libreqos for your download traffic then in all likelihood you do not need to run a local sqm instance yourself for the download even on a 20 Mbps link, but there are enough links in the 1 Gbps class with atrocious latency under load, where SQM can really improve things...
And on a DOCSIS 3.1 link the mandatory PIE instance in the upload direction might also be good enough for an individual network's requirements.
Hence I always recommend to try bot egress and ingress shaping to figure out for each individual network which policy to select.

I think I can construct an example where SQM download makes sense.
Suppose your Internet download speed is faster than your local network. For example if your Internet is 1.1Gbps and your local network is 1Gbps ethernet. In that case, the input queue to your router get filled at 1.1G and can only be drained at 1Gbps. Further suppose that the input queue is 1MB in length. The delay could be 125KB/1MB ==> 1/8 second, worst case. Every so often the queue overflows, and the router throws it all away, as it should. Then the delay drops to 0 and gradually increases again. So the average delay gradually increases to 1/8 second over a 10 second period, drops to zero, and repeats. In this case, an SQM limit of say 990,000 bps makes sense. That would eliminate delay.

I've never had a network like that, so I didn't think of it until just now.

moeller0 · April 20, 2024, 9:04pm

A queue will fill if for long enough ingress > egress, it is that simple, if the queue is well managed that does not need to result in poor performance, but typically ISP do not manage these queues well and so ingress shaping becomes an improvement, IFF the link is actually saturated often enough. ISPs often operate their backbone sich that the load never significantly exceeds 80% of capacity and then no queues will fill up unduly (except when smoothing out the occasional burst, exactly what queues are designed to mitigate). But for an access link that is hard, escpecially given that e.g. TCP is designed with the goal to saturate every link...
Again, I am not doubting your decision for your network, I am just trying to explain the principles and why I recommend for every network to try ot figure out the best fitting local policy.

linutic · April 20, 2024, 9:10pm

I got all the numbers wrong. Typing faster than I was thinking. Should have been a delay of 1 MBps / 1 Gbps ==> 8 ms. The bufffer would flush every 1 MBps / .1Gbps ==> 80 ms. I need to think before I type.

linutic · April 20, 2024, 9:25pm

Good point. rsolis76 has a 2.5 Gbps ingress connected to a 1 Gbps egress. The average input is 1.5 Gbps, but the cable modem will be bursing data to the router at a rate above 1.5 Gbps. So there will be a queue backup in the egress. Probably the delay will average less than 10 msec, but it is a delay that could be reduced with SQM. I suspect that this particular router will be able to handle it w/o CPU saturation, but the CPU be checked under test conditions. CPU saturation can cause all sorts of problems.

linutic · April 20, 2024, 9:40pm

To understand all this best, I recommend:
An Engineering Approach to Computer Networking: Atm Networks, the Internet, and the Telephone Network 1997. Srinisav Keshav
It's old, but the math surrounding networking hasn't changed. It's a remarkable book. Still available on Amazon.

leon.smit · April 21, 2024, 10:04am

i will know when i setup router to the end, thx. on Wrt inventing

moeller0 · April 21, 2024, 10:26am

The biggest issue typically is the queue at the ISP's side of the link. The ISP's device typically has more capacity towards the internet than to the customer and hence is likely to act as bottleneck link. Now such ISP devices often have buffers that are over-sized but under-managed, and that under load fill up and cause undesired delay. The whole idea behind ingress shaping is to move the effective bottleneck from the ISP side to one's homerouter, but that is only approximate, as high enough inrushing traffic might still cause (hopefully transient) queueing at the ISP side.

dlakelan · April 21, 2024, 3:54pm

Image confirms, your download speed is actually 1Gbps because all download traffic goes into a 1Gbps switch.

Just set your download link (eth1?) to 900Mbps cake as shown in your test and walk away.

Works great, problem solved!

brada4 · April 21, 2024, 4:25pm

Just dont complicate your life with 20 variables to adjust. One day some telco engineer will run their favourite FPS game and change cake-dt to codel-dt in everybodys docsis profile.

moeller0 · April 21, 2024, 5:13pm

Unlikely, DOCSIS mandates PIE and low latency DOCSIS a variant of the IETF's DualQ, neither cake nor codel are mandated, let alone proper fq scheduling...

moeller0 · April 21, 2024, 5:15pm

Yes, and no... I still want to understand why BQL and fq_codel do not work better than they seem to do...

brada4 · April 21, 2024, 5:52pm

pity you cannot fight past them.

dlakelan · April 21, 2024, 6:48pm

Yeah, this is a great question. I didn't follow the details. It seems like fq_codel + BQL would by default be enabled on the link between the router and the switch and this should do its job. I didn't go back and look in detail to see how the performance was during that operation. Can you provide some quotes of the specific test examples where this difference was visible?

linutic · April 22, 2024, 1:24am

I did some experiments on my 1 G fiber link connected through an OpenWRT router on a 2.5 G lan to a computer on the lan. The results show some improvement using SQM. Perhaps the sweet spot is 970M down & up. That reduces latency variation by 10 msec when the link is saturated. Might be important to a 200 Hz gamer downloading one game while playing another.
Hard to find another situation where it is detectable to a human.

rsolis76 · April 22, 2024, 3:15am

I was doing some tests today, trying to simplify the setup.
Basically just using 900mb limit on the download and 46mb limit on the upload.

And I found something interesting while comparing different qdiscs.

The thing is that if I use cake/piece of cake with the parameters previously shared in previous posts (Based on the openwrt SQM guide)

rsolis76:

WITH SQM
0. SQM Config

config queue 'eth1'
	option enabled '1'
	option interface 'eth1'
	option download '900000'
	option upload '46480'
	option qdisc 'cake'
	option script 'piece_of_cake.qos'
	option linklayer 'ethernet'
	option debug_logging '0'
	option verbosity '5'
	option overhead '42'
	option qdisc_advanced '1'
	option squash_dscp '1'
	option squash_ingress '1'
	option ingress_ecn 'ECN'
	option egress_ecn 'NOECN'
	option qdisc_really_really_advanced '1'
	option iqdisc_opts 'mpu 84 nat dual-dsthost ingress'
	option eqdisc_opts 'mpu 84 nat dual-srchost'

I get ratings that stil have bufferbloat, I tried some variations like not using advanced parameters such as nat, mpu, etc, and still same results. Both sites, waveform and fast dot com consistently show an increased latency in the order of 15ms to 25ms while using cake and piece of cake.

However if I use FQ_DEL with Simplest_tbf with the same limits, then I get very good results.

config queue 'eth1'
	option enabled '1'
	option interface 'eth1'
	option download '900000'
	option upload '46480'
	option qdisc 'fq_codel'
	option script 'simplest_tbf.qos'
	option linklayer 'ethernet'
	option debug_logging '0'
	option verbosity '5'
	option overhead '42'

I will post some before and after tc discs tomorrow.

dlakelan · April 22, 2024, 3:21am

This is more or less the way things are supposed to work. Maybe more like 10-15ms but it's basically expected that you'll have some increase during load.

However, what you're describing may be caused by excessive CPU usage in Cake. You may be able to mitigate this by using received packet steering, irqbalance, and some other tricks to spread CPU usage among multiple cores.

moeller0 · April 22, 2024, 6:04am

I guess this:

but there the tc-s qdisc output was missing...

moeller0 · April 22, 2024, 6:08am

As I said, you are responsible for setting policy for your link, and I will not second guess your decision as long as you are happy/satisfied. If your ISP does not over buffer, great, and if that ISP also does appropriate sharing under load even better. From my limited experience I would say not all ISPs are that enlightened.

moeller0 · April 22, 2024, 6:29am

The biggest cost, last time I looked closer some years ago, was the actual traffic shaper, the scheduler and AQM even with bells and whistles are comparatively frugal in their CPU needs. Simple/simplest/simplest_TBF.qos all use somewhat cheaper traffic shapers (and HTB TBF as configured by SQM will compromise throughput more than latency whgen CPU starved, while cake tends to allow more latency under load).

I wonder whether you would be willing to try HTB+cake and/or TBF+cake as that might be even better and allow some of cake's more advanced features? If yes, let me know and I will modify the respectice qos.scripts.

@hnyman, since simplest_tbf is your work, would you be opposed to simply allowing cake as leaf qdisc? This, cake as leaf qdisc (hanging off a HTB tree) is what libreqos uses quite successfully? If not I will whip up a new script...

But you get 810/63... so either fast.com fumbles the upload test and reports bogus numbers (IMHO that is likely, these numbers are probably NOT taking at the remote end, but simply are what the browser reported and modern browsers are works of marvel, but not the best environments for high precision measurements).
Here are the expected throughput limits for your configuration:

IPv4:
900 * ((1500-20-20) / (1500+42)) = 852.14 Mbps
46.48 * ((1500-20-20) / (1500+42)) = 44.01 Mbps
IPv6:
900 * ((1500-40-20) / (1500+42)) = 840.46692607 Mbps
46.48 * ((1500-40-20) / (1500+42)) = 43.4054474708 Mbps

Your download (810 Mbps) looks OK (but could already indicate that the CPU will not allow the 850Mbps, hard to say, as most transfers are not 100% efficient), but the upload (63 Mbps) is off...

Please do, also, in simplest_tbf.qos try changing:

sqm_prepare_script() {
    do_modules
    verify_qdisc "tbf" || return 1

    case $QDISC in
        cake*)
            sqm_warn "Cake is not supported with this script; falling back to FQ-CoDel"
            QDISC=fq_codel ;;
    esac
}

to

sqm_prepare_script() {
    do_modules
    verify_qdisc "tbf" || return 1

    #case $QDISC in
    #    cake*)
    #        sqm_warn "Cake is not supported with this script; falling back to FQ-CoDel"
    #        QDISC=fq_codel ;;
    #esac
}

And set cake as qdisc, to see whether the issue is cake's traffic shaper (which would then imply CPU starvation for cake as shaper).

That is also not wrong, but the increase for cake as shaper in download direction strikes me as higher than easily explained by cake's known behaviour.

I think the interrupts are reasonably distributed already...but that is certainly something to look at as well.