General Discussion about QOS/SQM

Unfortunately, the qos-scripts don't do a good job of letting you control the HFSC parameters in a good way. I'll hand you a custom script you can run and see how it works.

Can you provide me with some estimates as follows:

Total bandwidth your game uses during play?

Do you use any VOIP?

Do you do any bulk downloading / torrents etc? what ports would define those?

  1. Average bandwidth for gaming per wireshark is about 14kB's, or 114 kb's.
  2. I don't use voip at the moment, but I do use facetime, whatsapp, skype
  3. No torrents

Can you try these out and Wireshark them and tell me bandwidth and packet size and/or packets per second?

Look gaming I know nothing about, but I am using VoIP and Skype and Google Hangouts video conferencing. and all using sqm-scripts since before we called them that, and I have had no issues with latency and jitter. And that experience is over ARSL/ATM/AAL5, DOCSIS, VDSL2/PTM and includes calls from the US west coast to Central Europe, I guess I am simply lucky o jitter-insensitive....

Don't get me wrong, I do believe the reports about your experiences, I just have problems to understand where your issues might come from.

So in the spirit of discussing data, @mindwolf, how do you measure VoIP latency and jitter?

If your sqm setup experiences an latency under load increase of 100ms I can understand your unhappiness, but that seems rather unusual. I am not trying to say sqm and cake are the solution for everybody, and I agree that your needs seem better handled by a bespoke QoS system. So good luck!
BTW the TCP UDP destination is not a reliable predictor for the latency requirement s of a flow... @dlakelan knows a whole lot about how to custom tailor competent QoS, he has lots of experience, so you are in good hands :wink:

My experience with many of those technologies is that they auto adjust their jitter buffers, this can allow you to have smooth audio and video at up to 150ms latency increase provided the jitter isn't too bad. Skype in particular has excellent codecs that do packet loss concealment. The main symptom is you start to talk over the other person. That becomes really bad at around 300ms round trip.

I'm curious when it comes to VOIP do you use a 3rd party provider, or your ISP? I strongly believe that ISPs identify and prioritize their in house VOIP services. If you use something like VoIP.ms or anveo or etc they have to compete with best effort rather than priority on the backhaul... It can be much more annoying, particularly on DOCSIS links in my experience.

Of course if the first hop isn't the problem, then no amount of shaping will fix it, but dropping jitter from say 10ms to 0.8ms could definitely reduce the effect of issues down the line.

2 Likes

I agree on all points. So far I only used ISP VoIP services which, as I had not carefully enough pointed out, typically come with short RTTs to the entry point into a "dedicated" voice network (well it is all IP now, but ISPs treat their own VoIP packets especially careful :wink: ). So as so often it turns out that my anecdotal experience does not generalize well (luckily I believe I did not claim that :wink: )

1 Like

Absolutely, you're right that ISPs treat their voice packets with kid gloves, and us poor folks who have PBXes running on VPSes in other people's data centers get the short end of every stick... I have often wondered if ATT didn't purposefully muck with my mother's VOIP packets. She just could not get 3rd party VOIP over a DSL line to work reliably. Though, I don't think it was a shaper issue, mostly more like carrier grade NAT in the internal network would occasionally drop packets on the floor or something. Voice would just cut out for no reason at all.

@mindwolf

Here is a script that sets up an HFSC shaper with 3 realtime classes and 4 linkshare classes. You don't necessarily need to use all the classes. HFSC doesn't "reserve" bandwidth so having unused classes doesn't hurt really.

I'm assuming eth0.1 is your LAN ethernet and eth0.2 is your WAN ethernet (but change things for your usage case near the end of the script), and that prioritizing things for wifi is less important for testing. We should do the right thing for WiFi but it requires bridging a veth into your LAN and mucking about a little... so let's just test wired first with no wifi traffic to muck it up :wink:

Also, not tested at all, so could be it has typos or bugs.

#!/bin/sh -x

percent (){
    echo $(($1 * $2/100))
    }


hfscsetup (){
    DEV=$1
    BW=$2
    RT1=$3
    RT2=$4
    RT3=$5
    
    
tc qdisc del dev ${DEV} root
tc qdisc add dev ${DEV} stab overhead 42 linklayer ethernet handle 1: root hfsc default 40 

tc class add dev ${DEV} parent 1: classid 1:1 hfsc ls m2 ${BW}kbit ul m2 ${BW}kbit

## 3 realtime classes for UDP based streaming applications, with SFQ sharing the
## bandwidth among UDP streams, make the RT1, RT2, RT3 descending bandwidth

SFQLIM=$(($RT1*1000/250/8 * 50/1000))

tc class add dev ${DEV} parent 1:1 classid 1:10 hfsc sc m1 $(($RT1*10))kbit d 16ms m2 ${RT1}kbit
tc qdisc add dev ${DEV} parent 1:10 handle 10: sfq limit $(($SFQLIM*20)) depth $SFQLIM headdrop perturb 3

tc class add dev ${DEV} parent 1:1 classid 1:11 hfsc sc m1 $(($RT2*10))kbit d 16ms m2 ${RT2}kbit
tc qdisc add dev ${DEV} parent 1:11 handle 11: sfq limit $(($SFQLIM*20)) depth $SFQLIM headdrop perturb 3

tc class add dev ${DEV} parent 1:1 classid 1:12 hfsc sc m1 $(($RT3*10))kbit d 16ms m2 ${RT3}kbit
tc qdisc add dev ${DEV} parent 1:12 handle 12: sfq limit $(($SFQLIM*20)) depth $SFQLIM headdrop perturb 3


## LS classes, interactive
tc class add dev ${DEV} parent 1:1 classid 1:30 hfsc ls  m1 $(percent $BW 60)kbit d 64ms m2 $(percent $BW 20)kbit
tc qdisc add dev ${DEV} parent 1:30 handle 30: fq_codel

## default
tc class add dev ${DEV} parent 1:1 classid 1:40 hfsc ls  m1 $(percent $BW 20)kbit d 64ms m2 $(percent $BW 50)kbit
tc qdisc add dev ${DEV} parent 1:40 handle 40: fq_codel

## low priority
tc class add dev ${DEV} parent 1:1 classid 1:50 hfsc ls  m1 $(percent $BW 15)kbit d 64ms m2 $(percent $BW 20)kbit
tc qdisc add dev ${DEV} parent 1:50 handle 50: fq_codel

## Very lowprio, make sure it never uses more than 90% of bandwidth so we
## always have a little overhead to initiate higher prio flows
## particularly an issue when doing *downstream* QoS on egress of LAN


tc class add dev ${DEV} parent 1:1 classid 1:60 hfsc ls  m1 $(percent $BW 5)kbit d 64ms m2 $(percent $BW 10)kbit ul m2 $(percent BW 90)kbit
tc qdisc add dev ${DEV} parent 1:60 handle 60: fq_codel
 
}

hfscsetup eth0.2 49000 3000 1000 1000

hfscsetup eth0.1 10000 3000 1000 1000 

Now you need to classify packets. You can add this to your /etc/firewall.user





iptables -t mangle -N dscp_mark
ip6tables -t mangle -N dscp_mark
iptables -t mangle -F dscp_mark
ip6tables -t mangle -F dscp_mark

iptables -t mangle -A POSTROUTING -j dscp_mark
ip6tables -t mangle -A POSTROUTING -j dscp_mark

adddscpmark4 (){
    iptables -t mangle -A dscp_mark $@
}
adddscpmark6 (){
    ip6tables -t mangle -A dscp_mark $@
}

## wash DSCP on the WAN

adddscpmark4 -i eth0.2 -j DSCP --set-dscp-class CS0
adddscpmark6 -i eth0.2 -j DSCP --set-dscp-class CS0


## game traffic, CS6

adddscpmark4 -p udp -m multiport --port 3074,5222,5223 -j DSCP --set-dscp-class CS6
adddscpmark6 -p udp -m multiport --port 3074,5222,5223 -j DSCP --set-dscp-class CS6

## NTP traffic CS5

adddscpmark4 -p udp -m multiport --port 123 -j DSCP --set-dscp-class CS5
adddscpmark6 -p udp -m multiport --port 123 -j DSCP --set-dscp-class CS5


## DNS traffic and DHCP CS4

adddscpmark4 -p udp -m multiport --port 53,67 -j DSCP --set-dscp-class CS4
adddscpmark6 -p udp -m multiport --port 53,67 -j DSCP --set-dscp-class CS4


## long running TCP traffic downloads... CS1

adddscpmark4 -p tcp -m connbytes --connbytes 250000: --connbytes-dir both --connbytes-mode bytes -j DSCP --set-dscp-class CS1
adddscpmark6 -p tcp -m connbytes --connbytes 250000: --connbytes-dir both --connbytes-mode bytes -j DSCP --set-dscp-class CS1

## now classify packets on DSCP

adddscpmark4 -m dscp --dscp-class CS1 -j CLASSIFY --set-class 1:60
adddscpmark6 -m dscp --dscp-class CS1 -j CLASSIFY --set-class 1:60

adddscpmark4 -m dscp --dscp-class CS6 -j CLASSIFY --set-class 1:10
adddscpmark6 -m dscp --dscp-class CS6 -j CLASSIFY --set-class 1:10

adddscpmark4 -m dscp --dscp-class CS5 -j CLASSIFY --set-class 1:11
adddscpmark6 -m dscp --dscp-class CS5 -j CLASSIFY --set-class 1:11

adddscpmark4 -m dscp --dscp-class CS4 -j CLASSIFY --set-class 1:30
adddscpmark6 -m dscp --dscp-class CS4 -j CLASSIFY --set-class 1:30

Some thoughts on the shaper:

The 3 realtime classes are for UDP traffic only. They benefit from the per-flow fairness of sfq but don't benefit from the feedback that fq_codel tries to give to TCP streams, in fact if you over-saturate any of those realtime classes they'll halt (well, not really, they also have a link-share curve, but the realtime guarantees will go away), so it's good to keep the queue lengths short, so they can recover quickly, that's why I calculate the sfq limit to be approximately 50ms worth of packets (at 250 bytes/packet). Furthermore, since it's UDP, if you gotta drop something, you might as well drop from the front because that's the oldest packet and therefore the least likely to be useful, that's why I tell it to headdrop. The perturb 3 is questionable, if it causes problems we could remove it, but bugs have been fixed in the kernel that should prevent packet loss or reordering during perturb so the frequent perturbing should prevent two separate streams from mucking each other up for long.

I've made the assumption that your game bandwidth doesn't go over 3Mbps, I think that's pretty likely to be the case, but could be a bad assumption if you have say 5 of your friends all playing together on the LAN routinely... Fortunately since it's specified as "sc" rather than "rt" the realtime queues will also get link-share queues so they should fail somewhat gracefully, not just starve to death if you overdo it.

I put NTP into its own 1Mbps queue. That's clearly WAY too large, but it won't hurt anything in reality.

DNS and DHCP doesn't need realtime response, it's fine to just put it in a low-latency link share queue, so we do that into queue 1:30 which can burst at 60% of your bandwidth for 64ms. that will ensure it gets sent quickly.

The link-share classes are much more likely to have TCP in them, so each one has an fq_codel attached. One of the big benefits you may find from this is the down-prioritization of the long-running streams: those things that transferred more than 250 KBytes. At your 10Mbps upstream that's 200ms, so anything running a TCP transfer longer than 200ms gets down-prioritized. You can easily bump this up if you like, say to 500ms (625KB) or even 1 second (1250000 bytes)

@mindwolf any chance to try this yet?

Not yet, I've had a few things to finish for work, which is finally slowing down to a normal pace. I'll give it a try tomorrow and post the results of some tests.

1 Like

Did you have a chance to try things out? Just checking in.

So I'm back, from outter space...

Apologies, My dog passed Thursday of last week. We had the whole burial and mourning :cry:

Sunday I was able to give your script a shot and it works as intended, However, my biggest problem with network jitter within my control, was found to be 2 items:

interrupt coalesce/modulation (which I disabled)
large tx ring buffers ( lessened from 532 to 434)

Sorry to hear about your loss.

Would you mind sharing commands you used to diagnose your low level issues? How much did it matter?

Also can you tell me what kind of tests did you do for the shaper script?

I used a prettyping script of 64bytes to local At&t gateway and kept measuring the latency until I keep it consistent
./prettyping -s 56 -i 0.1 99.179.205.90 && ethtool -C eth0 rx-usecs 1-99 (default rx-usecs=100)

The tx ring buffers were lowered because from my understanding, packets reside in the buffers long before they reach a qdisc, which by then they are already in the queue. I use an NVG599 with IP Passthrough and that device has at least 250ms of buffering in itself, not counting further down the chain. My goal is to reduce all the latency possible on my router.

root@OpenWrt:~# ethtool -c eth0 && ethtool -c eth1
Coalesce parameters for eth0:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 0
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

Coalesce parameters for eth1:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 0
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

For tx ring buffers I'd expect them to be filled after the qdisc, and then they are drained at the full hardware rate, say gigabit for modern HW.

500 packets should be less than 6ms worst case. If you are shaping with HFSC at tens of megabits it would be rare to have a tx buffer with more than 1-10 packets I think. The gigE hardware sends 100 times faster than the qdisc shapes.

On the other hand interrupt coalescing might delay servicing packets the qdisc puts in... So that seems smart to reduce the max delay there to say 1ms

Big thanks for taking the time to write the HFSC script!

Yes you are right about it draining at full hardware rate, as I doubt the ring buffers would play a huge role in packet transactions for the WAN, nonetheless every bit helps :100:

Same thing applies to bursts, as even though the WAN maybe set to a 100mbits, it will burst at 1Gbit regardless.

Exactly, unless you set the port to 100Mbps your 1Gbps interface will always send at 1GBps.By putting a shaping qdisc in front of it you basically decide upon the duty cycle of the sending. Typically it is thought that buffers are a good thing as long as they are properly managed. For your Rx buffers in theory BQL seems a better solution than setting them manually. I also wonder why your reduction by 20% seems to have a measurable effect especially since you have a shaper in front of it....

The problem with this is that you will get more interrupts and hence a higher CPU load, which is probably fine if all you want is low latency, see e.g. https://blog.packagecloud.io/eng/2016/06/22/monitoring-tuning-linux-networking-stack-receiving-data/#interrupt-coalescing for a discussion. On my desktop system I see the follwing defaults:

rx-usecs: 20
rx-frames: 5
rx-usecs-irq: 0
rx-frames-irq: 5

tx-usecs: 72
tx-frames: 53
tx-usecs-irq: 0
tx-frames-irq: 5

iptables -t mangle -N dscp_mark
ip6tables -t mangle -N dscp_mark
iptables -t mangle -F dscp_mark
ip6tables -t mangle -F dscp_mark

iptables -t mangle -A POSTROUTING -j dscp_mark
ip6tables -t mangle -A POSTROUTING -j dscp_mark

adddscpmark4 (){
iptables -t mangle -A dscp_mark $@
}
adddscpmark6 (){
ip6tables -t mangle -A dscp_mark $@
}

now condensed...

iptables -t mangle -N dscp_mark
ip6tables -t mangle -N dscp_mark
iptables -t mangle -F dscp_mark
ip6tables -t mangle -F dscp_mark

iptables -t mangle -A PREROUTING -j dscp_mark
ip6tables -t mangle -A PREROUTING -j dscp_mark
iptables -t mangle -A POSTROUTING -j dscp_mark
ip6tables -t mangle -A POSTROUTING -j dscp_mark

ipt (){
    iptables -t mangle -A dscp_mark $@
    ip6tables -t mangle -A dscp_mark $@
}

eg ipt -p udp -m multiport --port 80,443-j DSCP --set-dscp-class AF42