Tuning SQM for Gaming?

I'm on the most basic ADSL service from Bell Canada since I live out of the city. The main issue I have is that my connection gets easily saturated with traffic (5Mbps down / .650 up). When I'm the only one using the internet, the connection is rock solid in terms of latency (31ms with zero jitter on League of Legends). However that's rarely the actual case since I'm not the only one using the internet. Using the most simple Cake + Piece of cake configuration it was already a huge improvement in the way that I could actually play my game within a range of 31-45ms ping with occasional bigger spikes to 60 and rarer 120-200ms spikes that would last a second or two. I noticed that the larger spikes tend to occur with upload traffic. Nevertheless, this is all already a huge improvement where it used to be completely unplayable at 100-300ms when anyone used the internet. Dslreports reflects the quality change from a 'C' bufferbloat score to 'A'. I lowered the ingress and egress thresholds considerably and the experience was roughly the same.

Now I want to try and dial it in a bit. My family doesn't torrent or do any heavy use of the internet for the most part, nor do they do anything that requires responsiveness. My ideal setup for SQM would be something that offers a very biased priority for my UDP traffic on ports 5000:5500 and keeps latency tight above anything else. I wouldn't particularly care about other traffic as long as the most basic level of browsing could be done even if it's choppy. I'd only be turning on this aggressive SQM during gaming.

I tried this firewall script with Layer Cake since it seemed like DSCP tagging was the way to go, but honestly I don't know what any of this actually does. I kinda just tried it from a Gargoyle user post. I got similar results to Piece of Cake so I'm not sure how effective it is and I can tell the tins aren't being used properly in layer.

##NORMAL
iptables -t mangle -A PREROUTING -j DSCP --set-dscp-class CS0

##ICMP
iptables -t mangle -A FORWARD -p icmp -j DSCP --set-dscp-class CS5

iptables -t mangle -A POSTROUTING -p icmp -j DSCP --set-dscp-class CS5

##GAMING
iptables -t mangle -A FORWARD -p udp --match multiport --sport 5000:5500 -j DSCP --set-dscp-class CS5

iptables -t mangle -A FORWARD -p udp --match multiport --dport 5000:5500 -j DSCP --set-dscp-class CS5


iptables -t mangle -A POSTROUTING -p udp --match multiport --sport 5000:5500 -j DSCP --set-dscp-class CS5

iptables -t mangle -A POSTROUTING -p udp --match multiport --dport 5000:5500 -j DSCP --set-dscp-class CS5

Here's my config and stats using the script. I only activated it at that moment while playing for a few minutes running speedtests and other types of loads from my phone and other PCs. Usually I have a bit more congestion on the network and I didn't experience any big spikes.

config queue 'eth1'
        option ingress_ecn 'ECN'
        option interface 'eth0'
        option upload '500'
        option debug_logging '0'
        option verbosity '5'
        option qdisc 'cake'
        option qdisc_advanced '1'
        option egress_ecn 'NOECN'
        option linklayer 'atm'
        option overhead '44'
        option qdisc_really_really_advanced '1'
        option script 'layer_cake.qos'
        option squash_dscp '0'
        option squash_ingress '0'
        option iqdisc_opts 'diffserv4 nat dual-dsthost'
        option eqdisc_opts 'diffserv4 nat dual-srchost'
        option enabled '1'
        option download '4750'

root@OpenWrt:~# tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 800a: dev eth0 root refcnt 2 bandwidth 500Kbit diffserv4 dual-srchost                                                                          nat nowash no-ack-filter split-gso rtt 100.0ms atm overhead 44
 Sent 19438888 bytes 147899 pkt (dropped 1218, overlimits 153491 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 266336b of 4Mb
 capacity estimate: 500Kbit
 min/max network layer size:           28 /    1492
 min/max overhead-adjusted size:      106 /    1696
 average network hdr offset:           14

                   Bulk  Best Effort        Video        Voice
  thresh       31248bit      500Kbit      250Kbit      125Kbit
  target        581.4ms       36.3ms       72.7ms      145.3ms
  interval         1.2s      131.3ms      167.7ms      290.7ms
  pk_delay          0us       89.3ms        5.5ms       30.7ms
  av_delay          0us       28.4ms         86us        1.6ms
  sp_delay          0us        102us         86us        389us
  backlog            0b           0b           0b           0b
  pkts                0        92375            1        56741
  bytes               0     16213232           90      4074754
  way_inds            0          291            0            0
  way_miss            0         3237            1           11
  way_cols            0            0            0            0
  drops               0         1218            0            0
  marks               0            0            0            0
  ack_drop            0            0            0            0
  sp_flows            0            0            1            0
  bk_flows            0            1            0            0
  un_flows            0            0            0            0
  max_len             0         1506           90          590
  quantum           300          300          300          300

qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------
 Sent 170605852 bytes 182849 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 ta                                                                         rget 5.0ms interval 100.0ms memory_limit 4Mb ecn
 Sent 3952826764 bytes 4105252 pkt (dropped 0, overlimits 0 requeues 23)
 backlog 0b 0p requeues 23
  maxpacket 444 drop_overlimit 0 new_flow_count 15 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth1.1 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 800b: dev ifb4eth0 root refcnt 2 bandwidth 4750Kbit diffserv4 dual-ds                                                                         thost nat nowash no-ack-filter split-gso rtt 100.0ms atm overhead 44
 Sent 168784549 bytes 179872 pkt (dropped 2977, overlimits 248121 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 109120b of 4Mb
 capacity estimate: 4750Kbit
 min/max network layer size:           46 /    1492
 min/max overhead-adjusted size:      106 /    1696
 average network hdr offset:           14

                   Bulk  Best Effort        Video        Voice
  thresh      296872bit     4750Kbit     2375Kbit     1187Kbit
  target         61.2ms        5.0ms        7.6ms       15.3ms
  interval      156.2ms      100.0ms      102.6ms      110.3ms
  pk_delay        627us       12.6ms       14.9ms        6.1ms
  av_delay         11us        2.5ms        1.2ms        886us
  sp_delay         11us         20us        178us         11us
  backlog            0b           0b           0b           0b
  pkts                2       181813           86          948
  bytes             120    172824268        77082       264268
  way_inds            0         2304            0            6
  way_miss            2         1591            6           42
  way_cols            0            0            0            0
  drops               0         2977            0            0
  marks               0            0            0            0
  ack_drop            0            0            0            0
  sp_flows            1            0            1            1
  bk_flows            0            1            0            0
  un_flows            0            0            0            0
  max_len            60         1506         1392         1484
  quantum           300          300          300          300

root@OpenWrt:~# tc -d qdisc
qdisc noqueue 0: dev lo root refcnt 2
qdisc cake 800a: dev eth0 root refcnt 2 bandwidth 500Kbit diffserv4 dual-srchost nat nowash no-ack-filter                                                 split-gso rtt 100.0ms atm overhead 44
qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0                                                ms memory_limit 4Mb ecn
qdisc noqueue 0: dev br-lan root refcnt 2
qdisc noqueue 0: dev eth1.1 root refcnt 2
qdisc noqueue 0: dev wlan0 root refcnt 2
qdisc cake 800b: dev ifb4eth0 root refcnt 2 bandwidth 4750Kbit diffserv4 dual-dsthost nat nowash no-ack-f                                                ilter split-gso rtt 100.0ms atm overhead 44

This post on Ultimate SQM settings looks interesting but I'm not sure if I need to go this far for my use case. I realize also that this may be as good as it gets for my bandwidth, but I want some input before I give up!

This is a big part of your problem, at 650 Kbps ADSL upload a single full MTU packet will take around:

((ceil((1500 + 44) / 48) * 53 ) * 8) * 1000 / (650 * 1000) = 21.526 ms, so if any packet is in the process of being transferred, your gaming traffic will see 21 additional ms of delay, so your jitter is going to be >= 21 ms. So I with an unloaded ping RTT of 31 ms I would expect best-case multi-user RTTs of 31-52ms pretty much in line with what you report (31-45, close enough :wink: ). The 60 ms spikes might come about when there are also high priority packets queued up on your egress, but I have no idea about the 120-200ms ones.

I suspected there'd be some kind of limit to my connection since everything I tried wasn't having much effect.

At least the game is playable now though. I'll take some jitter over not being able to play at all. Thanks for the detailed explanation. I'll take your word as gospel since it seems like you know your stuff about SQM from your posting.

I think the larger spikes could have something to do with my upload threshold since the upload bandwidth will sometimes fluctuate on rare occasion even on a completely quiet network. Maybe those momentary spikes happen when the upload bandwidth dips well below my egress threshold?

1 Like

Ah, the usual suspect would be wifi, did you test over wifi or with a wired connection between test computer and router?
And/or router being overtaxed, what router are you using?

1 Like

Don't do that. If anything you want to deprioritize ping (CS1). Yes, it means your ping measurement itself gets deprioritized but ping gets used for a lot of things and prioritizing it doesn't help your game. Ideally you would use a speciallized tool like irtt to measure an equivalent of ping and tell that tool to use diffserv itself. See rant here:

https://www.bufferbloat.net/projects/bloat/wiki/Wondershaper_Must_Die/

irtt is supported on most of the flent servers in the world, and you could look at more detail
by doing a irtt client --dscp=8 (or 40 if you want to test CS5), flent-fremont.bufferbloat.net (or one closer to you)

--dscp=dscp DSCP (ToS) value (default 0, 0x for hex), common values:
0 (Best effort)
8 (CS1- Bulk)
40 (CS5- Video)
46 (EF- Expedited forwarding)
https://www.tucny.com/Home/dscp-tos

2 Likes

and you should take a look at what udp ports your game actually uses and prioritize those. welcome to tcpdump and wireshark. you are in for an adventure.

2 Likes

This is especially relevant as you can instruct ping to use a specific DSCP on the command line. thereby being able to probe the RTT of each priority tier independently, hard coding all ICMPs to CS5 in comparison is a rather blunt tool... But note that the default ping application in windows 10 claims that the -v option seems to be not implemented any more...

1 Like

My own connection where I'm testing latency is wired. Router is a TP-Link Archer C5 AC2100v1. I'd imagine it can handle itself pretty well with a low amount of devices on it, low bandwidth, and no real function other than basic routing and QoS. The load average is basically nonexistent unless I open up a realtime graph on Luci.

@dtaht The rules I have under the ##Gaming comment use 5000-5500 UDP as per the games support page.
It's always different, but within that range and I've confirmed it before with wireshark:


pretty much what 99% of the traffic looks like while I play the game. Would there be any benefit or drawback to tagging them as higher-priority traffic (EF, CS6, CS7)?

Loosely what I think my firewall rules are trying to do: all traffic is deprioritized as CS0, then any traffic from/to ports 5000-5500 are being prioritized as CS5. It wouldn't matter what the tag is if I'm tagging the priority traffic in my firewall rules above the rest (CS0 traffic) right? I've deprioritized the ICMP from what you've explained.

As an aside, I appreciate someone developing cake took the time to answer my shallow end of the pool question. Thanks for your work.

Good, could you, just for testing, disable the radios of your router and repeat your gaming test (for sufficient time so you normally would see the latency spikes), please? This is a single core MIPS cpu (not to diss MIPS here, but they are getting long in the tooth) which might simply be getting close to its limits with wifi, nat, firewalling, ... traffic shaping. Also maybe try any of (AF4x, AF3x, CS3, AF2x, TOS4, CS2, TOS1) to get into the Video tin as that is twice as "broad" as the Voice tin (to rule out that your issue comes from gaming traffic exceeding the 125Kbps for CS5 and friends)?

For diffserv3, 4, and 8, cake will there will be little drawback as excess traffic in a tin will not be dropped, but rather served at lower priority, but for precedence high priority traffic can easily totally starve lower priority traffic. That said, it makes little sense in your case to pack your games into the Voice tin if the gaming traffic reliably exceeds the 125Kbps limit (for egress/upload, for ingress your problem is that cake sees the packets before the firewall/iptables dscp remarking and hence your game will not be prioritized at all).

Is there a reason you want to mark DSCP on the router instead of the machine that generates the packets, is someone trying to misuse DSCP - setting all traffic to high priority?

If you play games on Windows, you could use Local Group Policy Editor to mark DSCP


https://support.microsoft.com/en-us/help/2733528/policy-based-qos-not-working-in-windows-7-clients

If you play games on a Linux distro, iptables is already supported and in addition you could use network namespace to apply the marking to individual processes/applications.

When marking DSCP on machines that generate the packets, you could apply the marking to individual applications so that if a low priority application (or a remote server) started using ports in 5000:5500 range, the traffic wouldn't be incorrectly prioritized.

I'd try egress only shaping first as ingress is tricky (UDP even more so) that sometimes it could make the situation worse. Ingress shaping is usually more about deprioritization - dropping/marking TCP packets to signal congestion. For egress only shaping you'd need to remove prioritization for source port 5000:5500.

Although your upload bandwidth is a little low as @moeller0 pointed out, I think it's still possible to reduce the jitter and latency further. If this is purely an outbound traffic congestion problem, it should be possible to shape the traffic to make it behave almost as if you were the only person using the connection. The jitter and latency could be caused by other traffic competing for the limited upload bandwidth if your game exceeded the bandwidth threshold. To address that, you'd need a prio qdisc (http://man7.org/linux/man-pages/man8/tc-prio.8.html) that implements strict priority policy so that your gaming traffic could be prioritized over all other lower priority traffic which means low priority traffic would only be sent if there's spare bandwidth not used by high priority traffic (it could result in starvation which normally wouldn't matter if it's temporary, e.g. web pages taking one second longer to load wouldn't matter as much as getting 200ms latency spikes in games). When I used this set up, jitter and latency spikes were completely eliminated, so when I was playing a game, it's impossible to tell whether there's a P2P (torrent) downloading going on as low priority traffic had almost no impact on high priority traffic. I think if you set it like this, it's possible that jitter and latency spikes could be reduced to the point that it's undetectable.

In addition to prio qdisc which prioritizes traffic, you would need another qdisc to limit the overall bandwidth, you could use hfsc or htb, here's an example of htb + prio https://stackoverflow.com/questions/45978104/linux-htb-prio-qdisc-sometimes-empty-prio-1-queue , this is a simple example that doesn't care about fairness.

You could keep the realtime bandwidth graph open (wan tab) while playing games http://openwrt.lan/cgi-bin/luci/admin/status/realtime/bandwidth , when you experience jitter or latency spikes, check the graph to see whether inbound or outbound traffic spiked, that would tell you whether you have an outbound congestion problem or an inbound one. Spikes could be short-lived, so you might have to check many times.

1 Like