SQM and Cloud Gaming services - packet drop sensitivity

Hi,

I've been following the various SQM topics on the forum for a fair while and created my own set of DSCP marking scripts using a combination of iptables rules and a daemon which parses the socket output of netifyd and sets connmarks for detected flows according to a set of configurable DSCP classification rules (I'll look to share these with the community if people are interested).

Recently I've been testing Xbox Cloud Gaming and found that it is incredibly sensitive to any packet drops, with even 1-3 causing significant on-screen tearing.

I suppose this behaviour is somewhat expected given the UDP connection's real-time and bursty nature (the video tears and artefacts whenever the camera view changes significantly, which causes a spike in data transmission).

Has anybody else encountered similar issues with packet loss sensitive connections and identified a solution that could play nicely with SQM?

My setup has a cake qdisc configured on the wan interface's egress, and an ingress cake qdisc configured on the lan interface's egress.

The wan interface itself can sustain a rate of ~20 mbps, and the ingress bandwidth is set to 15 mbps.

Following testing with each of cake's diffserv schemes (including besteffort), and with classifying the stream packets in different tins, I've found that the only way I can reliably prevent this behaviour is to disable the ingress qdisc entirely, at which point the stream bandwidth tends to peak at ~12 mbps and has no artefacts.

A congested link without SQM enabled does not appear to cause the same issue, i.e. downloading in parallel to the stream results in a reduced video quality, but no tearing or artefacts.

This is what leads me to the conclusion that the cake qdisc packet drops are the root cause here, since the stream happily adapts to a congested but 'non-lossy' link by lowering the stream bitrate.

I'm wondering whether there's a way to allow the marked stream connections to bypass the qdisc entirely, or if a tin could be tweaked to buffer more, but not drop packets?

Keen to hear the community's thoughts and ideas, particularly with game streaming services receiving a big push into the mainstream.

Thanks,

you could try simple.qos/fq_codel, which gives you more control over target and interval of the codel part (cake only exposes the interval, withvthe 'rtt N' keyword) and this does not have cake's blue response to non-responsive flows. My gut feeling is that you might trigger the blue component here....

1 Like

Another dirty hack you could try is to use iptables and set ECT(0) on the game packets so cake will try CE marking instead of dropping....

Thanks moeller, after some more tests I think you're definitely right about this being the blue response behaviour.
Unfortunately as it's UDP traffic the ECT trick won't work, however now 21.02 is released I'll rebase and see what can be achieved with fq_codel.

I'm also toying with the idea of using nftables to mark and steer cloud streaming connections exceeding a bandwidth threshold on WAN ingress into a netem qdisc configured to introduce latency, this threshold would be set below that of the target cake tin on the LAN egress (a whacky idea, but a good way to get more experience with the nftables framework).

Why? The ECN bits are part of the IP header, and cake/fq_codel will not even look at the protocol used, but if a packet is marked ECT(0) or ECT(1) it wil be marked CE when the node experiences congestion (it still will be dropped, when the node experiences severe overload). So ECN is orthogonal to TCP/UDP, and the trick might work. Sure your UDP sender will not respond with an appropriate rate reduction, but this is why I called that approach "dirty hack".

1 Like

You're spot on moeller, that worked perfectly!
Thank you, I didn't realise cake would check the ECN bits for UDP packets, but in hindsight it makes complete sense since it doesn't care about the packet contents.
For those interested this can be achieved by using the iptables TOS target. I've attached examples with notes below:

# for reference the cloud gaming connections have a connmark of 2

# set cloud gaming packets' TOS to decimal 2, this is binary '10' which corresponds to ECT(0)
iptables -t mangle -A DSCP_mark -m comment --comment "Cloud Game ECT(0)" -m connmark --mark "2" -j TOS --set-tos "2"

# mark the outbound controller input packets as CS4
iptables -t mangle -A DSCP_mark -m comment --comment "Cloud Game Input" ! -o "$lan" -m connmark --mark "2" -j DSCP --set-dscp-class "CS4"

# mark the inbound video stream packets as AF41
iptables -t mangle -A DSCP_mark -m comment --comment "Cloud Game Stream" -o "$lan" -m connmark --mark "2" -j DSCP --set-dscp-class "AF41"

With TOS set using a method similar to the above cake is much less likely to drop the packets (the DSCP targets are not required to achieve this).

According to your theory it might be enough to do this for the inbound video stream... but I wonder whether the problem might not be that your link might be a bit too slow....

Yes it works with just TOS marking the inbound packets, probably little to no benefit from marking the outbound.

You're likely right about the bandwidth of the link, the cake tin threshold sits on the boundary between Microsoft's low and high quality bitrate requirements, and the adaptive bitrate steps up and down between the two settings in my observations (with packets starting to drop in the higher bitrate mode).

I'll see how much we can push things here anyway, there's definitely an improvement from setting ECT(0) for the stream and cake is managing to keep rtt times low even with additional activity in the background.
Hopefully this proves useful for other users with a similar setup.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.