Prioritising relatively large Shadow PC stream

This is actually a no-op, only the ingress keyword changes cake's behavior, I did not even realize we had "egress" as keyword, but sure, we do and it is the default.

This is a bit of a mixed blessing, ack-filter does help with highly asymmetric links and also with highly bursty links (but cake as used in SQM typically does not see the burstiness, so this component of ACK-filtering will not come into play). But unless your uplink gets overloaded this will have very little effect on your actual issue, so by all means keep this setting ;).

Well, the crux of the matter is, that sqm replaces the ISPs under-manages and over-sized buffers with its own advanced queue/buffer management, to reduce the latency-under-load increase (aka bufferbloat). For this to work, sqm needs to only admit at maximum as much data per time into the ISP's equipment (in xDSL systems, modem/CPE and indirectly dslam/msan) as that equipment can actually transmit over the bottleneck link, so that these ISP-buffers never fill up continuously to unhealthy levels.
To do this SQM needs to calculate for each packet it admits how much time/instantaneous bandwidth this is going to require on the bottleneck link. For packet based data transmission each packet carries a payload, as well as a bit of overhead required to actually transport the packet (this is loosely like sending a parcel, where the packaging/labeling adds weight and volume to the content and its the combination of both that needs to fit into the carrier vehicle volume and weight wise), for sqm to make an accurate prediction of the transmission time it needs to know the payload size (which is easy as the kernel typically has that information at hand) as well as the applicable overhead. And that second value is tricky to get, as the SQM-host might not be directly connected to the actual bottlenech link and hence is in no position to know the actual overhead itself. This is why we need to manually configure that per-packet-overhead, it is also immensely tricky to empirically measure that overhead robustly and reliably (we have a method that works for ATM/AAL5 based carries, but these are a dying breed, land IMHO rightly so).
Now, what happens if the overhead is under estimated? Typically people are advised to set the per-packet-overhead to the best of their knowledge (and err on rather a bit too much) and then measure the bufferbloat resulting from different shaper bandwidth settings. This is a reasonable approach, but let's see what happens when we under estimated the per-packet-overhead (for demonstration purposes I am estimating this as 0, but the principle will hold for any under estimation, just the consequences will be rarer/subtler), I will shamelessly use simple values here but assume VDSL2

gross-rate * optional-encoding * ((payload size) / (payload size + per-packet-overhead)) = goodput (~speedtest result)

"Optional encoding" differs between link technologies and equals 64/65 for VDSL2@PTM
Side-note: ATM/AAL5 is weirder and can not really be modeled with a simple encoding factor, but that is not your issue.
So assuming a IPv4/TCP measurement without any extras and for a true bottleneck gross rate of 100 and a true per-packet overhead of 30 on VDSL2 we get a goodput of:

100 * 64/65 * ((1500-20-20) / (1500 + 30)) = 93.96

if we use this as our real achievable top-speed we can calculate which shaper gross rate we would need if the per-packet-overhead is set to 0 instead of 30:

93.96 * 65/64 * ((1500)/(1500-20-20)) = 98.04

setting the shaper to 98.04 units will control bufferbloat, BUT only if the paket size is 1500 Bytes. If we just redo our calculations for a packet size of 100 bytes we get:

100 * 64/65 * ((100-20-20) / (100 + 30)) = 45.44


45.44 * 65/64 * ((100)/(100-20-20)) = 76.92

but since we set the shaper to 98 we will be admitting too much into the ISP's devices and hence bufferbloat will increase again. Depending on your actual mix of packet sizes on your link this issue will be more or less prominent, but it always lurks as a danger-pit unless your per-packet-overhead is equal or larger than the real per-packet-overhead. I hope this answers your question.

Hard to say, as above, I have no simple and reliable way to actually empirically measure the applicable per-packet-overhead, but according to ITU specs, VDSL2 will only give you 22 bytes of overhead (PPPoE would add another 8, but IPoE does not use PPP tunneling), an potential VLAN tag would add another 4 bytes (and some ISPs use double VLAN tagging). I would guess that 30 should be a decent estimate with a high probability to slightly over- instead of under-estimate, so exactly what you should do.

Thanks again @moeller0 for your reply. Very informative! I just tried some other values for overhead but they do not seem to make a big difference on the bufferbloat results on dslreports. It is also hard to say because my bufferbloat results are a bit inconsistent anyway. It is always A or A+ but there seem to be random bufferbloat spikes: sometimes in Idle, sometimes in download, sometimes in upload, and sometimes in neither. The spikes are also at random moments: sometimes at the start of the test, in de middle or in the end or multiple. The spikes are short and sometimes not even detected for the scoring. Are these random spikes weird?

In short, from what I read, the keywords I have in place are OK to configure per internal IP fairness. I also tried playing VR with these settings and unfortunately the video stream still stutters when starting Netflix. Without Netflix on it is perfect. The stutter is not random like in dslreports. It is perfectly synced with someone starting a Netflix episode.

one thing to note is that the wifi drivers for the wrt32x and friends are abandoned and the mfg sold that business so it's unlikely to get much better. they are kind of ok for everyday use but definitely not without their problems.

you are washing DSCP on ingress which is good... I would try adding CS5 to your one gaming stream and see if it helps with WMM priority.

Thanks @dlakelan. Do you have a link describing how to do this? This is really new to me. How do I identify the gaming stream? Based on IP of the gaming device?

edit: If this is too complicated or simply too much work to explain to someone who is basically used to GUI's, also just say so.

It should be possible to do this in the firewall LUCU GUI, in the action drop down list, select DSCP classification, and set the other fields accordingly...

like @moeller0 said, it can be done in the GUI. you can identify the stream as UDP (most likely) coming from the IP address of your shadow PC server. you could probably do well by tagging everything going to or from that IP honestly as you probably want to prioritize your control stream on the upstream side as well.

Thanks again @dlakelan and @moeller0!

Found the screen. Looks clear to me. But just to be sure:

Source address - I did not show it in the screenshot, but I should click the only non-local IP address right? The modem it's IP, correct?

Destination address - The local IP of the Oculus Quest to which I am streaming

Correct? Should I change something else?

To me, this feels a like setting classes/priorities like I am used to in QoS, before I got to know SQM. Is this like QoS inside cake? And, did I read something about needing to change to 'layer cake' instead of 'piece of cake' to get these DSCP marks to work?

And one last question, would it be beneficial to set a low DSCP mark for Netflix, or the Chromecast Netflix is streaming to?

EDIT: Reading a bit about DSCP Markings I noticed the configuration quidelines table on wikipedia. The last column is AQM, which equals SQM right? The row for CS5 notes a 'no' for AQM. The 'AFxx' markings note 'Yes, per DSCP'. Shouldn't we be using those then?

you probably want 2 rules, one would use the destination address as the address of your server in the data center, the other one you should use this as the source address. You'll have to type those in manually not click an option in the menu.

You could use layer cake on the upstream / upload setting of your SQM which would enable you to utilize these marks when sending your packets to the server in the cloud.

I recommend CS5, the DSCP system is pretty absurdly nonstandard. CS5 will hit the WMM VIDEO queue typically. that's where you want it.

Thanks for your quick reply.

My Shadow cloud PC in the datacenter is a Windows PC. I just logged in and asked Google: 'What is my IP?'. It showed me 'Your public IP address is ...'. I guess that is the IP I need to have in the rules right?

'Down' Rule - Source: Public IP Shadow PC + Destination: Local IP Oculus Quest
'Up' Rule - Source: Local IP Oculus Quest + Destination: Public IP Shadow PC

The zones should still be wan for Shadow and lan for Oculus, right?

Why do you say so specifically layer cake on the upstream? Not for downstream? In the SQM QoS setting I can select these at the Queue Discipline tab and then Queue setup script. But I guess that sets it for up and down? Right now it is set to piece of cake.

yes, that seems correct

Because on the downstream you are "wash" ing your DSCP so all the packets will go into the best effort tier anyway. And there's no way to change this, because the IPtables commands run after things queue in SQM.

1 Like

Aha. So the DSCP tagging is purely for improving Wi-Fi performance? But can also be used for upstream?

exactly, it will improve wifi on the downstream direction, and will be honored by SQM on the upstream direction.

There are a number of things going on here. First off, STREAM sounds like a genuine continuous, non-bursty stream (ahem) of packets. Netflix video 'streaming' isn't streaming in a continuous sense but is instead bursty. In other words 'send data as fast as you can for a bit, pause till video playback buffer reaches a low water mark, send data as fast as you can for a bit...'. So those 'as fast as you can' sections could well be stealing bandwidth from your genuine STREAM stream.

CAKE could potentially help here by using classification to guarantee a minimum bandwidth to 'video streaming' flows and allowing 'netflix best effort' flows to take up the spare space.

The problem with that as has been mentioned is that typically CAKE only gets to see packets on ingress before iptables and any classification rules (eg DSCP marks) have been applied.

A 'neat' workaround to that problem would be to use firewall connmarks to mark the connection as a particular category and have that category restored as part of a tc action at ingress time, just before CAKE gets to see packets.

I wrote act_ctinfo (in the kernel 5.3/4) and iptables connmark --setdscp (sadly not in kernel because kernel wants this in nftables form as well) for expressly this sort of thing: Basically on egress use iptables mangle table to choose as DSCP, use setdscp to store that in the connmark, use tc act_ctinfo to restore the DSCP to incoming packets from the value store in connmark.

1 Like

Thanks for your reply @ldir!

Yes, the Shadow stream is quite continuous, as it is a video game stream that reacts to my inputs. It never passes a certain bandwidth. Netflix just takes all it can, especially when buffering a new episode. When fully buffered, it still seems to load in bursts, but these do not seem to take as much bandwidth as initial buffering. Overall, using QoS or SQM, a Netflix video playing does not make my Shadow stream stutter, but initial buffering does.

This sounds really good, and when I read this forum, I think I see more people that would like to set classes while using SQM, is that correct? Like traditional priority based QoS inside Cake/SQM?

I can surely imagine the method you describe. But how to apply these settings I do not have a clue.

Which makes me wonder if you think having only traditional QoS would also work for me. I know everyone here loves SQM and Cake, but for the sake of keeping it simple: just putting Netflix on low priority or setting a bandwidth limit maybe would already suffice. My network is really low on traffic. Large downloads are really occasional. It is mostly Netflix, Spotify, browsing. My network must be shocked to see so much traffic since I started using Shadow. There aren't a lot of traffic scenario's, and the only one I am having issues with is Netflix loading while I am using Shadow.

Here I would just accept any source IP, so that everything to your oculus rig gets into AC_VI initially, that way you can first test whether that helps at all before tightening the rules (especially since IPs of cloud devices often can change anytime).

Here I wonder about the value of doing that at the router? If the Oculus Quest is running on a specific internal PC maybe set the dscp in widows instead? See for how that might work under windows 10.

Well, that is probably a consequence of ingress shaping being somewhat approximate, so unless you can establish a better traffic shaper at the ISP end of your access link (unlikely) your gaming stays sensitive to netflix starting (or any other similar traffic source starting).

That might be true, but then this is partly because that is what people where/are used to do from other QoS/AQM solutions. SQM's explicit goal is not being the one QoS/AQM solution that exposes all toggles and controls to power users, but rather doing the right thing for most users with the least requirements for twiddling and tweaking. But you can always fork the *.qos scripts and put in everything you like, so even for the tweakers SQM should at least offer a decent starting point and a reasonable framework to quickly test and change QoS configurations. That said, @ldir's great addition is someting I would like to pull into sqm scripts somehow, but I keep failing to free enough time to actually get this working and tested...

Well if you come up with a rue that allows you to unambiguously identify netflix traffic, you can always add an extra TBF instance to only throttle netflix (or try nft-qos/luci-app-nft-qos in addition to SQM)?

Good one.

The Oculus Quest is a standalone device. It runs Android. It runs its own apps/games internally, but it also has the Virtual Desktop app to wirelessly connect to your own pc or to a cloud pc in (for example) Shadow datacenter. Using Virtual Desktop to get PC-grade VR onto the Oculus Quest is kind of a workaround. Shadow is working on their own Oculus Quest app actually.

Hmm, that would be a bummer. The stuttering when Netflix is doing initial buffering is exactly what I am trying to solve here... So I should give up? What are we trying to improve here then? Just optimising latency in general?

I see the SQM benefit, and I also love it. Great work you guys are keeping on improving it. You guys really rock!

Because we are almost always streaming Netflix to a Chromecast, could the rule be 'throttle everything to Chromecast? I will read into TBF en nft-qos. These are new words for me.

Ah, okay, then I think setting DSCPs on the oculus is not going to be an option...

No, but I just want to prepare you for a potential outcome that this might not fully fixable in your router.

In this thread? We are trying to find a solution to your issue, but as above there might not be an optimal solution.

Well, that feature is @ldir's invention, and I concur, he rocks!

That would probably work, in the luci GUI you can set fixed DHCP configs by MAC address, I would do this for the chromwcast(s) that way you will have known internal IP addresses for that netflix shaper...
TBF is just a "simple" traffic shaper that might work well to throttle the chromecast(s), it seems to have all the features you need, ntf-qos is its own fancy bandwidth by IP-address allotment scheme, whith a nice GUI that might allow to throttle the chromecast from the GUI. But I have never tried nft-qos and do not know how well it plays with SQM (I assume it should work well, but I have no data to base this optimism on).

Conceptually it's pretty simple. Within sqm-scripts where we instantiate an egress cake instance we also (optionally) instantiate a tc action ctinfo. I did this to avoid the need for every egress packet to go through my iptables classification rules, a 'set once and forget' operation, instead the connmark stored DSCP is 'restored' to the egress packet for that connection.

    # Put an action on the egress interface to set DSCP from the stored connmark.
    # This seems counter intuitive but it ensures once the mark is set that all
    # subsequent egress packets have the same stored DSCP avoiding the need to have
    # iptables rules mark every packet.

    $TC filter add dev $IFACE protocol all u32 \
        match u32 0 0 \
        action ctinfo dscp 0xfc000000 0x02000000

ingress makes use of the existing ifb4interface construct and adds a ctinfo action to restore DSCP from the connmark before the re-direct (tc mirred action) to the ifb interface and hence before CAKE gets to see it:

    # redirect all IP packets arriving in $IFACE to ifb0
    # set DSCP from conntrack mark
    $TC filter add dev $IFACE parent ffff: protocol all u32 \
        match u32 0 0 \
        action ctinfo dscp 0xfc000000 0x02000000 \
        action mirred egress redirect dev $DEV

Storing a DSCP (and saying we stored it there so ctinfo knows to restore it) is achieved by using (a hacked) ip/6/tables connmark rule:

# store the decided DSCP into connmark for later restoration by ctinfo
ipt -A QOS_MARK_F_${IFACE} -t mangle -j CONNMARK --set-dscpmark 0xfc000000/0x02000000

The above simply makes DSCPs 'stick' across egress & (crucially) ingress so CAKE's diffserv awareness can be exercised.

The actual classification process of identifying and then 'twiddling DSCP bits' to suit flows is a continually evolving challenge. So far I'm using a combination of ipsets (filled by dnsmasq) and ports but it's pretty low tech.

A deep packet inspection method could prove better but definitely at the cost of CPU. I'm (very slowly) writing a program to interface with the ndpi 'netifyd' and its JSON stream, the idea being it could take netifyd's flow identification, decide on a suitable DSCP and interface to conntrack to store that DSCP into the connmark ready for tc's act_ctinfo to use it.

I think netify already have something along those lines (well a firewall interface) but it's written in Python3 which is a bit too heavy for embedded type devices.

If a proper programmer wishes to join me then that'd be great. I'm using it as an excuse to improve my 'c' and learn about sockets etc, 'cos I've never dealt with that before. I've got as far as reading the netify socket and parsing the JSON stream that's coming from it.


What can I do about this missing kernel?

Update. I flashed DD-WRT info the other bank. Enabled QoS using fq_codel, and set the Chromecast to Bulk Priority and the Oculus Quest to Maximum Priority. I think I read this means only 5% of bandwidth for the Chromecast when the Oculus needs it.

My problem was instantly solved. Big initial Netflix buffering causes only a small 5ms latency spike while using Shadow on the Oculus Quest. I see the spike on the latency counter but cannot even notice it when I am not looking. So I am happy.

What would be the easiest way to get this rule in place in OpenWRT?