SQM, cake and piece_of_cake.qos - High CPU usage

So I fear that this test, while sane in general, compares wired access with wireless and by that drags in wifi challenges into the mix, but I agree that there seems to be room for improvement...
The odd thing is that VoIP in general only uses around 100Kbps (in each direction) and hence should survive quite well with simple per flow fairness, because at 50Mbps you would need approximately 50000/100 = 500 flows, before the per flow share of ingress falls below the required 100Kbps (these numbers are not 100% correct, but the order of magnitude should be okay).
Maybe you really run into a CPU problem on the router...

I'm not near my computer, so from what I remember the ports are:

  • Game - 27015 (UDP)
  • Torrent Client - 60435 (UDP?)

That's sad. I was sure the router should be fast enough, at least after upgrading from TL-WR1043ND.
Do you think this is the router model? Or maybe it could be caused by the image I've flashed?
Do you recommend using QoS instead?

One challenge with port based priority assignments ist that the port-usage are pretty ephemeral, and especially torrents are known to try to evade easy identification, so these might be a moving target. Then again, all that means is that any port based qos needs to be verified every now and then and potentially tweaked a bit to reflect changes in port usage.

Well the new router still only sports a pretty ancient MIPS cpu, so the biggest improvement is probably the frequency jump but that only holds if you used a TL-WR1043ND v1 (400MHz) bevore, v2-v4 of the TL-WR1043ND were already at 720-750MHz, so you might not have seen any significant frequency increase to begin with. Naively I would guess that the new router offers better wlan but not necessarily more "punch".
Then again I have no personal experience with either router, so there might be big differences and improvements, and no matter what the archer should be able to work sufficiently well at your bandwidth...

Personally I do not, but it might in the end be a better solution for your problem. What I certainly would recommend is to use qos/scripts/luci-app-qos to verify that your old qos scheme still works better than sqm does for you.

Best Regards

I think custom QoS scripts are a good idea when you have specific quality tradeoffs you want to express that are different from "everyone gets similar bandwidth with pretty low latency under load" which is what Cake seems to try to do, and seems to do very well.

For example I have a PBX on a VPS server, so I can definitely identify all the VOIP traffic because it's all UDP coming to or from that given IP in a narrow port range. I want to prioritize this VOIP so that if there is anything in the queue at all from or to this IP it gets sent immediately completely starving all other bandwidth for up to say 20ms. I do this because I'm sure that I really do want as little latency as possible for this traffic because I know this endpoint very well.

Similarly, I recently started playing certain games with my kids, I can identify it because Steam has certain ports they've assigned to their traffic. Again I want that traffic to stall other non-realtime traffic for up to 20ms.

After those realtime classes are taken care of, the main thing is to keep my family happy and buffering-free on streaming videos, then almost all stuff after that is "default" except that I'm happy to let things like torrents be starved for hundreds of milliseconds, since who cares?

HFSC lets me express that very well. But other users may not have so clean of a set of requirements, for example if you're sharing a link with 3 other college roommates each of you may have your own totally separate requirements (some people are just facebooking, others are doing interactive ssh to a high performance computing cluster for example). Per-ip fairness is probably a good idea because expressing the "real" requirements may in fact be a nightmare and per-ip could be good enough (until the comp-sci major decides to load up 25 virtual machines / containers on his PC and take 25/28 of the bandwidth :wink: )

I took the time to understand HFSC and wrote scripts at the shell level to use it on my router (running Debian) I'll see about plopping a script into github and linking it here, it comes up often enough.

EDIT: mean time @Dor you can check out the basics that I posted on my blog: http://models.street-artists.org/2018/01/16/understanding-hfsc-in-linux-qos/

it gives the basic setup of the HFSC classes, you have to add your own filters, which are pretty simple for your 2 classification rules. I'll post the filter rules you need on as separate comment.

2 Likes

You want to filter udp port 27015 dst or src into the game queue 1:20 and udp or tcp port 60435 into the low priority queue 1:50

# match udp src port 27015 to game queue 1:20
tc filter add dev ${DEV} parent 1:0 protocol ip prio 10 u32 match ip protocol 17 0xff match ip sport 27015 0xffff flowid 1:20
#same for dst
tc filter add dev ${DEV} parent 1:0 protocol ip prio 11 u32 match ip protocol 17 0xff match ip dport 27015 0xffff flowid 1:20

#match udp  src or dst port 60435 to lowest priority 1:50
tc filter add dev ${DEV} parent 1:0 protocol ip prio 12 u32 match ip protocol 17 0xff match ip sport 60435 0xffff flowid 1:50
#same for dst
tc filter add dev ${DEV} parent 1:0 protocol ip prio 13 u32 match ip protocol 17 0xff match ip dport 60435 0xffff flowid 1:50

#match tcp  src or dst port 60435 to lowest priority 1:50
tc filter add dev ${DEV} parent 1:0 protocol ip prio 14 u32 match ip protocol 6 0xff match ip sport 60435 0xffff flowid 1:50
#same for dst
tc filter add dev ${DEV} parent 1:0 protocol ip prio 15 u32 match ip protocol 6 0xff match ip dport 60435 0xffff flowid 1:50

So recent cake can actually be address from tc filters, so onr option would be to still use sqm-scripts with layer_cake and set up two filters to place torrents into the bulk tin and game traffic into the high pririty tin. Not as nifty as your example, but if all that is required is to sort stuff into three priority classes that should be fine :wink:

But how to do this?

first run "tc -s qdisc" and figure out the major number of the cake instance in question:
qdisc cake 801b: dev pppoe-wan root refcnt 2 bandwidth 9545Kbit diffserv3 dual-srchost nat split-gso rtt 100.0ms noatm overhead 34 mpu 64
In this example 801b for interfave pppoe-wan, and remember that number

Then look at the tins for diffserv3:

verage network hdr offset:            0

                   Bulk  Best Effort        Voice
  thresh      596560bit     9545Kbit     2386Kbit
  target         30.5ms        5.0ms        7.6ms
  interval      125.5ms      100.0ms      102.6ms

1 being Bulk or Background and remember this as minor number.
Then all you need to do is to add a filter for the torrent port:

tc filter add dev pppoe-wan parent 801b: protocol ip u32 match ip dport 60435 0xffff action skbedit priority 801b:1

Mind you I have not tested that myself, but this was reported to work on the cake mailing list. @ldir can you spot an error of my copy of your command above :wink:

1 Like

This is pretty cool, but also pretty involved. @Dor if you have time to do this and enjoy fiddling a bit this is a fine way to both improve your network and learn more tricks about openwrt/linux :wink:

@dlakelan Thanks for the info, I'll read it and try that out.

@moeller0 and @dlakelan, after uninstalling luci-app-qos and luci-app-sqm, I still have queue discipline configurations when running the command tc qdisc. I suspect it might be the problem, or once again it's the CPU. The CPU usage is relatively high even after uninstall both scripts, and running simple speed test - CPU is around 42% usage for 52Mbps :disappointed:

You should stop the sqm instance by issueing:

/etc/init.d/sqm stop

then these should be gone.
luci-app-sqm is really just the cnfiguration interface the main package ist sqm-scripts IIRC.

Yeah 42% for 52 Mbps is probably the overhead of NAT and conntrack and firewall etc, the CPU is not super powerful.

Also see @moeller0 instructions for actually disabling sqm

@dlakelan
I find your blog post very interesting! You kinda make me want to try out HFCS for my online games, PS4, VOIP-client and torrents.
At the moment i'm using SQM@cake with piece_of_cake and it fits my setup quite well but only until torrents are coming into play...
Torrents are really the only thing that destroys my latency but the rest is pretty much under control with SQM@cake.

I'm hoping that the devs will improve SQM@cake even further so downloading torrents won't slow down my online gaming experience in the future... :slight_smile:

On my setup I put one instance on wan and one on lan, both are straight Ethernet, I don't have a bridge since my WiFi is separate APs. Each instance shapes output, so the output of lan is basically ingress. This is different from sqm using the IFB method... Ingress shaping is tricky.

That was what I did.

Still have this output:

qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
qdisc noqueue 0: dev br-lan root refcnt 2
qdisc noqueue 0: dev eth0.2 root refcnt 2
qdisc fq_codel 0: dev pppoe-wan root refcnt 2 limit 10240p flows 1024 quantum 1518 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
qdisc noqueue 0: dev br-guest_lan root refcnt 2
qdisc noqueue 0: dev wlan1 root refcnt 2
qdisc noqueue 0: dev wlan1-1 root refcnt 2
qdisc noqueue 0: dev eth0.3 root refcnt 2
qdisc noqueue 0: dev eth0.1 root refcnt 2```

This is fine, no cake instance left, I believe openwrt defaults to fq_codel foe ethernet, as it should....

1 Like

Seeing your qdisc output, it seems you will have difficulty shaping inbound traffic unless you go the IFB route as you have several places where your traffic can go (2 ethernet vlans and 2 wlans organized into two bridges lan and guest)

I'm hoping someone maybe @moeller0 knows where to copy and paste the 2 lines that create an ifb and add a filter to pppoe-wan or whatever to mirred redirect the inbound packets to the ifb.

It should look something like:

ip link add name ifb0 type ifb
tc filter add dev pppoe-wan parent 1: protocol ip prio 1 u32 \
 match ip dst 192.168.0.0/16 action mirred egress redirect dev ifb0

but can't guarantee that's totally correct.

Can you explain a bit more about this? I'm not sure I understand the problem.

Anyway, there is a lot of information in the recent comments. I'll try to do every suggestion you've written :slight_smile:
My next step is to run some iperf tests, maybe the results will help a little.

So here is what sqm-scripts does:

with
IFACE=pppoe-wan
DEV=ifb4pppoe-wan
TC=/usr/sbin/tc
IP=/sbin/ip or IP=/usr/sbin/ip (depending on whether one uses busybox's ip functionality or the real ip binary)

    $TC qdisc del dev $IFACE handle ffff: ingress 2>/dev/null
    $TC qdisc add dev $IFACE handle ffff: ingress
    $IP link add name $DEV type ifb
    $IP link set dev $DEV up
    # redirect all IP packets arriving in $IFACE to ifb0
    $TC filter add dev $IFACE parent ffff: protocol all prio 10 u32 \
        match u32 0 0 flowid 1:1 action mirred egress redirect dev $DEV

That should work, but honestly, I would actually try 18.06 and see whether tc filter can not be used to sort the torrent port packets into the background tin and the game packets into the voice/priority tin (assuming the thresh of the voice tin is sufficient for the games demand, for background this does not matter since the thresh basically is the guaranteed rate and you want almost no guarantee foe background so this traffic nicely yields for more important packets, you just eed enough guaranteed bandwidth to make some forward progress to avoid starving flows completely).

Best Regards

Is it possible to attach cake to one of the HFSC classes (similar to the way I'm using fq_codel) such as class 40 in my example that handles default connections? For afficionados this might be the ideal, you can prioritize VOIP and games as real-time classes in HFSC and then have a link-share class for "most everything else" with per-ip fairness, and a low priority class for torrents with pfifo behavior thereby giving the torrents the lag under load they expect to see so that they throttle their bandwidth appropriately.

Sure you can use cake as a leaf qdisc, but this will probably not work as intended, as I fai to see how cake would get meaningful backpressure, unless you set cake to shape to desired total shaped bandwidth minus dedicated realtime bandwidth, but then you would run two stacked shapers an potentially leave bandwidth on the table (now since each VoIP stream only takes around 100Kbps, this would not be too bad for low channel pure VoIP).

But personally I would first try to inject VoIP packets into cake diffserv3 or cake differserve4's highest priority bin and see whether the resulting jitter and real time response is not already good enough... Same for torrents.

IMHO the crux of all these schemes is to make sure the heuristic used to deduce a packets intended priority is working as intended and does not introduce undesired side-effects. As you have shown nicely in your blog achieving that is not impossible, but certainly involves some research and reading.

Best Regaerds

Yep, really what's needed for my scheme is just a per-ip-fairness scheduler separate from shaping. In fact, in the scheme I'm talking about HFSC is already choosing the bandwidth, it is the "bottleneck" so to speak, so you might do fine telling cake to shape to infinite bandwidth (or whatever, twice your full bandwidth for example) just to get its per-ip fairness. But it'd be a bunch of computing overhead for probably not a lot of benefit.

I do agree with you about trying to use filters to just hand cake the classified packets. that's probably a good thing to try. I'm not willing to give up my HFSC realtime VOIP shaper though. I can run 750Mbps muti-stream dslreports speed tests during phone calls and never experience more than about 1ms of VOIP jitter :wink: the same is true for gaming.

EDIT: also as far as I can tell there's no cake in Debian which is what my x86 router runs :wink: