SQM, cake and piece_of_cake.qos - High CPU usage

Most likely 8 is wrong, would you be willing to disclose which ISP you are a customer of, sometimes google can help to find information about the encapsulation. Do you always get a 100 MBps Sync in your VDSL-Modem or is the Sync variable? Or put differently, is the 55 Mbps the Sync of the modem or the thoughput you measure when the modem syncs at 100Mbps?

That sounds a bit odd, if you ask me. Then again cake/fq_codel by default give a small priority boost to small sparse flows like typical ACK-streams, so you currently have the opposite of your old set-up.

Tangentially related rant, torrent apps try to be good citizens that do their transfers in the background, unfortunately their method to detect competing traffic is by measuring latency increases caused by queueing and using a threshold to decide when to throttle back. Unfortunately competent AQM solutions like cake and fq_codel will only minimally increase the latency during load so the torrent apps fail to notice competing traffic and overwhelm the conection. BBR a novel TCP congestion avoidance algorithm by google used the same rationale, but they had the good sense to later add an explicit detector for competent AQM to avoid falling into the same trap. I so wish torrent programmers would revisit this issue in the near future.

Best Regards

1 Like

Sure, it's Bezeq Int. / Bezeq International.

My VDSL-Modem is synced on 55Mbps, and getting very close results on speed tests.

Is that something you shouldn't do as best practice? I mostly want to create stable connection for gaming, even if someone streams or uses torrents. Here is a similar image of my old setup I've found in an internet tutorial.

So if I understand correctly, when using AQM the bandwidth is somewhat stable then the torrent clients fail to the determine whether they use all the bandwidth and "flood" the AQM with more packets?
I might get all of this wrong, still learning though.

Ah, bummer google translate has a hard time with that web page...

Okay, are you using pppoe on openwrt, or are you using bezeq's router/modem for configuring the log-in data?

With a PPPoE connection on a VDSL2/PTM link (most likely what you have) you will need either 22 Bytes of overhead on top of the PPPoE payload or even 26, assuming your ISP also uses a VLAN. Now googling for bezeq seems to indicate that they do not use a VLAN, so I would recommend to configure the overhead for sqm to 22 if you instantiate sqm on eth0.2 or overhead 30 if you instantiate on pppoe-wan (on that interface the pppoe overhead is not yet added).
The next step would be to add the "ingress" keyword to your configuration, by addig the following to /etc/config/sqm:
option qdisc_advanced '1'
option qdisc_really_really_advanced '1'
option iqdisc_opts 'nat dual-dsthost ingress'
option eqdisc_opts 'nat dual-srchost'

These options will add per-IP fairness, which will not help with your torrents but might help, say your phone via wlan.

Let's take it from there....

Oh, I believe that expediting slow sparse flows is a good idea in general; for example that will boost DNS queries slightly and in interactive browsing that helps.

Well, the per-internal-IP-fairness should solve most of these issues, unless the streaming torrenting is happening on the same machine as the gaming. I had assumed that no gamer would load his/her box with unrelated gunk during play, but here you see how little I know about gaming...
I am not 100% sure that not prioritizing ACKs really is the right thing to do to deprioretize ACKs, especially since (most?) torrent client should be usinf the UDP based uTP instead of TCP and hence do not actually generate traditional ACK packets... Again, not using torrents so for all I know your client might be using TCP...

Well, that is my theory anyways. I wonder whether you could throttle your torrent client to only accept a smallish number of concurrent connections ad maybe restrict the bandwidths during those times you game as an immediate work-around for your issues?

Best Regards

Yeah, I guess they don't provide any data about it anyway.

On OpenWRT. My setup is Archer C7 v4 as a router and Netgear DM200 modem. I set up pppoe login via OpenWRT.,

Ok, I've changed the overhead to 30 and the interface to pppoe-wan (I was sure I'm working on pppoe-wan but somehow it was on eth0.2). After changing the overhead, the performance was better, but still had packet loss and high CPU usage, though it was lower than usual. I can post tc -s qdisc if needed.

No gamer would do that :slight_smile:
I'm just testing the performance of the internet and want to get the best result, as I've managed to get that with my old setup. I think I'll get the same result if I use the torrent client on other machine, because of the high CPU usage.

I've tested it too. I changed the maximum number of connection from 200 to 50, and it seems it doesn't matter. As the download speed increases, also the CPU works harder - To be more precise, download speed of 4 Mbit/s resulting 80% usage, and 5 Mbit/s resulting around 95%.

Getting the same result as I described above.

A key to slowing torrent down is not limiting max connections, but concurrent ones. Usually it's five per active torrent. I wish it was, like, capped at 5 no matter how many torrents.

packet loss is the normal way to slow tcp down.

I knew going into the aqm field that A) we were going to obsolete how torrent operated, and that B) it didn't matter, because of how big we generally won by aiming for 5ms induced delay rather than 100ms.

The relevant paper is here: https://perso.telecom-paristech.fr/drossi/paper/rossi14comnet-b.pdf

The point is that I get almost the same CPU usage when doing speedtest, which, I guess doesn't uses concurrent connections. Or at least way less than a torrent does.

Edit: I've limited the connection per torrent to 5, and had only 1 torrent running, still get the same result.

Hi Dave,

nice paper, I overlooked it initially it seems. Looking at LEDBAT's wikipedia page I see this gem:
"Both of the above implementations aim to limit the network queuing delay to 100ms."
What are these guys smoking, that inducing 100ms latency under load seems acceptable?

Anyway, I digress...

Just for my curiosity, could you do this test where torrenting and use are in different machines and you use the modifications to /etc/config/sqm I recommended above (mainly the nat and dual-XXXhost options)?

It's hard to tell and it doesn't seem to help. Connection was stable before torrenting, and then somewhat unstable after the torrent client's download speed reached 4.5Mbps.

Here are the results (I'm using Mumble voip to measure it, and it works on UDP). If there are better ways to test the connection, I would be happy to hear.

Torrent client PC:

Mobile device without the torrent client:

Can you identify the game and torrent traffic by port number or similar? If so I can help with a custom qos you could try

So I fear that this test, while sane in general, compares wired access with wireless and by that drags in wifi challenges into the mix, but I agree that there seems to be room for improvement...
The odd thing is that VoIP in general only uses around 100Kbps (in each direction) and hence should survive quite well with simple per flow fairness, because at 50Mbps you would need approximately 50000/100 = 500 flows, before the per flow share of ingress falls below the required 100Kbps (these numbers are not 100% correct, but the order of magnitude should be okay).
Maybe you really run into a CPU problem on the router...

I'm not near my computer, so from what I remember the ports are:

  • Game - 27015 (UDP)
  • Torrent Client - 60435 (UDP?)

That's sad. I was sure the router should be fast enough, at least after upgrading from TL-WR1043ND.
Do you think this is the router model? Or maybe it could be caused by the image I've flashed?
Do you recommend using QoS instead?

One challenge with port based priority assignments ist that the port-usage are pretty ephemeral, and especially torrents are known to try to evade easy identification, so these might be a moving target. Then again, all that means is that any port based qos needs to be verified every now and then and potentially tweaked a bit to reflect changes in port usage.

Well the new router still only sports a pretty ancient MIPS cpu, so the biggest improvement is probably the frequency jump but that only holds if you used a TL-WR1043ND v1 (400MHz) bevore, v2-v4 of the TL-WR1043ND were already at 720-750MHz, so you might not have seen any significant frequency increase to begin with. Naively I would guess that the new router offers better wlan but not necessarily more "punch".
Then again I have no personal experience with either router, so there might be big differences and improvements, and no matter what the archer should be able to work sufficiently well at your bandwidth...

Personally I do not, but it might in the end be a better solution for your problem. What I certainly would recommend is to use qos/scripts/luci-app-qos to verify that your old qos scheme still works better than sqm does for you.

Best Regards

I think custom QoS scripts are a good idea when you have specific quality tradeoffs you want to express that are different from "everyone gets similar bandwidth with pretty low latency under load" which is what Cake seems to try to do, and seems to do very well.

For example I have a PBX on a VPS server, so I can definitely identify all the VOIP traffic because it's all UDP coming to or from that given IP in a narrow port range. I want to prioritize this VOIP so that if there is anything in the queue at all from or to this IP it gets sent immediately completely starving all other bandwidth for up to say 20ms. I do this because I'm sure that I really do want as little latency as possible for this traffic because I know this endpoint very well.

Similarly, I recently started playing certain games with my kids, I can identify it because Steam has certain ports they've assigned to their traffic. Again I want that traffic to stall other non-realtime traffic for up to 20ms.

After those realtime classes are taken care of, the main thing is to keep my family happy and buffering-free on streaming videos, then almost all stuff after that is "default" except that I'm happy to let things like torrents be starved for hundreds of milliseconds, since who cares?

HFSC lets me express that very well. But other users may not have so clean of a set of requirements, for example if you're sharing a link with 3 other college roommates each of you may have your own totally separate requirements (some people are just facebooking, others are doing interactive ssh to a high performance computing cluster for example). Per-ip fairness is probably a good idea because expressing the "real" requirements may in fact be a nightmare and per-ip could be good enough (until the comp-sci major decides to load up 25 virtual machines / containers on his PC and take 25/28 of the bandwidth :wink: )

I took the time to understand HFSC and wrote scripts at the shell level to use it on my router (running Debian) I'll see about plopping a script into github and linking it here, it comes up often enough.

EDIT: mean time @Dor you can check out the basics that I posted on my blog: http://models.street-artists.org/2018/01/16/understanding-hfsc-in-linux-qos/

it gives the basic setup of the HFSC classes, you have to add your own filters, which are pretty simple for your 2 classification rules. I'll post the filter rules you need on as separate comment.

2 Likes

You want to filter udp port 27015 dst or src into the game queue 1:20 and udp or tcp port 60435 into the low priority queue 1:50

# match udp src port 27015 to game queue 1:20
tc filter add dev ${DEV} parent 1:0 protocol ip prio 10 u32 match ip protocol 17 0xff match ip sport 27015 0xffff flowid 1:20
#same for dst
tc filter add dev ${DEV} parent 1:0 protocol ip prio 11 u32 match ip protocol 17 0xff match ip dport 27015 0xffff flowid 1:20

#match udp  src or dst port 60435 to lowest priority 1:50
tc filter add dev ${DEV} parent 1:0 protocol ip prio 12 u32 match ip protocol 17 0xff match ip sport 60435 0xffff flowid 1:50
#same for dst
tc filter add dev ${DEV} parent 1:0 protocol ip prio 13 u32 match ip protocol 17 0xff match ip dport 60435 0xffff flowid 1:50

#match tcp  src or dst port 60435 to lowest priority 1:50
tc filter add dev ${DEV} parent 1:0 protocol ip prio 14 u32 match ip protocol 6 0xff match ip sport 60435 0xffff flowid 1:50
#same for dst
tc filter add dev ${DEV} parent 1:0 protocol ip prio 15 u32 match ip protocol 6 0xff match ip dport 60435 0xffff flowid 1:50

So recent cake can actually be address from tc filters, so onr option would be to still use sqm-scripts with layer_cake and set up two filters to place torrents into the bulk tin and game traffic into the high pririty tin. Not as nifty as your example, but if all that is required is to sort stuff into three priority classes that should be fine :wink:

But how to do this?

first run "tc -s qdisc" and figure out the major number of the cake instance in question:
qdisc cake 801b: dev pppoe-wan root refcnt 2 bandwidth 9545Kbit diffserv3 dual-srchost nat split-gso rtt 100.0ms noatm overhead 34 mpu 64
In this example 801b for interfave pppoe-wan, and remember that number

Then look at the tins for diffserv3:

verage network hdr offset:            0

                   Bulk  Best Effort        Voice
  thresh      596560bit     9545Kbit     2386Kbit
  target         30.5ms        5.0ms        7.6ms
  interval      125.5ms      100.0ms      102.6ms

1 being Bulk or Background and remember this as minor number.
Then all you need to do is to add a filter for the torrent port:

tc filter add dev pppoe-wan parent 801b: protocol ip u32 match ip dport 60435 0xffff action skbedit priority 801b:1

Mind you I have not tested that myself, but this was reported to work on the cake mailing list. @ldir can you spot an error of my copy of your command above :wink:

1 Like

This is pretty cool, but also pretty involved. @Dor if you have time to do this and enjoy fiddling a bit this is a fine way to both improve your network and learn more tricks about openwrt/linux :wink:

@dlakelan Thanks for the info, I'll read it and try that out.

@moeller0 and @dlakelan, after uninstalling luci-app-qos and luci-app-sqm, I still have queue discipline configurations when running the command tc qdisc. I suspect it might be the problem, or once again it's the CPU. The CPU usage is relatively high even after uninstall both scripts, and running simple speed test - CPU is around 42% usage for 52Mbps :disappointed:

You should stop the sqm instance by issueing:

/etc/init.d/sqm stop

then these should be gone.
luci-app-sqm is really just the cnfiguration interface the main package ist sqm-scripts IIRC.

Yeah 42% for 52 Mbps is probably the overhead of NAT and conntrack and firewall etc, the CPU is not super powerful.

Also see @moeller0 instructions for actually disabling sqm