General Discussion about QOS/SQM

  1. What does "calulate overhead" do for this script?
  2. Why isn't this utilized more?
  3. tc command shows no stats actually changing?

I've found for my personal use it works great. My ingress is controlled via an unknown shaper by my isp and for gaming/streaming it's hands down simple, but effective. I also
use a script created by @shm0 to disable ALL offlods (excellent work).

qdisc hfsc 1: root refcnt 2 default 30 
 Sent 418950647 bytes 4551413 pkt (dropped 0, overlimits 693026 requeues 0) 
 backlog 0b 0p requeues 0
qdisc fq_codel 400: parent 1:40 limit 800p flows 1024 quantum 300 target 5.0ms interval 100.0ms memory_limit 4Mb 
 Sent 97387730 bytes 1394574 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 5656 drop_overlimit 0 new_flow_count 9893 ecn_mark 0
  new_flows_len 0 old_flows_len 1
qdisc fq_codel 100: parent 1:10 limit 800p flows 1024 quantum 300 target 5.0ms interval 100.0ms memory_limit 4Mb 
 Sent 309536 bytes 2977 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 486 drop_overlimit 0 new_flow_count 2687 ecn_mark 0
  new_flows_len 1 old_flows_len 41
qdisc fq_codel 200: parent 1:20 limit 800p flows 1024 quantum 300 target 5.0ms interval 100.0ms memory_limit 4Mb 
 Sent 26810301 bytes 181888 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 1412 drop_overlimit 0 new_flow_count 12111 ecn_mark 0
  new_flows_len 0 old_flows_len 2
qdisc fq_codel 300: parent 1:30 limit 800p flows 1024 quantum 300 target 5.0ms interval 100.0ms memory_limit 4Mb 
 Sent 294442948 bytes 2971972 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 5656 drop_overlimit 0 new_flow_count 185864 ecn_mark 0
  new_flows_len 0 old_flows_len 1
qdisc ingress ffff: parent ffff:fff1 ---------------- 
 Sent 11404262989 bytes 8057927 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
+---(1:) hfsc 
     |    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
     |    backlog 0b 0p requeues 0
     |
     +---(1:1) hfsc sc m1 0bit d 0us m2 9800Kbit ul m1 0bit d 0us m2 9800Kbit 
          |     Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
          |     backlog 0b 0p requeues 0
          |
          +---(1:10) hfsc rt m1 5716Kbit d 79us m2 980Kbit ls m1 5716Kbit d 79us m2 5444Kbit ul m1 0bit d 0us m2 9800Kbit 
          |           Sent 309536 bytes 2977 pkt (dropped 0, overlimits 0 requeues 0) 
          |           backlog 0b 0p requeues 0
          |     
          +---(1:20) hfsc rt m1 5215Kbit d 199us m2 4900Kbit ls m1 5215Kbit d 199us m2 2722Kbit ul m1 0bit d 0us m2 9800Kbit 
          |           Sent 26810301 bytes 181888 pkt (dropped 0, overlimits 0 requeues 0) 
          |           backlog 0b 0p requeues 0
          |     
          +---(1:30) hfsc ls m1 0bit d 100.0ms m2 1361Kbit ul m1 0bit d 0us m2 9800Kbit 
          |           Sent 294493086 bytes 2972098 pkt (dropped 0, overlimits 0 requeues 0) 
          |           backlog 0b 0p requeues 0
          |     
          +---(1:40) hfsc ls m1 0bit d 200.0ms m2 272Kbit ul m1 0bit d 0us m2 9800Kbit 
                      Sent 97387730 bytes 1394574 pkt (dropped 0, overlimits 0 requeues 0) 
                      backlog 0b 0p requeues 0
+---(1:) hfsc 
     |    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
     |    backlog 0b 0p requeues 0
     |
     +---(1:1) hfsc sc m1 0bit d 0us m2 48999Kbit ul m1 0bit d 0us m2 48999Kbit 
          |     Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
          |     backlog 0b 0p requeues 0
          |
          +---(1:10) hfsc rt m1 11106Kbit d 39us m2 4899Kbit ls m1 11106Kbit d 39us m2 27221Kbit ul m1 0bit d 0us m2 48999Kbit 
          |           Sent 417871 bytes 2638 pkt (dropped 0, overlimits 0 requeues 0) 
          |           backlog 0b 0p requeues 0
          |     
          +---(1:20) hfsc rt m1 25533Kbit d 39us m2 24499Kbit ls m1 25533Kbit d 39us m2 13610Kbit ul m1 0bit d 0us m2 48999Kbit 
          |           Sent 45826590 bytes 92668 pkt (dropped 0, overlimits 0 requeues 0) 
          |           backlog 0b 0p requeues 0
          |     
          +---(1:30) hfsc ls m1 0bit d 100.0ms m2 6805Kbit ul m1 0bit d 0us m2 48999Kbit 
          |           Sent 7320662440 bytes 5211250 pkt (dropped 0, overlimits 0 requeues 0) 
          |           backlog 0b 0p requeues 0
          |     
          +---(1:40) hfsc ls m1 0bit d 200.0ms m2 1361Kbit ul m1 0bit d 0us m2 48999Kbit 
                      Sent 4149338464 bytes 2750688 pkt (dropped 0, overlimits 0 requeues 0) 
                      backlog 0b 0p requeues 0
          

Do you mind sharing the script?

Disclaimer: All credit for this script goes to @shm0

The sleep function can be uncommented as per use case

#!/bin/sh

# sleep 15

function log()
{
	local status="$1"
	local feature="$2"
	local interface="$3"
	
	if [ $status -eq 0 ]; then		
		logger "[ETHTOOL] $feature: Disabled on $interface"
	fi
	
	if [ $status -eq 1 ]; then
		logger -s "[ETHTOOL] $feature: Failed to disable on $interface"
	fi
	
	if [ $status -gt 1 ]; then
		logger "[ETHTOOL] $feature: Nothing changed on $interface"
	fi

}
	
function disable_offloads()
{
	local interface="$1"
	local features
	local cmd
	
	# Check if we can change features
	if ethtool -k $interface 1>/dev/null 2>/dev/null;then
		
		# Filter whitespaces
		# Get only enabled/not fixed features
		# Filter features that are only changeable by global keyword
		# Filter empty lines
		# Cut to First colum
		features=$(ethtool -k "$interface" | awk '{$1=$1;print}' \
										   | grep -E '^.+: on$' \
										   | grep -v -E '^tx-checksum-.+$' \
										   | grep -v -E '^tx-scatter-gather.+$' \
										   | grep -v -E '^tx-tcp.+segmentation.+$' \
										   | grep -v -E '^tx-udp-fragmentation$' \
										   | grep -v -E '^tx-generic-segmentation$' \
										   | grep -v -E '^rx-gro$' \
										   | grep -v -E '^rx-gro$' \
										   | grep -v -E '^$' \
										   | cut -d: -f1)

		# Replace feature name by global key word
		features=$(echo "$features" | sed -e s/rx-checksumming/rx/ \
										  -e s/tx-checksumming/tx/ \
										  -e s/scatter-gather/sg/ \
										  -e s/tcp-segmentation-offload/tso/ \
										  -e s/udp-fragmentation-offload/ufo/ \
										  -e s/generic-segmentation-offload/gso/ \
										  -e s/generic-receive-offload/gro/ \
										  -e s/large-receive-offload/lro/ \
										  -e s/rx-vlan-offload/rxvlan/ \
										  -e s/tx-vlan-offload/txvlan/ \
										  -e s/ntuple-filters/ntuple/ \
										  -e s/receive-hashing/rxhash/)
										  
		# Check if we can disable someting								  
		if [ -z "$features" ]; then
			logger "[ETHTOOL] Offloads: Nothing changed on $interface"
			return 0
		fi
								  
		# Construct ethtool command line								  
		cmd="-K $interface"
			  
		for feature in $features; do			
			cmd="$cmd $feature off"
		done
		
		# Try to disable offloads		
		ethtool $cmd 1>/dev/null 2>/dev/null
		log $? "Offloads" "$interface"
		
	else
		log $? "Offloads" "$interface"	
	fi
}

function disable_flow_control()
{
	local interface="$1"
	local features
	local cmd
	
	# Check if we can change settings
	if ethtool -a $interface 1>/dev/null 2>/dev/null;then
										  
		# Construct ethtool command line								  
		cmd="-A $interface autoneg off tx off rx off"
		
		# Try to disable flow control				
		ethtool $cmd 1>/dev/null 2>/dev/null
		log $? "Flow Control" "$interface"
		
	else
		log $? "Flow Control" "$interface"		
	fi	
}

function disable_interrupt_moderation()
{
	local interface="$1"
	local features
	local cmd
	
	# Check if we can change settings
	if ethtool -c $interface 1>/dev/null 2>/dev/null;then
															
		# Construct ethtool command line								  
		cmd="-C $interface adaptive-tx off adaptive-rx off" 

		# Try to disable adaptive interrupt moderation				
		ethtool $cmd 1>/dev/null 2>/dev/null
		log $? "Adaptive Interrupt Moderation" "$interface"
		
		features=$(ethtool -c $interface | awk '{$1=$1;print}' \
										 | grep -v -E '^.+: 0$|Adaptive|Coalesce' \
										 | grep -v -E '^$' \
										 | cut -d: -f1)	
			
		# Check if we can disable someting								  
		if [ -z "$features" ]; then
			logger "[ETHTOOL] Interrupt Moderation: Nothing changed on $interface"
			return 0
		fi
		
		# Construct ethtool command line								  
		cmd="-C $interface"
			  
		for feature in $features; do			
			cmd="$cmd $feature 0"
		done
			
		# Try to disable interrupt Moderation		
		ethtool $cmd 1>/dev/null 2>/dev/null
		log $? "Interrupt Moderation" "$interface"
		
	else
		log $? "Interrupt Moderation" "$interface"		
	fi	
}

function main()
{

for interface in /sys/class/net/*; do
	interface=$(basename $interface)

	# Skip Loopback
	if [ $interface == lo ]; then
		continue
	fi
	
	disable_offloads "$interface"
	disable_flow_control "$interface"
	disable_interrupt_moderation "$interface"
	
done
}


main

In your original post you mention "calculate overhead" and then you show us the output of some commands but you never show us the script that does this "calculate overhead" etc.

I'm refering to the original luci-app-qos package prior to sqm scripts...

This is best appreciated by looking into the code, last time I looked into this for qos-scripts some years ago,but did something approximate and imprecise. Having an ADSL line at the time this made me look into alternatives, and so tried to influence the then beginnings of what turned into sqm-scripts to allow better/more precise overhead accounting.

What exactly is under utilized in your opinion? It helps to try spelling everything out explicitly, otherwise it is way to easy to misunderstand your points.

What exactly did you do, and what output would you expect?

So cake will be default split large meta packets before scheduling them, which generally makes GRO and GSO less of a problem and more of a feature, as these can noticeably lighten the average CPU load caused by routing packets (the routing decision/lookup only needs to be done once for all walk packets inside a meta packet). That said, there are situations where offloads cause issues and hence it can be advantageous to disable them, but is is far from a one-sided fits all solution IMHO.

Regarding the statistics it would be interesting the learn what OpenWrt version and what kernel version you are using? HFSC is not the Kernel's most actively maintained traffic shaper and occasionally falls behind in being updated to support new tricks the kernel picked up....

Regarding the tc outputs you show, could you in the future please also include the actual command line invocation used to create that output?

It's more hands on, and requires more sophistication on understanding what hfsc does. But HFSC is absolutely the best way to get customized control over your quality of service.

Do you have some questions about how to further configure for your use case? I can try to help.

1 Like

I am not sure that HFSC really offers that much more than say HTB with appropriately configured burst and cburst parameters per subshaper; well maybe the abstraction is more elegant, but in the end all shapers more or less do the same and suffer from the same problems ;). I do not want to discourage the use of HFSC, as after it is not that different from HTB but I want to put things a bit into perspective, especially given the fact that Linux has no active HFSC maintainer.

That's not true really, HTB offers you bandwidth control via token buckets. It's equivalent to a pure linear service curve where you can control the rate of sending.

HFSC offers both bandwidth and latency control independently via the nonlinear service curve (it's a broken stick curve, one rate for a brief period, and a second rate for steady state). It also offers substantially better realtime latency guarantees when used correctly (there are rt, ls, and ul curves, with different behavior in rt vs ls)

however it's harder to understand I think. This is its big issue.

the fact that there is no active maintainer is an issue. I hope someone will step up to that as needed. It is truly an excellent design for a shaper, substantially better than a token bucket approach for handling realtime interactive streams such as VOIP, games, and NTP, as well as substantially better for handling deprioritization of bulk downloads for the purpose of latency control of the realtime streams, in my opinion.

Figure 2 in the paper gives a sense of how this works:

Suppose you have a VOIP call, it's extremely sensitive to latency, but it should be forced to not use more than say 200kbps. At the same time you have a bulk download going at 100Mbps and it shouldn't be limited unless someone else has traffic at which point it should give up its traffic unconditionally.

The HTB approach looks more like the left hand side. You can set the bandwidth for each bucket. You will need to set the bandwidth for the VOIP bucket at 200kbps to avoid letting anyone abuse the high priority bin... and you set the bulk bucket to 99800 kbps to give the download a free reign.

Now much of the time there are no VOIP packets, so the download will borrow the tokens from the VOIP stream. Then a VOIP packet arrives. The VOIP packet waits for the HTB to drain the active bucket (the download) and then for the VOIP bucket to fill up... and then sends the VOIP packet. (If you set the burst lower you can force a more granular approach admittedly but at substantial computational cost). This means substantial latency increase for VOIP in practice. That's what I experienced at least using fireqos which implements strict priority HTB shapers. The bandwidth was always guaranteed for the VOIP streams, but the jitter was substantially worse. I routinely get something like 1ms jitter on VOIP calls where under HTB I was getting more like 10-15ms if I remember correctly.

Basically HFSC combines the functions of prioritization/scheduling together with bandwidth shaping into a single abstraction that is excellent at doing exactly what it promises: reducing jitter on some streams at the expense of inducing jitter on other non-sensitive streams.

example jitter calculation: 200 byte packet * 8 bits/byte / 200e3 bits/s = .008s = 8ms to fill the bucket. So if the bucket is full, it will get sent right away, and if the bucket is empty it will get sent in 8ms delay, and anything in between is possible.

1 Like

I would not hold my breath, it has basically left to bitrot even before I first saw it around 2010 or so, which is a pitty, because it is elegant. (My point really was that the differences between the advanced shapers are rather minor if compared to say a policer, than to argue that there are no differences).

Well, I don't know how much maintenance it really needs... but maybe I'll have to learn some kernel hacking if it ever gets in danger of being dropped for lack of compatibility :wink:

Interestingly I asked on the nftables website about scheduling for HFSC and right away someone answered that they were working on it and had filed some bugs in the nftables code.

shows commits throughout 2016 and 2017, a few in 2018 so I think it's being maintained.

Well it had a number of issues that where only fixed years after the bugs were introduced, I do not consider this to be active maintenance. Including one fix that just removed a user visible error message instead of fixing the root cause...

That said, I hope it gets more love in the future! Competition is good!

2 Likes

It's a mature piece of code with a well defined algorithm that works well, basically what's needed is for it to not break when other parts of the kernel are changed. It's definitely a problem if bugs show up and no one fixes them, but it's not necessarily a sign that it's abandoned if not much happens provided people aren't filing bugs. The warning message bug took a while to identify the cause, and it was a pure warning (basically that hfsc asked its leaf qdisc fq_codel for a packet, but the leaf had dropped the packet in the meantime)

I don't see it as competition really. HFSC is about custom control of bandwidth and latency, it's definitely what you'd want if you were running a large server cluster and needed to ensure for example precise NTP synchronized time-stamping of particle accelerator events and turning on and off detectors at precise times while hammering 10Gbps of detection data into a clustered redundant filesystem for example. Cake is about no-knobs do something good for typical home/small business users.

The thing is that in a home user situation there are relatively few real-time applications other than VOIP and games. And a bunch of effort goes into fudging this because providers of VOIP and games services can't assume everyone has a great shaper on their network... to the extent that some people here actually want to add delay to their networks to take advantage of the game fudges :wink:

2 Likes

I do not believe this to be 100% true, but in essence you are right, cake aims to to the right thing by default so that for a lot of users not much configuration is required and only few heuristics (than need maintenance) add to this an operation principle that is rather easy to understand and you have a winner :wink: for the no to few-knobs crowd, for anybody else there is always HFSC, HTB or help god, CBQ.

P.S.: I really doubt HFSC gets much play outside the BSDs for Linux the maintenance issue really indicates rather neglect than active maintenance, and yes the network subsystem is in constant enough flux, that without proactive maintenance components get dangerous rather quickly.

1 Like

I think HFSC is more often in use at server farms than you might think. The biggest problem is how to use it, and that's both asked and I think pretty well answered here:

@mindwolf that link kind of explains why it is that you really shouldn't have a bunch of high bandwidth realtime curves... as apparently you do in some of your hfsc classes, like 1:10 and 1:20 where you're guaranteeing up to 50% of your bandwidth to realtime in 1:20, that's probably not what you want to do unless you really do have something like 5Mbps of your bandwidth used by realtime sensitive traffic (like are you running a call center with up to 50-100 simultaneous calls or a gaming arcade where up to 100 players may be on x-boxes at once?)

Sorry I missed most of the responses while at work, as I must be on a different time zone! :wink:

@dlakelan I'm experimenting with using qos-scripts, so I don't claim to know the in's and out's of it yet. HFSC is very confusing to say the least.

@moeller0 my commands are tc -d -s -g qdisc show dev ethx && tc -d -s -g class show dev ethx

I do think CAKE and HTB are great for bandwidth assurance for media downloads/streaming(which seems to be the large audience). However, Latency sensitive applications such as voip and gaming do suffer from jitter. Gaming has slightly larger packets ranging from 200-500bytes average which includes video/audio simultaneously. A person streaming Netflix or downloading a large file will not notice a difference in rtt of 100ms or 200ms. Above 200ms and some buffering and stuttering will occur. Anything above 150ms and gamers will start to feel upet. I don't feel as if CAKE has those guarantees for latency sensitive applications. I want my udp packets moving along with very little queueing compared to a tcp packet that is in no hurry :sunglasses:

Put your qos-script config file here, and describe how you are identifying traffic. I will tune it up for you :wink: your goals are very similar to mine: guaranteed low latency for low bandwidth real-time applications, accept moderately high latency (on order of 100ms) for bulk TCP streams.

My goals are as follows:

  • dns,dhcp,ntp 53,67,123
  • gaming, chat & video calls 3074,5222,5223
  • ftp,mail,web,downloads/streaming 20,21,22,25,80,110,443,465,993,995,5222,5223,5228,8080,8888

my experience is that web browsing and streaming can occupy the same space nicely

  • bulk (default) 1-65535
root@OpenWrt:/etc/config# cat qos

config classify
	option target 'Priority'
	option proto 'icmp'

config classify
	option proto 'udp'
	option target 'Priority'
	option ports '53,67,123'
	option comment 'dhcp,dns,ntp'

config classify
	option proto 'udp'
	option ports '3074,5222,5223'
	option comment 'call of duty'
	option target 'Express'

config classify
	option target 'Normal'
	option ports '20,21,22,25,80,110,443,465,993,995,5228,8080,8888'
	option comment 'http(s),ftp,mail,chat'

config interface 'wan'
	option classgroup 'Default'
	option enabled '1'
	option download '50000'
	option upload '10000'

config default
	option target 'Express'
	option proto 'udp'
	option pktsize '-500'

config reclassify
	option target 'Priority'
	option proto 'icmp'

config default
	option target 'Bulk'
	option portrange '1024-65535'

config classgroup 'Default'
	option classes 'Priority Express Normal Bulk'
	option default 'Normal'

config class 'Priority'
	option packetsize '400'
	option avgrate '10'
	option priority '20'

config class 'Priority_down'
	option packetsize '1000'
	option avgrate '10'

config class 'Express'
	option packetsize '1000'
	option avgrate '50'
	option priority '10'

config class 'Normal'
	option packetsize '1500'
	option packetdelay '100'
	option avgrate '10'
	option priority '5'

config class 'Normal_down'
	option avgrate '20'

config class 'Bulk'
	option avgrate '1'
	option packetdelay '200'

This is my second day actually having time to do this :sweat_smile:
PS this the luci rough draft before I found out that it could be edited further

I did try cake/fq_codel with dscp markings, which did have a small effect but not as substantial as HFSC. SQM is still my recommendation for people that want an easy button most other things!