Simple QoS for VoIP

Plonk34 · January 7, 2018, 4:57pm

Hi there, i need a simple prioritize for my SIP+RTP connection.
The tc scripts + luci-app a to complex and to resource intensiv for my O2-Box 6431 with Lantiq VRX200 Chipset + 64MB RAM
I have used in the past a simple script that i add (include) to the firewall.

For prioritize the output i use the normal tri-band priority by using the TOS-field (0x10 0x12 0x14 0x16 for band 0)
(can be doing in Asterisk)

For prioritize the input i use tc + ingress filter like:

dev=$(uci get network.wan_dsl.ifname)
tc qdisc add dev $dev handle ffff: ingress
# here i prior the packets with TOS flag low Delay
tc filter add dev $dev parent ffff: protocol ip prio 10 	u32 match ip tos 0x10 0xff 	flowid :0	police pass
tc filter add dev $dev parent ffff: protocol ip prio 11 	u32 match ip tos 0x12 0xff 	flowid :0	police pass
# ... 0x14 and 0x16
tc filter add dev $dev parent ffff: protocol ip prio 30 	u32 match ip src 0.0.0.0/0 	flowid :2	police rate 50mbit burst 500k drop

1.) It works in the past (LEDE 17.01.4) but now (LEDE snapshot 12/2017) i get this error:

How can i solve this ?

2.) I do not know the source where i have write up and i have not really an idea waht i am doing here.
2a) what is the meaning of "prio 10" or "prio 11" ?
2b.) what is the meaning of "flowid :0" or "flowid :2" are this the bands of the sending queue ?

anon57995562 · January 7, 2018, 7:46pm

Snapshots are not considered stable.

tmomas · January 7, 2018, 8:33pm

If it worked in 17.01.4, why have you switched to snapshot in the first place?

anon57995562 · January 7, 2018, 8:38pm

I think some might interpret snapshots as release candidates...

slh · January 7, 2018, 9:38pm

The master branch can only act as a base for a future stable release, if it gets actually tested before branching off and tagging the release, so arguing against this usage is counter productive. Especially as long as it isn't clear if this is caused by an intentional behaviour change of netfilter and tc code (it just happened to set sensible default in the absence of a clear declaration in the past) or a bug of varying shades (from "netfilter has been changed to require explicit loading (installation) of submodules to achieve certain functionality 'recently'" to real bug).
Just as a sidenote, especially in the case of VoIP usage, the stable release might not always be the best choice. Be it because of the major changes (improvements) for lantiq post 17.01 or because of the highly security sensitive nature of asterisk acting as VoIP pbx/ ATA, with its enormous global attack surface.

Unfortunately I can't really contribute to the underlying question of orchestrating tc.

Plonk34 · January 7, 2018, 10:04pm

I thing the reason is a newer version of tc 4.13 instead of 4.4 and i am shure it is not a bug and it will be in the next release.
But it does not solve my problem.
My main question is: What is the replacement for

tc ... police pass

?

moeller0 · January 7, 2018, 10:14pm

This looks a bit dated in that it seems to match the full 8 TOS bits, even though two of them (might) have been relegated to do ECN work,. I think replacing the 0xff with 0xfc should do the trick.

anon57995562 · January 7, 2018, 10:32pm

In this case, it appears it was the best choice...the snapshot broke it.

slh · January 7, 2018, 11:37pm

This will be my last remark in this non-productive subthread, but...

a) the snapshot isn't broken, it boots and works sufficiently to debug it or even to revert to an older release, if so desired (aka, not a brick).
b) VoIP works (and as a consequence so does the underlying essential infrastructure, like kernel, vdsl drivers, libc, busybox, uci, routing, wlan, etc.)

What does not work is the custom way he invoked tc to improve the latency for SIP traffic, which is very useful, especially for relatively weak (given the workload targetted here) devices, but not strictly necessary either. Given that this way to shape the traffic is indeed custom and not used (as-is) by any package in LEDE, there isn't really any bug to fix in LEDE either. Accordingly the initial poster didn't even complain about a bug in a non-release build, but asked for suggestions to fix the problem (not LEDE, but the way to invoke light-weight traffic shaping) in the future.

These condescending attacks (here and in the other thread) against users choosing the master branch, in the knowledge that stuff can break, is imho pretty unwarranted - and indeed counter productive, as you can bet this issue would have appeared just the same if everyone would have just waited for Santa, err, 18.xy.1, to appear - that is not how development works. Recommending stable over snapshots does make sense, but it's not always an option (be it because there is no stable support for $DEVICE yet or be it because there are new $FEATURES, $BUGFIXES, fixed $CVEs in master). In the concept of pull requests, the spectrum between advanced users and developers isn't clear cut and either depend on early feedback and bugreports to actually fix something, to come up with something that warrants the term 'stable' beyond the mere label. Neither of our posts belong into this thread, neither actually contribute to the topic in question - and I wouldn't have responded (because I can't help with the traffic shaping aspect), if I wouldn't have noticed this recent trend to sharply kill off support questions for snapshots without even looking at the actual problem (remember, this is not "installing a snapshot bricked my router!!!1!!1!" demand for warranty, but a discussion about the behaviour change of the current tc).

For reference, the device in question -like most lantiq devices- is not brickable by software, but rather involved to flash LEDE on - someone who succeeds in that, can deal with any breakage using the master branch might involve (as reverting is always an option, even if not the preferred one).

anon57995562 · January 8, 2018, 12:09am

You are way off base...

No one stated that it bricked the router...it broke the functionality, regardless of custom configuration.
The nature of snapshots are that they can be, and often are, unstable. Depending on the goal of the OP...if you want the functionality back without further debugging, revert to 17.01.4.
No one is discouraging discussion. People are free to post "revert to 17.01.4" as well as the OP replying that "I'd like to look into it more with the snapshot".

It's the OP's call.

Unfortunately, keyboard warriors interpreting other members' intentions is non-productive.

Have a good day.

Plonk34 · January 9, 2018, 7:59pm

OK sorry for my bad english i do not mean prior i mean prioritize (i have correct it)

@ tmomas

The reason is: I need it for the Easybox 904xDSL . For this router exist only a fork of LEDE based of snapshot not 17.01.4

My Workarount is now "... police rate 200mbit burst 500k pass" instead of "police pass"
But it does not make sense and cost 1MB/s download speed instead of 0,5MB/s "police pass" (all values against no tc rules)

here an example for downloading
unlimited: wget -4 -O /dev/null http://speedtest.wtnet.de/files/1000mb.bin
with max 20Mbit/s: wget -4 -O /dev/null http://www.speedtestx.de/testfiles/data_1gb.test
(it is require to a capatcha here http://www.speedtestx.de first)

tc qdisc del dev pppoe-wan_dsl ingress
tc qdisc add dev pppoe-wan_dsl handle ffff: ingress
tc filter add dev pppoe-wan_dsl parent ffff: protocol ip prio 10 u32 match ip src $(nslookup speedtest.wtnet.de | grep "Address [[:digit:]]*:" | grep -E -o "([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}") flowid :0 police rate 200mbit burst 500k pass
tc filter add dev pppoe-wan_dsl parent ffff: protocol ip prio 10 u32 match ip src $(nslookup www.speedtestx.de | grep "Address [[:digit:]]*:" | grep -E -o "([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}") flowid :0 police rate 20mbit burst 500k drop
tc qdisc show dev pppoe-wan_dsl

On new tc can it happen that "police" are totally wrong If I want to pass through the specific packages only.

moeller0 · January 10, 2018, 7:58am

Why these rather uncommon DSCP values, why not use the more conventional EF (46), voice admit (44) and CS3 (24)? In that case e.g. sqm-scripts layer_cake.qos with manually configure diffserv4 mode:

static int cake_config_diffserv4(struct Qdisc sch)
{
/ Further pruned list of traffic classes for four-class system:
*

     Latency Sensitive  (CS7, CS6, EF, VA, CS5, CS4)

     Streaming Media    (AF4x, AF3x, CS3, AF2x, TOS4, CS2, TOS1)

     Best Effort        (CS0, AF1x, TOS2, and those not specified)

```
     Background Traffic (CS1)
```
```
         Total 4 traffic classes.
```

should do the trick. Assuming that sch_cake is actually available for your snapsahot build
And if you configure Asterisk yourself switch its DSCPs should not be an issue.
In this case you actually might be able to use the DSCPs directly from the incoming VoIP packets (assuming they arrive suitably marked).

Plonk34 · January 10, 2018, 9:17pm

The reasons are DSCP to complex and my intention was only setting bit 3 for using band 0
I have to understand DSCP first for using.
A table with all 256 DSCP values vs. TOS field in dezimal, binär and hex values + the meanings will be great.

from the sip.conf

but corret me if i am wrong: tos_audio=ef = 1011 1000 = 0xb8 = TOS 0x18 minimal delay + maximize throughput = Band 1 = Bad because Band 0 are better

Plonk34 · January 10, 2018, 9:25pm

The Latency Sensitive group without VA have all TOS=0x0 or TOS=0x18 = Band 1
VA have TOS 0x14 = minimize delay + maximize reliability = Band 0
please correct me if i am wrong.

I thing i use tos_audio=va

moeller0 · January 10, 2018, 9:35pm

There are only 6 bits worth of DSCPs, so just 64 values, the other two bits of the TOS byte are now doing duty for explicit congestion notification (ECN).

Well, honestly DSCPs are really only ever defined inside a DSCP domain ad every network is free to (re-)map them any way they like to priorities or even change them (and almost no one is using the full 6 bits anyway).
That said there are a few DSCPs that actually are widely used, for example the values in your table, and with a bit of luck you ISP actually might leave these markings intact for incoming packets (with a packet capture during a VoIP call that should be easy to check).

This does not make much sense, since the code points per se do exactly nothing, it really only matters if an AQM/QoS system actually starts treating packets differentially based on the DSCPs. And inside your network you set the rules, nobody forces you to interpret CS1 as lowest priority background, you could actually use that making for highest priority; and nobody guarantees that your markings on outgoing packets will a) not be re-mapped to something else on the way and b) will actually be interpreted the way you want them by network elements in the internet between the endpoints. But again using the few DSCPs that actually are used in the wild seems like giving a better chance of the DSCPs actually surviving up to the end points.

Unfortunately the 6 DSCP bits where not initially separated into two groups of 3 bits: 3 bits intended code point and 3 bits for actual re-mapping in the current dscp domain, that way endpoints could at least signal intent end2end; but this was not done and now it is too late...

moeller0 · January 10, 2018, 9:37pm

I actually do not know what band 0 or band 1 actually is supposed to mean? Is band 0 the unrestricted band and band 1 all the rest?

Plonk34 · January 10, 2018, 10:14pm

Hmm i dont fount the link but,
As i understand, exist in each linux network driver a sending queue with band 0, 1 and 2
0 have the highest prioprity

and as i understand the ToS field will be used like here:
https://www.cs.drexel.edu/cgi-bin/manServer.pl/usr/share/man/man8/tc-prio.8

moeller0 · January 10, 2018, 10:31pm

I believe this is only true if you use the pfifo_fast qdisc, which I would not recommend anymore...

running "tc -d qdisc ; tc -s qdisc" might reveal which qdisc is actualy in use.

Also the strict priority precedence as used in those bands is really not that desirable, as band 0 traffic will be able to completely starve the other bands. If possible try sch_cake with the diffserv4 mode and set aterisk to use more conventional DSCP markings. The problem with pfifo_fast really is that it is super easy to introduce massive bufferbloat with that which does not sound like a good idea in general.

Plonk34 · January 10, 2018, 10:56pm

As i understand it is enable by default:
http://www.tldp.org/HOWTO/html_single/Traffic-Control-HOWTO/#qs-pfifo_fast

tc -s qdisc show dev pppoe-wan_dsl
qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn 
 Sent 70365911 bytes 1059986 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0

but what is the meaning ?

moeller0 · January 10, 2018, 11:09pm

IIRC, lede switched to fq_codel as the default qdisc, so outside of your defined policers there should be no priority bands active. at least not on egress...